Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

TAGEDENS E M I N A R S I N P E R I N A T O L O G Y 45 (2021) 151456

Available online at www.sciencedirect.com

Seminars in Perinatology
www.seminperinat.com

Multi ‘omic data integration: A review of concepts,


considerations, and approaches
Tasha M. Santiago-Rodriguez, and Emily B. Hollister*
Diversigen, Inc, 3 Greenway Plaza, Suite 1575, Houston, TX 77046, USA

A R T I C L E I N F O AB STR ACT

Keywords: The application of ‘omic techniques including, but not limited to genomics/metagenomics,
Omics transcriptomics/meta-transcriptomics, proteomics/meta-proteomics, and metabolomics
multi’omic integration to generate multiple datasets from a single sample have facilitated hypothesis generation
metagenomics leading to the identification of biological, molecular and ecological functions and mecha-
metatranscriptomics nisms, as well as associations and correlations. Despite their power and promise, a variety
proteomics of challenges must be considered in the successful design and execution of a multi-omics
metabolomics study. In this review, various ‘omic technologies applicable to single- and meta-organisms
(i.e., host + microbiome) are described, and considerations for sample collection, storage
and processing prior to data generation and analysis, as well as approaches to data storage,
dissemination and analysis are discussed. Finally, case studies are included as examples of
multi-omic applications providing novel insights and a more holistic understanding of bio-
logical processes.
Ó 2021 Elsevier Inc. All rights reserved.

‘omic data sets has been driven, in large part, by the develop-
An introduction to ‘omic technologies ment and increasing availability of array technologies, high-
throughput mass spectrometry and sequencing platforms,
‘Omics refers to the comprehensive or global assessment of a and the development of data and computer science techni-
collection of features. An ‘omic-based analysis may include ques for the movement, management, and integration of
genomics or metagenomics (i.e., all genes in an organism or high dimensional data.2
community), transcriptomics or meta-transcriptomics, Biological systems rely upon the transfer of information
metabolomics, proteomics, or other features studied in a from nucleic acids to proteins and metabolites in order to
global, typically high-throughput, manner.1 Although reduc- shape function and phenotype. This is referred to as the
tionist approaches focused on single genes, proteins, and ‘omics cascade (Fig. 1), and 'omic-driven studies are facilitat-
pathways have successfully identified key features influenc- ing a new, more holistic understanding of systems biology.
ing health and disease, rarely is a single entity the underlying This recognizes that many diseases result from complex and/
cause of a particular phenotype or complex disease. Whereas or heterogeneous processes and a combined ‘lens’ may be
analyte by analyte-based approaches were common previ- more powerful than single ‘omes alone.3 7 Recent studies
ously, many ‘omics platforms have reached a state where also suggest that outliers in multi-omic data sets may signal
high-throughput, high-resolution, cost-efficient profiling of disease progression and could be leveraged as early diagnos-
samples is the norm. The ability to study information-rich tic indicators.3,8,9 Such early detection capabilities may help

*Corresponding author.
E-mail address: ehollister@diversigen.com (E.B. Hollister).

https://doi.org/10.1016/j.semperi.2021.151456
0146-0005/Ó 2021 Elsevier Inc. All rights reserved.
2 S E M I N A R S I N P E R I N A T O L O G Y 45 (2021) 151456

Fig. 1 – ‘Omics cascade. Figure depicts depicts the flow of information and various ‘omic technologies used in the characteri-
zation of genome/metagenomes, transcriptome/meta-transcriptomes, proteome/meta-proteomes, and metabolomes in sin-
gle organisms and meta-organisms (i.e., host + microbiome). Figure also shows the utility of each analyte and their half-
lives.

refine approaches for public health screening and diagnosis, live in and on the body. Metagenomic studies of the human
as well as pinpoint and personalize interventions or treat- microbiome have focused largely on its bacterial members (i.
ments.10 In the following sections, we present commonly e., bacteriomics) to date. Despite this, both fungi (mycobiome)
studied ‘omes, discuss methods and strategies for their anal- and viruses (virome) are part of the human microbiome.
ysis and integration, and highlight multi-omic case studies. Although human-associated microbes have been studied for
centuries, interest in the human microbiome has grown
steadily in recent years. Fueled by the recognition that
Genomics and metagenomics microbes, both pathogenic and commensal, have the poten-
tial to influence nutrient acquisition, immune and neurologi-
Genomics is the characterization of an organism’s entire cal development, and long-term health,13 15 as well as
genetic content. Typically consisting of DNA (RNA in some advances in sequencing technology, metagenomic studies of
viruses), genomes consist of coding (i.e., genes) and non-coding the human microbiome have become commonplace.
regions, and the relative proportions of these vary across the Metagenomic data are generated through untargeted
tree of life. Noncoding regions include introns, regulatory ele- approaches extracting whole community DNA and using
ments, and repetitive DNA and tend to be more common in shotgun sequencing to generate taxonomic and functional
eukaryotes.11 Protein coding sequences represent approxi- profiles; however, amplicon-based approaches [e.g., 16S ribo-
mately 1% of the human genome, but 80-90% of most bacterial somal RNA (rRNA) gene or internal transcribed spacer (ITS)
genomes 12. Genomes are characterized through sequencing, sequencing)] can be used to generate bacterial and archaeal
assembly, and comparison with reference genomes. Whole (16S rRNA gene) or fungal (ITS) taxonomic profiles.16 Genomic
genome sequencing can be used to identify novel genes or gene (or metagenomic) data provide information about taxonomy,
variants, and previously uncharacterized traits may be identi- phylogenetic relationships, potential function, mutations,
fied through genome wide association studies (GWAS), which and allelic variations, which can be used to shape hypothe-
link genomic variants with disease states or other phenotypes. ses, guide experiments, and/or suggest mechanisms underly-
Multiple different approaches exist within genomics. For exam- ing the appearance or behavior of a system. Although
ple, exome sequencing is a genome sequencing strategy in mutations occur within genomes and metagenomic content
which exons (i.e., protein coding regions) are targeted instead of can shift in the context of shifts in community composition,
whole genomes. Targeted approaches like this can lead to genomes are typically considered static in the sense that they
faster, cheaper, more sensitive gene variant discovery.12 do not turn over rapidly and have long half-lives relative to
In contrast to single genomes, a metagenome is the collec- other ‘omes (e.g., RNA, proteins and metabolites). For
tive genomic content of a community (typically microbial). instance, the half-life of DNA in bones is 521 years, but soil
The human microbiome is the collection of the microbes that microbial DNA has a half-life varying from 9.1 to 49.7 h.17
TAGEDENS E M I N A R S I N P E R I N A T O L O G Y 45 (2021) 151456 3

Viromics Proteomics and meta-proteomics

Viromics refers to the study of viral communities (i.e., Proteomics, the study of expressed proteins, is used to char-
virome) and may include viruses that infect bacterial, acterize protein identity, abundance, post-translational mod-
archaeal, fungal, plant, animal, and/or human cells. Viral ifications, and/or the interactions of proteins (or peptides) in
metagenomics typically employs shotgun metagenomic or a system, sample, or matrix. Meta-proteomics characterizes
meta-transcriptomic sequencing to characterize DNA or RNA proteins originating from multiple sources (e.g., host plus
viruses, respectively. Viral metagenomics is challenging microbiome). Protein abundances are the result of upstream
given the small genome and structural sizes of viruses com- transcriptional activity,32,33 translational effects, and post-
pared to bacteria, their high degree of genome variation (e.g., translational modifications (e.g., phosphorylation, glycosyla-
double stranded or single stranded, DNA or RNA-based tion, nitrosylation, and/or ubiquitination), which are difficult
genomes), and biases of reference databases toward patho- to predict from transcriptional signals alone.34 Proteomic
gens and well-characterized bacteriophages. These con- measurements are typically generated using coupled liquid
straints often require additional isolation, concentration chromatography (LC) and mass spectrometry (MS), and they
steps, and annotation steps relative to general metagenomics can be targeted or untargeted. Unlike RNA, proteins are typi-
pipelines.18 Despite these challenges, the importance of the cally more stable and have longer half-lives (e.g., 10 to 1000 h,
virome in health and diseases is increasingly being discov- depending on origin).35,36 Proteomics technologies also tend
ered using viral metagenomics.18 Viral metagenomics may be to be lower-throughput and more expensive in nature.37
used to detect pathogenic viruses,19 and when paired with
bacterial community profiling correlations between viruses
and bacteria can be used to understand cross-kingdom poten- Metabolomics
tial relationships.20 22 Bacteria-virus interactions have also
been used to understand horizontal gene transfer mediated Metabolomics characterizes the small molecules (i.e., metab-
by bacteriophages, a process also known as transduction olites) present in a sample or matrix, including amino acids,
(and transductomics).23 fatty acids, carbohydrates, and other compounds. Metabolites
can be derived from host and/or microbiome metabolism,
ingested through diet or medication, and/or environmental
sources and are not limited to a single source (i.e., some
Transcriptomics and meta-transcriptomics metabolites may be produced by both host and microbe).
Metabolites reflect a variety of upstream biological processes
Transcriptomics is the study of expressed RNA. Transcrip- and can link genotype with phenotype.38 Metabolomic meas-
tomics often focuses on protein-coding RNA [i.e., messenger urements are made using nuclear magnetic resonance (NMR)
RNA (mRNA)] but can include non-coding RNAs, which coor- or gas or liquid chromatography (GC, LC) paired with mass
dinate and tune gene expression. Given that expressed genes spectrometry (GC-MS, LC-MS), and metabolomic studies can
represent a fraction of the total genome, transcriptomic sig- be performed in a targeted or untargeted manner. Metabolite
nals can provide focus on the genes and potential mecha- half-lives vary depending on the metabolite, collection site,
nisms involved in a biological process of interest. sample matrix, and preservation conditions, with the range
Transcriptomic profiling can be performed on host tissue or covering hours,39 41 to days.42 Most metabolites are highly
the microbiome (i.e., meta-transcriptomics), and each pro- labile, necessitating careful handling and preservation to
vides different information regarding a system’s biology. maintain signal integrity.43
Transcriptomic approaches can be targeted focusing on one
or a few genes, semi-targeted using pre-defined arrays (e.g.,
sequence capture panels and microarrays), or untargeted Epigenomics
using shotgun sequencing.24
Transcriptomic (or meta-transcriptomic) data can be used Epigenomics addresses genome-wide characterization of
to confirm or refute (meta)genomic-based hypotheses. How- reversible chemical modifications of DNA or DNA-associated
ever, a transcript’s presence does not guarantee translation proteins impacting gene expression and regulation.44 Epige-
into viable protein. RNases are nearly ubiquitously present, nomic modifications are associated with several diseases
and RNA molecules tend to be highly labile. mRNA half-lives including Prader-Willi syndrome, Angelman syndrome, and
range from seconds to minutes in well-characterized bacte- certain types of cancer.45,46 Epigenomic modifications can
rial and fungal strains, with this range extending to hours for provide information regarding disease status and/or environ-
some human transcripts.25 29 Immediate sample processing mental exposure and act as heritable traits. Epigenomic mod-
or preservation are crucial for ensuring RNA quantity, quality, ifications may be characterized through methylome
and signal detection.30,31 Adequate depletion of ribosomal sequencing, a modified genomic sequencing approach which
RNA and other non-target molecules are also important for captures methylated genomic regions using restriction
(meta)transcriptomic data generation.31 Additionally, given enzymes or bisulfate treatment prior to sequencing. Alterna-
that transcriptomic data reflect responses to conditions pres- tively, chromatin-immuno-precipitation coupled with
ent at a specific point in time, careful experimentation, sam- sequencing (ChIP-seq) represents a more targeted approach,
ple handling, and time series analysis are often required. employing antibodies specific to the modification(s) of
4 S E M I N A R S I N P E R I N A T O L O G Y 45 (2021) 151456

interest. In both cases, deep sequencing is required to maxi-


mize information gain.
Considerations for multi-omic studies

Study design, sample collection and sample storage

Careful planning is required to maximize information gained


Exposomics
and minimize bias in multi-omic studies.56 Well-designed
studies account for the inclusion of sufficient samples for sta-
Exposomics is the study of environmental exposures (i.e., the
tistical power, appropriate controls, replication (biological and
exposome) and their effects on health and disease. Exposures
technical), collection of enough biomass to support multiple
may include aspects of the natural (e.g., air, water and soil
assays, comprehensive documentation of project metadata
quality) and built environments (e.g., quality of workplace and
and experimental procedures, and appropriate sample han-
housing and access to fresh produce), as well as other chemi-
dling. ‘Omic analytes vary with respect to their inherent stabil-
cal exposures and/or pollutants. These factors may exert bio-
ity, and many require specific collection and preservation
logical responses, including inflammation, pro-inflammatory
conditions to maintain sample integrity and reliability. This is
cytokine secretion, methylation, and gene expression changes,
particularly important when samples are collected in field-
as well as increased responsiveness to cortisol. These
and community-based settings which preclude immediate
responses can result in modifications to the microbiome, epi-
sample processing or -80°C storage. The collection, pre-treat-
genome, and downstream gene expression.47 Exposomics is
ment, or preservation approaches appropriate for some ‘omes
frequently linked to other ‘omes, including metabolomics, as
or analytes may not suite others.57 For instance, preservation
certain molecules may be linked with functional changes asso-
of fecal metabolomic samples in 95% ethanol results in strong
ciated with chronic illness.48
concordance with immediate preservation at -80°C, but degra-
dation can occur when 95% ethanol is used for field preserva-
tion of DNA.58 Just as sample collection and preservation are
key considerations, so are methods for analyte extraction and
Additional ‘omes
isolation. Variation among DNA extraction methods can
impact metagenomic profiles,59,60 and protein quantity and
Additional ‘omic fields, many of which are sub-disciplines of
recovery vary according to proteomic extraction method.61
those described above, exist and continue to emerge. Exam-
ples of these include, but are not limited to the lipidome,
inflammasome, and interactome, each of which are described Data resources, storage, and dissemination
below.
Lipidomics, a sub-discipline of metabolomics, focuses on Data resources, storage, and dissemination are critical for
the structure and function of lipids in a cell or organism. Lipi- multi-omic studies, too. Best practices for data management
domics also assesses interactions among lipids, as well as and dissemination are evolving and often dictated by funding
between lipids and proteins or metabolites.49 Lipids are agencies, repository owners, and publication requirements.
known to influence a number of diseases and their progres- Data management and dissemination are also shaped by con-
sion, including cardiovascular disease, obesity, asthma, dia- sented use and government-mandated data privacy regula-
betes, and hypertension.49 Lipidomics may complement tions. Implementing good multi-omic data management
metagenomics as a comprehensive analysis of lipid composi- practices is important for data sharing, support of further analy-
tion can reveal the current state of a cell’s metabolism. Tech- ses and new data interpretation, and development of new soft-
niques such as MS, LC-MS, MS imaging, and ion-mobility MS ware, tools, and workflows. The FAIR principles (Findability,
are often used in lipidomics.49 Accessibility, Interoperability, and Reusability) have been pro-
The inflammasome is defined as the receptors and sensors, posed to guide scientific data management and stewardship.62
often intracellular proteins, that induce inflammation in Other data standards and guides, including the minimum infor-
response to pathogens, host protein-derived molecules, and mation about any sequence (MIxS) checklists,63 (e.g., MIGS,
other stimuli.50 Inflammasome sensors recognize and MIMS, and MIMARKS) seek to record key information regarding
respond to lipopolysaccharides, microbial/viral DNA and bac- genomic, metagenomic, and marker gene sequencing projects.
terial flagellins, providing protection against microbial infec- Similarly, MINSEQE (Minimum Information about a high-
tions.51 Altered inflammasomes are associated with throughput nucleotide SEQuencing Experiment; http://fged.org/
infection, IBD, cancer, and autoimmune conditions,51 and projects/minseqe/), MIAME (Minimum Information about a
studies are increasingly demonstrating associations between Microarray Experiment),64 and MIAPE (Minimum Information
these conditions and the gut microbiota.52,53 About a Proteomics Experiment),65 describe the information
The interactome studies protein-protein interactions, as needed to support interpretation and reproducibility for
well as tight complex forming macromolecule-macromole- sequencing, microarray, and proteomics studies, respectively.
cule interactions. This developing field can aid in the discov-
ery of biomarkers and therapeutic approaches.54 One
approach used to study the interactome involves breaking Databases
cells by cryolysis to provide a snapshot of their complexes at
different stages. Analyses are performed as in proteomics Databases are essential for any ‘omics study, as they aid in
and meta-proteomics usually using MS.55 the identification of analytes and provide contextual
TAGEDENS E M I N A R S I N P E R I N A T O L O G Y 45 (2021) 151456 5

information. Although a variety of specialized databases summary containing the description of the project, sample
exist, some of the most frequently used ones are described and processing protocols, as well as contact information.81
below.

Nucleotide and protein sequences Approaches and platforms for multi-omic analysis
and data integration
The National Center for Biotechnology Information (NCBI)
houses multiple nucleotide and protein databases, including Despite the increased availability of ‘omic data sets and tools
GenBank and the Reference Sequence collection (RefSeq). for their analysis, multi-omic integration remains challeng-
GenBank is an open access, annotated collection of publicly ing.2 Study design, noise, and data interoperability contribute
deposited nucleotide and protein sequences.66 RefSeq is a to this, as do varying definitions of what constitutes a multi-
non-redundant version of GenBank. Other repositories omic study. “Multi-omic integration” may be used to describe
include the Sequence Read Archive (SRA), the European Bioin- the analysis of a single 'ome across multiple studies (e.g., a
formatics Institute (EMBL-EBI), and the DNA Database of meta-analysis), as well as the integration of multiple ‘omes
Japan (DDBJ), each of which contribute to the International generated on the same set of samples (i.e., “vertical integra-
Nucleotide Sequence Database Collaboration (INSDC). The tion”).82 Similarly, multi-omic integration can be conceptual,
SRA is specific for sequence data generated on high-through- statistical, and/or model-based.57,83 Conceptual integration
put, next-generation sequencing platforms. Specialized data- combines insights obtained from single ‘omes to form a more
bases exist as well, including specific databases for 16S and comprehensive understanding of the biology of a system. Sta-
18S rRNA gene sequence data, like SILVA,67 the Genome Tax- tistical integration focuses on statistical relationships within
onomy Database (GTDB), which has sought to quantitatively and across ‘omes, and model-based integration is the layering
define species via average nucleotide identity (ANI),68 the of ‘omic data onto pre-defined system models to understand
MetaHIT (METAgenomics of the Human Intestinal Tract) gene molecular organization and function.
catalog,69 and the Integrated Microbial Genomes and Micro- In selecting methods for multi-omic data integration, the
biomes (IMG/M) database.70 nature of one’s data and how it should be handled must be
considered. Many ‘omic data are noisy, sparse (i.e., contain
Metabolomics many zeros), high-dimensional, and contain batch effects.
‘Omic data can be highly heterogeneous, have signals of dif-
Metabolite information can be accessed through a variety of fering scale across ‘omes, and vary with respect to being
repositories. For instance, the Human Metabolome Database quantitative, qualitative, and/or compositional.84 These qual-
(HMDB) includes predicted MS/MS and GC-MS data, and ities are computationally challenging and often require pre-
metabolite structural information.71 A related database, the processing, including filtering, imputation of missing values,
Human Fecal Metabolome database, focuses specifically on transformation, normalization, and/or scaling prior to down-
metabolite information from human stool. Each entry con- stream analysis. These steps limit impacts of outliers, reduce
tains over 110 data fields that are hyperlinked to other refer- the number of features considered, and prevent one ‘ome’s
ence databases, and it allows users to browse concentration signals from overwhelming others.85,86 Additionally, spike-in
data and associated diseases.72 Similarly, the Metabolomics standards may provide a “ground truth” to facilitate harmoni-
Workbench Metabolite Database contains structures and zation across ‘omes.56
annotations of biologically relevant metabolites, containing Interoperability (i.e., the ability of the data sets to ‘speak’ to
approximately 136,000 entries collected from public sources one another) between ‘omic data sets also represents an anal-
like Chemical Entities of Biological Interest (ChEBI), HMDB, ysis obstacle. Although numeric relationships can be consid-
Biological Magnetic Resonance Bank (BMRB), PubChem, and ered without explicit links between 'omes, the ability to
the Kyoto Encyclopedia of Genes and Genomes (KEGG).73 leverage knowledge-based metabolic networks, like those
METLIN is a resource for metabolite and other molecule data published by KEGG,78 MetaCyc,87 Reactome,88 and Ingenuity
analysis, representing over 350 chemical classes,74,75 and Pathway Analysis (IPA, QIAGEN Inc., https://digitalinsights.
MetaboLights covers structures, reference spectra, biological qiagen.com/products-overview/discovery-insights-portfolio/
roles, and data from metabolic experiments.76 analysis-and-visualization/qiagen-ipa/), relies upon one’s
ability to map or overlay data onto pre-existing models and
Proteins and proteomics frameworks. It also requires that identifiers (e.g., gene,
enzyme, compound) be classified using specific ontologies.
The Universal Protein Resource (UniProt) provides protein Although such models and frameworks are incredibly useful,
sequence and annotation information, as well as tools includ- it should be noted that they are not comprehensive and may
ing local BLAST, alignments, downloadable releases, and data not fully represent complex systems.
submission options.77 Other databases aggregate functional Through conceptual integration, results from one ‘ome are
information into functional modules and pathways, includ- used to build upon signals observed in another.89 Beyond
ing KEGG,78 the Clusters of Orthologous Genes (COGs) data- seeking consensus (i.e., maximum agreement), studies may
base,79 and Pfam.80 In addition, multiple easily searchable, search for complementarity (i.e., useful information found in
curated databases support proteomics research. Among each ‘ome but not necessarily shared across them) or lever-
these, the PRoteomics IDEntification Database (PRIDE) con- age information contained in one ‘ome to provide context for
tains proteomics datasets that are searchable, with a another. Consensus-seeking approaches can pinpoint
6 S E M I N A R S I N P E R I N A T O L O G Y 45 (2021) 151456

features that contribute to mechanism. Complementarity- be achieved using a variety of methods, including partial least
maximizing approaches can provide useful information squares regression, canonical correlation analysis (CCA),
about molecular processes and levels at which they are car- sparse CCA, and partial least squares discriminant analysis.
ried out.3 Leveraging ‘omes versus one another is often done Feature selection can serve as a discovery tool to identify can-
with paired metagenomic and meta-transcriptomic data to didate biomarkers and may provide mechanistic clues related
determine transcript to gene ratios. In paired metagenomic/ to poorly understood phenomena. Variable selection meth-
meta-proteomic studies, metagenome-assembled genes facil- ods are frequently applied in the analysis of single ‘omic data
itate protein identification from MS spectra.90,91 sets and may be used as a filtering step preceding the integra-
With statistical integration, multiple existing approaches tion of multiple ‘omes.82 Additional models and techniques
and models have been applied to ‘omic data sets,82,86,92 and are described in the platforms below and reviewed at length
new algorithms and platforms are introduced regularly. Sta- elsewhere.82,86,92
tistical integration includes model-based (i.e., statistical mod- Model-based integration leverages pre-defined system
els), multi-variate, and network-based methods, including models to understand molecular organization and function.
many Bayesian approaches. Bayesian approaches leverage a ‘Omics data may be pre-processed or pre-filtered (e.g., retain-
priori assumptions about the data to calculate posterior prob- ing differentially abundant or expressed features identified
ability distributions, refining our ability to model and predict through feature set enrichment analysis or similar
outcomes. Bayesian approaches are commonly applied to approaches) prior to model mapping, or all data may be
high-dimensional data sets, including single and multiple mapped with the intent of identifying reactions and/or path-
‘omes, for feature selection. Such approaches seek to identify ways for which evidence exists. Tools such as the Integrated
features for which support across multiple ‘omes is collec- Molecular Pathway-Level Analysis (IMPaLA),97 PaintOmics,98
tively high, and they are attractive in that they can incorpo- IPA (Qiagen), and the KEGG mapper,78 can be leveraged for
rate existing biological knowledge. this type of analysis. These approaches are appealing due to
Statistical integration includes both unsupervised and the highly visual nature of their output, and they are useful in
supervised analyses. Unsupervised analyses seek to identify that they can place large amounts of data in biological con-
subgroups within a data set, irrespective of known pheno- text. However, as noted above, these models are not fully
types or clinical features. For example, cluster analysis and comprehensive, may not facilitate the integration of all types
Principal coordinates analysis seek to identify groups such of ‘omic data, and are often limited in their representation of
that samples within a group are more like one another than complex systems.
to those in other groups. Typically, these similarities are
based on a similarity (or distance) metric calculated among
samples in an all versus all manner. Importantly, cluster
Software and bioinformatic tools for multi ‘omic
number and size are not defined a priori. Clustering can be
analyses
used to identify outliers, sub-groups, or assess the distribu-
tion of known sample characteristics across groups.
Multi ‘omic data generation has fueled a need for tools and
Network, or association, analysis is another unsupervised
platforms to support analysis, integration, and visualization.
technique used to identify numerical relationships among
A variety of tools have been developed and are becoming
‘omic features and samples.84 Analytes and/or samples are
available. Some perform correlation analyses, while others
represented as nodes and relationships (i.e., Pearson or
facilitate covariance-based and/or multiple co-inertia analy-
Spearman correlations) are depicted by edges connecting
ses99. Other tools support data visualization, increase
them. Network analysis can be performed on single or multi-
interpretability, and support data sharing and accessibil-
ple ‘omes. When performed with multiple ‘omes, this is
ity.85,100 Several of these tools are described in Table 1, and a
referred to as concatenation, “early integration”,92 or “single
more comprehensive list can be found in https://github.com/
block” analysis.2. With this approach, all data are combined
mikelove/awesome-multi-omics. In addition, multiple com-
into a single matrix prior to downstream analysis, and signal
mercial platforms are available and will likely continue to
normalization and scaling are particularly important. Net-
emerge (for example, Seven Bridges https://www.seven
work size, degree of connectivity, and overall topology can be
bridges.com/)
evaluated, and potential relationships can be inferred from
these results. The numerical relationships identified using
correlation-based approaches may not reflect direct, physical
relationships, nor do they specifically account for complex Case studies and future directions
interactions.86 Despite the simplicity of these approaches,
association analyses can identify potentially co-regulated Multi-omic approaches have been applied in a variety of con-
features,93,94 features regulating one another,95 and highlight texts to date, including studies of pregnancy and neonatal
dysregulation in the context of disease.82,96 health and disease. As described above, conceptual, statisti-
Supervised analyses attempt to model features that can be cal, and model-based integration approaches have been used
used to predict traits (e.g., phenotype or clinical outcome).82 to gain insight from layers of ‘omic data. Multi-omic applica-
Examples include regression analysis and its multi-variate tions to date have included biomarker discovery, the charac-
extensions, as well as some machine learning algorithms (e. terization of microbial dynamics, and host-microbiome
g., Random Forests). Variable selection (i.e., feature selection) interactions. In most cases, it is anticipated that these discov-
techniques are an extension of supervised analysis and can eries will be translated to therapeutics or distilled down to a
TAGEDENS E M I N A R S I N P E R I N A T O L O G Y 45 (2021) 151456 7

Table 1 – Description of several multi-omics tools for data integration, analysis and/or visualization.

Tool Interface Description Input Supports Supports single Reference


microbiome organisms
97
IMPaLA Web A web interface for the joint List of pathways that con- Yes Yes
analysis of transcriptom- tain at least one gene/pro-
ics or proteomics and tein and/or metabolite
metabolomics data.
Enrichment analysis is
performed with user-
specified lists of metabo-
lites and genes as input.
101
Cystoscape Java A platform for the visualiza- Network file formats Yes Yes
tion of interactions of any
data type through
networks.
98
PaintOmics Web A web tool for the visualiza- Feature quantification file No Yes
tion of multi-omics data- (e.g., proteomics, tran-
sets onto KEGG pathways. scriptomics and
metabolomics)
102
VANTED Java A Java-based framework Network file formats Yes* Yes
which generates network-
based view on large-scale
datasets
103
mixOmics R package An R package which classi- Normalized microarray, Yes Yes*
fies sample groups, iden- mass spectrometry-based
tifies discriminant proteomics, metabolo-
features, and predicts the mics, or sequence count
class of new samples. data (RNA-seq, 16S,
metagenomic)
104
MoCluster R package Applies multiblock multi- Normalized multiple omics Yes* Yes
variate analysis to define datasets
groups of variables that
represent shared patterns
across 'omics datasets. A
clustering algorithm then
aids to discover joint
clusters.
99
Multiple co- R package A multivariate analysis Normalized multiple omics Yes* Yes
inertia analy- method to investigate datasets
sis (mCIA) relationship patterns in
multiple omics datasets.
It utilizes both sparse and
structured sparse
methods.
105
MultiOmics R package A statistical approach for Normalized multiple omics Yes* Yes
factor analy- integrating multi-omics datasets
sis (MOFA) datasets in an unsuper-
vised manner.
106
Pathview Web Web A web-based tool that Gene or compound data in Yes* Yes
maps, integrates, and ren- tab- or comma-delimited
ders biological data to format (txt or csv) with
generate publication- matrix of genes or com-
ready visuals. pounds as rows and sam-
ples as columns.
107
BioMiner Web Integrates various cross- Data import must be per- Yes* Yes
omics high-throughput formed by a dedicated
data sets, focusing on specialist and requires a
cancer and personalized predefined cross-omics
medicine. relationship
108
trackViewer Bioconductor Can be used to visualize Mapped reads, genomic No Yes
package coverage and annotation coverage, SNP informa-
tracks and generate lolli- tion, or methylation sta-
pop and dandelion plots tus of a genomic region
to visualize methylation, converted into Granges
mutation, or variant data. from various file formats.
8 S E M I N A R S I N P E R I N A T O L O G Y 45 (2021) 151456

Table 1 (continued)

Tool Interface Description Input Supports Supports single Reference


microbiome organisms
109
STATegraEMS Web Supports annotation and Data as produced by the Yes* Yes
tracking of analysis pipe- omics equipment.
lines, including RNA-seq,
miRNA-seq, Chip-seq,
Methyl-seq, or Dnase-seq,
as well as proteomics and
metabolomics data.
110
gNOMO Snakemake Integrates and analyzes DNA and RNA paired-end Yes Yes
metagenomics, metatran- reads, as well as MS/MS
scriptomics and metapro- spectra
teomics data from the
microbiome and non-
model host organisms.

* Although the program has been developed specifically for the analysis of single organisms or microbiome data, both analysis types may be supported.

few key analytes for the purposes of diagnosis and/or patient microbial metabolism.112 Understanding this window of
stratification (Fig. 2). development is particularly important as the human gut
Multi-omic approaches are being used to understand a vari- microbiome develops rapidly following birth and has the
ety of biological phenomena across the lifespan, including potential to impact health and disease outcomes later in
gestational clocks and preterm birth. Preterm birth is a major life.113 Similarly, meta-proteomics has been used to identify
cause of neonatal death, and researchers routinely seek to key functional differences in the gut microbiota of preterm
understand the factors contributing to it. Establishing a thor- infants.114 As with the preterm birth study described above,
ough understanding of the molecular underpinnings and deviations from “healthy” or “normal” progression in micro-
chronological changes occurring during term pregnancy may bial community assembly or metabolism may serve as indica-
help identify (molecular) deviations associated with preterm tors for poor health outcomes and could implicate key
birth and other pregnancy-related pathologies. Leveraging a molecular mechanisms associated with them.
combination of metabolomics, proteomics, transcriptomics, Beyond pregnancy and early-life programming, biomarkers
and statistical models based on “single block” analysis, of birthweight, a characteristic having implications for health
Ghaemi et al. found that immune cell signaling responses later in life, have been identified using correlation analysis of
model gestational age better than single ‘omes alone.111 Their various ‘omic data sets including DNA methylation profiling,
results highlight the immunomodulatory capacity of the ste- transcriptomics, inflammation-related proteins, cholesterol,
roid hormone pregnanolone sulfate as a possible mechanism and anthropometric measurements.5 Abnormal birthweight
for maintaining pregnancy. is associated with increased mortality, risk of cardiovascular
In the context of microbiome development in the days and diseases, mental health problems, and certain cancers later
weeks following birth, another multi-omic analysis used in life. Recent work has found that both the metabolome and
metagenomics, proteomics, and metabolomics in meconium methylome carry common signatures associated with abnor-
and early stool. This study, an example of conceptual integra- mal birthweight, as well as novel biomarkers, including a
tion, demonstrated transient versus persistent presence of macrophage-derived chemokine. Additionally, cholesterol
certain gut microbes, as well as key features reflecting levels in cord blood were correlated with birthweight, such

Fig. 2 – Example of multi-omic applications in health and disease. Multi-omic approaches lend themselves to the discovery of
biomarkers and signals contributing to underlying mechanism. Distilled signals based on these discoveries are likely to con-
tribute to the development of therapeutics or become key analytes which serve as the basis for diagnostic tests and/or
patient stratification.
TAGEDENS E M I N A R S I N P E R I N A T O L O G Y 45 (2021) 151456 9

that higher birthweight is associated with increased high- 8. Guoa L, Milburna MV, Ryalsa JA, et al. Plasma metabolomic
density lipoprotein cholesterol.5 profiles enhance precision medicine for volunteers of nor-
Another example of network analysis and statistical inte- mal health. Proc Natl Acad Sci U S A. 2015. https://doi.org/
10.1073/pnas.1508425112.
gration for biomarker discovery can be found in a study
9. Brechtmann F, Mertes C, Matusevic iute_ A, et al. OUTRIDER: A
examining taxonomic, functional, and metabolic differences
Statistical Method for Detecting Aberrantly Expressed Genes
in children with inflammatory bowel syndrome (IBS).4 Key in RNA Sequencing Data. Am J Hum Genet. 2018. https://doi.
features differentiating cases from controls included micro- org/10.1016/j.ajhg.2018.10.025.
bial species and metabolites like secondary bile acids, sterols, 10. Karczewski KJ, Snyder MP. Integrative omics for health and
and steroid-like compounds, and combinations of these sug- disease. Nat Rev Genet. 2018. https://doi.org/10.1038/
gested promise in their potential to predict case versus con- nrg.2018.4.
11. Lewin B. GenesVIII. 8th ed. Upper Saddle River, NJ: Pearson/
trol, an important step forward in diagnosing a condition that
Prentice Hall; 2004.
is identified based upon self-reported symptoms. 12. Ng SB, Turner EH, Robertson PD, et al. Targeted capture and
Moving forward, as the cost of the various ‘omic data sets massively parallel sequencing of 12 human exomes. Nature.
described continues to decrease, and more tools are devel- 2009. https://doi.org/10.1038/nature08250.
oped for data integration, storage, sharing, analysis and visu- 13. Belkaid Y, Hand TW. Role of the microbiota in immunity
alization, we anticipate that the number of studies, and inflammation. Cell. 2014. https://doi.org/10.1016/j.
cell.2014.03.011.
employing multi-omic strategies will continue to grow. Multi-
14. Hollister EB, Gao C, Versalovic J. Compositional and func-
omic information will aid in the search for markers of health
tional features of the gastrointestinal microbiome and their
and disease and help to explain biological phenomena includ- effects on human health. Gastroenterology. 2014. https://doi.
ing the dynamics of microbial succession and microbe-host org/10.1053/j.gastro.2014.01.052.
interactions. 15. Hsiao EY, McBride SW, Hsien S, et al. Microbiota modulate
behavioral and physiological abnormalities associated with
neurodevelopmental disorders. Cell. 2013. https://doi.org/
10.1016/j.cell.2013.11.024.
Funding 16. Liu YX, Qin Y, Chen T, et al. A practical guide to amplicon
and metagenomic analysis of microbiome data. Protein Cell.
The author(s) received no specific funding for this work. 2020. https://doi.org/10.1007/s13238-020-00724-8.
17. Halter MC, Zahn JA. Degradation and half-life of DNA pres-
ent in biomass from a genetically-modified organism during
land application. J Ind Microbiol Biotechnol. 2017. https://doi.
Disclosure org/10.1007/s10295-016-1876-x.
18. Santiago-Rodriguez TM, Hollister EB. Potential Applications
T.M.S.-R. and E.B.H. are current employees of Diversigen, Inc. of Human Viral Metagenomics and Reference Materials: con-
E.B.H. owns OraSure stock/stock options and has received siderations for Current and Future Viruses. Appl Environ
honoraria for speaking at symposia. Microbiol. 2020. https://doi.org/10.1128/AEM.01794-20.
19. Cheval J, Sauvage V, Frangeul L, et al. Evaluation of high-
throughput sequencing for identifying known and unknown
R EF E RE N C E S viruses in biological samples. J Clin Microbiol. 2011. https://
doi.org/10.1128/JCM.00850-11.
20. Santiago-Rodriguez TM, Hollister EB. Human virome and dis-
1. Hasin Y, Seldin M, Lusis A. Multi-omics approaches to dis- ease: High-throughput sequencing for virus discovery, iden-
ease. Genome Biol. 2017. https://doi.org/10.1186/s13059-017- tification of phage-bacteria dysbiosis and development of
1215-1. therapeutic approaches with emphasis on the human gut.
2. Wo € rheide MA, Krumsiek J, Kastenmu € ller G, Arnold M. Multi- Viruses. 2019. https://doi.org/10.3390/v11070656.
omics integration in biomedical research a metabolomics- 21. Bajaj JS, Sikaroodi M, Shamsaddini A, et al. Interaction of
centric review. Anal Chim Acta. 2021. https://doi.org/10.1016/ bacterial metagenome and virome in patients with cirrhosis
j.aca.2020.10.038. and hepatic encephalopathy. Gut. 2020. https://doi.org/
3. Zhou W, Sailani MR, Contrepois K, et al. Longitudinal multi- 10.1136/gutjnl-2020-322470.
omics of host microbe dynamics in prediabetes. Nature. 22. Emlet C, Ruffin M, Lamendella R. Enteric Virome and Carci-
2019. https://doi.org/10.1038/s41586-019-1236-x. nogenesis in the Gut. Dig Dis Sci. 2020. https://doi.org/
4. Hollister EB, Oezguen N, Chumpitazi BP, et al. Leveraging 10.1007/s10620-020-06126-4.
Human Microbiome Features to Diagnose and Stratify Chil- 23. Kleiner M, Bushnell B, Sanderson KE, Hooper LV, Duerkop
dren with Irritable Bowel Syndrome. J Mol Diagnostics. 2019. BA. Transductomics: sequencing-based detection and analy-
https://doi.org/10.1016/j.jmoldx.2019.01.006. sis of transduced DNA in pure cultures and microbial com-
5. Alfano R, Chadeau-Hyam M, Ghantous A, et al. A multi-omic munities. Microbiome. 2020. https://doi.org/10.1186/s40168-
analysis of birthweight in newborn cord blood reveals new 020-00935-5.
underlying mechanisms related to cholesterol metabolism. 24. Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool
Metabolism. 2020. https://doi.org/10.1016/j.metabol.2020.154292. for transcriptomics. Nat Rev Genet. 2009. https://doi.org/
6. Walejko JM, Koelmel JP, Garrett TJ, Edison AS, Keller-Wood 10.1038/nrg2484.
M. Multiomics approach reveals metabolic changes in the 25. Laalami S, Zig L, Putzer H. Initiation of mRNA decay in bacte-
heart at birth. Am J Physiol - Endocrinol Metab. 2018. https:// ria. Cell Mol Life Sci. 2014. https://doi.org/10.1007/s00018-013-
doi.org/10.1152/ajpendo.00297.2018. 1472-4.
7. Neu J. Multiomics-based strategies for taming intestinal 26. Baudrimont A, Voegeli S, Viloria EC, et al. Multiplexed gene
inflammation in the neonate. Curr Opin Clin Nutr Metab Care. control reveals rapid mRNA turnover. Sci Adv. 2017. https://
2019. https://doi.org/10.1097/MCO.0000000000000559. doi.org/10.1126/sciadv.1700006.
10 S E M I N A R S I N P E R I N A T O L O G Y 45 (2021) 151456

27. Arava Y, Wang Y, Storey JD, Liu CL, Brown PO, Herschlag D. 47. Renz H, Holt PG, Inouye M, Logan AC, Prescott SL, Sly PD. An
Genome-wide analysis of mRNA translation profiles in Sac- exposome perspective: Early-life events and immune devel-
charomyces cerevisiae. Proc Natl Acad Sci U S A. 2003. https:// opment in a changing world. J Allergy Clin Immunol. 2017.
doi.org/10.1073/pnas.0635171100. https://doi.org/10.1016/j.jaci.2017.05.015.
28. Bernstein JA, Khodursky AB, Lin PH, Lin-Chao S, Cohen SN. 48. Smith MT, de la Rosa R, Daniels SI. Using exposomics to
Global analysis of mRNA decay and abundance in Escheri- assess cumulative risks and promote health. Environ Mol
chia coli at single-gene resolution using two-color fluores- Mutagen. 2015. https://doi.org/10.1002/em.21985.
cent DNA microarrays. Proc Natl Acad Sci U S A. 2002. https:// 49. Han X. Lipidomics for studying metabolism. Nat Rev Endocri-
doi.org/10.1073/pnas.112318199. nol. 2016. https://doi.org/10.1038/nrendo.2016.98.
29. Schwanhu € usser B, Busse D, Li N, et al. Global quantification 50. Guo H, Callaway JB, Ting JPY. Inflammasomes: Mechanism of
of mammalian gene expression control. Nature. 2011. action, role in disease, and therapeutics. Nat Med. 2015.
https://doi.org/10.1038/nature10098. https://doi.org/10.1038/nm.3893.
30. Carvalhais LC, Schenk PM. Sample processing and cDNA 51. Man SM. Inflammasomes in the gastrointestinal tract: infec-
preparation for microbial metatranscriptomics in complex tion, cancer and gut microbiota homeostasis. Nat Rev Gastroen-
soil communities. Methods in Enzymology. 2013. https://doi. terol Hepatol. 2018. https://doi.org/10.1038/s41575-018-0054-1.
org/10.1016/B978-0-12-407863-5.00013-7. 52. Elinav E, Strowig T, Kau AL, et al. NLRP6 inflammasome regu-
31. Shakya M, Lo CC, Chain PSG. Advances and challenges in lates colonic microbial ecology and risk for colitis. Cell. 2011.
metatranscriptomic analysis. Front Genet. 2019. https://doi. https://doi.org/10.1016/j.cell.2011.04.022.
org/10.3389/fgene.2019.00904. 53. Henao-Mejia J, Elinav E, Jin C, et al. Inflammasome-mediated
32. Vogel C, Marcotte EM. Insights into the regulation of protein dysbiosis regulates progression of NAFLD and obesity.
abundance from proteomic and transcriptomic analyses. Nature. 2012. https://doi.org/10.1038/nature10809.
Nat Rev Genet. 2012. https://doi.org/10.1038/nrg3185. 54. Huang TL, Lin CC. Advances in Biomarkers of Major Depres-
33. Mauger DM, Joseph Cabral B, Presnyak V, et al. mRNA struc- sive Disorder. Adv Clin Chem. 2015. https://doi.org/10.1016/
ture regulates protein expression through changes in func- bs.acc.2014.11.003.
tional half-life. Proc Natl Acad Sci U S A. 2019. https://doi.org/ 55. Aitchison JD, Rout MP. The interactome challenge. J Cell Biol.
10.1073/pnas.1908052116. 2015. https://doi.org/10.1083/jcb.201510108.
34. O’Donnell ST, Ross RP, Stanton C. The Progress of Multi- 56. Pinu FR, Beale DJ, Paten AM, et al. Systems biology and multi-
Omics Technologies: Determining Function in Lactic Acid omics integration: viewpoints from the metabolomics
Bacteria Using a Systems Level Approach. Front Microbiol. research community. Metabolites. 2019. https://doi.org/
2020. https://doi.org/10.3389/fmicb.2019.03084. 10.3390/metabo9040076.
35. Mathieson T, Franken H, Kosinski J, et al. Systematic analy- 57. Canzler S, Schor J, Busch W, et al. Prospects and challenges
sis of protein turnover in primary cells. Nat Commun. 2018. of multi-omics data integration in toxicology. Arch Toxicol.
https://doi.org/10.1038/s41467-018-03106-1. 2020. https://doi.org/10.1007/s00204-020-02656-y.
36. Moran MA, Satinsky B, Gifford SM, et al. Sizing up meta- 58. Song SJ, Amir A, Metcalf JL, et al. Preservation Methods Differ
transcriptomics. ISME J. 2013. https://doi.org/10.1038/ in Fecal Microbiome Stability, Affecting Suitability for Field
ismej.2012.94. Studies. mSystems. 2016. https://doi.org/10.1128/msys-
37. Ting YS, Egertson JD, Bollinger JG, et al. PECAN: Library-free tems.00021-16.
peptide detection for data-independent acquisition tandem 59. Sinha R, Abu-Ali G, Vogtmann E, et al. Assessment of varia-
mass spectrometry data. Nat Methods. 2017. https://doi.org/ tion in microbial community amplicon sequencing by the
10.1038/nmeth.4390. Microbiome Quality Control (MBQC) project consortium. Nat
38. Fiehn O. Metabolomics - the link between genotypes and Biotechnol. 2017. https://doi.org/10.1038/nbt.3981.
phenotypes. Plant Mol Biol. 2002. https://doi.org/10.1023/ 60. Costea PI, Zeller G, Sunagawa S, et al. Towards standards for
A:1013713905833. human fecal sample processing in metagenomic studies.
39. Leyva A, van Groeningen CJ, Kraal I, et al. Phase I and Phar- Nat Biotechnol. 2017. https://doi.org/10.1038/nbt.3960.
macokinetic Studies of High-Dose Uridine Intended for Res- 61. Zhang X, Li L, Mayne J, Ning Z, Stintzi A, Figeys D. Assessing
cue from 5-Fluorouracil Toxicity. Cancer Res. 1984. the impact of protein extraction methods for human gut
40. Rasmussen K, Moller J. Total homocysteine measurement in metaproteomics. J Proteomics. 2018. https://doi.org/10.1016/j.
clinical practice. Ann Clin Biochem. 2000. https://doi.org/ jprot.2017.07.001.
10.1258/0004563001899915. 62. Wilkinson MD, Dumontier M, IjJ Aalbersberg, et al. Com-
41. Muniyappa R. Oral carnitine therapy and insulin resistance. ment: the FAIR Guiding Principles for scientific data manage-
Hypertension. 2010. https://doi.org/10.1161/HYPERTENSIO- ment and stewardship. Sci Data. 2016. https://doi.org/
NAHA.109.147504. 10.1038/sdata.2016.18.
42. Johansson S, Lindstedt S, Register U, Wadstrom L. Studies on 63. Yilmaz P, Kottmann R, Field D, et al. Minimum information
the metabolism of labeled pyridoxine in man. Am J Clin Nutr. about a marker gene sequence (MIMARKS) and minimum
1966;18(3):185–196. information about any (x)sequence (MIxS) specifications. Nat
43. Guasch-Ferre M, Bhupathiraju SN, Hu FB. Use of metabolo- Biotechnol. 2011. https://doi.org/10.1038/nbt.1823.
mics in improving assessment of dietary intake. Clin Chem. 64. Brazma A, Hingamp P, Quackenbush J, et al. Minimum infor-
2018. https://doi.org/10.1373/clinchem.2017.272344. mation about a microarray experiment (MIAME) - toward
44. Bernstein BE, Stamatoyannopoulos JA, Costello JF, et al. The standards for microarray data. Nat Genet. 2001. https://doi.
NIH roadmap epigenomics mapping consortium. Nat Biotech- org/10.1038/ng1201-365.
nol. 2010. https://doi.org/10.1038/nbt1010-1045. 65. Taylor CF, Paton NW, Lilley KS, et al. The minimum informa-
45. Buiting K, Saitoh S, Gross S, et al. Inherited microdeletions in tion about a proteomics experiment (MIAPE). Nat Biotechnol.
the Angelman and Prader Willi syndromes define an 2007. https://doi.org/10.1038/nbt1329.
imprinting centre on human chromosome 15. Nat Genet. 66. Benson DA, Cavanaugh M, Clark K, et al. GenBank. Nucleic
1995. https://doi.org/10.1038/ng0495-395. Acids Res. 2018. https://doi.org/10.1093/nar/gkx1094.
46. Jones PA, Baylin SB. The Epigenomics of Cancer. Cell. 2007. 67. Quast C, Pruesse E, Yilmaz P, et al. The SILVA ribosomal RNA
https://doi.org/10.1016/j.cell.2007.01.029. gene database project: Improved data processing and web-
TAGEDENS E M I N A R S I N P E R I N A T O L O G Y 45 (2021) 151456 11

based tools. Nucleic Acids Res. 2013. https://doi.org/10.1093/ methods and resources. Metabolites. 2020. https://doi.org/
nar/gks1219. 10.3390/metabo10050202.
68. Parks DH, Chuvochina M, Chaumeil PA, Rinke C, Mussig AJ, 87. Caspi R, Altman T, Billington R, et al. The MetaCyc database
Hugenholtz P. A complete domain-to-species taxonomy for of metabolic pathways and enzymes and the BioCyc collec-
Bacteria and Archaea. Nat Biotechnol. 2020. https://doi.org/ tion of Pathway/Genome Databases. Nucleic Acids Res. 2014.
10.1038/s41587-020-0501-8. https://doi.org/10.1093/nar/gkt1103.
69. Ehrlich SD. MetaHIT: The European Union project on metage- 88. Fabregat A, Jupe S, Matthews L, et al. The Reactome Pathway
nomics of the human intestinal tract. Metagenomics of the Human Knowledgebase. Nucleic Acids Res. 2018. https://doi.org/
Body. 2011. https://doi.org/10.1007/978-1-4419-7089-3_15. 10.1093/nar/gkx1132.
70. Markowitz VM, Chen IMA, Palaniappan K, et al. IMG: The 89. Mars RAT, Yang Y, Ward T, et al. Longitudinal Multi-omics
integrated microbial genomes database and comparative Reveals Subset-Specific Mechanisms Underlying Irritable
analysis system. Nucleic Acids Res. 2012. https://doi.org/ Bowel Syndrome. Cell. 2020. https://doi.org/10.1016/j.
10.1093/nar/gkr1044. cell.2020.08.007.
71. Wishart DS, Feunang YD, Marcu A, et al. HMDB 4.0: The 90. Erickson AR, Cantarel BL, Lamendella R, et al. Integrated
human metabolome database for 2018. Nucleic Acids Res. metagenomics/metaproteomics reveals human host-micro-
2018. https://doi.org/10.1093/nar/gkx1089. biota signatures of Crohn’s disease. PLoS One. 2012;7(11):
72. Karu N, Deng L, Slae M, et al. A review on human fecal e49138.
metabolomics: Methods, applications and the human fecal 91. Zhang X, Ning Z, Mayne J, et al. MetaPro-IQ: A universal
metabolome database. Anal Chim Acta. 2018. https://doi.org/ metaproteomic approach to studying human and mouse gut
10.1016/j.aca.2018.05.031. microbiota. Microbiome. 2016. https://doi.org/10.1186/s40168-
73. Sud M, Fahy E, Cotter D, et al. Metabolomics Workbench: an 016-0176-z.
international repository for metabolomics data and meta- 92. Rappoport N, Shamir R. Multi-omic and multi-view cluster-
data, metabolite standards, protocols, tutorials and training, ing algorithms: review and cancer benchmark. Nucleic Acids
and analysis tools. Nucleic Acids Res. 2016. https://doi.org/ Res. 2018. https://doi.org/10.1093/nar/gky889.
10.1093/nar/gkv1042. 93. Sharma A, Laxman B, Naureckas ET, et al. Associations
74. Guijas C, Montenegro-Burke JR, Domingo-Almenara X, et al. between fungal and bacterial microbiota of airways and
METLIN: a Technology Platform for Identifying Knowns and asthma endotypes. J Allergy Clin Immunol. 2019. https://doi.
Unknowns. Anal Chem. 2018. https://doi.org/10.1021/acs. org/10.1016/j.jaci.2019.06.025.
analchem.7b04424. 94. Summers KM, Bush SJ, Hume DA. Network analysis of tran-
75. Smith CA, O’Maille G, Want EJ, et al. METLIN: a metabolite scriptomic diversity amongst resident tissue macrophages
mass spectral database. Therapeutic Drug Monitoring. 2005. and dendritic cells in the mouse mononuclear phagocyte
https://doi.org/10.1097/01.ftd.0000179845.53213.39. system. PLoS Biol. 2020. https://doi.org/10.1371/JOURNAL.
76. Haug K, Cochrane K, Nainala VC, et al. MetaboLights: a PBIO.3000859.
resource evolving in response to the needs of its scientific 95. Morgan XC, Kabakchiev B, Waldron L, et al. Associations
community. Nucleic Acids Res. 2020. https://doi.org/10.1093/ between host gene expression, the mucosal microbiome,
nar/gkz1019. and clinical outcome in the pelvic pouch of patients with
77. Bateman A. UniProt: a worldwide hub of protein knowledge. inflammatory bowel disease. Genome Biol. 2015. https://doi.
Nucleic Acids Res. 2019. https://doi.org/10.1093/nar/gky1049. org/10.1186/s13059-015-0637-x.
78. Kanehisa M, Furumichi M, Sato Y, Ishiguro-Watanabe M, 96. Li L, Wang Z, He P, Ma S, Du J, Jiang R. Construction and Anal-
Tanabe M. KEGG: integrating viruses and cellular organisms. ysis of Functional Networks in the Gut Microbiome of Type 2
Nucleic Acids Res. October 2020. https://doi.org/10.1093/nar/ Diabetes Patients. Genomics, Proteomics Bioinforma. 2016.
gkaa970. https://doi.org/10.1016/j.gpb.2016.02.005.
79. Tatusov RL, Koonin EV, Lipman DJ. A genomic perspective on 97. Kamburov A, Cavill R, Ebbels TMD, Herwig R, Keun HC. Inte-
protein families. Science. 1997(80). https://doi.org/10.1126/sci- grated pathway-level analysis of transcriptomics and
ence.278.5338.631 -. metabolomics data with IMPaLA. Bioinformatics. 2011.
80. Finn RD, Bateman A, Clements J, et al. Pfam: the protein fam- https://doi.org/10.1093/bioinformatics/btr499.
ilies database. Nucleic Acids Res. 2014. https://doi.org/10.1093/ 98.  pez F, Dopazo J, Conesa A. Pain-
Garcıa-Alcalde F, Garcıa-Lo
nar/gkt1223. tomics: a web based tool for the joint visualization of tran-
81. Perez-Riverol Y, Csordas A, Bai J, et al. The PRIDE database scriptomics and metabolomics data. Bioinformatics. 2011.
and related tools and resources in 2019: Improving support https://doi.org/10.1093/bioinformatics/btq594.
for quantification data. Nucleic Acids Res. 2019. https://doi. 99. Min EJ, Long Q. Sparse multiple co-Inertia analysis with
org/10.1093/nar/gky1106. application to integrative analysis of multi-Omics data.
82. Wu C, Zhou F, Ren J, Li X, Jiang Y, Ma S. A selective review of BMC Bioinformatics. 2020. https://doi.org/10.1186/s12859-
multi-level omics data integration using variable selection. 020-3455-4.
High-Throughput. 2019. https://doi.org/10.3390/ht8010004. 100. Conesa A, Beck S. Making multi-omics data accessible to
83. Ebbels TMD, Cavill R. Bioinformatic methods in NMR-based researchers. Sci Data. 2019. https://doi.org/10.1038/s41597-
metabolic profiling. Prog Nucl Magn Reson Spectrosc. 2009. 019-0258-4.
https://doi.org/10.1016/j.pnmrs.2009.07.003. 101. Shannon P, Markiel A, Ozier O, et al. Cytoscape: a software
84. Jiang D, Armour CR, Hu C, et al. Microbiome Multi-Omics Environment for integrated models of biomolecular interac-
network analysis: statistical considerations, limitations, and tion networks. Genome Res. 2003. https://doi.org/10.1101/
opportunities. Front Genet. 2019. https://doi.org/10.3389/ gr.1239303.
fgene.2019.00995. 102. Rohn H, Junker A, Hartmann A, et al. VANTED v2: a frame-
85. Subramanian I, Verma S, Kumar S, Jere A, Anamika K. work for systems biology applications. BMC Syst Biol. 2012.
Multi-omics Data Integration, Interpretation, and Its https://doi.org/10.1186/1752-0509-6-139.
Application. Bioinform Biol Insights. 2020. https://doi.org/ 103. Rohart F, Gautier B, Singh A. Le ^ Cao KA. mixOmics: an R
10.1177/1177932219899051. package for ‘omics feature selection and multiple data inte-
86. Eicher T, Kinnebrew G, Patt A, et al. Metabolomics and gration. PLoS Comput Biol. 2017. https://doi.org/10.1371/jour-
multi-omics integration: A survey of computational nal.pcbi.1005752.
12 S E M I N A R S I N P E R I N A T O L O G Y 45 (2021) 151456

104. Meng C, Helm D, Frejno M, Kuster B. MoCluster: identifying 110. Mun ~ oz-Benavent M, Hartkopf F, Van Den Bossche T, et al.
joint patterns across multiple omics data sets. J Proteome Res. gNOMO: a multi-omics pipeline for integrated host and
2016. https://doi.org/10.1021/acs.jproteome.5b00824. microbiome analysis of non-model organisms. NAR Genomics
105. Argelaguet R, Velten B, Arnol D, et al. Multi-Omics factor Bioinforma. 2020. https://doi.org/10.1093/nargab/lqaa058.
analysis—a framework for unsupervised integration of 111. Ghaemi MS, DiGiulio DB, Contrepois K, et al. Multiomics
multi-omics data sets. Mol Syst Biol. 2018. https://doi.org/ modeling of the immunome, transcriptome, microbiome,
10.15252/msb.20178124. proteome and metabolome adaptations during human preg-
106. Luo W, Pant G, Bhavnasi YK, Blanchard SG, Brouwer C. Pathview nancy. Bioinformatics. 2019. https://doi.org/10.1093/bioinfor-
Web: user friendly pathway visualization and data integration. matics/bty537.
Nucleic Acids Res. 2017. https://doi.org/10.1093/nar/gkx372. 112. Bittinger K, Zhao C, Li Y, et al. Bacterial colonization repro-
107. Bauer C, Stec K, Glintschert A, et al. Biominer: Paving the way grams the neonatal gut metabolome. Nat Microbiol. 2020.
for personalized medicine. Cancer Inform. 2015. https://doi. https://doi.org/10.1038/s41564-020-0694-0.
org/10.4137/CIN.S20910. 113. Wopereis H, Oozeer R, Knipping K, Belzer C, Knol J. The first
108. Ou J, Zhu LJ. trackViewer: a Bioconductor package for inter- thousand days - intestinal microbiology of early life: Estab-
active and integrative visualization of multi-omics data. Nat lishing a symbiosis. Pediatr Allergy Immunol. 2014. https://doi.
Methods. 2019. https://doi.org/10.1038/s41592-019-0430-y. org/10.1111/pai.12232.
109. Herna  ndez-de-Diego R, Boix-Chova N, Go  mez-Cabrero D, 114. Zwittink RD, Van Zoeren-Grobben D, Martin R, et al. Meta-
Tegner J, Abugessaisa I, Conesa A. STATegra EMS: an Experi- proteomics reveals functional differences in intestinal
ment Management System for complex next-generation microbiota development of preterm infants. Mol Cell Proteo-
omics experiments. BMC Syst Biol. 2014. https://doi.org/ mics. 2017. https://doi.org/10.1074/mcp.RA117.000102.
10.1186/1752-0509-8-S2-S9.

You might also like