MetaboliticsDB IEEE TCBB-2

1
MetaboliticsDB: A Database of Metabolomics

Analyses
M. Hasan Celik2 , Onurcan Ersen1 , Taj Saleh2 , Alper Dokay2 , and Ali Cakmak1 *
1
Istanbul Technical University, 2 Istanbul Sehir University
*
Corresponding Author: ali.cakmak@itu.edu.tr
Abstract—Web-based metabolomics databases enable researchers to disseminate metabolite concentration datasets measured
under different physiological conditions. In addition, many of these databases offer a number of tools to process (e.g., normalization,
outlier elimination, etc.) and analyze (e.g., clustering, enrichment studies, etc.) users’ metabolomics data sets. Nevertheless, none of
the existing metabolomics databases offer infrastructure and tools to store, manage, compare, and search metabolomics analysis
results. Besides, their pathway-level analysis capabilities are mostly limited to superimposing the measurements onto the pathways of
the measured metabolites. In this paper, we present MetaboliticsDB that features a database of metabolomics analyses and a set of
associated analytics tools. It enables users to store their metabolomics analysis results, and compare them against their own or other
publicly available analysis results to study, for instance, the progression of a disease, the effect of a drug, similarities between
well-known physiological conditions and the currently studied data, etc. Besides, MetaboliticsDB allows querying the metabolomics
analysis results database with flexible criteria, such as, listing all analyses where a certain pathway experiences a major
increase/decrease in activity to help researchers identify conditions sharing a similar metabolic mechanism. Moreover, MetaboliticsDB
offers a genome-scale metabolic network-based analysis tool that significantly extends the capabilities of the existing databases.
Finally, MetaboliticsDB employs AI-based as well as distance-based methods to associate the studied metabolomics data with
diseases stored in its database. To this end, it automatically trains, manages, and updates machine learning models based on the
stored metabolomics analysis data stored in its database for each disease. We demonstrate the use of MetaboliticsDB with a case
study on Hepatocellular Carcinoma. Our results show that MetaboliticsDB provides biologically relevant metabolic network-level
analysis results, disease association with high accuracy, and a scalable architecture supporting hundreds of simultaneous users.
Availability: MetaboliticsDB is available online at http://metabolitics.itu.edu.tr/.
Web interface source codes are available at https://github.com/itu-bioinformatics-database-lab/metabolitics-client.
Web API source codes are available at https://github.com/itu-bioinformatics-database-lab/metabolitics-api.
Source codes of the Metabolitics data analysis algorithm are available at
https://github.com/itu-bioinformatics-database-lab/metabolitics.
Index Terms—Metabolomics, Biological Databases, Personalized Medicine.
1 I NTRODUCTION
M ETABOLOMICS is the study of concentration changes

for a large number of metabolites in cells as well as
extracellular environments (e.g., blood). It provides invalu-
in literature, relatively few of them ( [12], [13], [14], [15],
[16]) have been made available to the researchers over web-
based databases. These works at a high-level fall under two
able insights regarding physiological conditions, as the phe- categories: (i) metabolic data resources and (ii) database-
notype of diseases are often reflected on the metabolome of enabled metabolomics data analysis resources.
an organism. With the recent advancements in experimental Metabolic data resources ( [19], [20], [21], [22], [23], [24],
methods, researchers are now able to measure the amount [25], [26]) have been around for a relatively longer time,
of many metabolites with high accuracy at a relatively and they usually act as data management and dissemination
low cost. The main challenge has been interpreting these hubs for the research community. The maintainers of such
measurements to understand the health and disease states resources compile their data by (i) generating (fully or
of cells, and accordingly, develop new diagnosis, treatment, partially) in their own labs, (ii) collecting from literature
and prognosis methods. Many methods and algorithms (manually or in a semi-automated manner), and (iii) inte-
have been proposed to analyze metabolomics data at dif- grating data from multiple external sources. Such resources
ferent levels of granularity (e.g., [1] [2], [3], [4], [5], [6], [7], usually feature basic browsing, searching, and visualization
[8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18]). Despite capabilities. Among several, the most widely used pathway
the large variety of metabolomics data analysis algorithms resources in this category are KEGG [19], BioCyc [20], Re-
actome [21], etc. Besides, some other data resources focus
on metabolites rather than pathways, such as HMDB [22],
• M.H. Celik, T. Saleh, and A. Dokay were with the Department of
Computer Science, Istanbul Sehr University, Istanbul, Turkey. CheBI [23], PubChem [25], etc.
• O. Ersen and A. Cakmak are with the Department of Computer Database-enabled metabolomics data analysis resources
Engineering, Istanbul Technical University, Istanbul, Turkey. E-mail: ( [12], [13], [14], [15], [16], [27]) usually work on data that is
ali.cakmak@itu.edu.tr Sehir University, Istanbul, Turkey.
imported from one or multiple of the above metabolic data
2
resources. In addition to the basic searching and visualiza- above web tools, several others offer somewhat similar
tion capabilities provided by the metabolic data resources, metabolomics analysis features, but are available only as
this category of tools also allows users to upload their stand-alone desktop applications, such as Pathway Tools
own metabolomics data, and then the analysis results are [18], SIMCA-P+ (Umetrics, Umea, Sweden), etc.
provided to the user in tabular and/or graphical form.
One of the most comprehensive analysis resources in this Even though the above database-enabled metabolomics
category is MetaboAnalyst 5.0 [28]. In addition to basic analysis resources are quite extensive in the number and
statistical significance and discrimination analysis tools at variety of the raw data processing and statistical analysis
the metabolite level, it also features pathway-level analysis features that they offer, we note the following gaps: (i)
in the form of pathway enrichment and topology-based Although some of the existing resources allow users to
assessment. Moreover, MetaboAnalyst 5.0 allows integrated store their raw metabolomics data, none of them enable
analysis of transcriptomics and metabolomics datasets at users to store, manage, compare, and search metabolomics
the pathway level. Metabolomics Workbench [13] is another analysis results. (ii) Their pathway-level analysis capabilities
comprehensive metabolomics data repository and analysis are limited to superimposing measured metabolite changes
resource. It offers a wide array of statistical analysis tools at onto their corresponding pathways. Moreover, all the
the metabolite level, ranging from running ANOVA analysis above resources consider each pathway independently from
to hierarchical clustering analysis. It also allows creation the metabolic network that they belong to. Even though
of various forms of plots on top of user data, such as MetaboAnalyst computes network-level measures, such as
boxplots, volcano plots, bar graphs, etc. However, it does betweenness, centrality, etc., it only uses them for ranking
not provide any pathway-level analysis. MeltDB 2.0 [14] pathways. Hence, the assessment in the above resources is
offers a number of features to annotate the metabolites, limited to only those pathways whose metabolites overlap
detect peaks, eliminate noise, etc. in user-submitted raw with the user-submitted measurements to some extent. Even
data. Once the data is preprocessed, users may perform then, the included pathways are evaluated individually by
statistical significance tests, classification and clustering, and ignoring the production/consumption/regulation relation-
various forms of visualization on their data. One differ- ships among them. (iii) Existing resources do not attempt
entiating feature of MeltDB 2.0 is that it provides built-in to associate metabolomics analysis results with known dis-
support to form project groups, share particular data files eases or physiological conditions.
among group members or with different project groups, and In this paper, we present a novel database-enabled web
create and manage different experiments. MetabolomeEx- resource, MetaboliticsDB, that performs metabolic activity
press [15] offers similar capabilities to MeltDB 2.0 in the analysis of user-provided metabolomics data in a holis-
categories of raw data processing and statistical analysis tic manner considering interconnections between pathways
of the processed data. XCMS Online [16] takes multi-omics with mass-balances preserved. To this end, it employs a
integration one step further than MetaboAnalyst 5.0, and state of the art systems-level algorithm [2] under the hood.
accommodates proteomics (in addition to transcriptomics) MetaboliticsDB stores analysis results in its database, and
along with metabolomics measurements. It also features users may compare current analysis results with previously
other common metabolite- and pathway-level analysis fea- stored analysis results of their own or other publicly avail-
tures (e.g., raw data processing, statistical analysis, pathway able analysis results shared by other users. Furthermore,
enrichment analysis, etc.) similar to the above tools. Caleydo MetaboliticsDB allows users to make a comparison between
[17] allows mapping omics data on pathways and nicely different analysis methods (e.g., Metabolitics vs. pathway
visualizes them so that users can see which pathways are be- enrichment) on the same dataset. Users may also flexibly
ing covered by the uploaded omics data, and to what degree search the stored analysis results to list those where certain
their activities change based on the concentration change pathways experience significant activity increase/decrease.
of metabolites in a metabolomics dataset, or the change in As another novel feature, MetaboliticsDB enables users to
mRNA levels in a gene expression dataset. WebSpecmine associate their metabolomics datasets with diseases based
[29] is a web-based metabolomics data analysis and min- on AI models that it creates and maintains.
ing tool built on the specmine R package. WebSpecmine
enables users to upload datasets to process, visualize, and We evaluate the features of MetaboliticsDB on a real
perform various analyses. WebSpecmine also trains machine metabolomics data set obtained from individuals with Hep-
learning models and predicts classes for future samples. atocellular Carcinoma (HCC). Our results demonstrate that
3Omics [30] is a web-based human metabolomics data anal- MetaboliticsDB provides (i) biologically relevant metabolic
ysis tool that offers visualization and analysis capabilities network-level analysis results along with markings through
on the uploaded datasets with an emphasis on combining a metabolic graph, (ii) disease association to analysis results
different omics data. Workflow4Metabolomics [31] offers with high accuracy, and (iii) a scalable architecture support-
data preprocessing with retention time alignment and peak ing hundreds of simultaneous users. This paper is orga-
extraction, and univariate analysis with nonparametric and nized as follows. The next section summarizes data man-
parametric tests. POMAShiny [32] provides data prepro- agement, metabolomics analysis features, and architecture
cessing with missing value imputation, normalization, and of MetaboliticsDB. Then, we discuss our results from the
outlier detection, univariate analysis with t-test, ANOVA, evaluation of MetaboliticsDB on an HCC dataset as well as
Mann-Whitney U-test, and Kruskal-Wallis test. It also of- the performance and accuracy of different features. Finally,
fers clustering with the k-means algorithm and classifica- we conclude with a discussion on how MetaboliticsDB may
tion with the random forest algorithm. In addition to the be employed in a wider scope with the proposed features.
3
2 M ETHODS Analysis table. The Methods table stores metabolomics anal-

ysis methods available at MetaboliticsDB, namely Metabol-
2.1 Database Model and Data Management
itics, Direct Pathway Mapping, and Pathway Enrichment.
MetaboliticsDB employs various data sources, namely, The Datasets table holds details of uploaded datasets.
a genome-scale reconstructed human metabolic network Metabolomics measurements uploaded with datasets are
dataset, Recon3D [33], a human disease ontology dataset stored in the MetabolomicsData table. Metabolomics anal-
[34], a customized metabolite synonym mapping dataset, ysis results for each sample uploaded with datasets are
and a metabolomics analysis database. To manage its data, stored in the Analysis table. This table is a hybrid table with
MetaboliticsDB employs a hybrid approach that involves two JSON columns that store reaction and pathway-level
the traditional relational model coupled with more recent analysis results. Both JSON columns are of type hashmap
NoSQL implementations. whose key is either pathway or reaction name and the value
is the computed flux score during the analysis. Diseases
2.1.1 Genome Scale Metabolic Network Data known by MetaboliticsDB are stored in the Diseases table.
Machine learning models trained on metabolomics analysis
Recon3D dataset contains 10600 reactions, 5835 metabolites, results are recorded in the DiseaseModels table.
and 106 pathways/subsystems. The metabolic network data
is kept as a single JSON file on the server side, and loaded 2.1.4 Multi-source Metabolite Synonym Mapping
to the client side only once at the beginning of the website
Since most of the metabolites have several synonyms and do
life cycle. It is stored in browser cache, named local-storage.
not have a standard name adopted by all the sources, user-
The original data size on the disk is 5.3 MB. In order to
provided metabolite names first need to be matched to those
increase the storage and network efficiency, the data file is
in MetaboliticsDB’s database. The original Recon3D data
compressed with gzip, and the final data size is reduced to
does not contain synonym information for the metabolites.
877 KB.
MetaboliticsDB utilizes BiGG [35] IDs of metabolites in the
To optimize the query performance, the original Re-
Recon3D dataset for metabolomics data analysis. Increas-
con3D data model is transformed into a carefully designed
ing recognized metabolite synonyms is vital for a more
JSON schema (see Fig. 1). In this schema, at the top level,
comprehensive metabolomics data analysis. MetaboliticDB
there are three main properties, namely, pathways, reac-
enhances metabolite name mapping in two distinct ways:
tions, and metabolites. Each of those properties holds a
First, it combines alternative synonyms for metabolites from
hashmap whose key is the id of the component, and the
datasets such as HMDB [22], KEGG [19], PubChem [25],
value is the object of the component. Each object has a set
and CheBI [23] stores it as a synonym repository. Second, if
of foreign keys referring to the related components such
the local synonym repository fails to match a name, RefMet
as reaction id in pathways, metabolite id in reactions, and
nomenclature [36] is checked for that name through HTTP
reaction id in metabolites. Those relations are replaced with
POST requests. Then, the returned metabolite names from
object references of the related components in the client side
RefMet are appended to the customized synonym mapping
after retrieval of data. Then, all queries are executed at the
dataset of MetaboliticsDB if a RefMet match is found. As a
client side on this data model in the corresponding web
result, MetaboliticsDB is able to map most of the metabolites
interfaces.
from different datasets to the metabolites in our database.
2.1.2 Human Disease Ontology Data

2.2 Metabolomics Data Analysis
Presently, the database contains a large set of diseases along
One of the main functionalities of MetaboliticsDB is the
with their parent diseases (if any) and synonyms. If known
analysis of user-provided metabolomics datasets to facilitate
to them, users may associate their metabolomics datasets
the interpretation of measured changes. MetaboliticsDB of-
with a disease from the database. MetaboliticsDB periodi-
fers three different types of metabolomics analysis methods.
cally updates the list of diseases stored within the relational
In this section, we present the usage and working principles
database.
of the analysis tool.
2.1.3 Metabolomics Analysis Data 2.2.1 Uploading Datasets

MetaboliticsDB employs a relational database to store MetaboliticsDB expects the metabolomics measurements in
users, metabolomics analysis methods, diseases, uploaded the form of fold-change values for each metabolite. The
datasets, metabolomics measurements, metabolomics anal- fold changes may be computed in reference to different
ysis results, and trained machine learning models for dis- baselines, such as average measurements of healthy indi-
eases. Currently, the database contains metabolomics mea- viduals, before-treatment measurements, etc. depending on
surements and the corresponding analysis results for 2174 the nature of the corresponding study. Users may either
individuals associated with 40 distinct diseases. Our rela- manually specify fold-change information for each metabo-
tional database is hosted on an instance of PostgreSQL, lite through the web interface, or they may choose to up-
which offers hybrid NoSQL features such as JSON fields. load them in a file. Fold-change values are computed by
The database schema (see Fig. 2) involves seven tables, MetaboliticsDB in reference to average measurements of
namely, User, Methods, Datasets, MetabolomicsData, Anal- healthy samples if users upload their datasets in the form
ysis, Diseases, and DiseaseModels. The User table stores of MetaboliticsDB’s own custom file format. MetaboliticsDB
user information and has a one-to-many relation with the accepts four different types of input file formats including
4
represent how much the activity of a pathway deviates in a

given individual in comparison to that of healthy/control
people. This allows a personalized evaluation for each indi-
vidual. Negative diff values indicate a decrease in activity,
while positive diff values signify increase in pathway activ-
ity. The computed diff scores are stored in the database as
part of the corresponding analysis run.
2.2.3 Pathway Enrichment Analysis

Another metabolomics data analysis approach offered by
MetaboliticsDB is pathway enrichment analysis. This ap-
proach specifies pathways that are significantly enriched
with the metabolites included in the user uploaded
metabolomics datasets. The hypergeometric distribution is
used to calculate a p-value for each pathway. Metabol-
iticsDB does not strictly apply a significance threshold.
Instead, it reports the computed p-values for all pathways
as a table.
2.2.4 Direct Pathway Mapping Analysis

Direct pathway mapping is a simple linear model that
directly maps the metabolites included in a metabolomics
dataset to the pathways that they participate. Each pathway
is assigned a score that is the sum of the concentrations of
Fig. 1. The customized schema for the MetaboliticsDB Version of Re- metabolites that it contains.
con3D JSON file
2.3 Tabular and Visual Analysis Results

Metabolomics Workbench mwTab, JSON file in the form The analysis results are presented in multiple levels of
of a dictionary where keys are metabolite names and the details in both visual and tabular forms (Fig. 3). A bar plot
values are the corresponding fold changes, CSV file as a (at the top) charts diff values for the top-20 pathways sorted
matrix of samples and metabolites, and MetaboliticsDB’s by the absolute value of their diff scores. Below the bar
own custom file format. Examples of these different input plot, in tabular form, all the pathways are listed with their
formats are provided online in the Documentation section computed diff values. Next to each pathway, two buttons
of the MetaboliticsDB. Once the user data is uploaded, are placed. The left button visualizes the corresponding
MetaboliticsDB employs the above-discussed approaches pathway where each node represents a metabolite, and each
to maximize the mapping of metabolites in the input file edge represents a reaction. Edges are colored and thickened
to the genome-scale metabolic network (Recon3D) that it according to the corresponding reactions’ diff value (see Fig.
maintains in its database. Once the mapping is completed, 6 for an example). This form of visualization allows the user
the final list of metabolites is presented to the user along to quickly inspect what parts of the pathway experience the
with the percentage of the mapped metabolites as well as most change. The right button, on the other hand, lists all the
the list of unmapped metabolites. The user may choose to pathway reactions and their diff values in tabular form. Each
remove any of the entries before the analysis takes place. analysis result set is stored in the database, and the user may
access past analysis results from through user account menu
2.2.2 Metabolic Flux Differentiation Analysis
(visible after signing in) or ”Browse Analysis Results” menu
The analysis feature of MetaboliticsDB dynamically builds item. The user may mark an analysis result as ”public” to
a linear programming model [37] on the whole metabolic make it available to all other users, or keep it ”private” to
network based on user-provided measurements. To achieve their own access (default setting).
this, it integrates an algorithm that we have recently devel-
oped [2]. More specifically, the total production flux of each
measured metabolite is added as a term into the objective 2.4 Similarity-based Disease Association
function of the constructed linear model. The fold change Another novel feature of the MetaboliticsDB is that for each
of each metabolite is registered to the objective function metabolomics analysis result, the closest set of diseases
as the coefficient of the corresponding term. Then, on the and physiological conditions are computed based on the
constructed model, flux variability analysis (FVA) [38] is similarity of the pathway-level metabolic behavior . More
performed to determine the upper and lower flux limits for specifically, MetaboliticsDB stores metabolomics analysis re-
each reaction in the metabolic network. Next, a metabolic sults of a number of diseases (pre-computed and recorded in
flux diffentiation score is computed for each reaction based its database). Then, for each user submitted analysis run, the
on how its flux boundaries differ from that healthy/control user is presented the top 5 diseases (e.g., diabetes) that have
samples. For each pathway, the mean of its reactions’ diff most similarity to the currently analyzed metabolomics data
score is set as the pathway’s diff score. Pathway ”diff” scores in terms of the activity level distribution over the metabolic
5
Fig. 2. Relational database schema of MetaboliticsDB
pathways (see Fig. 3). In order to quantify the similarity, 0 and 1 is provided, which reflects the probability that the
MetaboliticsDB turns both the current analysis results and underlying AI model assigns to its diseases prediction for
disease analysis results stored in the database into vectors that individual.
of numbers where each number represents the diff value
for a particular pathway. Then, Pearson correlation is com- 2.6 Comparison of Analysis Results
puted between each disease vector and the current analysis
result vector. Finally, the diseases in the database are sorted MetaboliticsDB allows users to compare and contrast their
according to their correlation values, and top 5 of them are metabolomics data analysis results to (i) their previous anal-
presented to the user. ysis results and (ii) other users’ publicly available analysis
results stored in its database. From the analysis results page,
2.5 Machine Learning-based Disease Prediction the user may choose any number of analysis results that
they are interested in comparing by clicking on the checkbox
Alternative to similarity-based disease association, Metabol- next to each study. Then, clicking on ”compare” button
iticsDB also offers machine learning-based disease status at the top leads to the comparison page. The comparison
prediction for each individual. To this end, MetaboliticsDB interface features a heatmap at the top where the rows
trains machine learning models based on previously stored represent pathways with the highest variance in terms of
metabolomics data analysis results for each disease peri- their diff values among the compared analysis results, and
odically. More specifically, for each disease four different the columns represent the compared analysis results (see
types of models based on Logistic Regression, Random Fig. 4). Each cell in the heatmap is colored according to
Forest, Support Vector Machines, and XGBoost are trained. the corresponding pathway diff values. The bottom part of
10-fold cross validation is used to tune the parameters the comparison interface includes a table that lists all the
of each model and evaluate its classification performance pathways with their computed diff values for each selected
based on f1-scores. Then, the best performing model is study (similar to the bottom part of Fig. 3).
chosen and stored in the database as binary files with the
pickle Python package. The models employ the computed
reaction metabolic differentiation scores as features. Then, 2.7 Advanced Search Interface
the reaction diff values of future metabolomics data analysis As different from similar tools, as a novel feature, Metabol-
results are exploited to predict potential diseases associated iticsDB allows to search the metabolomics analysis results
with an individual. Next to each disease a value between in the database in terms of the metabolic activity changes
6
Fig. 4. MetaboliticsDB Comparison interface featuring a heatmap that

compares three common cancer types: Lung, Breast, and Hepatocellu-
lar
any search result entry takes the user to the details and
visualizations of the clicked entity.
Users may also choose to browse the database pathway
by pathway. The browsing page lists all the pathways in the
database on the left, and clicking on each pathway displays
the reactions in the pathway and a graphical visualization
that shows a network view of the pathway (see Fig. 6 for
an example). The pathway visualizations in MetaboliticsDB
are drawn by using the Escher library [39] and saved in the
database.
Fig. 3. Analysis results page: top-20 pathways with the highest absolute
diff values. 2.9 Architecture
The architecture of MetaboliticsDB (Fig. 5) is carefully de-
of involved pathways. For instance, users may search for signed to meet demanding computational and data man-
metabolomics analysis results where Urea Cycle experi- agement requirements of various features in an efficient and
ences a decreased activity, whereas Fatty Acid Synthesis flexible manner. All frontend interfaces are implemented in
experiences increased activity. Optionally, the user may Angular, which is a Javascript framework that allows the
also specify the magnitude of increase and decrease. They development of sophisticated single page web applications.
cana flexibly add and remove pathways from consideration Most of the rendering and application logic is developed
leading to multiple conditions which are connected via SQL at client side to enhance the performance and user expe-
AND semantics. rience. The relational database stores analysis results and
user accounts, and it is hosted by a Postgres instance. The
frontend communicates with the database via the RESTful
2.8 Basic Searching, Browsing, and Other Visualiza- API that is developed in Flask, a micro web framework
tion Features in Python. One advantage of adopting RESTful API imple-
Similar to many other metabolic databases, Metaboli- mentation is that it offers programmatic API access to other
ticsDB includes a search interface to locate metabolites, researchers who may want to programmatically utilize the
reactions, and pathways that they are interested in. The analysis algorithms and services of MetaboliticsDB in their
search interface features an auto-complete feature (similar to project implementations. This feature provides another in-
Google Search) that automatically suggests names from the terface to MetaboliticsDB for users with programming skills.
database as user types in their search terms. The suggestions Documentation for MetaboliticsDB’s RESTful APIs inter-
are not a flat list of mixed names, but are categorized faces is available at http://metabolitics.itu.edu.tr/api/spec
into different groups based on the matching entity types in OPENAPI specification. In addition, MetaboliticsDB have
(i.e., metabolites, reactions, etc.). The search results are also a sophisticated system to manage the analysis requests,
presented in a similar manner where the matching items as each analysis task is computationally expensive, which
are categorized based on their entity types. Clicking on cannot be handled in the regular life cycle of HTTP requests.
7
More specifically, MetaboliticsDB employs Celery which is

a distributed task queue, and is used with Redis database to
store information about the tasks. Celery workers subscribe
to Redis and perform metabolomics analysis when a new
analysis is submitted to Redis.
Scalability is a major aspect that is greatly considered in
the design of MetaboliticsDB’s architecture. All components
of the architecture are dockerized. Moreover, each celery
worker can be distributed to multiple instances. Lastly, the
choice of Angular in frontend significantly reduces the load
on servers, since most of the application logic runs at client
side.
Fig. 6. Change in citric acid cycle
reported to affect the activities of Torasemide with the in-

creased recovery of Torasemide inspected in urine [42]. This
may explain the decreased activity of Torasemide activation
in HCC patients.
Fatty acid synthesis pathway has the second largest diff
value in the positive direction. Hepatocellular tumorigen-
esis has been reported to increase with abnormal activity
Fig. 5. Architecture overview diagram of the project.
in Fatty acid synthesis, and treatments inhibiting Fatty acid
synthase enzyme might be utilized in HCC therapies [43].
Increased activity of Fatty acid synthase C180 reaction in
3 R ESULTS the Fatty acid synthesis pathway observed in HCC analysis
In this section, we evaluate MetaboliticsDB in different as- results supports this observation.
pects, and demonstrate that it provides biologically relevant Nucleotide metabolism has the third largest diff value
insights via a use case study on Hepatocellular Carcinoma. in the positive direction. To keep up with the fast pace of
cell proliferation during tumorigenesis, increasing de novo
nucleotide synthesis is essential for large-scale RNA pro-
3.1 A Case Study on Hepatocellular Carcinoma duction and DNA replication [44]. In addition, Nucleotide
In order to demonstrate the capabilities of MetaboliticsDB, interconversion has a substantial diff value in the positive
we perform a case study on a Hepatocellular Carcinoma direction. This observation is plausible, as the essential
(HCC) dataset [40] which contains metabolomics mea- building blocks of Nucleotide metabolism are produced by
surements obtained from 177 individuals (71 healthy and this pathway with a transformation of (d) NMP ↔ (d) NDP
106 patients). Metabolomics measurements are converted ↔ (d) NTP.
into fold changes based on the average measurement Another pathway with a large diff value in the positive
of healthy samples during analysis. We upload this file direction is Hippurate metabolism. Reduced amounts of
through MetaboliticsDB’s analysis interface and submit it Hippurate are quantified in HCC patients due to decreased
for analysis after choosing ’Metabolitics’ as the analysis Benzoate binding proficiency [45]. Even though this is not
method. On the analysis result page, MetaboliticsDB pro- fully aligned with our observation, the elevated quantity
vides pathways with the highest absolute diff values based of Hippurate is also related to fruit and whole grains con-
on the submitted measurements in a bar chart as well sumption. Therefore, nutrition content of individuals during
as in tabular form ( available online on MetaboliticsDB period of sample collection may be one factor regarding this
at http://metabolitics.itu.edu.tr/past-analysis/1761). For observation. [46].
brevity, here, we discuss the relevance of the top 10 path- Heme degradation is another pathway with a large diff
ways with the highest absolute diff scores from this list in value in the positive direction. The enzyme that catalyzes
reference to literature to illustrate the effectiveness of the heme degradation, Heme oxygenase 1 has been reported to
MetaboliticsDB analysis tool. be related to cancer progression [47]. Inhibiting the activity
Protein formation pathway has the largest absolute diff of Heme oxygenase 1 has been claimed to decrease HCC
value in the negative direction. PROTEIN BS reaction in the progression [48].
Protein formation pathway yields Torasemide-M3 metabo- Another pathway with a large diff value in the positive
lite. Activated metabolite Torasemide-M3 is formed from direction is Vitamin B6 metabolism. The amount of Vitamin
the oxidation of Torasemide [41]. Liver disease has been B6 compounds present in cancer cases has been reported
8
to be less than that in control cases, and the activation disease. Presently, there are 40 distinct diseases stored in
levels of Pyridoxal kinase enzyme have been reported to MetaboliticsDB. Hence, the ground-truth clustering includes
help disease progression [49]. Increased activity of Pyridoxal 40 clusters. To compare the clusterings, we employ two
kinase reaction in the Vitamin B6 metabolism seen in HCC intuitive comparison metrics, i.e., homogeneity and com-
analysis results supports these statements. pleteness [55]. Briefly, homogeneity checks if each cluster
Another pathway with a large diff value in the positive contains patients with the same disease, and completeness
direction is Limonene and pinene degradation. Limonene deals with whether all patients with the same disease are
has been reported to inhibit the progression of HCC by assigned to the same cluster. Both measures may have
suppressing cell proliferation [50]. Pinene also has been a value between 0 and 1, where 1 represents best score,
reported to inhibit cancer cell development in vitro and in and 0 represents the worst score. In our evaluation, both
vivo [51]. Decreased levels of Limonene and Pinene due homogeneity and completeness are measured as 0.94, which
to activities of Limonene and pinene degradation might indicates that cluster assignments are mostly accurate.
contribute to HCC progression.
3.2.2 Machine Learning-based Disease Association
Cytochrome metabolism is another pathway with a large
diff value in the positive direction. Intrinsic clearance val- In this section, we present the prediction performance of the
ues indicating activity levels show an activity growth for machine learning models that MetaboliticsDB creates and
CYP2E1, CYP2D6, and CYP2C9 cytochrome P450 types in manages in its database. We employ k-fold cross validation
HCC samples [52]. Increased activity of Cytochrome P450 (with k = 10 or k = 5 depending on sample size) to test the
2E1, Cytochrome P450 2C9, and Cytochrome P450 2D6 classification accuracy. Table 2 summarizes precision, recall,
reactions in the Cytochrome metabolism pathway seen in and F1 scores for the disease prediction models stored in the
HCC analysis results supports these observations. database. The F1 score was calculated by the harmonic mean
Another pathway with a large diff value in the posi- of precision and recall values. The results show that healthy
tive direction is Thiamine metabolism. The activity levels and tumor samples are classified with high accuracy.
of enzymes that rely on Thiamine have been reported to Disease Precision Recall F1 Alg. K
increase in cancer cases [53]. Increased activity levels of Hepatocellular Carcinoma 0.89 0.91 0.90 LR 10
Thiamine diphosphokinase, Thiamine diphosphate kinase, Colon Carcinoma 0.96 0.99 0.98 RF 5
Breast Cancer 0.88 0.94 0.91 LR 10
and Thiamine-triphosphatase reactions in the Thiamine Stomach Cancer 0.94 0.99 0.96 LR 10
metabolism pathway seen in HCC analysis results support Ovarian Cancer 0.94 0.92 0.91 RF 10
these findings. Crohn’s Disease 0.81 0.91 0.84 RF 10
Finally, Fructose and mannose metabolism is the last Asthma 0.99 0.99 0.99 RF 10
Rheumatoid Arthritis 0.89 0.85 0.85 RF 10
among the top 10 pathways with a large diff score in the Steatotic Liver Disease 0.82 0.95 0.87 RF 10
positive direction. The development of HCC is increased Type 2 Diabetes Mellitus 0.86 0.96 0.91 LR 10
with diets rich in fructose since it enhances activity levels of Wilson Disease 0.80 0.90 0.83 RF 5
Adult Respiratory Distress
the lipogenic pathway and lipid accumulation [54]. Syndrome
0.88 1.00 0.93 LR 5
The above brief discussion illustrates that Metaboli- Androgenic Alopecia 0.81 0.94 0.86 LR 10
ticsDB is useful and effective in analyzing metabolomics Ankylosing Spondylitis 0.83 1.00 0.89 RF 5
datasets with insights on the underlying metabolic mech- Autistic Disorder 0.75 0.88 0.79 RF 5
Chronic Fatigue Syndrome 0.73 0.74 0.72 LR 10
anisms. Cystic Fibrosis 0.75 1.00 0.83 RF 5
Intermediate Coronary
0.77 1.00 0.85 RF 5
Syndrome
3.2 Disease Association Evaluation Peanut Allergy 0.83 0.92 0.83 RF 10
Placental Abruption 0.88 1.00 0.92 RF 5
In this section, we evaluate the disease association feature
Pre-eclampsia 0.75 0.88 0.75 RF 5
of MetaboliticsDB in two distinct dimensions. Sarcoidosis 0.81 0.86 0.78 SVM 10
Schizophrenia 0.90 1.00 0.95 SVM 10
3.2.1 Similarity-based Disease Association TABLE 1
Average results of k-fold cross validation
MetaboliticsDB reports diseases and physiological condi-
tions that are most similar to the analyzed metabolomics
dataset based on the correlation between the current We further evaluate the prediction performance of the
analysis results and the previously computed disease models by relaxing the true positive definition slightly. In
metabolomics data analysis results. In order to evaluate particular, since MetaboliticsDB provides a list of possible
the relevancy of the results provided by the proposed associated diseases sorted by their predicted likelihoods, we
scheme, we cluster all diseases in the database using the consider a disease association as true positive if the true
same proposed vector representation and similarity mea- disease appears in top 3 suggested diseases for patients,
sure (i.e., using agglomerative clustering with similarity and it does not appear at all for healthy individuals. The
measure: pearson correlation, linkage: complete). Then, we prediction is said to be accurate if the disease is listed in
compare the resulting disease clustering to a ”ground-truth top-3 predictions for patients or the disease isn’t listed in
clustering”. The ground-truth clustering that we employ predictions for healthy samples. The number of samples
in this evaluation includes one cluster per distinct disease. predicted accurately divided by all samples is given in the
That is, patient characteristics such as gender and age are precision column. The number of accurate predictions for
ignored, metabolomics samples are assigned to the same disease samples divided by all disease samples is given in
cluster as long as the corresponding patients have the same the recall column.
9
Disease Precision Recall F1
Hepatocellular Carcinoma 0.89 0.84 0.87
Moreover, in order to test the effect of the metabolomics data
Colon Carcinoma 1.00 1.00 1.00 size, random metabolites are selected from each network
Breast Cancer 0.81 0.75 0.78 and random fold-change values are assigned to them. The
Stomach Cancer 0.98 0.98 0.98 number of metabolite measurements included in the tests
Ovarian Cancer 0.92 0.88 0.90
Crohn’s Disease 0.88 0.76 0.82 varied between 5 and 150 (incremented by 5 leading to 30
Asthma 0.99 0.99 0.99 different metabolomics data sets). Then, the analysis is run
Rheumatoid Arthritis 0.88 0.76 0.82 with those artificial metabolomics data sets on each evalu-
Steatotic Liver Disease 0.91 0.84 0.88
Type 2 Diabetes Mellitus 0.90 0.87 0.88
ated metabolic network. Fig. 7 charts the average analysis
Wilson Disease 0.92 0.83 0.87 running time (in seconds) over all metabolomics data sets
Adult Respiratory Distress Syndrome 0.91 0.90 0.91 for each metabolic network.
Androgenic Alopecia 0.88 0.77 0.82
Ankylosing Spondylitis 0.75 0.50 0.60
Autistic Disorder 0.92 0.83 0.87
Chronic Fatigue Syndrome 0.80 0.63 0.71
Cystic Fibrosis 0.89 0.67 0.76
Intermediate Coronary Syndrome 1.00 1.00 1.00
Peanut Allergy 0.91 0.82 0.86
Placental Abruption 0.92 0.83 0.87
Pre-eclampsia 1.00 1.00 1.00
Sarcoidosis 0.75 0.69 0.72
Schizophrenia 0.93 0.99 0.96
TABLE 2
Disease prediction results
3.3 Responsiveness Evaluation

In this section, we evaluate the responsiveness of Metaboli-
ticsDB with varying numbers of simultaneous users. Each
user is assumed to send one request per second, and in Fig. 7. Running time of MetaboliticsDB analysis feature on different
total, 100 requests in their browsing life cycle. Table 3 reports networks
the average response rate in seconds, and the percentage of
response success rate. For this simulation, an open source
load testing Python tool, called Locust, is used. The server
4 C OMPARISON
hosting MetaboliticsDB during these tests has the following
configuration: DELL R720 with 2 x XEON E5-2620v2 2.10 In this section, we compare MetaboliticsDB to some of
GHz CPU and 80 GB RAM running Linux Ubuntu. the well known tools in the field in terms of different as-
pects. In particular, the comparison includes MetaboAnalyst
Number of Users Average Response Time (sec) Success Rate (%) 5.0 [28], Metabolomics Workbench [13], MeltDB 2.0 [14],
10 0.28 100
100 0.36 100
MetabolomeExpress [15], XCMS Online [16], Caleydo [17],
1000 1.51 100 WebSpecmine [29], 3Omics [30], Workflow4Metabolomics
TABLE 3 [31], and POMAShiny [32]. Table 5 summarizes the consid-
Responsiveness evaluation results ered aspects for comparison.
MetaboliticsDB offers metabolite name mapping, fold
change scaling, and reaction diff conversion in terms of
data preprocessing. Data filtration, normalization, name
3.4 Load Test mapping of metabolites, and alignment and detection of
In this section, additional performance tests are performed peaks are some of the data preprocessing steps supported
for the MetaboliticsDB analysis feature. The running time by MetaboAnalyst 5.0. Normalization and scaling are also
of MetaboliticsDB’s analysis feature depends on two fac- available in Metabolomics Workbench for data preprocess-
tors: the size of the underlying metabolic network and the ing. Alignment and detection of peaks on raw data are also
number of metabolites in the input metabolomics data. As offered by MeltDB 2.0 and MetabolomeExpress. Data filtra-
part of this evaluation, we run MetaboliticsDB’s analysis tion is available on XCMS Online and Caleydo, however,
feature on several metabolic networks of different sizes data normalization is not supported. Data is processed with
from various organisms (obtained from BIGG [24]) (Table 4). the alignment and detection of peaks steps during data up-
load by WebSpecmine, also other steps such as normaliza-
tion and scaling are available. Normalization is supported
Metabolic Net. BIGG Id Num. of Reactions Num. of Metabolites
e coli core 95 72 by Workflow4Metabolomics whereas data filtration isn’t
iAB RBC 283 342 469 available. Detection and cleaning of outliers are available
iRC1080 1706 2191 on POMAShiny contrary to Workflow4Metabolomics and
RECON1 3742 2766
RECON2 7785 5324
MetaboAnalyst 5.0.
TABLE 4 Fold-change analysis is the univariate analysis method
Metabolic network size available in MetaboliticsDB. Volcano plots, t-tests, and fold-
change analysis are among the univariate analysis methods
10
Genome-
Metabolic
Scale Python/R Advanced
Data Pre- Univariate Classi- Enrichment Pathway Disease Analysis AI Model Flux
Metabolic Package Analysis
processing Analysis fication Analysis Analysis Prediction Comparison Management Change
Network Availability Search
Prediction
Support
MetaboliticsDB 3 3 3 3 3 3 3 3 3 3 3 3
Metabo-
3 3 3 3 3 3 3 3 7 3 7 7
Analyst 5.0
Metabolomics
3 3 3 3 7 3 3 7 3 7 3 7
Workbench
MeltDB 2.0 3 3 3 3 3 3 7 7 3 7 7 7
Metabolome-
3 3 7 7 3 7 7 7 3 7 3 7
Express
XCMS Online 3 3 7 3 3 3 3 7 3 7 3 7
Caleydo 3 7 7 7 3 3 7 7 7 7 7 7
WebSpecmine 3 3 3 7 3 7 3 3 3 3 7 7
3Omics 7 7 7 3 3 3 7 7 3 7 7 7
Workflow4-
3 3 7 7 7 7 7 7 7 7 7 7
Metabolomics
POMAShiny 3 3 3 7 7 3 3 3 7 3 7 7
TABLE 5
Comparison of MetaboliticsDB with existing tools
offered by MetaboAnalyst 5.0 and MeltDB 2.0. Volcano MetabolomeExpress, XCMSOnline, WebSpecmine, and
plots and ANOVA analysis are available in Metabolomics 3Omics enable users to compare analysis results. Advanced
Workbench. t-tests and fold-change analysis are provided analysis search interface is available in MetaboliticsDB,
by MetabolomeExpress, but volcano plots are not sup- Metabolomics Workbench, MetabolomeExpress, and XCMS
ported. Fold-change analysis is also offered by XCMS On- Online. Metabolic flux change prediction is only available in
line. ANOVA analysis is also accessible along with t-tests MetaboliticsDB.
and fold change analysis in WebSpecmine. Non-parametric
and parametric tests and ANOVA analysis are provided by
5 D ISCUSSION
Workflow4Metabolomics and POMAShiny.
MetaboliticsDB offers automatically managed machine MetaboliticsDB is novel in a number of aspects in compar-
learning classification models of type Logistic Regression, ison to the existing similar works. In particular, it offers
Support Vector Machines, Random Forest, and XGBoost. a data management platform for metabolomics analysis
MetaboAnalyst 5.0 provides Support Vector Machine, Ran- results along with a set of associated web-based tools that
dom Forest, and Partial Least Squares Discriminant Analysis allow user to effectively query, visualize, and study the
classification methods. Random Forest and Orthogonal Par- analysis results at network level. Considering metabolomics
tial Least Squares Discriminant Analysis classification meth- analysis results as the main object, and designing tools
ods are provided by Metabolomics Workbench. Support around it facilitates the offering of, in particular, three useful
Vector Machine and Random Forest Classification methods features:
are also supported by MeltDB 2.0. Linear Discriminant (i) With the comparison feature, MetaboliticsDB enables
Analysis and Support Vector Machine methods are provided researchers to compare their datasets to the known dis-
by WebSpecmine. Random Forest algorithm is also available eases or other users’ public analysis results representing
in POMAShiny for classification. different physiological conditions. In particular, the compar-
ison feature allows researchers to make connection between
Both enrichment analysis and pathway analysis are
seemingly different conditions, and have some insights
available in MetaboliticsDB. Enrichment analysis is sup-
about what kind of condition the current metabolomics
ported by MetaboAnalyst 5.0, Metabolomics Workbench,
data set may belong to. In the former case, the recognition
MeltDB 2.0, XCMS Online, and 3Omics. MetaboAnalyst 5.0,
of common mechanisms between two different conditions
MeltDB 2.0, MetabolomeExpress, XCMS Online, Caleydo,
may pave the way for sharing known/existing therapies
WebSpecmine, and 3Omics provide Pathway Analysis.
designed for each condition. In the latter case, it may pro-
MetaboliticsDB, MetaboAnalyst 5.0, Metabolomics vide researchers with pointers on where to look at in the
Workbench, MeltDB 2.0, XCMS Online, Caleydo, 3Omics, existing knowledge while interpreting a new and possibly
and POMAShiny support genome-scale metabolic complex case. Besides, the comparison interface can be
networks. Python or R packages are available for utilized to understand the differences between sub-types of
MetaboliticsDB, MetaboAnalyst 5.0, Metabolomics a disease, progression of disease stages, and the effect of
Workbench, XCMS Online, WebSpecmine, and POMAShiny. possible drugs through before and after comparison. Finally,
MetaboliticsDB, MetaboAnalyst 5.0, WebSpecmine, and existing or possible common patterns across the same class
POMAShiny train machine learning models and use these of diseases may be observed (e.g., Fig. 4 compares different
models for sample prediction. MetaboliticsDB periodically cancers). For instance, Warburg Effect in different types of
trains and stores predictive models on analysis results for cancers may be studied in depth.
disease prediction. (ii) With the disease and physiological condition as-
MetaboliticsDB, Metabolomics Workbench, MeltDB 2.0, sociation tools, MetaboliticsDB may help clinicians to get
11
one step closer to the ”personalized medicine” goal. More ACKNOWLEDGEMENTS

specifically, MetaboliticsDB may help to narrow down the We would like to thank Beyza Turk for contributing some of
number of possible conditions considered for a particular the visualizations stored in MetaboliticsDB.
patient by her medical care provider.
(iii) Advanced search interface on metabolic analysis
results may help discovering commonly repeating patterns 7 D ECLARATIONS
of metabolic fluctuations across different conditions in terms 7.1 Ethics approval and consent to participate
of pathway activity changes. Not applicable.
MetaboliticsDB integrates a powerful analysis interface,
which is central to its functioning. As illustrated with a
case study on HCC, it helps researchers to effectively make 7.2 Consent for publication
biologically relevant interpretations of their metabolomics Not applicable.
data in a holistic manner over all the metabolic network.
This way, users are not anymore limited to biomarkers that 7.3 Availability of data and materials
differentiate a physiological condition from others; they may
now drill down into metabolic mechanism differences, even MetaboliticsDB is available online at http://metabolitics.
on parts of the metabolism where metabolite measurements itu.edu.tr/.
are not available in the data, owing to its state of the art Web interface source codes are available at https://github.
personolized metabolic analysis algorithm. com/itu-bioinformatics-database-lab/metabolitics-client.
Web API source codes are available at https://github.com/
Furthermore, MetaboliticsDB may be useful for drug
itu-bioinformatics-database-lab/metabolitics-api.
design research. Based on the highlighted changes on the
Source codes of the Metabolitics data analysis
metabolic networks, MetaboliticsDB may provide pointers
algorithm are available at https://github.com/
and ideas for possible drug targets. Similarly, it may explain
itu-bioinformatics-database-lab/metabolitics.
why a certain drug works or does not work through the
visualization of the relevant parts of the pathways.
Last but not the least, MetaboliticsDB is developed in a
way which is easy to generalize. In near future, Metaboli- 7.4 Competing interests
ticsDB will support multi-omics analysis with incorporation None.
of gene expression and proteomics measurement interpre-
tation with respect to changes on the metabolism using
7.5 Funding
the same underlying database and the associated analysis
tools with minimal development requirement. Given the This work was in part supported by the Scientific and Tech-
enormous amount of publicly available gene expression nological Research Council of Turkey (TÜBİTAK) [Grant
and proteomics datasets, the impact of MetaboliticsDB will Number: 114E115] and the National Center for High Perfor-
multiply with this extension. Finally, combining multi-omics mance Compututing (UHEM) [Grant Number: 1009742021].
data sets that belong to the same patient by utilizing the
MetaboliticsDB analysis interfaces is highly likely to provide 7.6 Authors’ contributions
invaluable insights into different physiological conditions.
MHC implemented the metabolomics analysis features. OE
worked on associating datasets with diseases. AD worked
on the pathway mapping method. TS implemented the data
ingestion features. AC conceived the study. All authors
6 C ONCLUSION contributed to the writing of the manuscript.
In this paper, we present MetaboliticsDB which incorporates

a novel pathway-level metabolomics data analysis results R EFERENCES
database, and a set of powerful associated tools running on [1] A. J. Carroll, R. M. Salek, M. Arita, J. Kopka, and D. Walther,
this database. In particular, MetabolomicsDB allows users “Metabolome informatics and statistics: current state and emerg-
ing trends,” Frontiers in bioengineering and biotechnology, vol. 4,
to analyze their metabolomics datasets with three different 2016.
methods, store them in their private user area or share [2] A. Cakmak and M. H. Celik, “Personalized metabolic analysis
with other users, compare them with known diseases in of diseases,” IEEE/ACM Transactions on Computational Biology and
Bioinformatics, vol. 18, no. 3, pp. 1014–1025, 2021.
terms of the underlying metabolic mechanisms, visualize
[3] J. Wang, D. Duncan, Z. Shi, and B. Zhang, “Web-based gene set
the changes on the metabolic network, perform basic and analysis toolkit (webgestalt): update 2013,” Nucleic acids research,
advanced search on metabolomics analysis results, and as- vol. 41, no. W1, pp. W77–W83, 2013.
sociate their datasets with different diseases (if any). We [4] A. Subramanian, P. Tamayo, V. K. Mootha, S. Mukherjee, B. L.
Ebert, M. A. Gillette, A. Paulovich, S. L. Pomeroy, T. R. Golub, E. S.
evaluate the effectiveness of the MetaboliticsDB’s analysis Lander et al., “Gene set enrichment analysis: a knowledge-based
features on a real data set obtained from HCC patients, approach for interpreting genome-wide expression profiles,” Pro-
and show that (i) it provides biologically relevant results as ceedings of the National Academy of Sciences, vol. 102, no. 43, pp.
supported by the existing literature, (ii) disease associations 15 545–15 550, 2005.
[5] Y. Drier, M. Sheffer, and E. Domany, “Pathway-based personalized
have high homogeneity and completeness scores, and (iii) it analysis of cancer,” Proceedings of the National Academy of Sciences,
can scale to large numbers of simultaneous users. vol. 110, no. 16, pp. 6388–6393, 2013.
12
[6] E. Lee, H.-Y. Chuang, J.-W. Kim, T. Ideker, and D. Lee, “Inferring models,” Nucleic acids research, vol. 44, no. D1, pp. D515–D522,
pathway activity toward precise disease classification,” PLoS com- 2015.
putational biology, vol. 4, no. 11, p. e1000217, 2008. [25] S. Kim, P. A. Thiessen, E. E. Bolton, J. Chen, G. Fu, A. Gindulyte,
[7] P. Khatri, S. Sellamuthu, P. Malhotra, K. Amin, A. Done, and L. Han, J. He, S. He, B. A. Shoemaker et al., “Pubchem substance
S. Draghici, “Recent additions and improvements to the onto- and compound databases,” Nucleic acids research, vol. 44, no. D1,
tools,” Nucleic Acids Research, vol. 33, no. suppl 2, pp. W762–W765, pp. D1202–D1213, 2015.
2005. [26] B. Elliott, M. Kirac, A. Cakmak, G. Yavas, S. Mayes, E. Cheng,
[8] S. Draghici, P. Khatri, A. L. Tarca, K. Amin, A. Done, C. Voichita, Y. Wang, C. Gupta, G. Ozsoyoglu, and Z. Meral Ozsoyoglu, “Path-
C. Georgescu, and R. Romero, “A systems biology approach for case: pathways database system,” Bioinformatics, vol. 24, no. 21, pp.
pathway level analysis,” Genome research, vol. 17, no. 10, pp. 1537– 2526–2533, 2008.
1545, 2007. [27] A. E. Cicek, X. Qi, A. Cakmak, S. R. Johnson, X. Han, S. Alshalwi,
[9] A. L. Tarca, S. Draghici, P. Khatri, S. S. Hassan, P. Mittal, J.-s. Kim, Z. M. Ozsoyoglu, and G. Ozsoyoglu, “An online system for
C. J. Kim, J. P. Kusanovic, and R. Romero, “A novel signaling metabolic network analysis,” Database, vol. 2014, p. bau091, 2014.
pathway impact analysis,” Bioinformatics, vol. 25, no. 1, pp. 75–82, [28] Z. Pang, J. Chong, G. Zhou, D. A. de Lima Morais, L. Chang,
2008. M. Barrette, C. Gauthier, P.-É. Jacques, S. Li, and J. Xia, “Metabo-
[10] C. J. Vaske, S. C. Benz, J. Z. Sanborn, D. Earl, C. Szeto, J. Zhu, analyst 5.0: narrowing the gap between raw spectra and functional
D. Haussler, and J. M. Stuart, “Inference of patient-specific path- insights,” Nucleic acids research, vol. 49, no. W1, pp. W388–W396,
way activities from multi-dimensional cancer genomics data using 2021.
paradigm,” Bioinformatics, vol. 26, no. 12, pp. i237–i245, 2010. [29] S. Cardoso, T. Afonso, M. Maraschin, and M. Rocha,
[11] L. M. Heiser, A. Sadanandam, W.-L. Kuo, S. C. Benz, T. C. Gold- “Webspecmine: A website for metabolomics data analysis and
stein, S. Ng, W. J. Gibb, N. J. Wang, S. Ziyad, F. Tong et al., “Subtype mining,” Metabolites, vol. 9, no. 10, 2019. [Online]. Available:
and pathway specific responses to anticancer compounds in breast https://www.mdpi.com/2218-1989/9/10/237
cancer,” Proceedings of the National Academy of Sciences, vol. 109, [30] T.-C. Kuo, T.-F. Tian, and Y. J. Tseng, “3omics: a web-based systems
no. 8, pp. 2724–2729, 2012. biology tool for analysis, integration and visualization of human
[12] J. Xia, I. V. Sinelnikov, B. Han, and D. S. Wishart, “Metaboana- transcriptomic, proteomic and metabolomic data,” BMC systems
lyst 3.0—making metabolomics more meaningful,” Nucleic acids biology, vol. 7, pp. 1–15, 2013.
research, vol. 43, no. W1, pp. W251–W257, 2015. [31] F. Giacomoni, G. Le Corguille, M. Monsoor, M. Landi, P. Pericard,
[13] M. Sud, E. Fahy, D. Cotter, K. Azam, I. Vadivelu, C. Burant, M. Pétéra, C. Duperier, M. Tremblay-Franco, J.-F. Martin, D. Jacob
A. Edison, O. Fiehn, R. Higashi, K. S. Nair et al., “Metabolomics et al., “Workflow4metabolomics: a collaborative research infras-
workbench: An international repository for metabolomics data tructure for computational metabolomics,” Bioinformatics, vol. 31,
and metadata, metabolite standards, protocols, tutorials and train- no. 9, pp. 1493–1495, 2015.
ing, and analysis tools,” Nucleic acids research, vol. 44, no. D1, pp.
[32] P. Castellano-Escuder, R. González-Domı́nguez, F. Carmona-
D463–D470, 2015.
Pontaque, C. Andrés-Lacueva, and A. Sánchez-Pla, “Pomashiny:
[14] N. Kessler, H. Neuweger, A. Bonte, G. Langenkämper, K. Niehaus, A user-friendly web-based workflow for metabolomics and pro-
T. W. Nattkemper, and A. Goesmann, “Meltdb 2.0–advances of the teomics data analysis,” PLOS Computational Biology, vol. 17, no. 7,
metabolomics software system,” Bioinformatics, vol. 29, no. 19, pp. p. e1009148, 2021.
2452–2459, 2013.
[33] E. Brunk, S. Sahoo, D. C. Zielinski, A. Altunkaya, A. Dräger,
[15] A. J. Carroll, M. R. Badger, and A. H. Millar, “The metabolome-
N. Mih, F. Gatto, A. Nilsson, G. A. Preciat Gonzalez, M. K. Aurich
express project: enabling web-based processing, analysis and
et al., “Recon3d enables a three-dimensional view of gene variation
transparent dissemination of gc/ms metabolomics datasets,” BMC
in human metabolism,” Nature biotechnology, vol. 36, no. 3, pp. 272–
bioinformatics, vol. 11, no. 1, p. 376, 2010.
281, 2018.
[16] R. Tautenhahn, G. J. Patti, D. Rinehart, and G. Siuzdak, “Xcms
[34] J. A. Baron, C. S.-B. Johnson, M. A. Schor, D. Olley, L. Nickel,
online: a web-based platform to process untargeted metabolomic
V. Felix, J. B. Munro, S. M. Bello, C. Bearer, R. Lichenstein et al.,
data,” Analytical chemistry, vol. 84, no. 11, pp. 5035–5039, 2012.
“The do-kb knowledgebase: a 20-year journey developing the
[17] M. Streit, A. Lex, M. Kalkusch, K. Zatloukal, and D. Schmalstieg,
disease open science ecosystem,” Nucleic acids research, vol. 52,
“Caleydo: connecting pathways and gene expression,” Bioinfor-
no. D1, pp. D1305–D1314, 2024.
matics, vol. 25, no. 20, pp. 2760–2761, 2009.
[35] Z. A. King, J. Lu, A. Dräger, P. Miller, S. Federowicz, J. A. Lerman,
[18] P. D. Karp, S. M. Paley, M. Krummenacker, M. Latendresse, J. M.
A. Ebrahim, B. O. Palsson, and N. E. Lewis, “Bigg models: A
Dale, T. J. Lee, P. Kaipa, F. Gilham, A. Spaulding, L. Popescu
platform for integrating, standardizing and sharing genome-scale
et al., “Pathway tools version 13.0: integrated software for path-
models,” Nucleic acids research, vol. 44, no. D1, pp. D515–D522,
way/genome informatics and systems biology,” Briefings in bioin-
2016.
formatics, vol. 11, no. 1, pp. 40–79, 2009.
[19] M. Kanehisa, M. Furumichi, M. Tanabe, Y. Sato, and K. Morishima, [36] E. Fahy and S. Subramaniam, “Refmet: a reference nomenclature
“Kegg: new perspectives on genomes, pathways, diseases and for metabolomics,” Nature methods, vol. 17, no. 12, pp. 1173–1174,
drugs,” Nucleic acids research, vol. 45, no. D1, pp. D353–D361, 2017. 2020.
[20] R. Caspi, R. Billington, L. Ferrer, H. Foerster, C. A. Fulcher, [37] J. D. Orth, I. Thiele, and B. Ø. Palsson, “What is flux balance
I. M. Keseler, A. Kothari, M. Krummenacker, M. Latendresse, analysis?” Nature biotechnology, vol. 28, no. 3, pp. 245–248, 2010.
L. A. Mueller et al., “The metacyc database of metabolic path- [38] A. C. Müller and A. Bockmayr, “Fast thermodynamically con-
ways and enzymes and the biocyc collection of pathway/genome strained flux variability analysis,” Bioinformatics, vol. 29, no. 7, pp.
databases,” Nucleic acids research, vol. 44, no. D1, pp. D471–D480, 903–909, 2013.
2015. [39] Z. A. King, A. Dräger, A. Ebrahim, N. Sonnenschein, N. E. Lewis,
[21] A. Fabregat, K. Sidiropoulos, P. Garapati, M. Gillespie, K. Haus- and B. O. Palsson, “Escher: a web application for building, sharing,
mann, R. Haw, B. Jassal, S. Jupe, F. Korninger, S. McKay et al., “The and embedding data-rich visualizations of biological pathways,”
reactome pathway knowledgebase,” Nucleic acids research, vol. 44, PLoS computational biology, vol. 11, no. 8, p. e1004321, 2015.
no. D1, pp. D481–D487, 2015. [40] T. Chen, G. Xie, X. Wang, J. Fan, Y. Qiu, X. Zheng, X. Qi, Y. Cao,
[22] D. S. Wishart, T. Jewison, A. C. Guo, M. Wilson, C. Knox, Y. Liu, M. Su, X. Wang et al., “Serum and urine metabolite profiling
Y. Djoumbou, R. Mandal, F. Aziat, E. Dong et al., “Hmdb 3.0—the reveals potential biomarkers of human hepatocellular carcinoma,”
human metabolome database in 2013,” Nucleic acids research, Molecular & Cellular Proteomics, vol. 10, no. 7, pp. M110–004 945,
vol. 41, no. D1, pp. D801–D807, 2012. 2011.
[23] J. Hastings, P. de Matos, A. Dekker, M. Ennis, B. Harsha, N. Kale, [41] S. Sahoo, H. S. Haraldsdóttir, R. M. Fleming, and I. Thiele, “Mod-
V. Muthukrishnan, G. Owen, S. Turner, M. Williams et al., “The eling the effects of commonly used drugs on human metabolism,”
chebi reference database and ontology for biologically relevant The FEBS journal, vol. 282, no. 2, pp. 297–317, 2015.
chemistry: enhancements for 2013,” Nucleic acids research, vol. 41, [42] H. Knauf and E. Mutschler, “Clinical pharmacokinetics and phar-
no. D1, pp. D456–D463, 2012. macodynamics of torasemide,” Clinical pharmacokinetics, vol. 34,
[24] Z. A. King, J. Lu, A. Dräger, P. Miller, S. Federowicz, J. A. Lerman, pp. 1–24, 1998.
A. Ebrahim, B. O. Palsson, and N. E. Lewis, “Bigg models: A [43] L. Che, P. Paliogiannis, A. Cigliano, M. G. Pilo, X. Chen, and D. F.
platform for integrating, standardizing and sharing genome-scale Calvisi, “Pathogenetic, prognostic, and therapeutic role of fatty
13
acid synthase in human hepatocellular carcinoma,” Frontiers in

oncology, vol. 9, p. 1412, 2019.
[44] X. Tong, F. Zhao, and C. B. Thompson, “The molecular determi-
nants of de novo nucleotide biosynthesis in cancer cells,” Current
opinion in genetics & development, vol. 19, no. 1, pp. 32–37, 2009.
[45] K.-x. Wang, G.-h. Du, X.-m. Qin, and L. Gao, “1h-nmr-based
metabolomics reveals the biomarker panel and molecular mech-
anism of hepatocellular carcinoma progression,” Analytical and
Bioanalytical Chemistry, vol. 414, no. 4, pp. 1525–1537, 2022.
[46] T. Pallister, M. A. Jackson, T. C. Martin, J. Zierer, A. Jennings, R. P.
Mohney, A. MacGregor, C. J. Steves, A. Cassidy, T. D. Spector et al.,
“Hippurate as a metabolomic marker of gut microbiome diversity:
Modulation by diet and relationship to metabolic syndrome,”
Scientific reports, vol. 7, no. 1, p. 13670, 2017.
[47] C.-S. Park, D.-W. Eom, Y. Ahn, H. J. Jang, S. Hwang, and S.-G. Lee,
“Can heme oxygenase-1 be a prognostic factor in patients with
hepatocellular carcinoma?” Medicine, vol. 98, no. 26, 2019.
[48] G. Sass, P. Leukel, V. Schmitz, E. Raskopf, M. Ocker, D. Neureiter,
M. Meissnitzer, E. Tasika, A. Tannapfel, and G. Tiegs, “Inhibition
of heme oxygenase 1 expression by small interfering rna decreases
orthotopic tumor growth in livers of mice,” International journal of
cancer, vol. 123, no. 6, pp. 1269–1277, 2008.
[49] L. Galluzzi, E. Vacchelli, J. Michels, P. Garcia, O. Kepp, L. Senovilla,
I. Vitale, and G. Kroemer, “Effects of vitamin b6 metabolism
on oncogenesis, tumor progression and therapeutic responses,”
Oncogene, vol. 32, no. 42, pp. 4995–5004, 2013.
[50] I. Kaji, M. Tatsuta, H. Iishi, M. Baba, A. Inoue, and H. Kasugai,
“Inhibition by d-limonene of experimental hepatocarcinogenesis
in sprague-dawley rats does not involve p21ras plasma membrane
association,” International journal of cancer, vol. 93, no. 3, pp. 441–
444, 2001.
[51] W. Chen, Y. Liu, M. Li, J. Mao, L. Zhang, R. Huang, X. Jin,
and L. Ye, “Anti-tumor effect of α-pinene on human hepatoma
cell lines through inducing g2/m cell cycle arrest,” Journal of
pharmacological sciences, vol. 127, no. 3, pp. 332–338, 2015.
[52] J. Zhou, Q. Wen, S.-F. Li, Y.-F. Zhang, N. Gao, X. Tian, Y. Fang,
J. Gao, M.-Z. Cui, X.-P. He et al., “Significant change of cytochrome
p450s activities in patients with hepatocellular carcinoma,” Onco-
target, vol. 7, no. 31, p. 50612, 2016.
[53] J. A. Zastre, R. L. Sweet, B. S. Hanberry, and S. Ye, “Linking
vitamin b1 with cancer cell metabolism,” Cancer & metabolism,
vol. 1, no. 1, pp. 1–14, 2013.
[54] L. Chávez-Rodrı́guez, A. Escobedo-Calvario, S. Salas-Silva, R. U.
Miranda-Labra, L. Bucio, V. Souza, M. C. Gutiérrez-Ruiz, and
L. E. Gomez-Quiroz, “Fructose consumption and hepatocellular
carcinoma promotion,” Livers, vol. 1, no. 4, pp. 250–262, 2021.
[55] A. Rosenberg and J. Hirschberg, “V-measure: A conditional
entropy-based external cluster evaluation measure.” in EMNLP-
CoNLL, vol. 7, 2007, pp. 410–420.

MetaboliticsDB IEEE TCBB-2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MetaboliticsDB IEEE TCBB-2

Uploaded by

Copyright:

Available Formats

1

MetaboliticsDB: A Database of Metabolomics

Index Terms—Metabolomics, Biological Databases, Personalized Medicine.

M ETABOLOMICS is the study of concentration changes

2 M ETHODS Analysis table. The Methods table stores metabolomics anal-

2.1.2 Human Disease Ontology Data

2.1.3 Metabolomics Analysis Data 2.2.1 Uploading Datasets

represent how much the activity of a pathway deviates in a

2.2.3 Pathway Enrichment Analysis

2.2.4 Direct Pathway Mapping Analysis

2.3 Tabular and Visual Analysis Results

Fig. 2. Relational database schema of MetaboliticsDB

Fig. 4. MetaboliticsDB Comparison interface featuring a heatmap that

More specifically, MetaboliticsDB employs Celery which is

Fig. 6. Change in citric acid cycle

reported to affect the activities of Torasemide with the in-

3.3 Responsiveness Evaluation

one step closer to the ”personalized medicine” goal. More ACKNOWLEDGEMENTS

In this paper, we present MetaboliticsDB which incorporates

acid synthase in human hepatocellular carcinoma,” Frontiers in

You might also like