Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Accepted Article

Received Date : 28-May-2013


Revised Date : 31-May-2013
Accepted Date : 04-Jun-2013
Article type

: MiniReview

Editor

: Jeff Cole

Corresponding author mail id : ron.caspi@sri.com

The challenge of constructing, classifying and representing metabolic


pathways

Ron Caspi, Kate Dreher and Peter. D. Karp


Bioinformatics Research Group
SRI International
333 Ravenswood Avenue
Menlo Park CA 94025

Abstract
Scientists, educators, and students benefit from having free and centralized access to the wealth of
metabolic information that has been gathered over the decades. Curators of the MetaCyc database
work to present this information in an easily understandable pathway-based framework. MetaCyc is

This article has been accepted for publication and undergone full peer review but has not been
through the copyediting, typesetting, pagination and proofreading process, which may lead to
differences between this version and the Version of Record. Please cite this article as doi:
10.1111/1574-6968.12194
This article is protected by copyright. All rights reserved.

Accepted Article

used not only as an encyclopedic resource for metabolic information but also as a template for the
pathway prediction software that generates pathway/genome databases for thousands of organisms
with sequenced genomes (available at www.biocyc.org). Curators need to define pathway boundaries
and classify pathways within a broader pathway ontology to maximize the utility of the pathways to
both users and the pathway prediction software. These seemingly simple tasks pose several challenges.
This review describes these challenges as well as the criteria that need to be considered and the rules
that have been developed by MetaCyc curators as they make decisions regarding the representation and
classification of metabolic pathway information in MetaCyc. The functional consequences of these
decisions in regard to pathway prediction in new species are also discussed.

Introduction
The accumulated knowledge of the metabolic processes employed by living organisms, including their
metabolic enzymes and pathways, spans many decades of research. Individuals trying to navigate
through this vast wealth of knowledge looking for specific information or seeking to perform broad
analyses may be stymied when data are scattered broadly, presented without a relevant biological
context, described using alternative compound or gene names, or locked up in out-of-print books, in
manuscripts which are often not easily accessible, or in other hard-to-reach resources. Therefore,
researchers, metabolic engineers, teachers, and students can benefit when this knowledge is presented
in an easily and freely accessible, highly integrated manner. The importance of such resources has
become even greater with the genomic revolution, which enables us to project computationally
knowledge obtained from one organism to thousands of organisms with sequenced and annotated
genomes. However, these new uses for the data present new challenges and require the development
of new tools. This review describes some of the challenges that we face while curating and categorizing

This article is protected by copyright. All rights reserved.

Accepted Article

metabolic pathways in the MetaCyc database (Caspi, et al., 2012) and while predicting the presence of
these pathways in the various organisms that make up the BioCyc collection of pathway/genome
databases. We summarize the guidelines and solutions we have developed to deal with these
challenges.

The common definition of metabolic pathways appears misleadingly straightforward. A well

accepted definition describes a metabolic pathway as a series of enzyme-catalyzed chemical reactions


occurring within an organism, in which a principal chemical is modified. Most people with some
background in biology will recall some well-defined key pathways of central metabolism, such as
glycolysis and the citric acid cycle. However, a close inspection of pathways described by different
sources, such as the biomedical literature, textbooks, and online databases, quickly reveals that
uniformity in pathway description is limited. After all, the metabolic network inside a living cell is very
complex, and a pathway is a somewhat abstract concept a simplification showing a very small subset
of that network, intended to make it easier for us to focus on that part. It is up to an investigator or a
curator attempting to describe a pathway to decide which network reactions or interactions should be
included in the pathway and which should be omitted. Similarly, when trying to classify pathways into
meaningful categories such as biosynthetic or degradative pathways, often there can be differences of
opinion on the proper categorization(s) depending on which of the principal chemical(s) are valued or, at
least, well-recognized by the intended audience. The results of the decisions made regarding these
issues affect pathway database contents, ontology design, and ease of use for different audiences.
Moreover, they affect the computational inference of metabolic pathways. As a result, curators of the
MetaCyc pathway database need to continually grapple with the definitions and classifications of
metabolic pathways.

This article is protected by copyright. All rights reserved.

Accepted Article

Pathway boundaries
The first and perhaps most controversial decision that needs to be made when attempting to describe a
metabolic pathway is determining the biochemical start and end points that define the boundaries of
the pathway. We use several guidelines to help us make this decision. The first guideline, which is often
used in the primary literature, is to describe the pathway using the essential subset of enzymes required
to achieve a particular biochemical goal. A second guideline is to start biosynthetic pathways and end
degradation pathways with common intermediates of central metabolism. Thus, a pathway that
describes the degradation of (R)-mevalonate by the bacterium Pseudomonas mevalonii will start with
(R)-mevalonate and end with acetyl-CoA, a common intermediate of central metabolism that feeds into

the citric acid cycle (TCA) cycle (Figure 1) (It should be noted that in many cases the second guideline is
not applicable since the pathway may utilize input compounds whose biosynthesis has not yet been
described in the literature).

A potential problem that arises when specifying pathways in this manner is the fact that many of the

pathways will contain large overlapping segments. For example, consider the degradation of the
aromatic compounds L-tryptophan, naphthalene, and L-quinate. These compounds, along with hundreds
of related compounds, can be fully degraded to acetyl-CoA, a compound of central metabolism, in
degradation pathways that involve an initial conversion to either catechol or protocatechuate (both
extremely common intermediates in the degradation of aromatic compounds), followed by further
degradation to acetyl-CoA via 2-oxopent-4-enoate. If we specify the full pathway from each compound
to acetyl-CoA, the three reactions involved in degradation of 2-oxopent-4-enoate to acetyl-CoA would
have to be repeated over and over. These repetitions result in redundancy, a much undesired quality in
a database. To avoid the redundancy issue, we implemented a procedure that uncouples the data

This article is protected by copyright. All rights reserved.

Accepted Article

encoding and data display in this respect. We will explain this by continuing with the same example.
Instead of repeating the last three steps in all of the pathways, we remove this segment from all
pathways and curate it as the standalone pathway 2-oxopentenoate degradation (PWY-5162). Instead
of the missing last three steps we terminate the original pathways with a pathway link leading to the 2oxopentenoate degradation pathway. The pathway link is a simple arrow that indicates the name of the
pathway(s) that continue(s) downstream. Since pathway links function as hyperlinks on a computer,
clicking on them allows the reader to navigate to the next segment in the metabolic network (Figure 1).

By replacing repeated pathway segments with pathway links, we eliminate the redundancy in

encoding the data. To enable users to see the full pathway in one diagram we introduced the concept of
superpathways. A superpathway is constructed by combining an individual base pathway (e.g. 2oxopentenoate degradation) with one or more additional pathways and/or individual reactions, to show
a larger part of the metabolic network. Because the superpathways are treated differently by the
software than non-superpathways (a.k.a. base pathways), they do not contribute to data redundancy,
and we can define as many superpathways as we find useful.

An example may help illustrate this concept. The full pathway from naphthalene to acetyl-CoA is

cleaved into four base pathways in MetaCyc - naphthalene degradation (PWY-5427), salicylate
degradation (PWY-6183), catechol degradation to 2-oxopentenoate (P183-PWY), and 2-oxopentenoate
degradation (PWY-5162) (The values in parentheses following the pathway names are IDs. Every object
in the MetaCyc database has a unique ID). Note that these pathways were broken at salicylate, catechol,
and 2-oxopent-4-enoate, all of which are branching points into which multiple pathways are known to
feed, and from which multiple pathways are known to depart The superpathway naphthalene

This article is protected by copyright. All rights reserved.

Accepted Article

degradation to acetyl-CoA (PWY-6956) contains these four base pathways and provides an overall view
of the full pathway from naphthalene to acetyl-CoA in one diagram.

It should be noted that different pathway databases differ in how they define pathway boundaries.

On one end of the spectrum is the KEGG database (Kanehisa, 2002), which prefers complex metabolic
maps that involve all known reactions that are related to a general topic regardless of whether they
occur within the same species or even the same kingdom (e.g. methane metabolism http://133.103.100.191/kegg/pathway/map/map00680.html). On the other end is UniPathway (Morgat,
et al., 2012), which defines every branching point as the boundary of a linear sub-pathway. MetaCyc
lies in between these two extremes.

Pathway variants
Another issue that pathway curators face involves pathway variants. It is well documented that different
organisms often achieve the same metabolic goal by implementing different pathways. Sometimes
multiple routes for achieving the same goal are found even within the same organism.

For example, salicylate can be used by multiple organisms as the source of carbon and energy.

However, different organisms degrade salicylate in different ways. The bacterium Ralstonia sp. U2
hydroxylates salicylate to gentisate in a single reaction (Fuenmayor, et al., 1998) (see MetaCyc pathway
salicylate degradation II, PWY-6224). Gentisate is then processed to pyruvate and fumarate (Zhou, et al.,
2001). The bacterium Streptomyces sp. WA46 also converts salicylate to gentisate, but does so by
activation to salicylate-CoA, which is hydroxylated to gentisyl-CoA and eventually converted to gentisate

This article is protected by copyright. All rights reserved.

Accepted Article

(salicylate degradation IV, PWY-6640) (Ishiyama, et al., 2004). The yeast Trichosporon moniliiforme
decarboxylates salicylate to generate phenol, then hydroxylates the latter to catechol, which is
processed to the central metabolites succinyl-CoA and acetyl-CoA (pathways salicylate degradation III,
phenol degradation I (aerobic), and catechol degradation III (ortho-cleavage pathway), PWY-6636, PWY5418, PWY-5417) (Iwasaki, et al., 2010). The bacterium Pseudomonas reinekei employs a single
decarboxylating hydroxylase that converts salicylate to catechol in a single step (salicylate degradation I,
PWY-6183) (Camara, et al., 2007). Should all of these routes be part of a single salicylate degradation
pathway, or are they different pathways? Once again, different databases treat this topic differently. In
KEGG all of these routes would be combined into a single pathway. In MetaCyc we curate a different
pathway for each known route. To emphasize that these pathways relate to each other, we define them
as pathway variants. They are often labeled with a Roman numeral (e.g. salicylate degradation III), and
the web page of each of these pathways contains links to the other variants. In addition to providing a
more accurate and precise representation of which pathways have been biochemically characterized in
which species, the inclusion of distinct variant pathways within MetaCyc would be expected to improve
the quality of the pathway/genome databases that are predicted using MetaCyc as a reference. Having
multiple pathway variants to choose from permits the prediction software to determine which of the
variants is the best fit for the enzyme complement of a particular organism, and incorporate only the
appropriate variant(s) in the database (see Pathway Prediction below).

Chimeric and conspecific pathways


Sometimes it is useful to combine pathways from different organisms into a single diagram that provides
an overview of a metabolic field. For example, the pathway superpathway of CDP-3,6-dideoxyhexose
biosynthesis (PWY-5823) brings together several pathways found in Gram-negative bacteria that

This article is protected by copyright. All rights reserved.

Accepted Article

produce these unusual sugars (CDP-paratose, CDP-abequose, CDP-ascarylose and CDP-tyvelose) as the
O-antigens of their lipopolysaccharides. Note that no single organism identified to date can naturally
produce all of these sugars.

Although it is simple to create such a diagram by generating a superpathway composed of the

various base pathways, there is an important distinction between pathways that are expected to occur
in their entirety in a single organism, and pathways that are not. In order to maintain this distinction, we
have established the concepts of conspecific pathways vs. chimeric pathways. While a conspecific

(meaning belonging to the same species) pathway comprises a set of reactions that are expected to be
found within each organism that has the pathway, a chimeric pathway comprises reactions from
multiple organisms, and is not expected to occur in its entirety in a single organism. Chimeric MetaCyc
pathways are clearly labeled as such to ensure that the user is aware of their special status (Figure 2).

In addition to their role as a resource for human readers, the pathways in MetaCyc are used by the

Pathway Tools software as a reference for prediction of the metabolic networks of organisms with a
sequenced genome, enabling the software to generate organism-specific Pathway/Genome Databases.
Currently, chimeric pathways are excluded from the prediction process. However, future software will
be able to construct conspecific versions of the chimeric pathways. When the software will predict that
a certain part of a chimeric pathway occurs in an organism (in a connected set of reactions), it will
remove the extraneous reactions from the pathway, producing a truncated conspecific version of it.

This article is protected by copyright. All rights reserved.

Accepted Article

Engineered pathways
Another type of important pathway that demands special presentation is an engineered pathway
these pathways are constructed artificially by modifying naturally-occurring enzymes, and/or by
introducing enzymes from different sources into a host organism. They share some characteristics with
chimeric pathways, but have the distinction that they operate within a single organism. Engineered
pathways are clearly labeled as such in MetaCyc, and are excluded from the pathway prediction process
as well.

Pathway prediction
A major focus of the BioCyc project is to provide high quality predictions of what subset of well-curated
pathways in MetaCyc are likely to exist in a target species chosen from any kingdom of life, based
primarily on its complete genomic sequence (Paley & Karp, 2002, Krummenacker, et al., 2005). This task
is accomplished using the PathoLogic tool of the Pathway Tools software (Karp, et al., 2010). The basic
procedure for predicting whether a particular pathway occurs in an organism is based on the presence
of the enzymes of the pathway in that organism (usually deduced by the presence of genes predicted to
encode such enzymes in the annotated genome). Since it is expected that some of the enzymes may not
have been properly recognized and annotated, owing to limited knowledge and variations in sequence,
an arbitrary threshold is defined. For example, one can demand that 80% of the enzymes of a pathway
must be present in order to predict that the pathway is present in the organism. As can be expected,
using such a simple rule often results in false predictions. Take for example the incomplete reductive
TCA cycle that operates in methanogenic archaea (P42-PWY) (Shieh & Whitman, 1987). The majority of
the enzymes that participate in this pathway are a subset of the enzymes of the tricarboxylic acid (TCA)
cycle, a pathway that operates in aerobic organisms. However, the reductive pathway includes only a

This article is protected by copyright. All rights reserved.

Accepted Article

subset of the reactions of the TCA cycle, and operates in the reverse direction, functioning as a carbon
dioxide assimilating mechanism. If we simply look for the presence of the enzymes, we will find the
majority of them in most aerobic organisms, and thus might erroneously predict the existence of the
incomplete reductive TCA cycle in aerobic organisms that possess the TCA cycle. To avoid such errors we
routinely use the following two features:

Taxonomic Range. For many of the pathways it is possible to define which taxa are likely to possess

them. By adding this information to the MetaCyc pathway, it is possible to direct the PathoLogic
program to avoid predicting the pathway in species outside its estimated taxonomic range. The
taxonomic range is not curated for engineered pathways.

Key Reactions: Many pathways require unique enzymes that do not participate in other pathways.

By designating the reactions catalyzed by these enzymes as key reactions in MetaCyc, and insisting on
their presence, PathoLogic can refrain from predicting the pathway in organisms whose genomes do not
seem to encode the corresponding key enzymes. For example, some methylotrophic bacteria are able to
assimilate formaldehyde in a complex pathway known as the RuMP (ribulose monophosphate) cycle
(PWY-1861) (Strom, et al., 1974). Most of the enzymes that participate in this pathway also catalyze
reactions of the pentose phosphate pathway, a central metabolic pathway that is widespread. However,
the RuMP cycle requires two unique enzymes, 3-hexulose-6-phosphate synthase (EC 4.1.2.43) and 6phospho-3-hexuloisomerase (EC 5.3.1.27). By specifying the reactions catalyzed by these enzymes as key
reactions, we can prevent the erroneous prediction of the pathway in non-methylotrophic organisms
that possess the pentose phosphate pathway.

This article is protected by copyright. All rights reserved.

Accepted Article

While the use of these features substantially reduces the number of false positive predictions, it can
also contribute to false negative predictions, leading the PathoLogic program to reject candidate
pathways that are legitimately present in a target organism. For example, a curator may set the
expected taxonomic range of a plant growth regulator biosynthetic pathway to the plant kingdom
(Viridiplantae). However, there are many known examples of specific fungal and bacterial pathogens
producing plant compounds to modulate the growth and/or defense responses of their plant hosts.
Such a case is found in the rice pathogen Gibberella fujikuroi, which can synthesize the plant hormone
gibberellin (see gibberellin biosynthesis IV, PWY-5047) (Rojas, et al., 2001). To help avoid these negative
results, the PathoLogic program uses a complex algorithm that employs more than 16 different heuristic
rules (Dale, et al., 2010), which may override the negative taxonomic range data if other strong support
exists for the presence of the pathway. In addition, the taxonomic pruning option can be turned off by
the user during the creation of a new database.

As discussed above, curated information, the complexity of the PathoLogic program, and user

decisions can all work together to produce a more accurate final set of predicted pathways for any
target species. It should be noted that some false positive predictions are likely to be made despite all of
these tools and guidelines. For example, even if an organism contains all the genes necessary for a
pathway, differential regulation of these genes may prevent them from being expressed at the same
time, making it unlikely that the predicted pathway would be physiologically functional (Constantinidou,
et al., 2006).

This article is protected by copyright. All rights reserved.

Accepted Article

Classification of pathways
MetaCyc version 17.0 of March 2013 contains over 2,000 base pathways. Many more will surely be
studied in the years to come, so it is essential that a pathway classification system be employed to group
pathways into meaningful categories that can aid researchers, educators, and students. In the absence
of a universally-accepted classification system for pathways, we have developed a Pathway Ontology in
MetaCyc. The ontology is continually updated to reflect curation needs. Currently the ontology contains
six top-level categories (or classes): Biosynthesis, Degradation/Utilization/Assimilation, Generation of
Precursor Metabolites and Energy, Detoxification, Activation/Inactivation/Interconversion, and
Metabolic Clusters. The definitions of these categories are provided in Table 1. It should be noted that
the inclusion of Metabolic Clusters is a compromise. These collections of unconnected reaction are not
true pathways, and thus one could argue that they should not be part of the ontology. However, we
have chosen to include them because we find that they serve an important purpose that traditional
pathways do not (see for example the metabolic cluster tRNA charging (TRNA-CHARGING-PWY)).

Pathways present in these different ontological categories (and increasingly granular subcategories)

can be easily browsed from the Search menu present on every page in Pathway Tools-generated
databases by selecting Ontologies and then Pathway Ontology. The same ontology can be used to
initiate a search under the Pathways option on the Search menu. Moreover, users can perform very
complex searches by combining selected ontological categories with additional pathway features such
as desired evidence codes, key compounds, and target organisms. For example, it is possible to search
for pathways for siderophore production found in -proteobacteria, or for pathways for the biosynthesis
of secondary metabolites with the expected taxonomic range of Viridiplantae (green plants).

This article is protected by copyright. All rights reserved.

Accepted Article

Further uses of pathways within pathway-genome databases


As its name implies, the Pathway Tools software includes multiple tools that can enhance the usability of
metabolic pathways. One such tool (the Cellular Overview) is a metabolic map that shows all the
pathways in the pathway/genome database (PGDB) in one integrated diagram. When combined with
the Omics Viewer tool, users can display transcriptomic, proteomic, metabolomic, and fluxomic data
values, as well as many other types of data, superimposed on the pathways in this diagram. If groups of
enzymes, compounds, etc., fall into specific categories, e.g. those up-regulated versus down-regulated in
a particular experimental condition, these different categories can be displayed on the diagram using a
Highlight feature. Finer control is obtained by defining numerical values for each category (e.g. highlight
pathway reactions catalyzed by enzymes whose genes are up-regulated more than 3-fold but less than
10-fold in blue). While using Pathway Tools online, users can apply these tools to any of the 3000 PGDBs
that are currently available in the BioCyc PGDB collection. Users who prefer to use the software on their
own desktops, perhaps because they create their own PGDBs, are able to easily download any of the
BioCyc PGDBs and install them locally by using a tool called the PGDB Registry. Using the desktop
version of the software allows more control, since the user is able to modify the data in any of these
PGDBs. For instance, if additional steps in a metabolic pathway are uncovered, the pathway can be
modified using the desktop version of the program, adding the new reactions to the pathway and
assigning the enzymes to them. Since the pathway diagrams are generated automatically by the
software as soon as the new reactions are added, a pathway diagram can be easily prepared for a
publication describing the new enzymes. Once the Metabolic Overview is regenerated by the program,
experimental data can then be visually displayed in the context of the extended pathway, which may
allow co-expression patterns or metabolic bottlenecks in mutants to be more easily discerned.
Additional pathway figures can be prepared, showing omics data superimposed on the new pathway
diagram.

This article is protected by copyright. All rights reserved.

Accepted Article

Sharing the new data is easy as well. Pathway Tools can be configured to run as a web server,
allowing remote users on an intranet or the Internet to connect via their web browsers and browse the
data. Pathways can also be exported to files that can be imported by collaborators who want to analyze
the same pathway using their own local installation of Pathway Tools.

As discussed at the beginning of this article, while the definition and use of metabolic pathways are

clearly beneficial, in reality there are no strictly defined pathways that operate in isolation. Rather, all
of the reactions performed within an organelle, cell, and/or set of cells are connected at some level.
Fortunately, the pathway boundaries imposed by MetaCyc curators provide no barrier to more global
analyses. A Metabolite Tracer tool (that is available only on the desktop version of the program) can
help researchers to see the connections between pathways. The Tracer enables users to define a
starting compound and track all of its metabolic routes in increment of one, two, or more reactions
away, no matter what pre-defined pathway they occur in.

The new MetaFlux tool in the Pathway Tools software (also available only on the desktop version)

uses the data in a PGDB to define a flux-balance analysis (FBA) model that considers the densely
interconnected web of metabolites, enzymes, and reactions that exist in that organism. Researchers can
also use the data files available for each PGDB to generate their own networks outside of the Pathway
Tools software. While there are some ready-made tools to facilitate this process (such as the BioCyc
plugin for the Cytoscape program, released in 2010), users have countless options for programmatically
bringing together the reactions and compounds that MetaCyc curators have separated into more
comprehensible pathways. Regardless of the tool used, once results are obtained from these more

This article is protected by copyright. All rights reserved.

Accepted Article

global analyses, their biological significance can often be fruitfully examined with the help of the
annotated MetaCyc pathways and their accompanying summaries and lists of references.

Conclusions
Curators of metabolic pathway databases face fundamental problems pertaining to the definition,
classification, and representation of metabolic pathways. In this paper we described the tools and
guidelines that we use in the MetaCyc database for representing these pathways, as well as the
ontology that we developed for classifying metabolic pathways. In addition, we described how the
Pathways in MetaCyc are used by the Pathway Tools software for prediction of the metabolic network of
thousands of sequenced organisms, and some of the tools that the software offers for pathway analysis.
We hope that our approach to pathway classification and representation and the tools that we
developed help make pathway data analysis easier and more powerful for researchers, students, and
educators.

References
Camara B, Bielecki P, Kaminski F, dos Santos VM, Plumeier I, Nikodem P & Pieper DH (2007) A gene cluster involved
in degradation of substituted salicylates via ortho cleavage in Pseudomonas sp. strain MT1 encodes enzymes
specifically adapted for transformation of 4-methylcatechol and 3-methylmuconate. J Bacteriol 189: 1664-1674.

Caspi R, Altman T, Dreher K, et al. (2012) The MetaCyc database of metabolic pathways and enzymes and the
BioCyc collection of pathway/genome databases. Nucleic Acids Res 40: D742-753.

Constantinidou C, Hobman JL, Griffiths L, Patel MD, Penn CW, Cole JA & Overton TW (2006) A reassessment of the
FNR regulon and transcriptomic analysis of the effects of nitrate, nitrite, NarXL, and NarQP as Escherichia coli
K12 adapts from aerobic to anaerobic growth. J Biol Chem 281: 4802-4815.

This article is protected by copyright. All rights reserved.

Accepted Article

Dale JM, Popescu L & Karp PD (2010) Machine learning methods for metabolic pathway prediction. BMC
Bioinformatics 11: 15.

Fuenmayor SL, Wild M, Boyes AL & Williams PA (1998) A gene cluster encoding steps in conversion of naphthalene
to gentisate in Pseudomonas sp. strain U2. J Bacteriol 180: 2522-2530.

Ishiyama D, Vujaklija D & Davies J (2004) Novel pathway of salicylate degradation by Streptomyces sp. strain
WA46. Appl Environ Microbiol 70: 1297-1306.

Iwasaki Y, Gunji H, Kino K, Hattori T, Ishii Y & Kirimura K (2010) Novel metabolic pathway for salicylate
biodegradation via phenol in yeast Trichosporon moniliiforme. Biodegradation 21: 557-564.

Kanehisa M (2002) The KEGG database. Novartis Found Symp 247: 91-101; discussion 101-103, 119-128, 244-152.
Karp PD, Paley SM, Krummenacker M, et al. (2010) Pathway Tools version 13.0: integrated software for
pathway/genome informatics and systems biology. Brief Bioinform 11: 40-79.

Krummenacker M, Paley S, Mueller L, Yan T & Karp PD (2005) Querying and computing with BioCyc databases.
Bioinformatics 21: 3454-3455.

Morgat A, Coissac E, Coudert E, et al. (2012) UniPathway: a resource for the exploration and annotation of
metabolic pathways. Nucleic Acids Res 40: D761-D769.

Paley SM & Karp PD (2002) Evaluation of computational metabolic-pathway predictions for H. pylori.
Bioinformatics 18: 715-724.

Rojas MC, Hedden P, Gaskin P & Tudzynski B (2001) The P450-1 gene of Gibberella fujikuroi encodes a
multifunctional enzyme in gibberellin biosynthesis. Proc Natl Acad Sci U S A 98: 5838-5843.

Shieh JS & Whitman WB (1987) Pathway of acetate assimilation in autotrophic and heterotrophic methanococci. J
Bacteriol 169: 5327-5329.

Strom T, Ferenci T & Quayle JR (1974) The carbon assimilation pathways of Methylococcus capsulatus,
Pseudomonas methanica and Methylosinus trichosporium (OB3B) during growth on methane. Biochem J 144:
465-476.

Zhou NY, Fuenmayor SL & Williams PA (2001) nag genes of Ralstonia (formerly Pseudomonas) sp. strain U2
encoding enzymes for gentisate catabolism. J Bacteriol 183: 700-708.

This article is protected by copyright. All rights reserved.

Accepted Article

Funding
Funding for this research was provided by the National Institute of General Medical Sciences of the
National Institutes of Health (grants GM080746, GM077678, GM088849 and GM075742); Department
of Energy (bioenergy-related pathway curation, grant DE-SC0004878); National Science Foundation
(plant pathway curation performed by Carnegie Institution for Science), grants IOS-1026003 and DBI0640769). Funding for open access charge: A grant from the National Institute of General Medical
Sciences of the National Institutes of Health (NIH).

Table 1: The higher-level classes in the MetaCyc pathway ontology. The left column lists the master
classes; the right column lists direct subclasses. Numbers indicate the number of MetaCyc pathways in
each class in version 17.0 of MetaCyc. Note that the sum of pathways in all subclasses of a particular top
class is likely to be smaller than the total number of pathways for that class, since many pathways are
listed directly under the top class. The 13 precursor metabolites mentioned in the definitions include Dglucose 6-phosphate, D-fructose 6-phosphate, D-ribose 5-phosphate, D-erythrose 4-phosphate, Dglyceraldehyde 3-phosphate, 3-phospho-D-glycerate, phosphoenolpyruvate, pyruvate, acetyl CoA, 2oxoglutarate, succinyl CoA, oxaloacetate, and D-sedoeptulose 7-phosphate.

Top Category

Definition

Subcatertgories

Generation of
Precursor Metabolites
and Energy (183)

This class contains the pathways of


central metabolism (glycolysis, pentose
phosphate pathways, and TCA cycle),
which collectively produce the 13
starting materials, sometimes termed
precursor metabolites, for all cellular
biosyntheses. Other degradative
pathways, sometimes termed feeder

Fermentation (48)

This article is protected by copyright. All rights reserved.

Respiration (28)
Chemoautotrophic Energy
Metabolism (17)
Electron Transfer (14)

Accepted Article

Biosynthesis (1318)

pathways, feed into central metabolism.


This class also contains the pathways
that generate energy under various
conditions of growth.

Methanogenesis (13)
TCA cycle (9)
Hydrogen Production (9)
Photosynthesis (8)
Glycolysis (7)
Other (5)
Acetyl-CoA Biosynthesis (4)
Pentose Phosphate Pathways (4)

This class contains pathways that


constitute a complete spectrum of the
biosynthetic capacities of a cell,
including the routes of synthesis of small
molecules, macromolecules and cell
structure components. It does not
contain the pathways that generate the
13 starting materials, sometimes termed
precursor metabolites, for all cellular
biosyntheses.

This article is protected by copyright. All rights reserved.

Secondary Metabolites
Biosynthesis (530)
Cofactors, Prosthetic Groups,
Electron Carriers Biosynthesis
(209)
Fatty Acids and Lipids Biosynthesis
(140)
Amino Acids Biosynthesis (114)
Carbohydrates Biosynthesis (114)
Hormones Biosynthesis (57)
Nucleosides and Nucleotides
Biosynthesis (48)`
Other Biosynthesis (41)
Cell structures Biosynthesis (47)
Amines and Polyamines
Biosynthesis (37)
Aromatic Compounds Biosynthesis
(32)
Siderophore Biosynthesis (17)
Metabolic Regulators Biosynthesis

Accepted Article

(6)

Degradation/
Utilization/
Assimilation (872)

Aminoacyl-tRNA Charging (4)


This class contains pathways by which
various organisms degrade substrates to
serve as sources of nutrients and
energy, utilize exogenous sources of
essential metabolites, or assimilate
certain sources of essential bioelements.

This article is protected by copyright. All rights reserved.

Aromatic Compounds Degradation


(182) Amino Acids Degradation
(119)
Inorganic Nutrients Metabolism
(101)
Secondary Metabolites
Degradation (96)
Carbohydrates Degradation (94)
Amines and Polyamines
Degradation (54)
Carboxylates Degradation (44)
Chlorinated Compounds
Degradation (38)
Polymeric Compounds
Degradation (37)
Nucleosides and Nucleotides
Degradation (36)
Hormones Degradation (32)
C1 Compounds Utilization and
Assimilation (29)
Degradation/Utilization/Assimilati
on - Other (27)
Fatty Acids and Lipids Degradation
(24)
Alcohols Degradation (19)
Aldehyde Degradation (11)
Cofactors, Prosthetic Groups,
Electron Carriers Degradation (5)

Accepted Article

Protein Degradation (3)


Activation/
Inactivation/
Interconversion (33)

Detoxification (39)

This class holds pathways for activation,


inactivation, and interconversion of
metabolic compounds.

Interconversion (17)
Activation (8)

Inactivation (8)
In contrast to a standard "biosynthesis"
pathway in which a biologically active
compound is synthesized from precursor
molecules, activation pathways involve
relatively minor chemical modifications
to existing compounds that result in a
substantial increase in their biological
activity.
Similarly, in contrast to standard
"degradation" pathways in which a more
complex compound is broken down into
a set of simple metabolites, inactivation
pathways involve relatively minor
chemical modifications to existing
biologically active compounds that
result in a substantial decrease in their
biological activity.
Interconversion pathways describe the
bidirectional conversion of a biomolecule to a different form. The
forward and backward conversions
often result in significant changes in the
biological activity of the compound, thus
resulting in its activation and
deactivation, respectively.
These modifications may be either
reversible or irreversible.
This class contains pathways by which
various organisms protect themselves
against the harmful effects of toxic
compounds. The sole purpose of these
pathways is to avoid toxicity, with no
other benefit (such as energy or useful

This article is protected by copyright. All rights reserved.

Methylglyoxal Detoxification (8)


Arsenate Detoxification (4)
Antibiotic Resistance (7)
Acid Resistance (2)

Accepted Article

metabolites) to the organism.

Metabolic Clusters
(32)

Cyanide Detoxification (2)


Mercury Detoxification (1)

A metabolic cluster is a set of


biochemical reactions that are
biologically related, but are largely
unconnected, and therefore do not
constitute a pathway in the traditional
sense of the word.

Figure Legends

Figure1. A simple pathway showing the degradation of (R)-mevalonate by the bacterium Pseudomonas
mevalonii. The pathway starts with the compound that is being degraded and ends with acetyl-CoA, a
common intermediate of central metabolism that feeds into the citric acid cycle (TCA cycle). The other
end product, acetoacetate, is linked via a pathway link to the pathway that degrades that compound
further into acetyl-CoA. Compounds are shown in red, enzymes in yellow, genes in purple, EC numbers
in blue, and pathway links in green.

Figure 2. Chimeric pathways, like this one describing the synthesis of different CDP-3,6-dideoxyhexoses,
comprise reactions and enzymes from multiple organisms, and are not expected to occur in their
entirety in a single organism. The title of chimeric MetaCyc pathways is labeled as such to ensure that
the reader is aware of their special status. In addition, rather than a taxonomic range, the pathway
comments provide a list of taxa known to possess parts of the pathway (not shown). Compounds are
shown in red, enzymes in yellow, genes in purple, and EC numbers in blue.

This article is protected by copyright. All rights reserved.

Accepted Article
This article is protected by copyright. All rights reserved.

You might also like