Professional Documents
Culture Documents
SPOKE123
SPOKE123
Why is there a need for a biomedical knowledge graph like SPOKE?: Start by explaining that the
biomedical field generates vast amounts of data. This data includes information about genes, proteins,
diseases, drugs, and more. However, these data are often scattered across various databases and
repositories, leading to a fragmented and compartmentalized landscape. SPOKE was developed to address
this issue by creating a unified knowledge graph that connects this scattered information.
How is the complexity, size, and heterogeneity of biomedical information a challenge?: Describe the
challenges posed by the sheer volume of data and the diversity of data types (e.g., genetic, clinical,
pharmacological). Emphasize that managing and integrating such diverse and complex data is a formidable
task.
Why is connecting seemingly disparate information essential for precision medicine efforts?: Explain
that precision medicine aims to tailor medical treatments to individual patients. To achieve this, it's crucial to
connect and cross-reference a wide range of information, from a patient's genetic makeup to the latest drug
research. SPOKE provides the infrastructure to make these connections, which can lead to more informed
and personalized healthcare decisions.
Scalable precision medicine open knowledge
engine (SPOKE)
Continuous Updates: The construction process is ongoing, with weekly updates, ensuring
that SPOKE stays current with the latest information from the source databases.
An overview of Organism Identification: It identifies organisms using the NCBI Taxonomy ID and determines
species of interest from multiple sources, which is essential for linking biological data. For
how SPOKE is example, Escherichia coli and Bacillus subtilis are identified by their unique Taxonomy IDs,
allowing researchers to study specific bacterial species.
built Protein Information: SPOKE includes protein data from UniProt, which is a well-known
protein database. This means it can provide details about proteins found in various
organisms. For instance, it might offer information about the structure, function, and known
interactions of a particular protein.
Protein Interactions: The graph incorporates information about protein interactions from
sources like STRING and IntAct, which helps to establish relationships between proteins.
Consider two proteins, A and B. SPOKE might contain information from sources like STRING
and IntAct that indicate that protein A interacts with protein B. This helps researchers
understand how different proteins work together in biological processes.
Genes and Diseases: SPOKE integrates data from NCBI
Gene to provide information about human genes. For
instance, it might link a specific gene to diseases it is
associated with. This allows researchers to study the
genetic basis of diseases and understand which genes
An are involved in particular health conditions.
Compound Information: SPOKE includes data about
overview of compounds from sources like ChEMBL, DrugBank, and
the Connectivity Map project, which is crucial for
An participates-Pathway" edges.
Metabolic Pathways:
overview • Reads data from resources such as KEGG, MetaCyc, and PATRIC to incorporate
metabolic pathways.
• Introduces a "Reaction" node that links to metabolites through "Reaction-
Food Data:
built • Integrates information from two food databases, FooDB and the Australian
Food Composition Database.
• Establishes relationships with compounds and nutrients in the knowledge
graph through "Food-contains-Compound" and "Food-contains-Nutrient"
edges.
• Aims to incorporate the FoodOn ontology to standardize food data mapping.
API Access: The REST API provides a means
for users to access and query the nodes and
edges within SPOKE, allowing for data
An exploration and retrieval.
overview
of how Support for Various Use Cases: The API
SPOKE is offers different types of queries, including
meta-information, information about
built specific nodes, and network-related queries.
This flexibility supports a wide range of use
cases in biomedical research and analysis.
The SPOKE REST API
• is like a bridge that allows you to access and interact with the SPOKE
knowledge graph, which contains a lot of information about biology and
medicine. It was mainly designed to work with a tool called the
Neighborhood Explorer. This API has three main parts:
1.Meta-Information: You can ask for general information about the
knowledge graph, kind of like checking a map to see what's in it.
2.Node Information: You can search for specific things in the graph, like
finding all the proteins related to a certain disease. This is a bit like
searching for specific places on a map.
3.Network Information: This part is a bit more complex, but it allows you to
explore how different things in the graph are connected. It's like looking at
a subway map and figuring out how to get from one place to another.
The main differentiation factors between a REST API
and a SPARQL API
Factor REST API Hypothetical SPARQL API
Purpose Accessing data from a database Querying linked data (knowledge graph)
Access Method HTTP methods (GET, POST, etc.) SPARQL query language
Query Complexity Limited complexity in queries Highly complex, graph pattern matching