Professional Documents
Culture Documents
PubChem Database BioInformatics Notes
PubChem Database BioInformatics Notes
PubChem Database BioInformatics Notes
Introduction
PubChem is a database of chemical molecules and their activities. The National Centre for
Biotechnology Information (NCBI) maintains the system, a component of the National Library of
Medicine, which is part of the United States National Institutes of Health (NIH). PubChem is a
primary database; hence the data can be accessed for free. PubChem contains small molecules, but
also larger molecules such as nucleotides, carbohydrates, lipids, peptides, and chemically
modified macromolecules. The three main databases that make up PubChem are Compounds,
Substances, and BioAssay. All three are constantly expanding.
PubChem Homepage
The search interface on the PubChem homepage enables users to look for any phrase,
keyword, or identifier. Multiple hits from a search are displayed on an Entrez DocSum page.
The user will be sent to the website that has information on the single record that the search
turned up. This page is referred to as the Compound Summary, Substance Record, or
BioAssay Record page, depending on the kind of record. The PubChem homepage is a
central location for all PubChem services.
Data in the database is stored in the form of a table with rows and columns like an Excel spreadsheet.
The rows correspond to compounds and the columns correspond to descriptions for the compounds
(e.g. melting and boiling points, chemical names, toxicity, bioactivity, target proteins, and so on).
These columns are called ‘data fields’. To search the (depositor-provided) chemical name field of the
records in the PubChem Compound database, a chemical name query needs to be suffixed with either
the “[synonym]” or “[completesynonym]” index. The “[synonym]” index invokes a search for
molecules whose names contain the query chemical name as a part (that is, partial matching), and the
“[completesynonym]” index invokes a search for those whose names completely match the query
(that is, exact matching). The “[completesynonym]” or “[synonym]” are the “depositor-provided
synonyms” fields of the compound records in PubChem that are searched for the query string.
aspirin[completesynonym] (1 hit)
Many of MeSH terms are chemical names (e.g., for drugs, nutrients, metabolites, toxic chemicals, and
so on). PubChem performs an automated annotation of PubChem records with MeSH terms (through
chemical name matching), creating associations between PubChem records and PubMed articles that
share the same MeSH annotation. The MeSH term that matches a (depositor-provided) synonym of a
compound in PubChem is presented with its entry terms under the “MeSH Synonyms” section of the
Compound Summary page of that compound.
The PubChem Classification Browser allows the user to navigate or search PubChem records
associated to a hierarchical classification system of interest. It can be accessed from the PubChem
Homepage.
The Classification Browser can retrieve records annotated with terms in the following classification
systems:
The Classification Browser provides a powerful way to find a desired subset of PubChem records.
An important feature of the Classification Browser is that the Table of Contents presented in the
Compound Summary is integrated into the Classification Browser, allowing users to quickly retrieve
compounds with a particular type of information available.
3.2. PubChem Data Sources Page
The PubChem Data Sources page helps users determine who provided what information. The page
can be used to retrieve the data provided by a data depositor or to download the annotations collected
from a data source. For example, the boiling point data can be downloaded from the DrugBank.