A Comprehensive Review of Database Resources in CH

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Review

iq.unesp.br/ecletica

| Vol. 45 | n. 3 | 2020 |

A comprehensive review of database resources in chemistry


Syed Sauban Ghani1+
1. Jubail Industrial College, General Studies Department, Jubail, Saudi Arabia

+
Corresponding author: Syed Sauban Ghani, Phone: +96658392-0235, Email address: Syed_SG@jic.edu.sa

ARTICLE INFO Keywords:

Article history: 1. database


Received: December 11, 2019 2. scopus
Accepted: May 8, 2020 3. google scholar
Published: July 1, 2020 4. citation

ABSTRACT: As scientific community worldwide is


publishing a huge number of research articles in
various fields; it is necessary to distinguish between
databases that are efficient and objective for
literature searches. This review offers information on
the important points of the database. None of the
databases are complete and perfect, but they
complement each other. If a library can only afford
one, choice must be based on the priorities of
institutional needs. The benefits that databases can
provide in the preparation of the literature review for developing future studies and dissemination of research are discussed. This
paper provides an overview of the most frequently used free chemistry databases such as PubChem, Crystallography Open
Database, PubMed, ZINC, ChemSpider, and Google Scholar. It also gives a brief description of three major commercial
databases such as Scopus, Web of Science, and SciFinder. Thus, substance and citation databases that covers almost all areas of
chemistry, has become an invaluable tool in bibliometric analysis.

1. Introduction meaningful. Chemical databases have now


become a powerful tool in drug discovery.
The amount of information available today is Database searches based on potential
growing at an exponential rate and the ability to requirements for biological activity identifies
search for the necessary information is one of the compounds that are suitable for detailed analysis
basic needs of knowledge. The abundance of or indicate novel ways to achieve the desired
technological and Internet resources can both activity2. Accessing chemical information that are
simplify and complicate a researcher's world. stored in different kinds of databases by utilizing
Chemistry is an interdisciplinary subject upon the means of computer are gradually becoming
which the other scientific disciplines are more significant. The dimension of all databases
dependent to a certain extent. This vast either in terms of the structures or that of the
information data must be systematically organized reactions are growing tremendously each year3.
by the experts in the field. A database is an The proficiency of search algorithms executed
organized collection of data in any field1. In within these databases are very crucial. Therefore,
addition to the basic search techniques that are the computer supported databases are becoming
used today in almost everyday life, such as useful tools for several research laboratories in
searching by keyword search engines, there are industry as well as in academia.
some areas of chemistry that are not so simple or

57 Eclética Química Journal, vol. 45, n. 3, 2020, 57-68


ISSN: 1678-4618
DOI: 10.26850/1678-4618eqj.v45.3.2020.p57-68
Review

It is must to have a better understanding of prerequisite for this search method. The method
how the data is organized and interconnected for that is frequently adopted by the database creators
the effective search in any database. The is the inclusion of the “forms” which is already
databases are broadly distributed in two major having names of the usual fields such as
group’s viz. full-text and the structures, based on "bibliographic data" or "physical parameters" to
the category of data contained4. A full-text give a user-friendly interface.
database is generally a set of documents in which Search by chemical identifier is probably the
indexes are created to facilitate their fast search. simplest search by keyword, with the difference
This type of database is commonly run by that chemical identifiers are a little more difficult
publishers of magazines and books, patent offices to define. However, the free programs available to
for patents or academic institutions. The largest us can generate these identifiers if we are able to
database of this type on this globe is definitely enter the structure of the compound (e.g. Chem-
operated by Google, in which documents and Sketch or Marvin- Sketch)11. The most common
other accessible files are uploaded on the internet search for information on chemical compounds or
in the form of websites5. On the other hand, their chemical products is the search for structures
structured databases normally include a set of or substructures. Less common were also searched
tables that contain records or rows, all of which in structured databases under chemical, physical,
have the same structure well defined by a set of or biological properties of the chemical
fields or columns6. Each record is always assigned compounds. There are many considerations that
a unique "ID" or "Number" called as identifier or are involved in the construction and searching of
the primary keys, which are easily referenced. The chemical databases. Chemical structures that are
Chemical Abstracts Service Registry Number commonly stored in databases, such as text, differ
(CAS RN) is an example of the primary key for significantly from other entities therefore the
the structure in the REGISTRY database7. different search modes too differ significantly,
Structured databases are generally classified into however some matches can be drawn. The reason
two large groups in terms of its contents as for the existence of different databases is that each
bibliographic and factographic. The bibliographic of them have its own function, however, none of
databases usually do not have the full text of a them is perfectly a subset of any other. The
document, however it records information about a subsequent process for any chemical databases
single publication, patents, and similar using a specialized structural editor is to create a
documents8. Typical fields in bibliographic search query, give a chemical structure or
records are - author, article title, journal name, substructure of a search compound. JME Editor
volume, issue, year of publishing, pages, etc. The was the most widely used structural editor of this
Digital Object Identifier (DOI) is a comparatively kind but for the last couple of years or more have
new parameter, which describes the distinctive started to phase out this technology12.
placement of a document on the Web9. The Consequently, the creators of chemical databases
bibliographic databases are secondary sources that opt for JavaScript-based editors that is the
interpret, analyze, and summarize, the primary recognizable technology of the future. So, the best
source information to increase usability and speed among the structural editors nowadays is Marvin
of delivery, such as an online encyclopedia. JS, which is widely used in the application of
Moreover, factographic databases consist of Reaxys. The most exciting and commercially
specific information extracted from primary available program for drawing chemical formulas
documents, particularly in the area of chemistry is Chem Draw, which is marketed by Cambridge
that have details about chemical reactions and Soft. The most recent version of this editor
chemical compounds such as toxicological, permits the user to search for the diagrams
spectral, physical, or chemical characterization10. directly in SciFinder13.
It is must for the databases to allow the user to The commercial chemical databases of the
search for records by all field values as well as to resource are the most popular and most widely
create search queries for logical operators in order used web applications of Scopus, Reaxys,
to be regarded as an ideal database system. SciFinder and Web of Science (WoS), in which
Additionally, the thorough knowledge of the the above search technologies are possible and are
database structure, the syntax of the search more closely related to this article. The chemical
language, and specific IT skills are the databases are nowadays searched to give novel

58 Eclética Química Journal, vol. 45, n. 3, 2020, 57-68


ISSN: 1678-4618
DOI: 10.26850/1678-4618eqj.v45.3.2020.p57-68
Review

ideas for prime discovery. The comparison of the 2.1. PubChem


search possibilities of Reaxys, SciFinder and Web
of Science sources were published in 2016 14, and PubChem is a public repository for information
a comparison of two more chemically oriented on chemical substances and their biological
sources - Reaxys and SciFinder was published by activities. It is regarded as the grandfather of all
Jaroslav Silhanek15. In this paper, we will first free chemistry databases, which search over 8
describe the types of databases used in chemistry million compounds by a variety of criteria and is
and the possibilities of the most important systematized as three linked databases viz.
commercial tools. The authors chose some of the PubChem Substance, PubChem Compound, and
databases as the object of study on the assumption PubChem Bioassay16. PubChem is a database of
that these databases might provide the most chemical molecules and their activities against
informative and relevant results for a specific biological assays. The National Center for
query. As the main source of the selected database Biotechnology Information (NCBI) maintains its
for retrieving results from published journals, system. PubChem can be freely accessed through
books, patents, conference abstracts, and other a web user interface where millions of compound
available relevant resources. After a quick structures and descriptive datasets are freely
description of several existing databases, we will downloaded via FTP. PubChem contains the
also provide an overview of alternatives to freely descriptions of substance and small molecules
available chemical resources, which in some having lesser than 1000 atoms and 1000 bonds.
cases, may replace the commercial resources. And More than 350 database retailers add to the
finally, we will conclude by highlighting the developing PubChem database. PubChem have a
weaknesses and shortcomings of the database as significant amount of literature-derived bioactivity
well as recommend the ways for their best data of chemical substances which are manually
possible utilization. The research criteria adopted extracted from several thousands of scientific
were based on qualitative and quantitative articles by data contributors such as ChEMBL and
characteristics of the database such as source, BindingDB and additionally, through integration
citations, searching and special features by of data from Drug Bank, the Hazardous
analysing previous studies. Substances Data Bank and other databases17. The
databases of these databases complement to the
2. Experimental contents of PubChem.

The open Web compromises a rich pool of 2.2. Crystallography Open Database (COD)
various chemical data sources if the user knows
where to find out. It has been over many years Crystallography Open Database (COD) is an
since some emerging chemical databases were open-access collection of crystal structures of
dominated by a handful of established players, the organic, inorganic, metal-organics compounds,
field has practically opened up to a variety of and minerals, excluding biopolymers. This
innovative newcomers. Although some of the database is specifically designed to store
original databases are no longer active, it is information about the structure of molecules and
inspiring to see that several them continue to run crystals18. All data on this site have been placed in
and even flourish. It is of course more likely that the public domain by the contributors. The COD
still many more services will be created and some can provide a link to CIF if there is a CIF
of them will become irrelevant in the coming available somewhere in the internet. The
years. The Internet now offers a varied range of Crystallography Open Database has more than
free online chemistry databases, and this list is 360,000 entries and has various contributors, as
being continuously updated with new information well as contains CIFs as prescribed by the
and new entries. The following list summarizes International Union of Crystallography19.
some of the databases that are freely available for COD has a website
the users. http://www.crystallography.net which provides
proficiencies for all registered users to deposit
published or unpublished crystallographic
structures as personal communications or pre-
publication depositions. Having such sort of a

59 Eclética Química Journal, vol. 45, n. 3, 2020, 57-68


ISSN: 1678-4618
DOI: 10.26850/1678-4618eqj.v45.3.2020.p57-68
Review

setup that enables extension of the COD database companies. ZINC can be easily used for download
by several users simultaneously. It also increases using the website http://zinc.docking.org. It is
the chances for growth of the COD database and currently built from the catalogues of ten major
may be considered as one-step towards creating a compound vendors in several common file
worldwide Internet-based collaborative platform formats including SMILES, mol2, 3D SDF, and
committed to the collection and curation of DOCK flexi base format and the number of
structural knowledge. Each structure deposited molecules in ZINC is continuously growing. This
into the COD generate a unique seven-digit database has been designed in such a way that it
number, called COD number, which identifies a organizes data relationally so that it remains
particular illustration of a structure determination. compatible to attain the objectives of efficient
In general, COD does not accept duplicate loading, incremental updates, querying, and data
structures. subsetting. These steps make them fast and
efficient. Though exporting subsets of the
2.3. PubMed database can make them slow, but this problem
has been resolved by exporting the molecule
PubMed is a freely accessible web interface subsets from the database into ready-to-download
(since 1997) designed to search for records compressed files, and database-intensive work is
located primarily in the MEDLINE database of scheduled in batch mode. This totally bypasses the
references and abstracts. It comprises more than relational database and subsets are downloaded
28 million citations for biomedical literature from speedily once it is ready. The web-based interface
MEDLINE, life science journals, and online is fast as well as supports moderately complex
books20. Citations may include links to full-text queries and users may search ZINC based on
content from PubMed Central and publisher web several criteria. The ZINC server enables users to
sites. PubMed also provides access to older upload and process their own molecules, as we
references even from the print form of Index often come across molecules such as positive and
Medicus dating back to 1951 or earlier in addition negative controls that we need to dock that are not
to MEDLINE. This bibliographic database is part of the existing database22. Henceforth, ZINC
indexed by journal entries and other primary is much useful for virtual screening by experts and
sources related to medicine. There is also non-specialists equally and assist more
information about publications in the field of researchers to attempt computational ligand
medicinal chemistry or biochemistry. The discovery.
PubMed identifier (PMID) is the primary key
used in PubMed to identify the unknown in this 2.5. ChemSpider
database. The tool provided in PubMed facilitates
saving searches, filtering search results saving sets ChemSpider is an open access chemical
of references retrieved as part of a PubMed structure database, which provide rapid text and
search, configuring display formats of search structure search access to over 67 million
terms and the extensive range of further options. structures from hundreds of data sources.
PubMed records with recent increases in activity. ChemSpider is one of the chemistry community’s
primary online public compound databases.
2.4. ZINC ChemSpider serves data for tens of thousands of
chemists every day and it lays the foundation for
ZINC is a commercially available free many important international projects to integrate
database of compounds for virtual screening and chemistry and biological data, facilitate drug
this database has brought virtual screening discovery efforts and help to identify new
libraries to a comprehensive range of structural chemicals from under the ocean23. It is not just a
biologists and medicinal chemists. It contains search engine based on terabytes of chemistry
more than 35 million available compounds in data but also acts as a crowdsourcing community
ready-to-dock, 3D formats21. Due to its structure- for chemists those have contributed their data,
based virtual screening, it has numerous skills, and knowledge for the enhancement and
significant successes in recent years and is curation of the database. Therefore, it can be said
nowadays a common technique in initial stage of that ChemSpider seems like Wikipedia by
drug discovery in many of the pharmaceutical promising participation and contributions from the

60 Eclética Química Journal, vol. 45, n. 3, 2020, 57-68


ISSN: 1678-4618
DOI: 10.26850/1678-4618eqj.v45.3.2020.p57-68
Review

scientific community. ChemSpider can link open- For the known compound, it provides
and closed-access chemistry journals, extensive information like all its possible names
environmental data, PubChem, Chemical Entities and identifiers (both standard and nonstandard),
of Biological Interest (ChEBI), chemical vendors, experimental and calculated physicochemical
Wikipedia, The Kyoto Encyclopedia of Genes and properties, toxicity and biological activity data,
Genomes (KEGG), and few other patent spectra (NMR, IC, MS, UV-vis), publications,
databases24. These links allow a ChemSpider user patents, etc. The information that is available
to collect information of their interest, such as depends on what has been gained from the
from where to buy a chemical, chemical toxicity, original sources and the links to it are available.
metabolism data, and so on. Amassing this level The role of ChemSpider is to get information
of related information through a usual search about all the compounds available on the web at
engine like Google or Bing is a time-consuming one central location, make it easy to search and
process. Additional features have been added to standardize their structures and names. It also
each of the chemical structures within the improves the quality of chemical sources by using
database, such as structure identifiers like automated control of the structure and manual
SMILES, InChI, IUPAC, and Index Names, as management of collaborating experts as well as
well as many physico-chemical properties25. provides a platform for data input and storage.
ChemSpider also offers access to a series of Additionally, it tries to make it easy to access all
property prediction algorithms. The user can data using a web interface optimized for mobile
access this database by browsing devices, mobile applications, and web services for
http://www.chemspider.com/. The ChemSpider data capture. It does integrate data into the RSC
homepage as it appears on the desktop has been publication using the first links and use validated
shown in Fig. 1. The provider of this service has chemical names to search in Google Scholar,
been the Royal Society of Chemistry (RSC) since PubMed and RSC books, journals, and databases.
2009, which gives more value to other positive
and useful services.

Figure 1. ChemSpider homepage as on desktop.

61 Eclética Química Journal, vol. 45, n. 3, 2020, 57-68


ISSN: 1678-4618
DOI: 10.26850/1678-4618eqj.v45.3.2020.p57-68
Review

2.6. Google Scholar Despite of the similarities between them still there
are differences between them that are worth of a
Google Scholar is a freely available web-based detailed analysis.
search engine, available since 2004, that indexes
the full text or metadata of scholarly literature 3.1. Scopus
through a range of publishing formats and
disciplines. The indexed resources include online Scopus is the Elsevier’s largest abstract and
journals, conference papers, books, dissertations, citation database of peer-reviewed literature,
thesis, patents, and any other significant literature. which was launched in 2004. The literature covers
It is estimated that it contains approximately 389 more than 49 million records including scientific
million documents comprising articles, citations journals, conference proceedings, and books27.
and patents which makes it the world's largest Scopus offers a complete summary of the global
academic search engine in 2018 26. Google research output in the area of science,
Scholar has now become indispensable for engineering, medicine, humanities, and social
research and research dissemination that provides sciences. Scopus database is the leading
a systematized and instant process for users to searchable citation and abstract source for
build on through a sort of digital snowball for searching literature that is continuously expanding
literature retrieval. The reason for the excel of and updating. Scopus offers smart tools that have
Google can be attributed to its sophisticated the sorting and refining features to track, analyze
natural language processing. In addition to the and visualize research of more than 27 million
search, Google Scholar users are also able to citations and abstracts dating back to 1960s28.
create a personal profile with a list of their own Researchers across the globe believe that use of
publications and can generate census statistics and Scopus had positive influence on the research
H-indexes like that of Web of Science. finding as it is easy to use, saves time as well as
Nevertheless, if a user wishes to use a structured provides quality outcome. The content on Scopus
query in accordance to the field values in the is derived from over 5,000 publishers, which is
bibliographic record or to find documents that reviewed by an independent Content Selection
have not been issued, it is preferable to resort to and Advisory Board (CSAB) and then selected for
paid databases. The difficulties faced by the indexing in Scopus. The metadata that is provided
Google Scholar users are that they are not aware by publishers includes the following: authors
that when it is updated, includes old articles, as name, affiliations, document title, volume, issue,
well as no suggestions are provided for limiting pages, year, electronic identification (EID), source
searches. title, citation count, document type and digital
object identifier (DOI). This metadata is
3. Results and Discussion integrated to different websites and platforms,
which provides more precise search and enables
The three major commercial web database retrieval of scientific information. Scopus
applications that are widely used in the field of provides International Standard Serial Number
chemistry are Scopus, Web of Science (WoS) and (ISSN) for journals, conference series or book
SciFinder. All the three databases contain series for series publication and International
extensive search options and are somehow Standard Book Number (ISBN) for one-time
remarkably similar in their chemical content as conference or book publication29. The overall
well as in their search mode, search effectiveness view of the working pattern of Scopus is given in
and interface. They have periodically undergone Fig. 2.
significant overhauling However, the coverage of a journal by Scopus
and have intense competition between them. may be discontinued for a certain period i.e. it has
This competition has led to improvements in the breaks for some journals whereas for some
services offered by them, which is, however, journals Scopus makes a partial coverage. Several
advantageous for users. As these databases are studies can be found in the literature making
expensive, it is not feasible to have all these detailed descriptions of the main features of
databases, therefore the scientific libraries must Scopus and comparing the databases with the aim
decide that which citation database will meet the of assessing the number of citations obtained by a
requests of the consumers more effectively. particular set of documents in each of them.

62 Eclética Química Journal, vol. 45, n. 3, 2020, 57-68


ISSN: 1678-4618
DOI: 10.26850/1678-4618eqj.v45.3.2020.p57-68
Review

Studies have analyzed the set of journals covered navigate. Scopus have some extraordinary
by each database as well as their interface features such as: it allows the user to go both
accessibility and usability compared with Scopus, forwards and backwards in time by linking to both
from the point of view of the number of items citing and cited documents; it can link to the
included, and of testing the breadth of coverage. publisher's web site to view the document; citation
The rankings from Scopus and WoS match at the accuracy is so accurate that 99% of citing
top and the bottom but deviate considerably in the references and citing articles matched exactly; can
middle positions30. If the user is aware with search work in all the common web browsers like
devices such as drop-down boxes and check Chrome, Internet explorer or Mozilla.
boxes, even for the beginner Scopus is easy to

Figure 2. Working pattern of Scopus.

3.2. Web of Science past research and monitor current developments


for around 100 years of indexed content including
Web of Science is an ideal place to search the 59 million records as early as 1898 32. WoS were
citation universe across subjects and across the originally created by the Institute for Scientific
world as it provides an access to the most reliable, Information (ISI) and now is maintained by
integrated, multidisciplinary research that is Clarivate Analytics. WoS enables the user to
connected through linked content citation metrics acquire, analyze, and disseminate database
from multiple sources within a single interface31. information in a timely manner and is possible
As Web of Science adheres to a strict evaluation due to the creation of a common vocabulary,
process, it assures that only the most influential, called ontology, for varied search terms and
relevant, and credible information is included. The varied data. Furthermore, these search terms
selection is made based on impact evaluations i.e. generated relate information across categories.
Impact factor (IF) that is a measure reflecting the WoS platform provides access where the user can
yearly average number of citations to recent search individually or through a combination of
articles published in that journal and it includes topic, title, author or author ID, editor, conference,
open-access journals. Therefore, it allows the user language, journal title, digital object identifier
to uncover the subsequent vast idea quicker. WoS (DOI), year published, organization, address,
connects the complete search as well as discover document type, funding agency, grant number,
the process through Multidisciplinary Content; accession number, and PubMed ID33. The Web of
Subject Specific Content; Emerging Trends; Science Core Collection, as illustrated in Fig. 3,
Analysis Tools and Research Data. WoS precisely consists of the following six online databases:
indexes the utmost significant literature in the Science Citation Index Expanded; Social Sciences
world and has become the standard for research Citation Index; Arts & Humanities Citation Index;
discovery and analytics. WoS links publications Emerging Sources Citation Index; Book Citation
and researchers through citations and organized Index and Conference Proceedings Citation Index.
indexing in curated databases across every Apart from the seven citation indices listed,
discipline. It uses cited reference search to track additionally two chemistry databases, Index

63 Eclética Química Journal, vol. 45, n. 3, 2020, 57-68


ISSN: 1678-4618
DOI: 10.26850/1678-4618eqj.v45.3.2020.p57-68
Review

Chemicus and Current Chemical Reactions permit to provide a distinct picture of the full impact of
the creation of structure drawings, consequently research output, and act as an important tool for
allowing users to locate chemical compounds and data attribution and detection. WoS has the most
reactions. advanced features for citation analysis. It allows
Key features of Data Citation Index (DCI) on H-index to be honored that is now extensively
WoS is to search directly through millions of used to assess the quality of the scholar and is able
records from hundreds of evaluated data to search for multi-field quotes with respect to
repositories in the Sciences, Social Sciences, and other databases. The newer WoS features provide
Humanities34. Each DCI record links directly to search by the grant agency or the grant number. It
the repository so that users can quickly access the is also possible to export the found export logs in
associated research data. Citations to data sets are various formats to a file or a web-based version of
indexed so that the user can measure their impact EndNote's personal bibliographic database. For
as well as track their influence. The Data Citation subsequent import into another bibliographic
Index offers a single point of access to research database, it is appropriate to use the RIS
data from repositories across disciplines structured export scheme, which is a recognized
throughout the world. In this index, descriptive standard for these purposes. Controlling WoS is
records are generated for data objects and linked the easiest to learn with the help of a video
to literature articles in the Web of Science. As tutorial, which are available to foreign operators.
data citation practices increase, the resource aims

Figure 3. The Web of Science Core Collection.

3.3. SciFinder explore the chemical literature, thereby


eliminating the need to learn the intricacies of
SciFinder was launched by Chemical Abstract searching CAS. SciFinder offers easy, convenient,
Service (CAS) in 1995 as a desktop application and prompt access to CAS REGISTRY, the
tool for Medals of Chemical Literature35. Today's standard for substance information, proposing
application provides access to some databases more substances than any other single-source tool
produced by CAS, as well as to the freely including organic and inorganic molecules, DNA,
available MEDLINE bibliographic database. RNA, proteins, polymers and Markush
SciFinder is a sophisticated search interface to six structures36. On daily basis CAS scientists gather
basic chemical related databases. CAS itself and investigate published scientific literature
produces five of these databases. SciFinder across the globe, creating the best quality and
Scholar is designed so infrequent searchers can most up-to-date collection of scientific

64 Eclética Química Journal, vol. 45, n. 3, 2020, 57-68


ISSN: 1678-4618
DOI: 10.26850/1678-4618eqj.v45.3.2020.p57-68
Review

information in the world. Covering progresses in 3.3.3. CASREACT


chemistry and linked sciences for the last 150
years, the CAS content pool empowers It is a chemical information database that
researchers, and information professionals with provides access to over 10 million reactions from
instant access to the trustworthy information the journal literature and patents. Most of the
required to catalyze innovation. SciFinder offer a reactions are from the publications dating back to
direct search through the below mentioned 1985, but the records published in 1840 can also
database: be found40. If the SciFinder interface is used in the
REACTIONS section, which allows search
3.3.1. CAplus only according to data entered in the structural
editor, and the records from this database is
The main Chemical Abstracts literature shown. At present, the CASREACT database has
database of over 23 million references. It is a more than 83 million records, and each day adds
bibliographic database, which contains data from about 30,000 new responses.
the most important chemistry journals for the
CAS Source Index (CASSI), is available free of 3.3.4. CHEMCATS
charge, where the user can search by CODEN,
ISBN, ISSN, and naming or abbreviations for all It is a Supplier Chemicals Database, which
sources used by CAS since 1907 37. The lists suppliers of commercially available
bibliographic records from this database are chemicals worldwide. For each commercially
displayed in the SciFinder interface for relevant available compound, a link to its vendor that leads
references to other CAS databases, and in most to this database is available41. However, only
cases, references to the full texts of the document those suppliers who joins CAS CHEMCATS
are also found. program are able to find it. Nevertheless, it is
often possible to find a trusted supplier with a
3.3.2. CAS REGISTRY distinctly more favorable price than the usual
suppliers. Naturally, the current price of a
It is a substance database containing chemical is often quite different from what is
information about all the compounds that CAS reported in SciFinder.
has ever been abstracted from the literature. It has
a pool of more than 27 million organic and 3.3.5. CHEMLIST
inorganic substances and 57 million bio
sequences. Any previously unrecorded compound It is a regulatory chemicals database. It
that is added to the database is assigned the new includes chemicals that appear on a list of
CAS Registry Number (CAS RN), which is a very regulated chemicals (toxic, hazardous, etc.). If a
wide-ranging identifier of chemical compounds, compound is found in the CAS REGISTRY
often used in chemical vendor catalogues38. For database, it is also contained in the CHEMLIST
the most common chemicals, it is possible to database. The appropriate links can be found in
search CAS RN by name of the compound or to the compound record in the REGULATORY
find the name in a freely accessible web INFORMATION section42. Currently, this
application operated by CAS - Common database contains more than 348,000 entries, and
Chemistry39. If the user applies some forms and a about 50 new substances are added each week that
structural editor from the SUBSTANCES section are accumulated from the extensive group of
of the SciFinder interface, the results from the national and international regulatory lists and
database will be displayed, and in each record, the inventories.
user will be able to find the relevant links to the
other CAS databases, including the CAS 4. Conclusions
REGISTERS. Every day about 15,000 new
compounds are added to the database. CAS The dimension of almost all chemical
REGISTRY serves as a universal standard for databases has increased manifold in the last many
chemists worldwide. years so the search engines must be equally more
powerful. The outline of this research is the
usefulness of the databases for teaching, learning,

65 Eclética Química Journal, vol. 45, n. 3, 2020, 57-68


ISSN: 1678-4618
DOI: 10.26850/1678-4618eqj.v45.3.2020.p57-68
Review

and research as each chemical record retains the acquisition and usage of any of the databases
links to the original source of the material, thereby presented here, as there may be changes in scope,
associating a micro attribution and these links let a configuration, vendor, etc. Libraries willing to
database user source information of particular subscribe the database should make their choice
interest. Each of the commonly used chemical based on the needs of the library.
databases presented here has at least some overlap
with each of the remaining ones, which means 5. Acknowledgment
that each of these databases appears to have its
own “niche”. The user, looking for a variety The author is thankful to Jubail Industrial
would like to give attention to each of them. College for providing institutional access to the
These databases thus seem to be a valuable commercial websites for downloading the
resource to the chemical community as they offer research articles.
a large collection of compounds, either with
related sample availability or with a diverse and 6. References
unique structure set. As all the investigated
databases developed over the years, the detailed [1] Masic, I., Review of most important biomedical
results of these databases essentially signify a databases for searching of biomedical scientific
snapshot in time. The description reported here literature, Donald School Journal of Ultrasound in
Obstetrics and Gynecology 6 (4) (2012) 343-361.
may give a useful overview relative to some of the
https://doi.org/10.5005/jp-journals-10009-1258.
most important large chemical databases
available. In PubChem, unique chemical [2] Walters, W. P., Stahl, M. T., Murcko, M. A.,
structures are extracted from the Substance Virtual screening—an overview, Drug Discovery
database and stored in the Compound database Today 3 (4) (1998) 160-178.
that provides an accumulated interpretation of https://doi.org/10.1016/S1359-6446(97)01163-X.
information for a given chemical structure. COD
database establishes a worldwide Internet-based [3] Hunter, L., Cohen, K. B., Biomedical language
collaborative platform committed to the collection processing: what's beyond PubMed? Molecular Cell 21
and curation of structural knowledge. PubMed (5) (2006) 589-594.
https://doi.org/10.1016/j.molcel.2006.02.012.
provided a general description of PubMed
including its content and unique characteristics. [4] Tenopir, C., Ro, J. S., Full Text Databases,
ChemSpider provides the variety of information Greenwood Press, Westport, 1990.
of a given compound including physical and
chemical properties, molecular structure, synthetic [5] Bar-Ilan, J., Which h-index? - A comparison of
methods, spectral data, and systematic WoS, Scopus and Google Scholar, Scientometrics 74
nomenclature for millions of compounds in a (2008) 257-271. https://doi.org/10.1007/s11192-008-
single Web site. The ZINC database provides 3D 0216-y.
molecules in several formats compatible with
most docking programs. Google Scholar helps to [6] Chang, K. C-C., He, B., Li, C., Patel, M., Zhang,
Z., Structured databases on the web: Observations and
identify the collection of publications for a
implications, ACM SIGMOD Record 33 (3) (2004) 61-
specific research topic. There is a high association 70. https://doi.org/10.1145/1031570.1031584.
between WoS and Scopus databases that allows
searching and sorting the queries by anticipated [7] Dittmar, P. G., Stobaugh, R. E., Watson, C. E., The
parameters such as first author, citation, and Chemical Abstracts Service Chemical Registry System.
institution, etc. regarding impact factor and h- I. General Design, Journal of Chemical Information
index. SciFinder meets its goal of effectively and Computer Sciences 16 (2) (1976) 111-121.
exploring the scientific literature and the search https://doi.org/10.1021/ci60006a016.
results are mostly truly relevant and often
astonishingly inclusive regardless of the level of [8] Wright, K., McDaid, C., Reporting of article
retractions in bibliographic databases and online
complexity or syntax of the query. The database
journals, Journal of the Medical Library Association 99
that ought to be used depends on the user and (2) (2011) 164-167. https://doi.org/10.3163/1536-
desired information. Therefore, the user must 5050.99.2.010.
investigate the up-to-date condition of the specific
database before establishing a decision of

66 Eclética Química Journal, vol. 45, n. 3, 2020, 57-68


ISSN: 1678-4618
DOI: 10.26850/1678-4618eqj.v45.3.2020.p57-68
Review

[9] Paskin, N., Digital Object Identifier (DOI®)


System, In: Encyclopedia of Library and Information [19] Hall, S. R., Allen, F. H., Brown, I. D., The
Sciences, Bates, M. J., Maack, M. N., ed., CRC Press: crystallographic information file (CIF): a new standard
Boca Raton, 3rd ed., 2010, Ch. 7. archive file for crystallography, Acta Crystallographica
https://doi.org/10.1081/E-ELIS3-120044418. Section A A47 (1991) 655-685.
https://doi.org/10.1107/S010876739101067X.
[10] Conklin, D., Fortier, S., Glasgow, J., Knowledge
discovery in molecular databases, IEEE Transactions [20] Van Buskirk, N. E., The review article in
on Knowledge and Data Engineering 5 (6) (1993) 985- MEDLINE: ambiguity of definition and implications
987. https://doi.org/10.1109/69.250082. for online searchers, Bulletin of the Medical Library
Association 72 (4) (1984) 349-352. PMCID:
[11] Ertl, P., Molecular structure input on the web, PMC227511
Journal of Cheminformatics 2 (1) (2010) 1-9.
https://doi.org/10.1186/1758-2946-2-1. [21] Irwin, J. J., Shoichet, B. K., ZINC-a free database
of commercially available compounds for virtual
[12] Bienfait, B., Ertl, P., JSME: a free molecule editor screening, Journal of Chemical Information and
in JavaScript, Journal of Cheminformatics 5 (24) Modeling 45 (1) (2005) 177-182.
(2013) 1-6. https://doi.org/10.1186/1758-2946-5-24. https://doi.org/10.1021/ci049714+.

[13] Mendelsohn, L. D., ChemDraw 8 Ultra, Windows [22] Sterling, T., Irwin, J. J., ZINC 15 – Ligand
and Macintosh Versions, Journal of Chemical Discovery for Everyone, Journal of Chemical
Information and Computer Sciences 44 (6) (2004) Information and Modeling 55 (11) (2015) 2324-2337.
2225-2226. https://doi.org/10.1021/ci040123t. https://doi.org/10.1021/acs.jcim.5b00559.

[14] Bharti, N., Leonard, M., Singh, S., Review and [23] Pence, H. E., Williams, A., ChemSpider: An
Comparison of the Search Effectiveness and User Online Chemical Information Resource, Journal of
Interface of Three Major Online Chemical Databases, Chemical Education 87 (11) (2010) 1123-1124.
Journal of Chemical Education 93 (5) (2016) 852-863. https://doi.org/10.1021/ed100697w.
https://doi.org/10.1021/acs.jchemed.5b00601.
[24] Hettne, K. M., Williams, A. J., van Mulligen, E.
[15] Šilhánek, J., Comparisons of the most important M., Kleinjans, J., Tkachenko, V., Kors, J. A., Erratum
chemistry databases - Scifinder program and reaxys to: Automatic vs. manual curation of a multi-source
database system, Chemicke Listy 108 (1) (2014) 81- chemical dictionary: the impact on text mining, Journal
106. of Cheminformatics 2 (4) (2010) 1-7.
https://projekty.upce.cz/sites/default/files/groups/admin https://doi.org/10.1186/1758-2946-2-4.
s/luva3059/2014_01_83-90.pdf.
[25] Williams, A., Tkachenko, V., The Royal Society
[16] Wang, Y., Xiao, J., Suzek, T. O., Zhang, J., Wang, of Chemistry and the delivery of chemistry data
J., Bryant, S. H., PubChem: a public information repositories for the community, Journal of Computer-
system for analyzing bioactivities of small molecules, Aided Molecular Design 28 (10) (2014) 1023-1030.
Nucleic Acids Research 37 (2 Suppl) (2009) W623- https://doi.org/10.1007/s10822-014-9784-5.
W633. https://doi.org/10.1093/nar/gkp456.
[26] Zientek, L. R., Werner, J. M., Campuzano, M. V.,
[17] Kim, S., Thiessen, P. A., Bolton, E. E., Chen, J., Nimon, K., The Use of Google Scholar for Research
Fu, G., Gindulyte, A., Han, L., He, J., He, S., and Research Dissemination, New Horizons in Adult
Shoemaker, B. A., Wang, J., Yu, B., Zhang, J., Bryant, Education & Human Resource Development 30 (1)
S. H., PubChem Substance and Compound databases, (2018) 39-46. https://doi.org/10.1002/nha3.20209.
Nucleic Acids Research 44 (D1) (2016) 1202-1213.
https://doi.org/10.1093/nar/gkv951. [27] Burnham, J. F., Scopus database: a review,
Biomedical Digital Libraries 3 (1) (2006) 1-8.
[18] Gražulis, S., Daškevič, A., Merkys, A., https://doi.org/10.1186/1742-5581-3-1.
Chateigner, D., Lutterotti, L., Quirós, M.,
Serebryanaya, N. R., Moeck, P., Downs, R. T., Le Bail, [28] Bar-Ilan, J., Tale of Three Databases: The
A., Crystallography Open Database (COD): an open- Implication of Coverage Demonstrated for a Sample
access collection of crystal structures and platform for Query, Frontiers in Research Metrics and Analytics 3
world-wide collaboration, Nucleic Acids Research 40 (6) (2018) 1-9.
(D1) (2012) D420-D427. https://doi.org/10.3389/frma.2018.00006.
https://doi.org/10.1093/nar/gkr900.

67 Eclética Química Journal, vol. 45, n. 3, 2020, 57-68


ISSN: 1678-4618
DOI: 10.26850/1678-4618eqj.v45.3.2020.p57-68
Review

[29] Ball, R., Tunger, D., Science Indicators Revisited


– Science Citation Index versus SCOPUS: A [39] Fisanick, W., Mitchell, L. D., Scott, J. A., Stouw,
Bibliometric Comparison of Both Citation Databases, G. G. V., Substructure Searching of Computer-
Information Services & Use 26 (4) (2006) 293-301. Readable Chemical Abstracts Service Ninth Collective
https://doi.org/10.3233/ISU-2006-26404. Index Chemical Nomenclature Files, Journal of
Chemical Information and Computer Sciences 15 (2)
[30] Gavel, Y., Iselid, L., Web of Science and Scopus: (1975) 73-84. https://doi.org/10.1021/ci60002a003.
A journal title overlap study, Online Information
Review 32 (1) (2008) 8-21. [40] Blake, J. E., Dana, R. C., CASREACT: more than
https://doi.org/10.1108/14684520810865958. a million reactions, Journal of Chemical Information
and Computer Sciences 30 (4) (1990) 394-399.
[31] Jacso, P., As we may search – Comparison of https://doi.org/10.1021/ci00068a008.
major features of the Web of Science, Scopus, and
Google Scholar citation-based and citation-enhanced [41] Cavaller, V., Software review: SciFinder,
databases, Current Science 89 (9) (2005) 1537-1547. International Journal of Competitive Intelligence,
https://www.jstor.org/stable/24110924?seq=1#metadat Strategic, Scientific and Technology Watch SciWatch
a_info_tab_contents. Journal 1 (1) (2008) 15-17.
https://hexalog.files.wordpress.com/2008/05/2-_-
[32] Bar-Ilan, J., Web of Science with the Conference scifinder-english2.pdf.
Proceedings Citation Indexes: the case of computer
science, Scientometrics 83 (3) (2010) 809-824. [42] Murray-Rust, P., Chemistry for everyone, Nature
https://doi.org/10.1007/s11192-009-0145-4. 451 (2008) 648-651. https://doi.org/10.1038/451648a.

[33] Meho, L. I., Rogers, Y., Citation counting, citation


ranking, and h-index of human-computer interaction
researchers: A comparison of Scopus and Web of
Science, Journal of the American Society for
Information Science and Technology 59 (11) (2008)
1711-1726. https://doi.org/10.1002/asi.20874.

[34] Robinson‐Garcia, N., Jiménez‐Contreras, E.,


Torres-Salinas, D., Analyzing data citation practices
using the data citation index, Journal of the Association
for Information Science and Technology 67 (12)
(2016) 2964-2975. https://doi.org/10.1002/asi.23529.

[35] Somerville, A. N., SciFinder Scholar (by


Chemical Abstracts Service), Journal of Chemical
Education 75 (8) (1998) 959-960.
https://doi.org/10.1021/ed075p959.

[36] Cain, R., Schwall, K., Guiding your literature


searching, Chemtech: the innovator's magazine 25 (8)
(1995) 8-11. ISSN: 0009-2703.

[37] Baykoucheva, S., Comparison of the


Contributions of CAPLUS and MEDLINE to the
Performance of SciFinder in Retrieving the Drug
Literature, Issues in Science and Technology
Librarianship 66 (2011) 1-17.
https://doi.org/10.5062/F42Z13FT.

[38] Fisanick, W., Cross, K. P., Rusinko III, A.,


Similarity searching on CAS Registry substances. 1.
Global molecular property and generic atom triangle
geometric searching, Journal of Chemical Information
and Computer Sciences 32 (6) (1992) 664-674.
https://doi.org/10.1021/ci00010a013.

68 Eclética Química Journal, vol. 45, n. 3, 2020, 57-68


ISSN: 1678-4618
DOI: 10.26850/1678-4618eqj.v45.3.2020.p57-68

You might also like