Professional Documents
Culture Documents
Efficient Internet Searches For Chemists
Efficient Internet Searches For Chemists
Efficient Internet Searches For Chemists
DOI: 10.21767/2470-6973.100012
© Under License of Creative Commons Attribution 3.0 License | This article is available in: www.cheminformatics.imedpub.com/archive.php 1
2015
Chemical informatics
ISSN 2470-6973 Vol. 1 No. 2: 12
Google cannot index an Oracle [12], MySQL [13] database etc. Profiles
If the data are not in an html file, or are server side generated
asp/php etc. pages the data will not appear in the Google index A profile is a selection of databases that are relevant for specific
[14,15]. For instance AKosSamples is a MySQL database and you searches. If you want to buy a compound, you can choose to
need a special interface to search the database. This problem search only over databases that provide supplier information. In
does not exist in a federated search if access to the database is a federated search over the Internet it is yet impossible to use
provided. For examples as it is for AKosSamples in iScienceSearch. a logical “and”. If the original source can interpret a query like
“pyridine and carcinogenic”, you will get only answers where
The heading to this paragraph was “Why would you use pyridine is connected with carcinogenic. However, you cannot
iScienceSearch and not Google”. For chemical questions indeed draw a structure and type carcinogenic and expect to find only
it makes more sense to use iScienceSearch instead of Google. such structures that are carcinogenic. This would mean that the
In the following we compare iScienceSearch to databases. system needs to collect all answers from the Internet, builds a
Here it depends on your question if you start with a database local cache (database) and filters the search. A profile helps to
or iScienceSearch, or use both. For some searches a database is overcome this limitation. If you want to find LD50 ties, you search
the better choice. You can use Boolean logic in your searches and in the profile “Toxicity” and search only over databases that
restrict your searches to certain fields in the database. hopefully offer a LD50. However, an LD50 can always be mentioned
in a journal article. Then you should extend your search over the
Why would you use iScienceSearch and PubChem? profile “Literature”. Another strategy is to begin searching over
PubChem is a database and there are time delays, see below. “All Sources” and use the sort, group, and filter methods in the
No system can be comprehensive. Building a database with result page.
all suppliers is just too expensive. For instance PubChem has
155 suppliers, CHEMCATS has ca. 880 [9], eMolecules [16] ca. Additional features
140, ChemSpider [17] has in total 493 sources; ChemExper Some of the iScienceSearch tools fall in the category of predicting
[18] lists more than 1500 suppliers. Experience has shown that data. iScienceSearch shows links to experimental data where
iScienceSearch is the system of choice if you are searching for possible. Some features are convenient, like generating structure
suppliers of research chemicals, because with the exception of from text. Other tools are there to compare results of the different
CHEMCATS all these and 26 more directories of suppliers can be databases, to discover error and discrepancies.
searched in iScienceSearch in one go.
Example: Search for toxicity: Suppose we want to learn more
Why would you use iScienceSearch and SciFinder? about adverse effects of the structure shown in Figure 1. For a
comparison, we make a search in SciFinder and iScienceSearch.
The foremost reason to use iScienceSearch is cost. iScienceSearch
Neither SciFinder nor iScienceSearch find something under
is free. With the exception of CHEMCATS, the basis of the Chemical
toxicity (or adverse effects). In SciFinder, we look for biological
Abstract database are journals, patents, dissertations and other
studies and find 23 references. In iScienceSearch, we use the
high quality sources [19], but not other databases like ChEMBL
profile “Drug Info” and get an overview as to which database
[20] which collect also high quality data. It should be obvious that
contains information about this compound (Figure 2).
not everything is in SciFinder. A few examples are at the end of
the paper. There is one more reason why a chemist should also PubChem, ChEMBL, [21-25] DrugBank, [26] etc. have very detailed
search in iScienceSearch, it is the “Extended Search”. data, and very often a good overview of the results. Nobody
questions the usefulness of SciFinder as a literature search tool,
Extended search but you do not get an overview as to which database will provide
We chemist have solved the issue of similarity by using detailed information. In PubChem, you get a nice overview of
substructure, and similarity searches with chemical structures. It articles, and widgets display the results in iScienceSearch, see
is extremely limiting that many databases on the Internet cannot BioActivity window in Figures 3 and 4. You can select those
be searched by structure. In iScienceSearch we implemented the references first, where the compound is found to be active. In
extended search. This means when you draw a structure, or type ChEMBL you get pie charts that help you getting a fast overview
a chemical name, iScienceSearch searches in the background of the activities of a compound.
databases and finds concordances of structure, identification
numbers (i.e., CAS Registry Number or AKos Number), and Comparison of iScienceSearch with other
names. For Aspirin you will find about 200 different names, and databases
it would be too time consuming to do 200 extra searches in the
background. iScienceSearch limits the names to about 20 most No database is as up-to-date as a snapshot of the current status
important ones. In the background iScienceSearch searches for of the Internet. This means you will not find certain compounds.
instance by different InChIs, CAS Registry Number and names. Try to find in SciFinder the following structures. We made the
search on August 6, 2013, and July 23, 2015.
The result is that you start with a structure and get answers from
a database that can only be searched by CAS Registry Number Go to http://www.ncbi.nlm.nih.gov/pcsubstance?cmd=search
(see list of databases in the Table 1 for examples), or you start &term=all%5Bfilt, and try to find the latest compounds that are
with a name like maslinic acid and get perfectly correct results recorded in PubChem, and you will not find a link to it in Google.
where only the synonym crategolic acid appears. Even Google takes time to update its index.
from NCI/CADD group (http://cactus.nci.nih.gov/chemical/ features) the CAP (Chemical Activity Predictor) web service
structure). For predicting chemical properties, (see Additional provided by NCI/CADD group is used.
Summary by text. The extended search makes it possible to widen the query.
With a structure search you find answers in databases, which for
ScienceSearch provides one user interface to search many example can only be searched by CAS Registry Number or text.
databases on the Internet. The advantage is that one gets a quick
overview as to which source contains relevant information about iScienceSearch provides a short list of links with the numbers of
a compound. iScienceSearch is unique as an Internet search hits in each source. This makes it easy to pick the most relevant
engine, because it allows you to search by structure, and not only answers.
4 CAS (2013) "CAS REGISTRY and CAS Registry Number FAQs" [Online]. 22 ACD/Labs (2013) "ACD/Labs" [Online]. Available: http://www.
Available: http://www.cas.org/content/chemical-substances/faqs. acdlabs.com/home/.
5 Clove (2015) [Online]. Available: https://en.wikipedia.org/wiki/Clove. 23 ChemAxon (2013) "Chemicalize.org" [Online]. Available: http://
www.chemicalize.org/.
6 AKos GmbH (2013) “AKosSamples" [Online]. Available: http://www.
akosgmbh.de/AKosSamples/index.html. 24 Poroikov V, Filimonov D, Gloriozova T, Lagunin A (1998) "Prediction
of Biological Activity Spectra for Substances: in House Applications
7 NCI (2013) "PubMed" [Online]. Available: http://www.ncbi.nlm.nih. and Internet Feasibility" [Online]. Available: http://www.akosgmbh.
gov/pubmed. de/pass/PASS_Overview.htm.
8 PubChem -Substance Data Source Information (2013) [Online]. 25 ChemAxon (2015) "chemicalize.org" [Online]. Available: http://www.
Available: http://pubchem.ncbi.nlm.nih.gov/sources/sources.cgi. chemicalize.org/.
9 CAS (2013) "Chemical Suppliers - CHEMCATS - Find commercially 26 DrugBank (2015) [Online]. Available: http://www.drugbank.ca/.
available chemicals, pricing, and supplier contact information"
[Online]. Available: http://www.cas.org/content/chemical-suppliers. 27 PubChem Compound (2013) "PubChem Compound - ST50039568
- Compound Summary (CID 5327091) -keto form" [Online].
10 Hitchcock A (2005) "Google's BigTable" [Online]. Available: http://
Available: http://pubchem.ncbi.nlm.nih.gov/summary/summary.
andrewhitchcock.org/?post=214.
cgi?cid=5327091.
11 IUPAC (2013) "The IUPAC International Chemical Identifier (InChI)"
28 PubChem (2013) "Deposited Record (SID 26755036) -enol form"
[Online]. Available: http://www.iupac.org/home/publications/e-
15/08/2013. [Online]. Available: http://pubchem.ncbi.nlm.nih.gov/
resources/inchi.html.
summary/summary.cgi?sid=26755036&viewopt=Deposited.
12 Oracle (2015) [Online]. Available: http://www.oracle.com/index.html.
29 Elsevier (2013) "Reaxys: Chemistry Workflow Solution” [Online].
13 MySQL (2015) [Online]. Available: http://www.oracle.com/us/ Available: http://www.elsevier.com/online-tools/reaxys.
products/mysql/overview/index.html.
30 ACS (2013) "ACS Publications" [Online]. Available: http://pubs.acs.org/.
14 Crawling & Indexing (2015) [Online]. Available: http://www.google.
com/insidesearch/howsearchworks/crawling-indexing.html. 31 University of Konstanz "KonSearch" (2012) [Online]. Available:
[Accessed 24/7/2015]. http://www.ub.uni-konstanz.de/digitale-bibliothek/konsearch/.
15 Can Google crawl into a database? (2015) [Online]. Available: https:// 32 University of Heidelberg "HEIDI" (2013) [Online]. Available: http://
www.webmasterworld.com/google/3013128.htm. www.ub.uni-heidelberg.de/helios/kataloge/heidi.html.
16 eMolecules (2015) [Online]. Available: https://www.emolecules. 33 Wikipedia, the free encyclopedia (2013) [Online]. Available: http://
com/. en.wikipedia.org/wiki/Metasearch_engine.
17 ChemSpider (2013) "Data Sources" [Online]. Available: http://www. 34 Scilligence - Software for Life Science (2015) [Online]. Available:
chemspider.com/DataSources.aspx. http://www.scilligence.com/web/.