Professional Documents
Culture Documents
Chapter 19
Chapter 19
Introduction
The first Neural Network (NN) model goes back to 1943 when it
was proposed by Warren McCulloch (18981969), a neurophysiologist, and Walter Pitts (19231969), a mathematician. The first
attempt to use NN in computers took place in 1956 by a group of
IBM researchers and neuroscientists from McGill University in
Canada. Not only neurosciences have contributed to the development of Artificial Neural Networks (ANN). For instance, the
network unit known as a Perceptron was proposed in 1957 by
Frank Rosenblatt (19281971), a psychologist, and is still in use
due to its simplicity.
In addition, the development of computer science has followed the way of the mathematician John von Neumann. In 1947,
he designed a structure based on sequential processing of data and
instructions. This structure rigorously follows a sequence defined
by data stored in memory. Von Neumanns architecture is based on
the logic of procedures, normally used for partial solutions in some
problems to link the specific problem with the result. Although the
Hugh Cartwright (ed.), Artificial Neural Networks, Methods in Molecular Biology, vol. 1260,
DOI 10.1007/978-1-4939-2239-0_19, Springer Science+Business Media New York 2015
319
320
QSAR/QSPR
321
322
QSAR/QSPR
323
324
Methodology
Extraction is an important technique in ANN, even though this
has a cost in resources and effort.
In artificial intelligence, the term of explanation refers to an
explicit structure which can be used internally within, for example,
an ANN, to elucidate and understand information and then explain
the results obtained to the user. The information can be externalized with object hierarchy, semantic networks, frames, and so on.
This explanation also includes the steps of the reasoning process or
intermediate steps of the process as the trace of activity rules or test
structure. This can assist the user in checking the internal logic of
the system, a particularly helpful step if the explanations are very
coarse or limited.
In many systems, there are deficiencies in rule extraction
because it is very difficult to generate the explanatory structure of
the process, even though the helpful capacity and quality of explanation are important in use of a trained ANN. The quality of explanation refers to how direct the task of extraction from ANN is. For
example, Rule of trails may be used to capture an explanation, but
it has been shown that this approach is often too rigid or inflexible,
since these rules may contain references about internal calculations
and repetitions [6].
2.1
Software
QSAR/QSPR
325
QSAR/QSPR
For both financial and social reasons, the drug industry would like
to develop new drugs in much shorter times than is possible using
a purely experimental (lab-based) approach. A large number of
novel compounds are synthesized each year. The number of
molecules that can be obtained by organic synthesis is so high that
the probability of choosing randomly a molecule with the desired
biological activity is practically nil. In this regard, medicinal chemistry techniques that allow discovery and optimization of new
templates are used. However, a large fraction of these compounds
are not tested for physicochemical properties or biological activities, which still remain unknown owing to unavailability of the
compounds or possible risks of toxicity. A method able to predict,
within a realistic error boundary, the biological activities of untested
compounds is necessary to evaluate these molecular features in a
rapid and inexpensive approach.
Among these methods, computer aided design has found its
way to the rational design of new compounds; the most important
practice is known as Quantitative StructureActivity Relationship
(QSAR). In recent years, numerous quantitative structureactivity/property relationship QSAR/QSPR models have been introduced for calculating the biological activities from a variety of
numerical descriptors of chemical structures. These relationships
develop correlations between the descriptors of individual compounds and their biological activity/chemical properties. This
approach has proved especially productive when coupled with solid
optimizing tools that help to narrow down the field of potential
molecules to just the most promising candidates, namely, Genetic
Algorithms, Ant Colony Optimization, Support Vector Machines,
and Artificial Neural Networks.
An inevitable difficulty, when dealing with molecular descriptors, is that of collinearity, which frequently exists between independent variables. In the worst case, this creates a rigorous problem
in certain types of mathematical treatment such as matrix inversion
[13], though collinearity of molecular descriptors is not normally
so complete that matrix inversion will fail. A better predictive
model can be obtained by orthogonalization of the variables by
means of principal component analysis (PCA) and the subsequent
method is called principal component regression (PCR) [14].
In order to decrease the dimensionality of the independent variable space, a restricted number of principal components (PCs) are
used. For this reason, the choice of significant and informative PCs
is an important task in PCAbased calibration methods [15, 16].
Diverse methods have been proposed to select the important
PCs for calibration purposes. The simplest and most frequent one
is a topdown variable selection where the factors are ranked in the
326
order of decreasing eigenvalue. The factor with the highest eigenvalue is considered as the most important one; further factors are
then progressively introduced into the calibration model until no
significant improvement of the calibration model is obtained. In
another method, correlation ranking, the factors are ranked by their
correlation coefficient with the property to be correlated (i.e., biological activity) and selected by the procedure of eigenvalue ranking.
QSAR is a mathematical model that attempts to relate the
structure-derived features of a compound to its biological or physicochemical activity and it is used to understand drug action as well
as to design new compounds. Elaboration of QSAR models starts
with the collection of a representative set of molecules with the
desired biological activity, and a control group that does not possess the activity under study. This is followed by the selection of
molecular descriptors obtained by a mathematical logic that transforms the chemical and/or structural information about a molecule into a set of numbers. In favorable cases, the molecular
descriptors can then be related to the desired biological activity. In
order to validate the descriptors, the full data set is divided into a
training set, and a learning test set. In the learning process, there
are many ways of building a model, among which we can mention
Multiple Linear Regression (MLR), logistic regression, or machine
learning methods. The optimal model is obtained from the modeling parameters and features. Finally, this model is validated to
ensure that it will be useful (see Fig. 4) [17].
A QSAR model correlates, within congeneric molecules, many
biological activities such as inhibition constants, affinities of ligands
Data
Features
Training set
Testing set
Learning
Model
Validation
QSAR/QSPR
327
to their binding sites and rate constants, with either structural features or with atomic group or molecular properties such as solubility, lipophilicity, electronic effects, ionization, and stereochemistry.
The structural features were proposed by Free and Wilson [18]
and the use of molecular properties was proposed by Hansch and
Fujita [19], both published independently in 1964. QSAR has
been important in the integration of ADMET (absorption, distribution, metabolism, excretion, and toxicity) profiling and prediction, which is mostly dependent on a type of molecular descriptors
as described by the Lipinski Rule of 5. This procedure is applied to
improve our understanding of structureactivity links, to identify
hazardous compounds at low cost, and to reduce the number of
animals used for experimental toxicity testing. In the field of QSAR
and in the prediction of animal toxicity, various artificial intelligence techniques have been proposed and developed such as ANN
and statistical understanding of knowledge networks [2022].
There are different tools that have been used to predict the
activity of molecules that can present some biological activity; MLR,
nonlinear regression using ANN and molecular simulations from
molecular recognition (Docking) are also used in QSAR [23].
The calculation of chemical descriptors is an important tool for
QSAR applications, because it makes possible prediction of toxic
properties, harmful characteristics, and pharmacological biological
properties that a group of molecules could present. Many chemical
descriptors can be measured experimentally but this is cost and
time expensive, and therefore development of QSAR tools is useful
to replace or augment measurement.
Most QSAR/QSPR practice uses quantum-mechanical descriptors.
QSAR analysis uses a variety of descriptors including:
3.1 QSAR-ANN
Application
Topological,
Quantum chemical,
2D-autocorrelation,
AssemblY
328
329
QSAR/QSPR
MLR-M86A
Training (86 molecules)
Predicted (20 molecules)
ANN-M66B
Training (66 molecules)
Predicted (40 molecules)
NIH11167
NIH11179
NIH11100
NIH11179
NIH10945
NIH11168
NIH11178
NIH11168
NIH11178
NIH11154
log K m
exp
log K m
NIH11181
NIH11188
NIH11154
exp
NIH11068
NIH11072
0
1
1
NIH 11081
2
2
1
exp
log K m
1
exp
log K m
Fig. 6 Examples of data set calculated by Multiple Linear Regression (solid line), ideal line (dashed line), and
residual line limits (dotted line, cutoff = |1.0|). These were the best ANN calculated results [32]
330
QSAR/QSPR
331
332
Acknowledgement
AC Martnez-Olgun wishes to thank CONACyT for a graduate
scholarship. The English was kindly reviewed by Miss Dsire
Argott.
References
1. de la Fuente Aparicio Ma J, Calonge Cano T
(1999) Aplicaciones de las redes de neuronas
en supervisin, diagnosis y control de procesos. Universidad Simn Bolvar, CaracasVenezuela, p 11
2. Brgains JC (2007) Anlisis sntesis y control
de diagramas de radiacin generados por distribuciones continuas y discretas utilizando
algoritmos estadsticos y redes neuronales artificiales aplicaciones. Universidad de Santiago
de Compostela, Galicia-Spain, p 104
3. Flores Lpez R, Fernndez Fernndez JM
(2008) Las redes neuronales artificiales
Fundamentos tericos y aplicaciones prcticas.
Netbiblo, S.L., Spain, pp 1214, 17, 21
4. Pino Diez R, Gmez Gmez A, de Abajo
Martnez N (2001) Introduccin a la inteligencia artificial: Sistemas expertos, redes neuronales
artificiales y computacin evolutiva. Universidad
de Oviedo, Asturias-Spain, pp 27, 28
5. Douali L, Villemin D, Cherqaoui D (2003)
Neural networks: accurate nonlinear QSAR
6.
7.
8.
9.
10.
11.
12.
QSAR/QSPR
13. Vendrame R, Braga RS, Takahata Y et al
(1999) Structureactivity relationship studies
of carcinogenic activity of polycyclic aromatic
hydrocarbons using calculated molecular
descriptors with principal component analysis
and neural network methods. J Chem Inf
Comput Sci 39:10941104
14. Jolliffe IT (1986) Principal component analysis. Springer, New York, NY
15. Xie YL, Kalivas JH (1997) Evaluation of principal component selection methods to form a
global prediction model by principal component regression. Anal Chim Acta 348:1927
16. Depczynski U, Frost VJ, Molt K (2000)
Genetic algorithm applied to the selection of
factors in principal component regression.
Anal Chim Acta 420:217227
17. Dehmer M, Varmuza K, Bonchev D (2012)
Statistical modelling of molecular descriptors
in QSAR/QSPR. Wiley, Germany, p 2
18. Free SM, Wilson JA (1964) A mathematical
contribution to structure-activity studies. J
Med Chem 7:395399
19. Hansch C, Fujita T (1964) p-- Analysis. A
method for the correlation of biological activity and chemical structure. J Am Chem Soc
86:16161626
20. Hansch C, Leo A, Hoekman D (1995)
Exploring QSAR: fundamentals and applications in chemistry and biology. American
Chemical Society, Washington, DC
21. Hansch C, Leo A, Hoekman D (1995)
Exploring QSAR: volume 2: hydrophobic,
electronic, and steric constants. American
Chemical Society, Washington, DC
22. Helma C, Kramer S (2003) A survey of the
predictive toxicology challenge 20002001.
Bioinformatics 19:11791182
23. Waterbeemd H, Gifford E (2003) ADMET in
silico modeling: towards prediction paradise?
Nat Rev Drug Discov 2:192204
24. Karelson M (2000) Molecular descriptors in
QSAR/QSPR. Wiley-Interscience, New York,
USA
25. Ramrez-Galicia G, Garduo-Jurez R,
Hemmateenejad B et al (2007) QSAR study on
the relaxant agents from some Mexican medicinal plants and synthetic related organic compounds. Chem Biol Drug Des 70:143153
26. Marini F, Bucci R, Magr AL et al (2008)
Artificial neural networks in chemometrics:
history, examples and perspectives. Microchem
J 8:178185
27. Fernndez EF, Almonacid F, Sarmah N et al
(2014) A model based on artificial neuronal
network for the prediction of the maximum
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
333