Professional Documents
Culture Documents
Computational Toxicology: Methods and Protocols
Computational Toxicology: Methods and Protocols
Computational
Toxicology
Methods and Protocols
Methods in M o l e c u l a r B i o lo g y
Series Editor
John M. Walker
School of Life and Medical Sciences
University of Hertfordshire
Hatfield, Hertfordshire, AL10 9AB, UK
Edited by
Orazio Nicolotti
Dipartimento di Farmacia-Scienze del Farmaco,
Università degli Studi di Bari Aldo Moro, Bari, Italy
Editor
Orazio Nicolotti
Dipartimento di Farmacia-Scienze del Farmaco
Università degli Studi di Bari Aldo Moro
Bari, Italy
This Humana Press imprint is published by the registered company Springer Science+Business Media, LLC part of
Springer Nature.
The registered company address is: 233 Spring Street, New York, NY 10013, U.S.A.
Dedication
v
Preface
I first heard about computational toxicology about ten years ago when I was asked to apply
elsewhere modeling predictive strategies, typical of rational drug design. This was the way
I started this challenging journey. Walking toward this emerging field was quite natural
although goals and objectives were rather different. At present, I would say that computa-
tional toxicology is a blend of knowledge embracing medicinal chemistry, pharmacology,
organic chemistry, biochemistry, clinical and forensic medicine, and other attractive disci-
plines. Anyhow, I would say that computing has evolved toxicology from a mostly empirical
level based on disease-specific observational measures to target-specific models aimed at
properly predicting the risk/benefit ratio of chemicals.
This book comprises excellent contributions from colleagues working all over the
world. Its structure reflects my personal experience, initiated from medicinal chemistry.
Chapters 1–4 provide a comprehensive view of molecular descriptors and QSAR, the fasci-
nating root from which all comes from. Molecular and data modeling methods needed to
comply both scientific and regulatory sides are discussed in Chapters 5–10 while the rele-
vance of computational toxicology in drug discovery is mostly highlighted in Chapters
11–19. The last part, including Chapters 20–27, explains how to predict some relevant
human-health toxicology endpoints.
The book collects methods and protocols currently used in computational toxicology
by sharing the vision of top scientists in the understanding of solid target-specific models.
Last but not least, the ultimate aim of the book is to arouse the curiosity and interest of the
reader.
vii
Contents
Dedication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
ix
x Contents
xiii
xiv Contributors
Abstract
Molecular descriptors capture diverse parts of the structural information of molecules and they are the
support of many contemporary computer-assisted toxicological and chemical applications. After briefly
introducing some fundamental concepts of structure–activity applications (e.g., molecular descriptor
dimensionality, classical vs. fingerprint description, and activity landscapes), this chapter guides the readers
through a step-by-step explanation of molecular descriptors rationale and application. To this end, the
chapter illustrates a case study of a recently published application of molecular descriptors for modeling the
activity on cytochrome P450.
Key words Molecular descriptors, Molecular similarity, Chemical space, Mathematical chemistry,
QSAR
1 Introduction
Orazio Nicolotti (ed.), Computational Toxicology: Methods and Protocols, Methods in Molecular Biology, vol. 1800,
https://doi.org/10.1007/978-1-4939-7899-1_1, © Springer Science+Business Media, LLC, part of Springer Nature 2018
3
4 Francesca Grisoni et al.
(
P = f x 1 ,x 2 , …,x p ) (1)
Fig. 1 Principal steps of quantitative structure–activity relationship (QSAR) development and use: starting from
a set of molecules with annotated experimental properties (e.g., physicochemical, toxicological, and biologi-
cal), several types of molecular descriptors can be calculated. The obtained dataset (molecular descrip-
tors + experimental properties) is then used in the phase of information extraction, to obtain a reliable and
validated QSAR model. The model can be later applied to predict the properties of untested molecules, to
obtain mechanistic insights through the interpretation of the molecular descriptors, or to design novel
molecules
Molecular Descriptors for Structure–Activity Applications: A Hands-On Approach 7
(RMSE) and the model predictive ability (Q2) [21, 22] for
quantitative responses, and Sensitivity and Non-Error Rate
[23] for qualitative responses. It is important to keep in mind
that structure–activity models are reductionist, that is, they
capture only a portion of information relevant for the process
under investigation. In this framework, one additional factor
to consider when developing structure–activity models is the
so-called applicability domain (AD) [24, 25]. AD can be
defined as the region of the chemical space (or, more precisely,
of the descriptor space) where the predictions can be consid-
ered as reliable and the model assumptions are met. Molecules
falling outside the AD may be too dissimilar from the mole-
cules used to calibrate the model and, thus, they may be sub-
ject to different biological pathways, have diverse mechanisms
of action or be characterized by structural features not repre-
sented in the training data.
(d) Model application. The validated model can be then used for
several applications, such as to predict the properties of
untested molecules (e.g., [15, 26, 27]), to design new mole-
cules with desirable properties (e.g., [28, 29]) and/or to glean
useful insights into the relationships among the structural fea-
tures and the property of interest (e.g., [30–33]).
In addition to the QSAR/QSPR approach, the problem of
property estimation can also be addressed indirectly, that is, with-
out the need of mathematically expressing the relationship between
descriptors and property. This is done through the similarity prin-
ciple, according to which, molecules with similar descriptor values
will be likely to have similar bioactivities [34]. In toxicology and
related fields, this approach is often referred to as read-across [35],
while in other fields, such as drug discovery, it is referred to as simi-
larity search [36].
Since molecular descriptors are usually fast and inexpensive to
compute, QSAR and descriptor-based methods for toxicological
and ecotoxicological applications have been in the spotlight of
industries, researchers and regulatory agencies, especially as alter-
natives to animal testing [37]. In computational toxicology and
related fields, molecular descriptors have, for instance, been applied
for testing prioritization purposes [38], in vivo acute [39–41] and
chronic [42–44] toxicity prediction, as well as organ [45, 46] and
receptor-mediated toxicity modelling [27, 47, 48]. In addition,
descriptor-based approaches have become valuable tools for screen-
ing and designing efficient hits and pharmaceuticals [29, 49, 50],
protein–ligand interaction prediction [51–53], environmental fate
[15, 26, 54, 55] and risk assessment [56, 57], as well as for food
chemistry applications [58, 59].
8 Francesca Grisoni et al.
1.1 Molecular The information captured by molecular descriptors can vary from
Representation simple bulk properties to complex three-dimensional definitions or
and Descriptor substructure frequency. In particular, different levels of complexity
Dimensionality (also known as “dimensionality”) can be used to represent any
given molecule (Fig. 2), as follows:
(a) 0-Dimensional (0D). The simplest molecular representation
is the chemical formula, that is, the specification of the
chemical elements and their occurrence in a molecule. For
instance, the chemical formula of
2,3,7,8-Tetrachlorodibenzodioxin (a contaminant known
for its toxicity to humans and ecosystems [60, 61]) is
C12H4Cl4O2, which indicates the presence of 12 Carbon, 4
Hydrogen, 4 Chlorine, and 2 Oxygen atoms. This represen-
tation is independent of any knowledge about atom connec-
tivity and bond types. Hence, molecular descriptors obtained
from the chemical formula are referred to as 0D descriptors
and capture bulk properties. 0D descriptors are very simple
to compute and interpret, but show a low information con-
tent and a high degeneration degree, that is, they may have
equal values for different molecules. Some examples of 0D
descriptors are atom counts (e.g., number of carbon atoms),
molecular weight, and sum or average of atomic properties
(e.g., atomic van der Waals volumes).
Fig. 2 Graphical example of different molecular representations of the same structure (ibuprofen, here depicted
as a 2D structure). The relationship between chosen dimensionality and information content/ease of calcula-
tion is also depicted
Molecular Descriptors for Structure–Activity Applications: A Hands-On Approach 9
1.2 Classical Descriptors are usually calculated from the chosen molecular rep-
Descriptors vs. resentation. They can be chosen based on an a priori knowledge
Fingerprints and/or on their performance for the problem under analysis.
Molecular descriptors can be grouped according to the rationale
underlying their design, which influences their applicability to
computational problems and the required modeling steps. In par-
ticular, molecular descriptors can be divided into classical molecu-
lar descriptors and binary fingerprints, as follows:
1. Classical molecular descriptors (MDs) are designed to encode
a precise structural/chemical feature (or a set of features of dif-
ferent complexity) into one, single number. Thus, each descrip-
tor can be used alone or in combination with other descriptors.
Classical descriptors can have different measurement scales:
they can be integers (e.g., number of double bonds and counts
of atom types), binary (e.g., presence/absence of a given sub-
stituent) or can have continuous values (e.g., molecular weight).
MDs may be subject to scaling, reduction and selection tech-
niques, as explained in the next paragraph. The majority of clas-
sical molecular descriptors are usually interpretable to a certain
extent, and, in some cases, they can be mapped back onto sets
of structural features (i.e., reversible decoding).
2. Binary fingerprints (FPs) give a complete representation of all
the structural fragments of a molecule in a binary form. Unlike
classical descriptors, fingerprints encode information about 2D
Molecular Descriptors for Structure–Activity Applications: A Hands-On Approach 11
1.3 Descriptor When dealing with a modeling campaign, one can hypothesize that
Choice and Activity not all of the structural features are relevant in determining the
Landscapes experimental property of interest. Likewise, also not all of the MDs
will be relevant for the modeling purposes. Thus, the molecular
descriptor choice directly affects the outcome of the respective
computer-aided project (e.g., [49, 63]). The choice of molecular
descriptors has shown to have much greater influence on the pre-
diction performance of QSAR models than the nature of modeling
techniques [85, 86]. Thus, it is always crucial to determine the
optimal set(s) of descriptors to best address the problem of inter-
est. In addition to the choice of molecular descriptors, also their
processing and use have a crucial impact on the corresponding
computational models, such as on how the similarity between mol-
ecules is expressed [49].
A relevant factor to consider when dealing with descriptor-
based applications is the so-called “activity landscape” [87–90].
The activity landscape can be thought of as the relationship between
the molecular descriptor space and the experimental a ctivity space.
In other words, if the former is imagined as a set of geographical
coordinates of a map (i.e., latitude and longitude), the latter would
represent the height of the landscape at each point of the map
(Fig. 3). Activity landscapes depend on the nature of the endpoint/
assay of interest, the chemical space covered by the training com-
pounds, the density distribution of the compounds in these regions,
and, most importantly, on the nature of the molecular descriptors
used [90]. Molecular activity landscapes can be similar to gently
12 Francesca Grisoni et al.
Fig. 3 Schematic representation of two activity landscapes [91], given two hypothetical descriptors (x1 and x2)
and a given biological property (A): (a) “gently rolling hills” landscape, where, to small changes in the descrip-
tor values correspond small changes of the corresponding biological activity; (b) “rugged canyons” landscape,
where, to small changes in the descriptor values correspond drastic changes in the activity
Molecular Descriptors for Structure–Activity Applications: A Hands-On Approach 13
2 Materials
2.2 Data We chose the CYP3A4 dataset developed and curated by Nembri
et al. [30]. The data were retrieved from the publicly available CYP
bioactivity database of Veith et al. [102], which contains the
potency values of 17,143 drug-like compounds on five Cytochrome
P450 isoforms (3A4, 2D6, 2C9, 2C19, 1A2). The dataset was
retrieved from PubChem [103] (PubChem ID = AID: 1851). The
database provides the class of activity (active/inactive) for each
compound, identified by a SMILES (Simplified Molecular Input
Line Entry System, see Subheading 3.1.2) string.
Data were curated by: (1) removing the records without
SMILES and/or activity class; (2) removing duplicate struc-
tures with mismatching class activity; (3) removing discon-
nected structures. The CYP3A4 dataset is composed of 9122
molecules, 6385 of which (70%) were used as the training set to
build the models, while 2737 (30%) were used as the validation
14 Francesca Grisoni et al.
2.3 Classifi In the guided example, after describing the molecular descrip-
cation QSARs tors under analysis, we will show how they can be used for
structure–activity classification tasks. Classification commonly
refers to the application of statistical and mathematical tech-
niques to the problem of predicting the “label” of a given object
(in our case, a molecule). In the SAR context, the labels are
generally experimental values expressed in a categorical form,
such as “active” vs. “inactive” compounds, or “toxic” vs. “non-
toxic” compounds. Classification problems differ from the so-
called regression problems, where the value to be predicted is
numerical and continuous, such as the half maximal effective
concentration (EC50), or the octanol–water partitioning coef-
ficient (KOW).
In this chapter, two classification models for predicting the
activity towards CYP3A4 will be taken into account, namely a
decision-tree-based QSAR and a similarity-based QSAR. This sec-
tion will describe the rationale of each modeling approach, while
details on each single model along with the definition of the
selected molecular descriptors will be given in the Methods section
on a step-by-step basis.
2.3.1 Decision-Tree This model was calculated by the Classification and Regression
Based QSAR Model (CART) Tree (CART) approach [105]. CART is a machine-learning algo-
rithm based on a recursive partitioning of data using one descriptor
at a time: at each univariate split, data are divided in two groups (as
homogeneous as possible) according to their descriptor values; the
splitting procedure is further applied to each group separately until
a stop criterion is fulfilled. The model is graphically represented as
a decision tree (Fig. 4): each node is a univariate split that parti-
tions the molecules in the following branches, while the leaves are
the predicted classes for the molecules that fall in them. In addition
to its simplicity and interpretability, CART technique can deal with
nonlinear relationships between variables, thus it is particularly
well-suited for complex biological problems (e.g., [31, 106]). The
power of CART also lies in its ability to select the descriptors that
provide the best class separation automatically, by neglecting those
that are not relevant for the problem under analysis. However, vali-
dation protocols are fundamental to prune the classification tree
and avoid overfitting [107].
2.3.2 Similarity-Based The second model relies on the similarity between molecules,
QSAR Model (KNN) according to the k-Nearest Neighbors (KNN) method [108]. For
a new molecule to be predicted, first a fixed number (k) of similar
Molecular Descriptors for Structure–Activity Applications: A Hands-On Approach 15
Fig. 4 Exemplary scheme of a Classification and Regression Tree (CART) model with three nodes (rounded
rectangles) and four leaves (circles). Each node corresponds to a univariate split of data according to a selected
descriptor (MDi) and threshold value (thr). A new molecule is assigned the successive node/leaf according to
the comparison of its descriptor value with the node threshold (in this example, y [yes] indicates that the
descriptor value is smaller than the threshold, while n [no] that is equal to or larger than the node threshold).
Leaves correspond to the assigned class (e.g., class 1 and 2 in this example)
Fig. 5 Graphical example of the k-Nearest Neighbors approach (KNN). The model is based on two variables (x1
and x2) and k = 3. Given a target molecule with unknown class (a), the three closer molecules in the variable
space are selected (b) and their experimental classes are used to predict the unknown class of the target as a
majority vote (c). In this case, 2 out of 3 neighbors belong to the “orange” class, and, thus, the unknown mol-
ecule will be assigned the “orange” class
2.4 Model The predictive ability of the two classification models was quanti-
Performance fied using Sensitivity (Sn), Specificity (Sp), and Non-Error Rate
Evaluation (NER), which, in the two-class cases, are defined as follows:
16 Francesca Grisoni et al.
TP
Sn% = × 100
TP + FN
TN (2)
Sp% = × 100
TN + FP
Sn + Sp%
NER% = %
2
where TP, TN, FP and FN are the number of true positives (active
molecules correctly predicted), true negatives (inactive molecules
correctly predicted), false positives (inactive molecules predicted as
active) and false negatives (active molecules predicted as inactive).
All of the parameters range from 0 to 100%, the higher, the better.
Sn and Sp are class-based parameters, that is, they quantify the abil-
ity to correctly classify positive and negative compounds, respec-
tively. NER, which is the average between Sn and Sp, is a measure
of the global classification ability of a model. Sn, Sp and NER were
chosen as they are optimal parameters for both binary- and multi-
class classification tasks [109] .
The classification statistics for the selected models are reported
in Table 1. Both the models correctly identify around 74% of
active compounds (Sn), the CART model being slightly better.
The KNN model, on the other hand, has a slightly better ability
to correctly identify inactive compounds (Sp% = 79.8%) than
CART (Sp% = 75.1).
2.5 Example Six molecules were chosen to illustrate the basic steps of descriptor
Molecules calculation, processing and use. The molecules are reported in
Table 2 along with their identification label (ID), chemical name
and chemical representation (see Note 1).
2.6 Software Molecular descriptors were calculated using the software Dragon
and Requirements 7.0 [110], which can be used to reproduce the results. In addition,
this chapter will illustrate how to manually calculate them starting
from a 2D molecular representation. Besides this, several free alter-
Table 1
Classification performance of the selected QSAR models on the training data, in terms of NER, Sn
and Sp. Number of molecular descriptors (p) and of nodes/neighbors (k) are also reported. Statistics
have been calculated on the 6385 molecules used to calibrate the models (“Fitting”), on the test set
of 2737 molecules (“Validation”) and on all of the 9122 molecules (“All data”) of the original dataset
Model p k NER% Sn% Sp% NER% Sn% Sp% NER% Sn% Sp%
CART 3 4 74.6 74.7 74.6 73.4 73.4 73.4 74.7 74.3 75.1
KNN 2 14 76.5 73.7 79.3 74.9 71.0 79.3 76.4 72.9 79.8
Molecular Descriptors for Structure–Activity Applications: A Hands-On Approach 17
Table 2
List of the molecules used in the guided example, annotated for their ID, name and 2D representation
ID Name 2D Structure
Mol1 N-[4-(2,5-dimethylphenyl)-5-methyl-1,3-
thiazol-2-yl]-3-ethyl-5-methyl-1,2-oxazol-
4-carboxamide
Mol2 1,1′-(1,3-propanediyldisulfonyl)
bis(4-methylbenzene)
Mol3 2-oximino-3-butanone
Mol4 2-{[2-Hydroxy-3-(4-methylphenoxy)propyl]
carbamoyl}benzoic acid
2.7 Supplementary The training and example data (i.e., SMILES, descriptor values,
Material experimental classes) are provided free of any charge as both
spreadsheet (.xlsx) and MATLAB (.mat) files. They can be down-
loaded from Milano Chemometrics and QSAR Research Group
website at the following link: http://michem.disat.unimib.it/
chm/download/mol_desc_example.htm. The Excel-format data-
set can be found under the name “CYP3A4_dataset.xlsx” (Table 3).
18 Francesca Grisoni et al.
Table 3
Example portion of the file “CYP3A4_dataset.xls”, sheet “Training set”
The first sheet (“Training set”) contains the list of the molecules
that were used in the original work to develop the models. Each
row corresponds to a molecule, with an assigned ID, a structure
representation (SMILES, see Subheading 3.1.2), the experimental
class (annotated both as a string and a numeric code), along with
the values of the selected molecular descriptors. The second sheet
(“Example molecules”) contains the molecules used in the step-
by-step guided example. It is organized as the previous sheet (i.e.,
IDs, SMILES and descriptor values), except for the experimental
class, which is not reported it being unknown.
The MATLAB file “CYP3A4_dataset.mat” contains several
items (Fig. 6):
(a) class: the experimental classes (1 = active, 2 = inactive);
(b) ID: the molecule IDs, which are the same as those provided in
the Excel file;
(c) SMILES: the SMILES strings of the molecules;
(d) X_CART: the molecular descriptors for the CART model;
(e) X_KNN: the molecular descriptors for the KNN model;
(f) lab_CART and lab_KNN: contain the descriptor labels of X_
CART and X_KNN, respectively.
All of the item rows have a row-to-row correspondence, that
is, each row corresponds univocally to the same molecule.
Molecular Descriptors for Structure–Activity Applications: A Hands-On Approach 19
3 Methods
3.1.1 Molecular Graph A graph is the fundamental mathematical object of graph theory
[115], which can be adapted to the description of molecules. The
so-called “molecular graph” is a topological representation of a
chemical compound; it is usually denoted as follows:
G = (V ,E ) (3)
where V is a set of vertices that correspond to the molecule atoms
and E is a set of elements representing the binary relationships
between pairs of vertices; unordered vertex pairs are called edges,
which correspond to the bonds between atom pairs. If two vertices
occur as an unordered pair more than once, they define a multiple
edge. A molecular graph obtained by excluding all the hydrogen
atoms is called H-depleted molecular graph (Fig. 7a), while a
20 Francesca Grisoni et al.
Fig. 7 Examples of the generation of different types of graphs for Mol3: (a)
H-depleted graph, (b) H-filled graph, and (c) H-depleted multigraph
Fig. 8 Examples of the generation of SMILES from a given molecular representation for Mol3 (a) and Mol5 (b).
Grey numbers represent the order used for generating the SMILES string
3.1.2 Simplified One of the most widely used graph-based (2D) representations is the
Molecular Input Line Entry Simplified Molecular Input Line Entry System (SMILES). SMILES
System (SMILES) are a chemical notation language specifically designed for computer
use by chemists [116]. According to their rationale, after representing
the chemical as a molecular graph, it is converted to a linear notation
by specifying atom types and connectivity, as well as other chemical
information through predefined rules, as follows (Fig. 8):
(a) Atoms are represented by their atomic symbols, with the pos-
sibility to omit H;
(b) Single, double, triple and aromatic bonds can be represented
with the following symbols: “−”, “=”, “#”, and “:”, respec-
tively. Single bonds can be omitted.
(c) Branches are specified by enclosures in parentheses;
(d) Cyclic structures are represented by breaking one single or
aromatic bond in each ring and starting from one of the ring
atoms. Ring “opening”/”closure” bonds are then indicated
by a digit immediately following the atomic symbol at each
ring closure (Fig. 8b). Aromaticity on carbon atoms can be
written with lower-case letters or by alternating single and
double bonds (Kekulé notation). For instance, benzene can be
both written as “c1ccccc1” and “C1=CC=CC=C1”;
(e) Local chirality can be specified using the symbols “/” and “\”.
For instance, E- and Z-1,2-difluoroethene can be written as
F/C=C/F and F/C=C\F, respectively. Additionally, tetrahe-
dral centers are often indicated using “@” (or “@@”), follow-
ing the atomic symbol of the chiral atom. “@” indicates that
22 Francesca Grisoni et al.
Table 4
ID, 2D structure and corresponding SMILES strings for the example molecules. The red dot in the
structural representation indicates the starting atom used to generate the SMILES
ID 2D Structure SMILES
Mol1 CCc1noc(C)c1C(=O)Nc2nc(c(C)s2)
c3cc(C)ccc3C
Mol2 Cc1ccc(cc1)S(=O)(=O)CCCS(=O)(=O)
c2ccc(C)cc2
Mol3 CC(=O)C(=NO)C
Mol4 Cc1ccc(OCC(O)CNC(=O)
c2ccccc2C(=O)O)cc1
Mol5 Brc1cccc(OC(=O)c2cccnc2)c1
Mol6 OC(=O)c1ccccc1C2c3ccccc3Oc4ccccc24
Molecular Descriptors for Structure–Activity Applications: A Hands-On Approach 23
Table 5
Molecular descriptors for the six example molecules under analysis
3.2 Descriptor This section will define the selected descriptors and provide the
Calculation users with a step-by-step guide on how to calculate them manu-
ally starting from a 2D representation of the molecule (e.g.,
molecular graph). Where necessary, some of the example mole-
cules will be used and the descriptor values for all of them will
also be provided.
Amongst the high number of software for molecular descriptor
calculation, we chose Dragon 7 [110], which is regarded as a
benchmark software and can compute more than 5000 0D to 3D
descriptors. However, the descriptors can be calculated with other
software (see Note 2) or manually, and the considerations drawn
here always apply.
The models include in total eight descriptors, namely: nBM,
nBO, C%, SIC3, ATSC4p, ATSC6i, nROH, and NaasC. Their
numerical values for the example molecules are collected in Table 5.
The MDs can be divided into four logical blocks, according to
their chemical meaning and the molecular representation they
derive from, namely (1) constitutional indices (nBM, nBO, C%,
and nROH), (2) autocorrelation descriptors (ATSC4p and
ATSC6i), (3) indices of neighborhood symmetry (SIC3), and (4)
atom-type E-state indices (NaasC). The descriptor theory, the
calculation formulas, and some numerical examples will be pre-
sented according to this logical division.
3.2.1 Constitutional Constitutional descriptors are the simplest descriptors. They reflect
Descriptors the chemical composition of a compound, without encoding any
24 Francesca Grisoni et al.
Table 6
Calculation example of the indices of neighborhood symmetry for 2-oximino-3-butanone (Mol3).
Equivalent vertices, their probabilities, and IC and SIC descriptor values, from order 0 to order 3. The
process of vertex partitioning can be found in Fig. 10
Order Descriptor
(m) Equivalent vertices Probability (p) values
0 [C1, C2, C4, C7]; [O3, O6]; 4 2 1 7 IC0 = 1.689
[N5]; [H8, H9, H10, H11, 14 ; 14 ; 14 ; 14 SIC0 = 0.444
H12, H13, H14]
1 [C1, C7]; [C2]; [C4]; [O3]; 2 1 1 1 1 1 1 IC1 = 2.557
[O6]; [N5]; [H11]; [H8, 14 ; ; ; ; ; ; ; SIC1 = 0.672
14 14 14 14 14 14
H9, H10, H12, H13, H14]
6
14
2 [C1]; [C2]; [C4]; [C7]; [O3]; 1 1 1 1 1 1 1 IC2 = 2.700
[O6]; [N5]; [H11]; [H8, 14 ; 14 ; 14 ; 14 ; ; ; ; SIC2 = 0.709
14 14 14
H9, H10, H12, H13, H14]
1 6
14 ; 14
3 [C1]; [C2]; [C4]; [C7]; [O3]; 1 1 1 1 1 1 1 IC3 = 3.128
[O6]; [N5]; [H11]; [H8, 14 ; 14 ; 14 14 ; ; ; ; SIC = 0.822
; 14 14 14 3
H9, H10]; [H12, H13,
H14] 1 3 3
14 ; 14 ; 14
3.2.2 Autocorrelation Autocorrelation indices are based on the concept of spatial auto-
Descriptors correlation, which is a measure of the degree to which a spatial
phenomenon is correlated to itself in space, or, in other words, the
degree to which the observed value of a variable at one region
depends on values of the same variable at neighboring regions. The
spatial pattern of a property distribution is defined by the arrange-
Molecular Descriptors for Structure–Activity Applications: A Hands-On Approach 25
vertices, that is, the number of bonds along the shortest path con-
necting two atoms. Moreau and Broto [122–124] are the first who
applied an autocorrelation function to the molecular graph to mea-
sure the distribution of atomic properties on the molecule t opology.
The final vector of autocorrelation functions at different lags (k)
was defined by the authors as autocorrelation of a topological
structure (ATS). ATS at a given k (ATSk) can be calculated as
follows:
nAT −1 nAT
ATSk = ∑ (
∑ wi ⋅ w j ⋅ δ dij ;k ) (5)
i =1 j =i +1
where w is any atomic property, nAT is the number of atoms in a
molecule, k is the lag, and dij is the topological distance between
the ith and jth atom; δ(dij; k) is a Dirac-delta function equal to 1 if
dij = k, zero otherwise.
The centered Broto–Moreau autocorrelations are calculated
by replacing atomic properties (w) with their centered values (w’),
which are derived by subtracting the average property value w of
the molecule from each w value:
nAT −1 nAT nAT −1 nAT
ATSCkw = ∑ ∑ ( ) (
| (wi − w ) | ⋅ | w j − w | ⋅δ dij ; k = ) ∑ ∑w ⋅w ′
i
′
j (
⋅ δ dij ; k ) (6)
i =1 j =i +1 i =1 j =i +1
Hollas [125] demonstrated that, only if properties are centered, all
autocorrelation descriptors are uncorrelated, thus resulting more
suitable for the subsequent statistical analysis.
The selected autocorrelation descriptors for this example are
ATSC4p and ATSC6i. They are the centered Broto–Moreau auto-
correlation descriptors, respectively of lag 4 weighted by polariz-
ability (p) and of lag 6 weighted by ionization potential (i). A
calculation example is reported in Fig. 9 for 2-oximino-3-butanone
(Mol3). The molecule is represented by the H-filled molecular
graph (Fig. 9a), which shows the chemical identity and the sequen-
tial identification number of the vertices. The topological distance
matrix (Fig. 9b) can be used to find the pairs of atoms that enter
the summation. To calculate the centered Broto–Moreau autocor-
relation of lag 4 (ATSC4p) with polarizability as the atomic weight-
ings (Fig. 9c), one has to select the atom pairs with distance 4
(highlighted in red boldface into the matrix) and add the products
of the corresponding polarizability values (Fig. 9d).
Fig. 9 Calculation of the centered Broto–Moreau autocorrelation for 2-oximino-3-butanone (Mol3): (a) repre-
sentation of the molecule as an H-filled molecular graph; (b) topological distance matrix, atom-pairs with a
topological distance of 4 and of 6 are highlighted in red and blue boldface, respectively; (c) atomic weightings,
where w refers to carbon-scaled atomic weightings (polarizability [p] and ionization potential [i]) and w’ to
centered atomic weightings on the molecule mean; (d) example of descriptor calculation (ATSC4p and ATSC6i),
using the obtained w’ values
Fig. 10 Neighborhood symmetry for 2-oximino-3-butanone (Mol3). Illustrative process of vertex partitioning
into equivalence classes considering a neighborhood of order 0 to 3. Note that for the sake of simplicity, the
unitary equivalence classes (i.e., classes comprised of only one vertex), which appear in one level, are not
repeated in the subsequent level of the partition process. The calculated class probabilities, along with the
corresponding IC and SIC values, are reported in Table 6
the path and the same chemical element and vertex degree of the
involved vertices. In other words, two atoms vi and vj are equiva-
lent at the mth level if it is possible to obtain two equal sub-
structures (in terms of connected atoms and bonds, through paths
of maximum m bonds), one rooted in vi and one in vj,
respectively (Fig. 10).
Once the equivalence classes of the vertices have been deter-
mined from the H-filled multigraph, for each mth order (usually
m ranges from 0 to 5), the neighborhood Information Content
(ICm) is calculated by Shannon’s entropy formula as follows:
G Ag Ag G
ICm = − ∑ ⋅ log 2 = − ∑ p g ⋅ log 2 p g (7)
g =1 nAT nAT g =1
where the summation goes over the G atom equivalence classes, Ag
is the cardinality of the gth equivalence class (i.e., the number of
atoms grouped in the same class), nAT is the total number of ver-
tices (i.e., atoms) and pg can be thought of as the probability of
randomly selecting a vertex of the gth class. ICm represents a mea-
sure of structural complexity per vertex. The larger the number of
Molecular Descriptors for Structure–Activity Applications: A Hands-On Approach 29
equivalence classes at a given mth order, the higher the ICm value.
Its normalized counterpart is called structural information content
(SICm) and is calculated as follows:
ICm
SICm = (8)
log 2 nAT
3.2.4 Atom-Type E-State Atom-type electrotopological state (or E-state) indices are molec-
Indices ular descriptors that encode topological and electronic informa-
tion related to particular atom-types in the molecule. They
combine structural information about (1) the electron accessibil-
ity associated with each atom-type, (2) the presence or absence of
a given atom-type and (3) counts of the atoms of a given atom-
type [128, 129].
To compute atom-type E-state indices, the first step is to
identify precise atom-types in the molecule, according to three
factors:
(a) The atom identity, based on the atomic number Z;
(b) The valence state indicator (VSI), which can be calculated as
the sum between the vertex degree and the valence vertex
degree (VSI = δv + δ);
(c) The aromatic indicator (IAR), which is equal to 1 if the atom
belongs to an aromatic system, while 0 otherwise.
Once the atoms in the molecule are assigned their specific
atom-types, two different atom-type E-state indices can be com-
puted: the atom-type E-state sums and the atom-type E-state
counts, as described below.
The atom-type E-state sums are calculated by adding the
electrotopological state (Si) of all the atoms of the same atom-type
in the molecule. The electrotopological state (or E-state) is an
atomic index encoding information related to the electronic and
topological state of the atoms in the molecule [130]. It is calcu-
lated from the H-depleted molecular graph as follows:
30 Francesca Grisoni et al.
V Ii − I j
Si = I i + ∆I i = I i + ∑ (9)
(d )
k
j =1
ij +1
where Ii is the intrinsic state of the ith atom and ΔIi is the field
effect on the ith atom calculated as perturbation of the intrinsic
state of ith atom by all other non-H atoms in the molecule (V); dij
is the topological distance between the ith and the jth atoms. The
exponent k is a parameter to modify the influence of distant or
nearby atoms for particular studies. Usually, k = 2. The intrinsic
state Ii of the ith atom is calculated as follows:
( 2 / Li )
2
⋅ δiv + 1
Ii = (10)
δi
where Li is the principal quantum number (i.e., 2 for C, N, O, F
atoms, 3 for Si, S, Cl, etc.), while δi and δiv are the simple and the
valence vertex degrees, respectively. For any atom, the intrinsic
state can be thought of as the ratio of π and lone pair electrons over
the count of the σ bonds in the molecular graph. Therefore, the
intrinsic state reflects the possible partitioning of non-σ electrons
influence along the paths, starting from the considered atom. The
smaller the partitioning of the electron influence, the more avail-
able are the valence electrons for intermolecular interactions.
The atom-type E-state counts are also based on the assign-
ment of the molecule atoms to the given atom-types. However,
unlike the previous ones, the atoms of the same atom-type in a
molecule are simply counted [131].
The symbol of each atom-type E-state index is composed of
three parts. The first part is “S” or “N”, depending on whether the
E-states of the atoms of the same type are summed up or simply
counted. The second part is a string representing the bond-types
associated with the atom (“s”, “d”, “t”, “a” for single, double,
triple, and aromatic bonds, respectively). The third part is the sym-
bol identifying the chemical element and the eventual bonded
hydrogens, such as CH3, CH2, and F.
In the case of Mol5, as illustrated in Fig. 11, seven different atom-
types are identified according to the mentioned criteria (i.e., atomic
number [Z], valence state indicator [VSI] and aromatic indicator [IAR]
of non-H atoms). The atom-types are –Br, --C(–)--, --CH--, –O–,
=C(–)–, =O, --N--, where the letter indicates the chemical element
and the symbols “–“, “=” and “--” indicate single, double and aro-
matic bonds, respectively. To each atom-type there corresponds an
atom-type E-state count (Fig. 11), namely: NsBr = 1, NaasC = 3,
NaaCH = 8, NssO = 1, NdssC = 1; NdO = 1, and NaaN = 1. To
obtain the atom-type E-state sums, the electrotopological states Si of
the atoms of the same type are instead summed up; thus, for instance,
the descriptor SaasC = 1.8006 derives from the summation of the
Molecular Descriptors for Structure–Activity Applications: A Hands-On Approach 31
Fig. 11 H-depleted molecular graph of 3-Bromophenyl nicotinate (Mol5) and the atomic indices for atom-type
assignment and E-state descriptor calculation. The atom-type was represented as the chemical element plus
the specification of its bond(s), i.e., single (−), double (=) and aromatic (--) bonds
Fig. 12 Graphical representation of the CART model. Rounded rectangles denote univariate splits (i.e., nodes)
according to the selected descriptors and thresholds, while round boxes (i.e., leafs) denote the assigned class.
Yes (y) and no (n) indicate whether the condition specified at each node is satisfied or not by any molecule to
be predicted
32 Francesca Grisoni et al.
S-states of the atoms 2, 6, and 10. In our guided exercise, the descrip-
tor involved in the KNN model is NaasC; its values for the six example
molecules in analysis are collected in Table 5.
3.3 Predictions The CART model used in this chapter is depicted in Fig. 12. It is
with the CART Model comprised of three constitutional descriptors (nBM, nBO, and
nROH) and constituted by four nodes (Table 5).
The model can be written as a pseudo-code as follows:
if nBMi < 13: predicted_classi = “inactive”
else:
if nROHi < 1:
if nBMi > 16:
if nBOi < 25: predicted_classi =
“inactive”
else: predicted_classi = “active”
else: predicted_classi = “active”
else: predicted_classi = “inactive”
where i denotes the ith compound, while nBMi, nBOi, nROHi its
descriptor values, and predicted_classi the predicted activity,
respectively.
The advantage of CART models lies in their being easily imple-
mented and manually applicable, if necessary. In a MATLAB
environment, for instance, this can be done as follows:
n = size(X,1); % number of molecules to be predicted
class = zeros(n,1); % pre- allocation of predicted
class
for i = 1:n % runs over the molecules
if X.nBM(i) < 13; class(i,1) = 2; % node 1
else
if X.nROH(i) >= 1; predicted_class(i,1) = 2; % node 2
else
if X.nBM(i) > 16; predicted_class(i,1) = 1; %
node 3
else
if X.nBO(i) < 25; predicted_class(i,1) = 2;
% node 4
else predicted_class(i,1) = 1;
end
end
end
end
end
Fig. 13 Application of the CART model to the example molecules (as indicated by boldface titles), along with
their descriptor values, the predicted class and the set of nodes, branches and leaves used for the prediction.
In analogy with Fig. 12, y [yes] indicates that the condition specified in the node is satisfied, while n [no] that
it is not satisfied
3.4 Predictions The KNN model under analysis uses six molecular descriptors,
with the KNN Model namely C%, nROH, SIC3, ATSC4p, ATSC6i, and NaasC, which
have been described in Subheading 3.2, and a number of neigh-
bors (k) equal to 14. For any new molecule to be predicted, the
neighbors, which are the most similar compounds, are identified
on the basis of their molecular descriptors, after the appropriate
34 Francesca Grisoni et al.
3.4.1 Data Scaling The scaling of molecular descriptors has a crucial influence on the
outcome of many modeling techniques, especially on those based
on molecular similarity/diversity analysis. In fact, when dealing
with molecular descriptors expressed in different measuring units
(e.g., molecular weight vs. number of carbon atoms), the scaling is
necessary to have comparable descriptor ranges and to avoid biased
distance/similarity calculations [132]. The values of the relevant
molecular descriptors in a model change according to the chosen
scaling function, and, as recently shown, the data scaling can have
a major influence on the modeling output of several types of
descriptors [49].
The KNN model was trained on range scaled data, that is, each
descriptor was transformed into a new numerical scale with mini-
mum and maximum values equal to 0 and 1, respectively. This is
achieved as follows:
x ij − min j
x ’ij = (11)
max j − min j
where xij is the original-scale value of the ith molecule for the jth
descriptor, while minj and maxj are the minimum and maximum
value of the jth descriptor in the training set, respectively.
Using MATLAB programming language, the range scaling on
the training data can be performed as follows:
X_scal = (X - min(X))./(max(X)-min(X));
where X is the original data matrix, in which each row is a molecule
and the columns correspond to the selected molecular descriptors,
and X_scal is the range-scaled data matrix. The effect of the data
scaling on the descriptor comparability can be then investigated
using a boxplot (Fig. 14), as follows:
figure;
subplot(1,2,1); boxplot(X); title 'Original data'
subplot(1,2,2); boxplot(X_scal); title 'Scaled data'
Fig. 14 Effect of the range scaling procedure on the measure scales of the selected molecular descriptor:
(a) original data, (b) range-scaled data. Boxplots show median, 1st and 3rd quartiles (solid lines), and
minimum/maximum values (asterisks). Grey dots represent the underlying descriptor values for the train-
ing set molecules
Table 7
Original and range-scaled KNN descriptor values for the example molecules, along with minimum and maximum training set values
Francesca Grisoni et al.
ID C% SIC3 ATSC4p ATSC6i nROH NaasC C% SIC3 ATSC4p ATSC6i nROH NaasC
Mol1 41.3 0.867 9.042 1.306 0 9 0.666 0.789 0.132 0.177 0.000 0.45
Mol2 39.5 0.676 12.844 0.880 0 4 0.637 0.486 0.187 0.119 0.000 0.20
Mol3 28.6 0.822 0.936 0.008 1 0 0.461 0.717 0.014 0.001 0.083 0.00
Mol4 41.9 0.902 7.733 0.829 2 4 0.676 0.845 0.113 0.112 0.167 0.20
Mol5 50.0 1.000 4.169 0.295 0 3 0.806 1.000 0.061 0.041 0.000 0.15
Mol6 54.1 0.808 8.346 1.045 1 6 0.871 0.695 0.122 0.142 0.083 0.30
min* 0.0 0.370 0.000 0.000 0 0 0.000 0.000 0.000 0.000 0.000 0.00
max* 62.0 1.000 68.590 7.386 12 20 1.000 1.000 1.000 1.000 1.000 1.00
Table 8
Scaled descriptor values for the two molecules (Mol1 and Tr1) that are used in the calculation example
of Euclidean distance (see text)
p
( )
2
dit = ∑ x ij − x tj (13)
j =1
where dit is the Euclidean distance between molecules i and t, p is
the number of molecular descriptors, while xij and xti are the values
of the jth descriptor for the molecules i and t, respectively. Whenever
one wants to apply the KNN model, it is necessary to calculate the
distance of any new molecule from all of the training set molecules.
Note that, for the reasons explained above, the data used to com-
pute the distance must be scaled.
Using the KNN molecular descriptors, the Euclidean distance
between any ith new molecule to be predicted and any tth training
molecule, can be obtained as follows:
(C %i − C %t ) + ( SIC 3i − SIC 3t ) + ( ATSC 4 pi − ATSC 4 pt )
2 2 2
dit =
+ ( ATSC 6ii − ATTSC 4it ) + (nROHi − nROHt ) + ( NaasCi − NaasCt )
2 2 2
( 0.666 − 0.440 ) + ( 0.789 − 0.000 ) + ( 0.132 − 0.050 ) + ( 0.177 − 0.083 )
2 2 2 2
dit =
+ ( 0.00 − 0.00 ) + ( 0.45 − 0.15)
2 2
3.4.3 Nearest-Neighbor- According to the KNN strategy, once the descriptors have been
Based Prediction used to compute the distances between any new molecule and all
of the training set compounds, the k closest neighbors are identi-
fied, along with their experimental class. The neighbors’ experi-
mental class is then used to predict the class of the new molecule,
using a majority voting criterion, that is, by selecting the class that
is the most frequent amongst the neighbors. In the case of Mol1,
for instance, 9 out of 14 neighbors are active and, thus, the mole-
cule was predicted as active by the KNN model (Table 9).
In MATLAB, this can be carried out as follows:
% selects the neighbors according to D
[D_sort,ind_sort] = sort(D,2); % sorts D (ascending) column-
wise
D_sort = D_sort(:,1:k); % distance of the k neighbors
ind_sort = ind_sort(:,1:k); % numerical identifiers
of the k neighbors
class_neigh = class(ind_sort); % class of the k neighbors
% counts the frequency of each class amongst the k
neighbors
G = max(class); % number of classes
Molecular Descriptors for Structure–Activity Applications: A Hands-On Approach 39
Table 9
The 14 training neighbors of Mol1, sorted according to their distance (d ), along with their 2D depiction
and the experimental class. Since 9 out of 14 neighboring molecules are active (64%), Mol1 is predicted
as active by the KNN model
Fig. 15 Bar plot of the relative frequencies of the neighbors’ classes for all of the example molecules. Bold
numbers represent the example molecules (from Mol1 to Mol6), while bold titles (“Active” or “Inactive”)
represent the final prediction according to the KNN model
Fig. 16 List of the two most similar training molecules to Mol5, according to the type molecular descriptors
used (Euclidean distance). Descriptor values and corresponding distance (d) are reported for each conceptual
group of descriptors. When more than one descriptor was used, the data were range scaled as explained in the
text (Eq. 11). Note that, in the case of one descriptor considered (i.e., NaasC and SIC3) more than two training
molecules had a distance equal to 0 from Mol5, but, for the sake of simplicity, only two randomly selected
molecules were shown
3.5 Bringing it all As shown in the previous sections, molecular descriptors grasp
Together diverse characteristics of the molecular structure. This reflects in the
information captured by structure–activity relationship models.
To better understand the effect of the chosen molecular
descriptors on the perceived similarity between molecules, the
Euclidean distance from one example molecule (Mol5) and all of
the training compounds was computed on each conceptual group
of molecular descriptors separately. Figure 16 depicts the two most
similar training molecules to Mol5 according to the chosen set of
descriptors.
For instance, since constitutional indices reflect only the infor-
mation about the chemical composition and the single atom con-
nectivity, the identified neighbors share with Mol5 the same
number of multiple bonds (nBM = 13) and the same percentage of
carbon atoms (C% = 50.0), as well as similar number of non-
hydrogen bonds (nBO = 17 and 16). As no higher-level informa-
Molecular Descriptors for Structure–Activity Applications: A Hands-On Approach 43
Fig. 17 First two components of the Principal Component Analysis (PCA) performed on constitutional descrip-
tors (a, c) and autocorrelation descriptors (b, d). EV indicates the percentage of variance explained by each
component. (a) score plot obtained using constitutional descriptors; (b) score plot obtained using autocorrela-
tion descriptors; (c) loading plot of constitutional descriptors; (d) loading plot of autocorrelation descriptors.
Compounds are colored according to their activity against CYP3A4 (red = active, grey = inactive)
combines the original variables (in our case, the molecular descrip-
tors) to obtain new orthogonal variables, termed principal compo-
nents (PCs). PCs are determined in such a way that the first PC
explains the largest data variance, the second one (orthogonal to
the first) the second largest variance, and so on, up to a number of
components equal to the number of starting variables. The objects
(molecules) can be then projected into the new PC space. Thereby,
one can comprehend the linear relationships among the original
variables, the objects, and the PCs, through: (1) the scores, which
are the object (i.e., molecule) coordinates in the PC space, and (2)
the loadings, which represent the contribution of each variable
(descriptor) to each PC. The MATLAB code for performing the
PCA can be found in Note 5. The PCA was performed on consti-
tutional (i.e., nROH, nBO, nBM, and C%) and autocorrelation
descriptors (i.e., ATSC6i and ATSC4p) separately (Fig. 17).
The space determined by the first two principal compo-
nents explains 83% and 100% of the total variance for constitu-
tional and autocorrelation descriptors, respectively. In the case of
autocorrelation descriptors, 100% of the variance is explained as
expected, due to the use of two initial variables, which are com-
bined in maximum number of two PCs. As it can be noticed, the
molecular information captured by the two molecular descriptor
types are very different, in terms of compound spatial distribution
and the relative positioning of active and inactive compounds. For
instance, in the chemical space defined by constitutional descrip-
tors (Fig. 17a), active compounds are more clustered in the center
of the PC space than inactive compounds, the latter having smaller
scores on PC1 and larger scores on PC2. On the contrary, when
autocorrelation descriptors are used, this separation is less visible,
and only a few of inactive compounds isolate with very high scores
on PC1 and very small scores on PC2.
The information encoded by molecular descriptors can be lev-
eraged to gather some insights into the structure–activity relation-
ships. In the case of PCA, for instance, the compounds distribution
can be interpreted using the loading plot (Fig. 17c and d): the
loadings represent the linear coefficients used to generate the new
PCs starting from any variable value. High loadings (in absolute
values) indicate that the descriptor has a great contribution in
determining a given PC, while loadings close to 0 indicate that a
given variable is not relevant for that PC.
In the case of constitutional descriptors (Fig. 17c), for instance,
we can observe that nBM, C% and nBO have positive loadings on
PC1: this means that compounds with positive scores on PC1 will
be characterized by a high number of multiple bonds (nBM), a
high percentage of carbon atoms (C%) and a high number of non-
hydrogen bonds (nBO). As nROH and nBO have positive load-
ings on PC2, compounds with high PC2 scores will have a higher
number of hydroxyl groups and of non-hydrogen bonds than
compounds with low scores on PC2. This information, combined
Molecular Descriptors for Structure–Activity Applications: A Hands-On Approach 45
4 Notes
References
1. Schultz TW, Cronin MTD, Walker JD, Aptula cal structure curation in cheminformatics and
AO (2003) Quantitative structure–activity QSAR modeling research. J Chem Inf Model
relationships (QSARs) in toxicology: a his- 50:1189–1204
torical perspective. J Mol Struct THEOCHEM 13. Furusjö E, Svenson A, Rahmberg M,
622:1–22 Andersson M (2006) The importance of out-
2. McKinney JD, Richard A, Waller C, Newman lier detection and training set selection for
MC, Gerberick F (2000) The practice of reliable environmental QSAR predictions.
structure activity relationships (SAR) in toxi- Chemosphere 63:99–108
cology. Toxicol Sci 56:8–17 14. Mansouri K, Grulke CM, Richard AM,
3. Johnson MA, Maggiora GM (1990) Concepts Judson RS, Williams AJ (2016) An automated
and applications of molecular similarity. Wiley, curation procedure for addressing chemical
New York errors and inconsistencies in public datasets
4. Crum-Brown A, Fraser T (1868) On the con- used in QSAR modelling. SAR QSAR Environ
nection between chemical constitution and Res 27:911–937
physiological action. Part 1. On the physio- 15. Grisoni F, Consonni V, Villa S, Vighi M,
logical action of the ammonium bases, derived Todeschini R (2015) QSAR models for bio-
from Strychia, Brucia, Thebaia, Codeia, concentration: is the increase in the complex-
Morphia and Nicotia. Trans R Soc Edinb ity justified by more accurate predictions?
25:151–203 Chemosphere 127:171–179
5. Hansch C, Maloney PP, Fujita T, Muir RM 16. Goldberg DE, Holland JH (1988) Genetic
(1962) Correlation of biological activity of algorithms and machine learning. Mach Learn
phenoxyacetic acids with Hammett substitu- 3:95–99
ent constants and partition coefficients. 17. Grisoni F, Cassotti M, Todeschini R (2014)
Nature 194:178–180 Reshaped sequential replacement for vari-
6. Richardson B (1869) Physiological able selection in QSPR: comparison with
research on alcohols. Med Times Gazzette other reference methods. J Chemom
703:706 28:249–259
7. Richet M (1893) Note sur le rapport entre la 18. Cassotti M, Grisoni F, Todeschini R (2014)
toxicité et les propriétés physiques des corps. Reshaped sequential replacement algorithm:
Compt Rend Soc Biol Paris 45:775–776 an efficient approach to variable selection.
8. Wiener H (1947) Influence of interatomic Chemom Intell Lab Syst 133:136–148
forces on paraffin properties. J Chem Phys 19. Shen Q, Jiang J-H, Jiao C-X, Shen G, Yu R-Q
15:766–766 (2004) Modified particle swarm optimization
9. Platt JR (1947) Influence of neighbor bonds algorithm for variable selection in MLR and
on additive bond properties in paraffins. PLS modeling: QSAR studies of antagonism
J Chem Phys 15:419–420 of angiotensin II antagonists. Eur J Pharm Sci
10. Todeschini R, Consonni V (2009) Molecular 22:145–152
descriptors for chemoinformatics, vol 2. 20. Derksen S, Keselman HJ (1992) Backward,
Wiley-VCH Verlag GmbH, Weinheim, forward and stepwise automated subset selec-
Germany, Weinheim tion algorithms: frequency of obtaining
11. Todeschini R, Consonni V, Gramatica P authentic and noise variables. Br J Math Stat
(2009) Chemometrics in QSAR. In: Psychol 45:265–282
Comprehensive Chemometrics. Elsevier, 21. Cramer RD, Bunce JD, Patterson DE, Frank
Oxford, pp 129–172 IE (1988) Crossvalidation, bootstrapping,
12. Fourches D, Muratov E, Tropsha A (2010) and partial least squares compared with mul-
Trust, but verify: on the importance of chemi- tiple regression in conventional QSAR stud-
ies. Quant Struct Act Relat 7:18–25
Molecular Descriptors for Structure–Activity Applications: A Hands-On Approach 49
22. Todeschini R, Ballabio D, Grisoni F (2016) 35. Patlewicz G, Ball N, Booth ED, Hulzebos E,
Beware of unreliable Q2! A comparative study Zvinavashe E, Hennes C (2013) Use of cate-
of regression metrics for predictivity assess- gory approaches, read-across and (Q)SAR:
ment of QSAR models. J Chem Inf Model general considerations. Regul Toxicol
56(10):1905–1913 Pharmacol 67:1–12
23. Sokolova M, Lapalme G (2009) A systematic 36. Schneider G, Neidhart W, Giller T, Schmid G
analysis of performance measures for classifi- (1999) “Scaffold-hopping” by topological
cation tasks. Inf Process Manag 45:427–437 pharmacophore search: a contribution to vir-
24. Sahigara F, Mansouri K, Ballabio D, Mauri A, tual screening. Angew Chem Int Ed
Consonni V, Todeschini R (2012) 38:2894–2896
Comparison of different approaches to define 37. Höfer T, Gerner I, Gundert-Remy U, Liebsch
the applicability domain of QSAR models. M, Schulte A, Spielmann H, Vogel R, Wettig
Molecules 17:4791–4810 K (2004) Animal testing and alternative
25. Dragos H, Gilles M, Alexandre V (2009) approaches for the human health risk assess-
Predicting the predictability: a unified ment under the proposed new European
approach to the applicability domain problem chemicals regulation. Arch Toxicol
of QSAR models. J Chem Inf Model 78:549–564
49:1762–1776 38. Mansouri K, Abdelaziz A, Rybacka A et al
26. Sabljic A (2001) QSAR models for estimating (2016) CERAPP: collaborative estrogen
properties of persistent organic pollutants receptor activity prediction project. Environ
required in evaluation of their environmental Health Perspect 124(7):1023–1033. https://
fate and risk. Chemosphere 43:363–375 doi.org/10.1289/ehp.1510267
27. Novič M, Vračko M (2010) QSAR models for 39. Sedykh A, Zhu H, Tang H, Zhang L, Richard
reproductive toxicity and endocrine disrup- A, Rusyn I, Tropsha A (2011) Use of in vitro
tion activity. Molecules 15:1987–1999 HTS-derived concentration–response data as
28. Miyao T, Arakawa M, Funatsu K (2010) biological descriptors improves the accuracy
Exhaustive structure generation for inverse- of QSAR models of in vivo toxicity. Environ
QSPR/QSAR. Mol Inform 29:111–125 Health Perspect 119:364–370
29. Munteanu RC, Fernandez-Blanco E, Seoane 40. Cassotti M, Ballabio D, Todeschini R,
AJ, Izquierdo-Novo P, Angel Rodriguez- Consonni V (2015) A similarity-based QSAR
Fernandez J, Maria Prieto-Gonzalez J, model for predicting acute toxicity towards
Rabunal RJ, Pazos A (2010) Drug discovery the fathead minnow (Pimephales promelas).
and design for complex diseases through SAR QSAR Environ Res 26:217–243
QSAR computational methods. Curr Pharm 41. Belanger SE, Brill JL, Rawlings JM, Price BB
Des 16:2640–2655 (2016) Development of acute toxicity quanti-
30. Nembri S, Grisoni F, Consonni V, Todeschini tative structure activity relationships (QSAR)
R (2016) In silico prediction of cytochrome and their use in linear alkylbenzene sulfonate
P450-drug interaction: QSARs for CYP3A4 species sensitivity distributions. Chemosphere
and CYP2C9. Int J Mol Sci 17:914 155:18–27
31. Grisoni F, Consonni V, Vighi M, Villa S, 42. Wang C, Lu GH, Li YM (2005) QSARs for
Todeschini R (2016) Investigating the the chronic toxicity of halogenated benzenes
mechanisms of bioconcentration through to bacteria in natural waters. Bull Environ
QSAR classification trees. Environ Int Contam Toxicol 75:102–108
88:198–205 43. Fan D, Liu J, Wang L, Yang X, Zhang S,
32. Cramer RD, Patterson DE, Bunce JD (1988) Zhang Y, Shi L (2016) Development of quan-
Comparative molecular field analysis titative structure–activity relationship models
(CoMFA). 1. Effect of shape on binding of for predicting chronic toxicity of substituted
steroids to carrier proteins. J Am Chem Soc benzenes to daphnia magna. Bull Environ
110:5959–5967 Contam Toxicol 96:664–670
33. Marrero Ponce Y (2004) Total and local 44. Austin TJ, Eadsforth CV (2014) Development
(atom and atom type) molecular quadratic of a chronic fish toxicity model for predicting
indices: significance interpretation, compari- sub-lethal NOEC values for non-polar nar-
son to other molecular descriptors, and cotics. SAR QSAR Environ Res 25:147–160
QSPR/QSAR applications. Bioorg Med 45. Schöning V, Hammann F, Peinl M, Drewe
Chem 12:6351–6369 J (2017) Identification of any structure-spe-
34. Bender A, Glen CR (2004) Molecular similar- cific hepatotoxic potential of different pyrroli-
ity: a key technique in molecular informatics. zidine alkaloids using random forest and
Org Biomol Chem 2:3204–3218 artificial neural network. Toxicol Sci
50 Francesca Grisoni et al.
molecules by molecular transforms and its 82. Watson P (2008) Naïve bayes classification
application to structure-spectra correlations using 2D pharmacophore feature triplet vec-
and studies of biological activity. J Chem Inf tors. J Chem Inf Model 48:166–178
Comput Sci 36:334–344 83. Klon AE, Diller DJ (2007) Library finger-
70. Rybinska A, Sosnowska A, Barycki M, Puzyn prints: a novel approach to the screening of
T (2016) Geometry optimization method virtual libraries. J Chem Inf Model
versus predictive ability in QSPR modeling 47:1354–1365
for ionic liquids. J Comput Aided Mol Des 84. Geppert H, Bajorath J (2010) Advances in
30:165–176 2D fingerprint similarity searching. Expert
71. Nicklaus MC, Wang S, Driscoll JS, Milne Opin Drug Discov 5:529–542
GWA (1995) Conformational changes of 85. Tetko IV, Sushko I, Pandey AK, Zhu H,
small molecules binding to proteins. Bioorg Tropsha A, Papa E, Oberg T, Todeschini R,
Med Chem 3:411–428 Fourches D, Varnek A (2008) Critical assess-
72. Klebe G, Abraham U, Mietzner T (1994) ment of QSAR models of environmental tox-
Molecular similarity indices in a comparative icity against Tetrahymena pyriformis: focusing
analysis (CoMSIA) of drug molecules to cor- on applicability domain and overfitting by
relate and predict their biological activity. variable selection. J Chem Inf Model
J Med Chem 37:4130–4146 48:1733–1746
73. Hopfinger AJ, Wang S, Tokarski JS, Jin B, 86. Zhu H, Tropsha A, Fourches D, Varnek A,
Albuquerque M, Madhav PJ, Duraiswami C Papa E, Gramatica P, Oberg T, Dao P,
(1997) Construction of 3D-QSAR models Cherkasov A, Tetko IV (2008) Combinatorial
using the 4D-QSAR analysis formalism. J Am QSAR modeling of chemical toxicants tested
Chem Soc 119:10509–10524 against Tetrahymena pyriformis. J Chem Inf
74. Andrade CH, Pasqualoto KFM, Ferreira EI, Model 48:766–784
Hopfinger AJ (2010) 4D-QSAR: perspectives 87. Guha R (2011) The ups and downs of struc-
in drug design. Mol Basel Switz ture-activity landscapes. Methods Mol Biol
15:3281–3294 672:101–117
75. Vedani A, McMasters DR, Dobler M (2000) 88. Bajorath J, Peltason L, Wawer M, Guha R,
Multi-conformational ligand representation Lajiness MS, Van Drie JH (2009) Navigating
in 4D-QSAR: reducing the bias associated structure–activity landscapes. Drug Discov
with ligand alignment. Quant Struct Act Today 14:698–705
Relat 19:149–161 89. Wassermann AM, Wawer M, Bajorath
76. Vedani A, Briem H, Dobler M, Dollinger H, J (2010) Activity landscape representations
McMasters DR (2000) Multiple-conformation for structure−activity relationship analysis.
and protonation-state representation in J Med Chem 53:8209–8223
4D-QSAR: the Neurokinin-1 receptor system. 90. Maggiora GM (2006) On outliers and activity
J Med Chem 43:4416–4427 cliffs: why QSAR often disappoints. J Chem
77. Vedani A, Dobler M (2002) 5D-QSAR: the Inf Model 46:1535–1535
key for simulating induced fit? J Med Chem 91. Eckert H, Bajorath J (2007) Molecular simi-
45:2139–2149 larity analysis in virtual screening: founda-
78. Vedani A, Dobler M, Lill MA (2005) tions, limitations and novel approaches. Drug
Combining protein modeling and Discov Today 12:225–233
6D-QSAR. Simulating the binding of struc- 92. Hu Y, Bajorath J (2012) Extending the
turally diverse ligands to the estrogen recep- activity cliff concept: structural categoriza-
tor. J Med Chem 48:3700–3703 tion of activity cliffs and systematic identifi-
79. Willett P (2006) Similarity-based virtual cation of different types of cliffs in the
screening using 2D fingerprints. Drug Discov ChEMBL database. J Chem Inf Model
Today 11:1046–1053 52:1806–1811
80. Cassotti M, Grisoni F, Nembri S, Todeschini 93. Cruz-Monteagudo M, Medina-Franco JL,
R (2016) Application of the weighted power- Pérez-Castillo Y, Nicolotti O, Cordeiro
weakness ratio (wPWR) as a fusion rule in MNDS, Borges F (2014) Activity cliffs in
ligand–based virtual screening. MATCH drug discovery: Dr Jekyll or Mr Hyde? Drug
Comm Math Comp Chem 76:359–376 Discov Today 19:1069–1080
81. Ewing T, Baber JC, Feher M (2006) Novel 94. Guha R, Jurs PC (2004) Development of
2D fingerprints for ligand-based virtual QSAR models to predict and interpret the
screening. J Chem Inf Model 46:2423–2431
52 Francesca Grisoni et al.
124. Moreau G, Turpin C (1996) Use of similarity est neighbours approach to assess the applica-
analysis to reduce large molecular libraries to bility domain of a QSAR model for reliable
smaller sets of representative molecules: predictions. J Cheminform 5:27
Informatique et analyse. I. Analysis 136. Dimitrov S, Dimitrova G, Pavlov T, Dimitrova
24:M17–M21 N, Patlewicz G, Niemela J, Mekenyan O
125. Hollas B (2002) Correlation properties of the (2005) A stepwise approach for defining the
autocorrelation descriptor for molecules. applicability domain of SAR and QSAR mod-
MATCH–Commun math. Comput Chem els. J Chem Inf Model 45:839–849
45:27 137. Jolliffe IT (1986) Principal component analy-
126. Magnuson V, Harriss D, Basak S (1983)
sis and factor analysis. In: Principal compo-
Topological indices based on neighborhood nent analysis. Springer, New York, NY,
symmetry: chemical and biological applica- pp 115–128
tions. In: Chemical applications of topology 138. Marvin Sketch 5.1.11 ChemAxon, (2013).
and graph theory. Elsevier, Amsterdam, http://www.chemaxon.com
pp 178–191 139.
NCI/CADD Group, (2013) Chemical
127. Roy A, Basak S, Harriss D, Magnuson V
Identifier Resolver. Available at: http://cac-
(1984) Neighborhood complexities and sym- tus.nci.nih.gov/chemical/structure
metry of chemical graphs and their biological 140.
Dalby A, Nourse JG, Hounshell WD,
applications. Pergamon Press, New York Gushurst AK, Grier DL, Leland BA, Laufer
128. Hall LH, Kier LB, Brown BB (1995)
J (1992) Description of several chemical
Molecular similarity based on novel atom- structure file formats used by computer pro-
type electrotopological state indices. J Chem grams developed at molecular design limited.
Inf Comput Sci 35:1074–1080 J Chem Inf Comput Sci 32:244–255
129. Hall LH, Kier LB (1995) Electrotopological 141.
RDKit: Open-source cheminformatics;
state indices for atom types: a novel combina- http://www.rdkit.org
tion of electronic, topological, and valence 142. Steinbeck C, Han Y, Kuhn S, Horlacher O,
state information. J Chem Inf Comput Sci Luttmann E, Willighagen E (2003) The
35:1039–1045 chemistry development kit (CDK): an open-
130. Kier LB, Hall LH (1990) An electrotopolog- source java library for chemo- and bioinfor-
ical-state index for atoms in molecules. Pharm matics. J Chem Inf Comput Sci
Res 7:801–807 43:493–500
131. Butina D (2004) Performance of kier-hall
143. Steinbeck C, Hoppe C, Kuhn S, Floris M,
E-state descriptors in quantitative structure Guha R, Willighagen EL (2006) Recent
activity relationship (QSAR) studies of multi- developments of the chemistry development
functional molecules. Molecules kit (CDK)-an open-source java library for
9:1004–1009 chemo-and bioinformatics. Curr Pharm Des
132. Todeschini R, Ballabio D, Consonni V (2015) 12:2111–2120
Distances and other dissimilarity measures in 144. Chemical Computing Group Inc., (2013)
chemometrics. In: Encyclopedia of analytical Molecular operating environment (MOE).
chemistry. John Wiley & Sons Ltd, Hoboken 1010 Sherbooke St West Suite 910 Montr.
133. Todeschini R, Ballabio D, Consonni V,
QC Can. H3A 2R7 2014
Grisoni F (2016) A new concept of higher- 145. Hong H, Xie Q, Ge W, Qian F, Fang H, Shi
order similarity and the role of distance/simi- L, Su Z, Perkins R, Tong W (2008) Mold2,
larity measures in local classification methods. molecular descriptors from 2D structures for
Chemom Intell Lab Syst 157:50–57 chemoinformatics and toxicoinformatics.
134. Cassotti M, Ballabio D, Consonni V, Mauri J Chem Inf Model 48:1337–1344
A, Tetko IV, Todeschini R (2014) Prediction 146. SciPy.org—SciPy.org. https://www.scipy.
of acute aquatic toxicity toward Daphnia org/. Accessed 5 Sep 2017
magna by using the GA-kNN method. Altern 147. Ballabio D (2015) A MATLAB toolbox for
Lab Anim 42:31–41 principal component analysis and unsuper-
135.
Sahigara F, Ballabio D, Todeschini R, vised exploration of data structure. Chemom
Consonni V (2013) Defining a novel k-near- Intell Lab Syst 149:1–9
Chapter 2
Abstract
The OECD QSAR Toolbox is a computer software designed to make pragmatic qualitative and quantita-
tive structure–activity relationship methods-based predictions of toxicity, including read-across, available
to the user in a comprehensible and transparent manner. The Toolbox, provide information on chemicals
in structure-searchable, standardized files that are associated with chemical and toxicity data to ensure that
proper structural analogs can be identified. This chapter describes the advantages of the Toolbox, the aims,
approach, and workflow of it, as well as reviews its history. Additionally, key functional elements of it use
are explained and features new to Version 4.1 are reported. Lastly, the further development of the Toolbox,
likely needed to transform it into a more comprehensive Chemical Management System, is considered.
Key words OECD QSAR Toolbox, Chemical category, Data gap filling, Adverse outcome pathways,
Weight of evidence
1 Introduction
Orazio Nicolotti (ed.), Computational Toxicology: Methods and Protocols, Methods in Molecular Biology, vol. 1800,
https://doi.org/10.1007/978-1-4939-7899-1_2, © Springer Science+Business Media, LLC, part of Springer Nature 2018
55
56 Terry W. Schultz et al.
2 Advantages
2.1 Toolbox One of the little-known advantages of the Toolbox is the structure
Governance of its governance. From the beginning, the establishment of guid-
ing principles and workflow, as well as the development of Toolbox
modules, was under the direction of an administrative hierarchy.
While the day-to-day development was under the direction of the
primary contractor, the Laboratory of Mathematic Chemistry
http://oasis-lmc.org/, and various sub-contractors, the overall
governance was through the OECD. The OECD, via the
Environmental Health and Safety Division of the Environment
Directorate, supervised the Toolbox on a monthly basis. The
decision-making body was the OECD QSAR Toolbox Management
Group which consisted of representatives of the OECD member
countries. While this group was formed largely of regulators famil-
iar with (Q)SAR, it also included representatives of industry and
nongovernmental organizations. Financial support was provided,
The OECD QSAR Toolbox Starts Its Second Decade 57
2.2 Toolbox Training Another key feature contributing to the success of the Toolbox is
the continuous development and expansion of training material
(http://oasis-lmc.org/products/software/toolbox/toolbox-sup-
port.aspx). New training exercises in the form of slide shows con-
tinue to be developed. Additionally, LMC provides on-site training
sessions, as well as regular courses (http://oasis-lmc.org/prod-
ucts/software/toolbox/toolbox/training.aspx).
2.3 Docking to Other A third advantage of the Toolbox is its ability to dock to other soft-
Software ware platforms. The docking of other software platforms to the
Toolbox allows the user to make predictions via the docked software
while using the knowledge from the Toolbox. The key element of
docking is the knowledge that the Toolbox represented by the pro-
filers can be used to search analogs in the training sets of the docked
software and their models. This capability is important for providing
reliable analogs and for establishing a weight-of-evidence.
The ECOSAR models for predicting aquatic toxicity and the
models of the Danish EPA QSAR Database are docked to the
Toolbox. In addition, the OASIS software platforms TIMES and
Catalogic are capable of being docked to the Toolbox and predic-
tions from respective models (TIMES model for predicting Skin
sensitization, AMES Mutagenicity, Catalogic 301C model, etc.)
can be done within the Toolbox. Importantly, the predictions from
the docked platforms can be reported via the Toolbox. Specific
information unique to the docked software (such as metabolic
maps, quantitative distribution of metabolites, the effect of meta-
58 Terry W. Schultz et al.
3.1 The Aims The long-term goal of the tool is to group organic substances into
of the Toolbox chemical categories for apical outcomes of regulatory interest and
using data from tested category members to fill data gaps for
untested category members. To be useful in a regulatory setting,
this means the Toolbox needed to provide all the information nec-
essary to ensure, as far as possible, that the Toolbox user would
actually use the prediction(s) coming from it as part of its regula-
tory assessment. To enhance the likelihood of acceptance, it was
critical that the Toolbox first gets the chemistry correct, second
gets the biology correct and thirdly, when appropriate, adds statis-
tical assurance. The problem is, while some regulatory endpoints,
such as acute aquatic toxicity are amenable to data gap filling by
classic QSAR modeling (e.g., trend analysis; Y = aX + b), many of
the more critical endpoints (e.g., human health effects) were not
amenable to such QSAR prediction.
While decades of work have led to the development of a variety
QSARs using various descriptors and various modeling approaches,
most of these models fail to achieve regulatory use. This lack of
acceptance is often a reflection of putting statistics ahead of chem-
istry and biology. The net result is typically a “black box” predic-
tion that lacks transparency and mechanistic understanding. One
answer to this dilemma, the one employed in the Toolbox, is the
category approach and read-across.
3.2 The Category The category approach is the basis for the Toolbox predictions [3].
Approach In the OECD chemical category approach, a number of chemicals
are grouped based on their similarity. Available experimental results
from one or more members of the category, the source substances,
are used to fill data gap for other members of the category, the
target substances.
According to the OECD guidance [3], similarity is context
dependent and not only similarity in chemical structure and physi-
cochemical properties, but also similarity of mechanism of interac-
tion with different biomolecular targets (e.g., proteins, DNA), as
well as toxicokinetic and toxicodynamic properties, should be con-
sidered [4].
(b) Identifies other substances that have the same features, and
(c) Uses existing experimental data to fill the data gap.
Six modules (i.e., Chemical Input, Profiling, Endpoints, Category
Definition, Filling Data Gap, and Report) [5] guide the use
through a logical workflow based on the category approach.
Guidance suggests these modules be employed in a sequential
work flow [6]. The “Chemical Input” module provides the user
with several means of entering the target chemical. Since all subse-
quent functions are based on chemical structure, the goal of this
module is to make sure the molecular structure assigned to the
target chemical is the correct one. The “Profiling” module elec-
tronically retrieves relevant information in the form of automated
alerts on the target compound. The “Endpoints” module elec-
tronically retrieves experimental results for regulatory endpoints
(e.g., data on environmental fate, ecotoxicity, in vitro or in vivo
mammalian toxicity). These data are stored in the Toolbox. This
data gathering can be executed in a global fashion (i.e., collecting
all data of all endpoints) or more often, on a more narrowly defined
basis (e.g., collecting data for a single or limited number of end-
points). The “Category Definition” module provides the user with
several methods of grouping chemicals into a toxicologically mean-
ingful category. It is key that the category includes the target mol-
ecule and at least one source substance. As previously pointed out,
this is the critical step in the work flow; several options are available
in the Toolbox to assist the user in refining the category definition
via subcategorization [5]. The “Filling Data Gaps” module pro-
vides the user with three options for making an endpoint-specific
prediction for the target chemical; these options, in increasing
order of complexity, are by read-across, by trend analysis, and
through the use of QSAR models. The ‘Report’ module provides
the user several means of downloading a written audit trail of the
sequence of Toolbox functions the user performed in arriving at
the prediction.
Further elaborations of the workflow of the Toolbox, includ-
ing examples, have been presented elsewhere [1, 5, 6, http://
oasis-lmc.org/products/software/toolbox/toolbox-support.
aspx].
(a) Developing “OECD Principles for the Validation, for
Regulatory Purposes, of (Q)SAR Models.”
(b) Garnering the experiences of member countries in applying
(Q)SAR.
(c) Developing a prototype (Q)SAR Application Toolbox.
The establishment of the “OECD Principles for the Validation, for
Regulatory Purposes, of (Q)SAR Models” [7, 8] largely built on
the discussions and conclusions of the 2002 workshop held in
Setubal, Portugal and organized by the European Chemical
Industry Council and the International Council of Chemical
Associations [9]. A report on the regulatory uses and applications
in OECD Member countries of (Q)SAR models in the assessment
of new and existing Chemicals was released in 2006 [10].
In June 2005, the OECD Member countries endorsed the
plan for Phase 1 of Toolbox development. The concept for Phase
1 of the Toolbox, inspired by Gilman Veith, relied on the concept
of chemical similarity [11]. The Phase 1 work (2006–2008), done
in conjunction with the European Commission, focused largely on
proof-of-concepts. The objective of Phase 1 was to develop a work-
ing prototype designed to integrate, in a single in silico platform,
knowledge and data, as well as ‘profilers’ for grouping chemicals.
The initial knowledge and data was donated by the member coun-
tries or provided by the contractor. The profilers, along with the in
silico platform, were developed by the Laboratory of Mathematical
Chemistry (LMC). Briefly, using the previously derived OASIS
platform, LMC implemented the agreed upon work flow.
Phase 1 was completed with the release of Version 1.0 of the
Toolbox, in October of 2008. Key to the member countries agree-
ing to its release were several case studies which demonstrated that
within mechanistically consistent categories, developed by the
Toolbox, existing experimental data within the category could be
used to fill data gaps for other chemicals in the same category.
These examples were designed to show the Toolbox Management
Group how the Toolbox’s chemical profilers and integrated toxi-
cology databases can be used to make transparent predictions. The
case studies focused on regulatory endpoints where high quality
databases and a basic understanding of the mechanisms of action
leading to the apical outcome existed. Particularly useful were the
case studies demonstrating the following:
(a) The skin sensitization potential of a chemical by read-across
based on protein-binding similarity.
(b) The Ames mutagenicity potential of a chemical by read-across
based on DNA-binding similarity.
(c) The acute aquatic toxicity to fish and Daphnia, based on the
correlation between toxic potency and hydrophobicity for the
narcotic mode of action (i.e., trend analysis).
The OECD QSAR Toolbox Starts Its Second Decade 61
5.2 Adverse An AOP delineates the documented, plausible and testable pro-
Outcome Pathways cesses by which a chemical induces a molecular perturbation and
the associated biological responses at the sub-cellular, cellular, tis-
sue, organ, whole animal, and/or, when appropriate, population
levels of observation [17, 18]. As such, an AOP depicts existing
knowledge concerning the association between the extremes of a
toxic sequence, a Molecular Initiating Event (MIE) and an Adverse
Outcome (AO). These two extremes are coupled by a succession
of Key Events (KEs) and, where possible, the relationships between
the KEs (KERs). An AOP is typically represented by moving from
one KE to another, as compensatory mechanisms and feedback
loops are overcome [19]. The KEs are by design limited to a few
measurable and toxicologically relevant events that are fundamen-
tal to the progression of biological occurrences leading to the
AO. An AOP is not expected to postulate a comprehensive descrip-
tion of every aspect of the chemistry and biology; rather it focuses
on the crucial steps along the sequence [18].
Since an AOP provides a means of recording and formalizing
toxicity pathway information, it has the capability of assisting in
developing a chemical category. For example, the estrogen recep-
tor (ER) binding AOP links ER binding to reproductive impair-
ment in fish [20]. This AOP is diagrammed in Fig. 1.
In the above example, experimental binding potency in the
Rainbow trout estrogen-receptor binding assay is linked to experi-
mental vitellogenin production in the Rainbow trout liver slice assay.
These data, when combined, lead to a rule-based expert system [21]
64 Terry W. Schultz et al.
Fig. 2 Key nodes of the AOP for protein-binding leading to skin sensitization
5.3 Reducing There are uncertainties inherent to the current in vivo toxicity test
Uncertainty paradigm. While some of these uncertainties are known, others
Through Weight have yet to be identified or characterized. The level of comfort
of Evidence (WoE) with these uncertainties has been bolstered by nearly 40 years of
experience with the methods and the data which they provide.
The rise in the development of new alternative test systems
provide additional information that, when used appropriately, pro-
vide additional WoE. There will be uncertainties in the alternative
66 Terry W. Schultz et al.
5.4 Summary of Key In summary, much of the more recent Toolbox development has
Elements been aimed at improving acceptance of data gap filling by read-
across (see Subheading 5.1). It is generally acknowledged that one
increases acceptance of a read-across prediction by decreasing the
uncertainty surrounding its preparation. The latter is attained by
increasing transparency and establishing better mechanistic prob-
ability (see Subheading 5.2), as well as assessing the uncertainty,
including the plausible mode of action, and WoE (see Subheading
5.3). Acceptance of a Toolbox prediction is driven by:
(a) Quality and quantity of the apical endpoint data,
(b) Confidence (e.g., adequacy and reliability) associated with the
underlying similarity hypothesis, and
(c) Good relevant supporting information, including data from
appropriate in vitro and in chemico methods.
Earlier versions of the Toolbox relied on simple profilers to estab-
lish a chemical category for reading across for an apical endpoint.
Within the context of an AOP, such a progression typically jumped
from the MIE to the AO with little regard to what happened in
between (Fig. 3). However, by adding toxicity pathway informa-
tion and appropriate information and data for relevant intermedi-
ate events, a more robust category could be established (Fig. 4).
Specifically, one that reduced uncertainty and adds WoE.
future will most likely include toxicity testing in the form of a bat-
tery of assays which are performed in silico, in chemico, and in
vitro, with a reduced need for in vivo testing. The long-term vision
will only be realized by incremental advances in both scientific
knowledge and regulatory acceptance. Moreover, incremental
integration of SAR and new test methods to augment and subse-
quently replace existing in vivo toxicity testing requirements also
will facilitate public acceptance of these alternative approaches. It
must be stressed that this long-term vision of the future will not be
realized for decades to come. By combining state-of-the-art
approaches in a transparent and scientifically defensible manner,
Toolbox-aided assessments will be compatible with the future
vision of toxicity testing and assessment.
Toxicology, especially in the regulatory sense, must be looked
at as both a science and an art. Predictive toxicology makes use of
data-gathering and observational processes to develop hypotheses
and models that can be used to make informed predictions about
adverse effects of chemicals for which there is little available experi-
mental data. The future of regulatory toxicology, as well as the
Toolbox, lies in enhancing the science and improving predictive
capacity.
Increasing the strength of the Toolbox of the future will lie in
enhancing the breadth of information that is used to develop an
understanding of the toxicological profile of a chemical. The
Toolbox must continue to seek new means of deriving toxicologi-
cal information from existing and future data and to infer toxico-
logical potential and potency based on chemical properties and
similarities to other chemicals. The Toolbox must continue to
advance the use of both in vitro and in vivo data; however, the
Toolbox will, in the near-term at least, continue to target only
those in vivo endpoints that have relevance to the toxicity profile of
the target chemical or chemical category. To this end, new in silico,
in chemico, and in vitro methods will be mapped to the Toolbox
and used as categorizing or prioritizing tools.
As noted above, AOPs are focused on developing an under-
standing of the underlying biological response that results in a
regulatory endpoint. Since one goal of the Toolbox is to base pre-
dictive toxicology on such understanding, AOPs will be important
to further Toolbox development. While simple AOPs have been
mapped to the Toolbox, future AOPs will be complex and highly
integrated. A complete mechanistic understanding of the biologi-
cal responses underlying most toxicity outcomes is arguably many
years in the future. However, work is underway at the OECD
Cooperative Chemicals Assessment Programme to develop this
kind of biological systems-level understanding. In order to develop
an understanding of the key responses that trigger an adverse out-
come, it will continue to be necessary to understand how an organ-
ism functions at all biological levels of organization and how these
74 Terry W. Schultz et al.
Acknowledgments
References
1. Diderich R (2010) Tools for category forma- safety series on testing and assessment no. 102.
tion and read-across overview of the OECD ENV/JM/MONO(2009)
(Q)SAR application toolbox. In: Cronin MTD, 6. Dimitrov SD, Diderich R, Sobanski T,
Madden JC (eds) In Silico toxicology: princi- Pavlov TS, Chankov GV, Chapkanov AS,
ples and applications. RSC Publishing, Karakolev YH, Temelkov SG, Vasilev RA,
Cambridge, pp 385–407 Gerova KD, Kuseva CD, Todorova ND,
2. Schultz TW, Dimitrova G, Dimitrov S, Mehmed AM, Rasenberg M, Mekenyan OG
Mekenyan OG (2016) The adverse outcome (2016) QSAR toolbox—workflow and
pathway for skin sensitisation: moving closer to major functionalities. SAR QSAR Environ
replacing animal testing. Altern Lab Anim Res 27:203–219
44:1–8 7. Organisation for Economic Cooperation and
3. Organisation for Economic Cooperation and Development (OECD) 2004. The Report from
Development (OECD) 2007. Guidance on the expert group on (quantitative) structure–
grouping of chemicals, OECD environmental activity relationships (Q) SARs. on the princi-
health and safety series on testing and assess- ples for the validation of (Q)SARs, ENV/JM/
ment no. 80. ENV/JM/MONO(2007)28 TG(2004)27/REV, Organisation for Economic
4. Organisation for Economic Cooperation and Cooperation and Development, Paris, FR
Development (OECD) 2014. Guidance on 8. Organisation for Economic Cooperation and
grouping of chemicals. 2nd edn, OECD envi- Development (OECD) 2007. Guidance docu-
ronmental health and safety series on testing ment on the validation of (quantitative) struc-
and assessment no. 194. ENV/JM/ ture–activity relationships (Q) SARs. Models,
MONO(2014)4 OECD environmental health and safety series
5. Organisation for Economic Cooperation and on testing and assessment no. 69. ENV/JM/
Development (OECD) 2009. Guidance docu- MONO(2007)2
ment for using the OECD (Q)SAR application 9. Jaworska JS, Comber M, Auer C, Van leeuwen
toolbox to develop chemical categories accord- CJ (2003) Summary of a workshop on regula-
ing to the OECD guidance on grouping of tory acceptance of (Q)SARs for human health
chemicals, OECD environmental health and and environmental endpoints. Environ Health
Perspect 111:1358–1360
76 Terry W. Schultz et al.
10. Organisation for Economic Cooperation and 19. Organisation for Economic Co-operation and
Development (OECD) 2006. Report on the Development (OECD) 2009. Report of the
regulatory uses and applications in oecd mem- expert consultation to evaluate an estrogen
ber countries of (Q)SAR models in the assess- receptor binding affinity model for hazard
ment of new and existing chemicals, OECD identification. OECD environmental health
environmental health and safety series on test- and safety series on testing and assessment, no.
ing and assessment no. 58. ENV/JM/ 111, ENV/JM/MONO(2009)33
MONO(2006)25 20. Schmieder PK, Kolanczyk RC, Hornung MW,
11. Bradbury SP, Russom CL, Schmieder PK, Tapper MA, Denny JS, Sheedy BR, Aladjov H
Henry TR, Schultz TW, Diderich R, Auer (2014) A rule-based expert system for chemical
CM2014 Advancing computational toxicology prioritization using effects-based chemical cat-
in a regulatory setting: a selected review of the egories. SAR QSAR Environ Res 25:253–287
accomplishments of Gilman D. Veith (1944– 21. Organisation for Economic Co-operation and
2013). Appl In Vitro Toxicol 1:11–20 Development (OECD) 2011. Report of the
12. Sakuratani Y, Zhang HQ, Nishikawa S, workshop on using mechanistic information in
Yamazaki K, Yamada T, Yamada J, Gerova K, forming chemical categories, 8–10 december
Chankov G, Mekenyan O, Hayashi M (2013) 2010, Crystal City, VA, USA. OECD environ-
Hazard evaluation support system (HESS) for mental health and safety series on series on test-
predicting repeated dose toxicity using toxico- ing and assessment no. 138, ENV/JM/
logical categories. SAR QSAR Environ Res MONO(2011)8
24:351–363 22. Organisation for Economic Co-operation and
13. Patlewicz G, Helman G, Pradeep P, Shah I Development (OECD) 2012. The adverse out-
(2017) Navigating through the minefield of come pathway for skin sensitisation initiated by
read-across tools: a review of in silico tools for covalent binding to proteins. Part 1: scientific
grouping. Comput. Toxicology 3:1–18 evidence. OECD environmental health and
14. Schultz TW, Amcoff P, Berggren E, Gautier F, safety series on testing and assessment no. 168,
Klaric M, Knight DJ, Mahony C, Schwarz M, ENV/JM/ MONO(2012)10/PART 1
White A, Cronin MTD (2015) A strategy for 23. Organisation for Economic Cooperation and
structuring and reporting a read-across predic- Development (OECD) 2016. Guidance docu-
tion of toxicity. Regul Toxicol Pharmacol ment on the reporting of defined approaches to
72:586–601 be used within integrated approaches to testing
15. Organisation for Economic Co-operation and and assessment. OECD environmental health
Development (OECD) 2014. Guidance on and safety series on testing & assessment no.
grouping of chemicals, 2nd edn, OECD envi- 255. ENV/JM/MONO(2016)28
ronmental health and safety series on testing 24. Schultz TW, Przybylak KR, Richarz AN,
and assessment no. 194 ENV/JM/ Bradbury SP, Cronin MTD (2017) Read-
MONO(2014)4 across for 90-day rat oral repeated-dose toxicity
16. Schultz TW, Cronin MTD (2017) Lessons for selected 2-alkyl-1-alkanols: a case study.
learned from read-across case studies for Comput Toxicol 2:28–38
repeated-dose toxicity. Regul Toxicol 25. Organisation for Economic Co-operation and
Pharmacol 88:185–191 Development (OECD) 2012. The adverse out-
17. Tollefsen KE, Scholz S, Cronin MT, Edwards come pathway for skin sensitisation initiated by
SW, de Knecht J, Crofton K, Garcia-Reyero N, covalent binding to proteins. Part 2: use of the
Hartung T, Worth A, Patlewicz G (2014) AOP to develop chemical categories and inte-
Applying adverse outcome pathways (AOPs) to grated assessment and testing approaches.
support integrated approaches to testing and OECD environmental health and safety series
assessment (IATA). Regul Toxicol Pharmacol on testing and assessment no 168: ENV/JM/
70:629–640 MONO(2012) 10/ PART 2
18. Organisation for Economic Co-operation and 26. Organisation for Economic Co-operation and
Development (OECD) 2016. Guidance docu- Development (OECD) 2016. Case study on
ment for the use of adverse outcome pathways the use of integrated approaches for testing and
in developing integrated approach to testing assessment for repeated dose toxicity of substi-
and assessment (IATA). OECD environmental tuted diphenylamines (SDPA). OECD envi-
health and safety series on testing & assessment ronmental health and safety series on series on
no. 260 ENV/JM/MONO(2016)67 testing and assessment no. 252, ENV/JM/
MONO(2016)50
The OECD QSAR Toolbox Starts Its Second Decade 77
27. Organisation for Economic Co-operation and 29. Ellison CM, Madden JC, Judson P, Cronin
Development (OECD) 2016. Case study on MTD (2010) Using in silico tools in a weight
the use of an integrated approach to testing of evidence approach to aid toxicological
and assessment for hepatotoxicity of allyl esters. assessment. Mol Inform 29:97–110
OECD environmental health and safety series 30. Borgert CJ, Mihairch EM, Ortego LS, Bentley
on testing and assessment no. 253, ENV/JM/ KS, Holmes CM, Levine SL, Becker RA (2011)
MONO(2016)51 Hypothesis-driven weight of evidence frame-
28. Balls M, Amcoff P, Bremer S, Casati S, Coecke work for evaluating data within the US EPA's
S, Clothier R, Combes R, Corvi R, Curren R, endocrine disruptor screening program. Regul
Eskes C, Fentem J, Gribaldo L, Halder M, Toxicol Pharmacol 61:185–191
Hartung T, Hoffmann S, Schechtman L, Scott 3 1. Schultz TW, Przybylak KR, Richarz A-N,
L, Spielmann H, Stokes W, Tice R, Wagner D, Mellor CL, Escher SE, Bradbury SP,
Zuang V (2006) The principles of weight of Cronin MTD (2017) Read-across for
evidence validation of test methods and testing 90-day rat oral repeated-dose toxicity for
strategies: the report and recommendations of selected n-alkanols: a case study. Comput
ECVAM workshop 58. Altern Lab Anim Toxicol 2:12–19
34:603–620
Chapter 3
Abstract
QSAR (quantitative structure–activity relationship) is a method for predicting the physical and biological
properties of small molecules; it is today in large use in companies and public services. However, as any
scientific method, it is nowadays challenged by more and more requests, especially considering its possible
role in assessing the safety of new chemicals. Posing the question whether QSAR is a way not only to
exploit available knowledge but also to build new knowledge, we shortly review QSAR history, thus
searching for a QSAR epistemology. We consider the three pillars on which QSAR stands: biological data,
chemical knowledge, and modeling algorithms. Most of the time we assume that biological data is a true
picture of the world (as they result from good experimental practice), that chemical knowledge is scientifi-
cally true; so if a QSAR is not working, blame modeling. This opens the way to look at the role of modeling
in developing scientific theories, and in producing knowledge. QSAR is a mature technology; however,
debate is still active in many topics, in particular about the acceptability of the models and how they are
explained. After an excursus in inductive reasoning, we relate the QSAR methodology to open debates in
the philosophy of science.
1 Introduction
Orazio Nicolotti (ed.), Computational Toxicology: Methods and Protocols, Methods in Molecular Biology, vol. 1800,
https://doi.org/10.1007/978-1-4939-7899-1_3, © Springer Science+Business Media, LLC, part of Springer Nature 2018
79
80 Giuseppina Gini
Today, there are two main streams in the development and use
of QSAR: QSAR as a method for molecular design, and QSAR as
a method for evaluation of properties. The first stream uses QSAR
together with computational chemistry tools and docking meth-
ods, and is aimed at designing new products with desired proper-
ties, in particular drugs. The second line is aimed at screening
molecular structures for predicting properties relevant for humans
and nature.
We will not consider the tools necessary for molecular design,
but only will concentrate on the common aspects of both QSARs,
namely the building of predictive models and the meaning of these
models.
2.1 Biological Data Biological data used in QSAR originate from laboratory testing on
living systems (animal or vegetal). Many models use principles sim-
ilar to QSARs to predict physicochemical properties (as boiling
point or logP) and are usually called QSPRs (quantitative struc-
ture–property relationships). In the following, we focus only on
biological data, since the effects of chemical substances on the liv-
ing systems are the final interest of regulatory bodies and industrial
developers.
While chemical and physical knowledge are often enough to
understand the physicochemical properties of the molecules, the
biological effects are hard to model and are affected by a larger
variability. The main reason is the individual variability; it is impos-
sible to find two perfectly identical instances of any biological
entity, thus affecting the reproducibility of the experiments. If a
substance has different effects on different individuals, how is it
possible to generalize the results to a whole population? The prac-
tice in toxicology is standardizing the experimental results from
individual answers to average over a population. The purpose of
the biological endpoints LD50 or IC50 is exactly that of eliminating
the individual variability from the experimental results.
A more basic discussion about how to make use of biological
observations to make science has been a hot topic in the philoso-
phy of science in the last century. How to explain the biological
phenomena that generate the data? Reductionist and antireduc-
tionist positions in biology appeared [17]. Pure reductionists con-
tend that the only biological explanation to seek is at the level of
82 Giuseppina Gini
2.3 Modeling Modeling is the last, but not least, pillar of QSAR. Roughly, there
Algorithms are two main streams for making models: data modeling and
algorithmic modeling.
Data modeling is the stream commonly developed by statisti-
cians: from the data analysis they postulate the kind of relation
between data and response, and use a large set of mathematical
tools to derive the model.
Algorithmic modeling has been developed more recently,
starting in the mid-1980s when powerful new algorithms for fit-
ting data became available. The community using these tools
aimed at building predictive models for data sets so difficult that
84 Giuseppina Gini
3.1 Success Stories Classical QSARs are based on the Hansh hypothesis that three
main molecular properties are required to explain variations in a set
of congeneric compounds: electronic, hydrophobic, and steric
properties. The intent of those initial QSARs was not predictive
but explanatory, as a way to increase the understanding of the bio-
chemical properties under consideration. The initial QSAR equa-
tions were linearly relating the three descriptors with the logarithm
1
Titus Lucretius Carus, Of The Nature of Things,
http://www.gutenberg.org/ebooks/785?msg=welcome_stranger
86 Giuseppina Gini
2
http://www.antares-life.eu/index.php?sec=modellist
QSAR: What Else? 87
3.2 Pitfalls Since modern QSARs have large domain of applicability and use a
large set of molecular properties (often implicitly contained into
molecular descriptors), there is no more a common structural core.
They are no more local models of a specific “mechanism.” The
molecular descriptors are not easily connected to the biological
interactions of ligand and protein, so QSAR models using those
descriptors cannot say directly what biochemical property is rele-
vant to explain the effect. This situation is partially modified by the
use of fingerprint descriptors that conceptually can be interpreted
as fragments relevant to some biological process.
Cronin and Schultz in 2003 made a warning about bad prac-
tices in QSAR. They wrote: “At the end of the day, however,
QSARs are predictive techniques based on the relationship, for a
series of chemicals, between some form of biological activity and
some measure(s) of physico-chemical or structural properties. As
such, there are a number of limitations to the use and application
of QSARs. It is the concern of the authors that these are often not
appreciated, or may be forgotten by the developers of QSAR”
[27]. Considering the three pillars of QSARs, they indicate the
most common pitfalls in:
1. Biology: experimental errors, reproducibility of data;
2. Chemistry: errors in descriptors;
3. Statistical analysis: overfitting, using unnecessarily complex
models.
After a decade, Cherkasov et al. [12] listed twenty-one prob-
lems that can make it hard to develop or to accept QSARs in prac-
tice. We group them in five categories:
(a) Data curating: Failure to take account of data heterogeneity—
Use of inappropriate end point units.
(b) Data preprocessing: Use of confounded descriptors—Use of
noninterpretable descriptors—Errors in descriptor values.
(c) Model construction: Overfitting of data—Use of excessive
numbers of descriptors in a QSAR—Incorrect calculation—
Lack of descriptor auto-scaling—Misuse or misrepresentation
of statistics—No consideration of distribution of residuals—
Replication of chemicals in a data set.
(d) Model validation: Inadequate training and/or test set selec-
tion—Inadequate QSAR model validation—Inadequate or
missing statistic measures.
(e) Model usability and delivery: Poor transferability of QSARs-
Inadequate or undefined applicability domainC—
88 Giuseppina Gini
models. The valuable answers are many, and validation can be ful-
filled with a plethora of methods. We will explore this point in
Subheading 5.
The fifth requisite is about model interpretation. A mechanistic
interpretation strictly means that, similarly to a mechanical system,
the predictors (or most of them) used by the model can be played
in a simulation exercise to show that really their values activate a
process and produce the observed results. This property of the
models seldom can be demonstrated, for various reasons. For
instance, there are tens or hundreds of molecular descriptors that
are correlated with shape, physical properties, steric properties, and
so on. This correlation is a coarse interpretation of the “mecha-
nism,” if any, involved. Second, the hidden variables may be more
important than the predictors. Third, different good quality mod-
els can use completely different descriptors, giving the idea that
any kind of mechanistic interpretation has more a didactic value
than a true interpretation value. The problem of interpretation of
a predictive model is the subject of philosophical speculation, and
will be discussed in Subheading 5.
3.3 Trends Trends in QSAR as we have so far seen include the change of focus:
1. From laboratory use to regulatory use.
2. From a chemical family to the chemical space.
3. From linear to nonlinear models.
4. From single model to ensemble models.
Other trends appear and disappear, and one of them is related
to the right representation level. A recurrent issue is whether to
consider the whole molecule or focus on the presence of specific
fragments. The whole molecule is represented through molecular
descriptors, while fragments are represented as simple strings or
bits. Moreover, does the effect depend on the whole structure
(including energy and chemical properties arising from the 3D
shape) or on the presence of specific functional groups, usually
called structural alerts (SA)?
In the development of predictive models SAs are used as a
transparent way to decide the possible hazards of a chemical. Many
systems of large use rely on rules that check for SAs. SAs were cre-
ated by human experience after observing in many cases that the
presence of a given structure was associated with the effect. Most
of them are accepted since a full set of plausible chemical transfor-
mations and its binding to a receptor is individuated. The role of
functional subgroups in design of drugs is of primary importance.
The drawback of their use for regulatory purposes is that often
the association observed is not a causal relationship but is still a
statistical relationship. In some cases the alert is positively corre-
lated to the effect in about 50% of the observations. In general the
only use of SAs tends to overestimate the positive effect, and has to
90 Giuseppina Gini
4 QSAR as Induction
4.1 Induction The online Oxford English Dictionary defines induction as: “The
process of inferring a general law or principle from the observation
of particular instances (opposed to deduction).”
The problem of induction is the question of whether inductive
reasoning leads to knowledge. Inductive methods are essential in
science (as well as in everyday life), but they come to a cost.
Induction seems to lack of justification; it either makes a general-
ization of the properties of a class of objects based on a large num-
ber of observations of particular instances, or presupposes that a
sequence of events in the future will occur as it always has in the
past. To give an example, after observing that “all birds we have
seen fly,” the induction” all birds fly,” is true only before the obser-
vation that ostriches cannot fly.
A principle found by induction cannot be proved deductively.
That induction is opposed to deduction (as stated in the Oxford
Dictionary) is not always right. However, deductive logic is demon-
strable: the premises of an argument constructed according to the
rules of logics imply the argument’s conclusion. For induction
there are no complete theories to distinguish good from bad
inductions.
David Hume is considered the father of inductive reasoning.
Hume was interested in how we make causal connections, an
argument central in his project of developing the empirical science
of human nature and belief. Hume divided all reasoning into
QSAR: What Else? 91
4.2 The Role The Merriam-Webster dictionary defines statistics as: “a branch of
of Statistics mathematics dealing with the collection, analysis, interpretation,
and presentation of masses of numerical data.” Statistical analysis
infers properties about a population, which is assumed to be larger
than the observed data set.
Statistic is a well-developed tool used to derive information
from multiple observations. As such it is connected with induction.
Statistical inference is the process of deducing properties of an
underlying distribution by analyzing data.
Since all empirical scientific research uses statistics, the philoso-
phy of statistics is of key importance to the philosophy of science,
and is connected with the problem of induction, which concerns
how to justify the inferences that extrapolate from data to predic-
tions and general facts.
In 1953 Rudner wrote: “Since no hypothesis is ever completely
verified, in accepting a hypothesis the scientist must make the deci-
sion that the evidence is sufficiently strong or that the probability is
sufficiently high to warrant the acceptance of the hypothesis” [32].
“Sufficiently” is the key word. Consider when making experiments
with known error probability (the probability of rejecting a true
hypothesis or of accepting a false hypothesis). Here the problem of
induction is in reality a problem of decision, and the acceptability
of the results is in practice an optimization problem.
In the 70s of the last century, there was a flourishing of statistic
predictive models. Most of them were linear models, in particular,
proper models that used weights of the features (obtained in a way
to optimize the model properties), or improper models using ran-
dom weights (or unitary weights). These systems aimed at improving
3
https://plato.stanford.edu/entries/induction-problem/
92 Giuseppina Gini
4.3 The Role The problems considered by probability and statistics are inverse to
of Probability each other. In probability theory we consider some underlying
process, which has some uncertainty modeled by random variables,
and we figure out what happens. In statistics we observe some-
thing that has happened, and try to figure out what underlying
process would explain the observations.
The connection between induction and probability is given by
the Bayes rule. In an inductive inference problem there are data
D = d1, d2, … dn, and a set of hypotheses H = h1, h2, … hm. The
problem is to find out which of the hypothesis is the true hypoth-
esis that explains the data, or is the most plausible one. The Bayes
formula P(h|D) = P(D|h)P(h)P(D) computes the probability of
each hypothesis h given the data D (in the formula is P(h\D)),
using the inverse probability of observing such data given the
hypothesis (in the formula P(D\h)). P(D) is an independent con-
stant, leaving the computational burden on defining the P(D\h),
and on assigning the prior probabilities P(h). Conceptually, it is
hard to assign such P(h) probabilities before observing the data D.
Solomonoff [35] proposed a universal prior distribution that
unifies probability and uncertainty, thus answering the question:
given data about an unknown phenomenon, how to rate different
hypotheses and thereby select the hypothesis that best explains that
phenomenon? And how to use this hypothesis to predict new data
and measure the likelihood of that prediction being the right one?
The Solomonoff answer is the algorithmic probability theory
(APT). APT integrates philosophical principles with mathematics.
QSAR: What Else? 93
4.4 From Models People think that using statistics is a declaration of ignorance: we
to Theory do not have a deterministic knowledge, so we have to use statistics.
to Knowledge This debate has been active for about a century in atomic physics,
with the so-called Copenhagen interpretation, accepted by Bohr
and Heisenberg, and rejected by Planck and Einstein. The
Copenhagen interpretation states that physical systems generally
do not have definite properties prior to being measured, and quan-
tum mechanics can only predict the probabilities that measure-
ments will produce certain results; the act of measurement affects
the system, causing the set of probabilities to reduce to only one of
the possible values immediately after the measurement.
Even though models born to explain the property of atoms
cannot be exported to other fields of science, many of the consid-
erations advanced by the Copenhagen interpretation were indeed
taken from Hume.
Scientific models represent a phenomenon that is interesting in
science, as for instance the Bohr model of the atom. But we could
represent the same subject matter in different ways. Any model,
and QSAR too, has two different representational functions: it can
model a target or a theory. In the first case the model can be a rep-
resentation of a selected part of the world, the target system. In the
second case the model represents a theory, in the sense that it
interprets the laws of that theory. These two notions are not mutu-
ally exclusive in scientific models. Usually the knowledge about the
model can be translated into knowledge about the target system.
Models of data, introduced by [36], are an idealized version of
the data obtained from immediate observation, the raw data. After
the steps of data reduction and curve fitting what we get is a model
of data. Models of data play a crucial role in confirming theories
because it is the model of data and not raw data that we compare
to a theoretical prediction.
94 Giuseppina Gini
4
https://plato.stanford.edu/entries/justep-intext/
96 Giuseppina Gini
on a test set, a set of data not used in any way in the process of
building the model. The implicit assumption is that both the train-
ing and the test sets are taken from the same population. To avoid
the bias of choosing a bad test set, the process should be repeated
more times with randomly selected test and training sets, and the
error averaged.
Other validation methods include cross-validation; it should
assure that the random sampling of the test guarantees that each
class is properly represented. In cross-validation we fix a number of
folds, or partitions of the data. In the case of tenfold cross-
validation, the dataset is split in 10 equal partitions; 9 are used for
training and 1 for testing, repeating the procedure ten times so
that any instance is used at once for testing. Finally, the ten errors
are averaged to obtain the overall error estimate. The choice of 10
has practical and theoretical evidence, but other numbers, as 5, can
work as well. Cross-validation provides a method of high accuracy
and low variability in estimating the model performance. Leave-
one-out is n-fold cross-validation for n = 1. The procedure is inter-
esting since it uses a large training set and no random split is
involved. However, it can give artificial error rates, since the test set
contains only one element.
In case there is a plenty of data (as in the big data applications
on web data) it is straightforward to use many of them for training
and the remaining for testing. The results on the test set are con-
sidered to be a true picture of the predictive capability of the
model. But there is a trade-off: the larger the training set, the bet-
ter the classifier; the larger the testing set, the better the error esti-
mate on future new items.
A common practice in QSAR is to use an external test set [43].
However, both statistical and experimental analyses show that the
external test set is not always the best choice. Gütlein et al. [44]
report a large evaluation of cross-validation and external test set
validations using a big dataset of 500 and a small dataset of 100
compounds; the authors consistently found that cross-validation
gives a more pessimistic view of the model performance than exter-
98 Giuseppina Gini
nal test set. Moreover, since in QSAR often data are scarce, cross-
validation can be the best evaluation choice for small data sets.
The bootstrap is another estimation method based on sam-
pling with replacement. The idea is to use the same instance twice,
thus sampling the data set of n instances to obtain a new data set
of n instances where some elements are missing and some are
repeated. The bootstrap procedure is repeated many times with
different replacements and the errors averaged. This could be
another good way to measure the error in small datasets.
The coefficient of determination, which indicates the propor-
tion of the variance in the dependent variable that is predictable
from the independent variable(s), called R2, is usually adopted in
regression QSARs. Its value ranges in (0–1) and the values com-
puted on the training set and on the test sets should be as near as
possible to 1 and similar. Several other measures are available, as
the commonly mean squared error (MSE) and the mean absolute
error (MAE).
Another useful and often neglected way to assess the value of a
classifier is the ROC (receiver operating characteristic) curve. The
ROC curve plots the percentage of positive on the total number of
positive in the test set on the vertical axis, and the percentage of
negatives on the horizontal axis (this applies also to the training
set). The result is a line that starts in (0, 0) and ends in (1, 1). The
curve of a perfect classifier immediately reaches (0, 1) and is then
constant to (1, 1), while good classifiers approach that ideal curve.
Different classifiers have different ROC curves that can be com-
pared and interpreted.
ROC graphics has been extended to regression models; REC
(Regression Error Characteristic) curves use a different range of
values on the x-axis, giving an effective representation, seldom
used in QSAR [45]. See Fig. 2 for an example REC curve compar-
ing three models: the best model, which corresponds to the lowest
squared residual, is characterized by the upper curve.
Sometimes we are interested in comparing different training
schemes on the same dataset. In this case it is not completely cor-
rect to compare their errors in any of the before mentioned meth-
ods, but it is necessary to apply statistics tests, as the t-test or the
student’s-t-test. Often in the QSAR literature only the R2 are
reported; they are biased since the coefficient of determination
increases with the number of independent variables.
0.9
mean(0.47838)
0.8 NNgdx4(0.35436)
NNbr3(0.32655)
0.7
0.6
Accuracy
0.5
0.4
0.3
0.2
0.1
0
0 0.5 1 1.5 2 2.5
Squared residual
6 Conclusions
References
1. Hansch C, Maloney PP, Fujita T, Muir RM state information. J Chem Inf Comput Sci
(1962) Correlation of biological activity of 35:1039–1045
phenoxyacetic acids with hammett substitu- 7. Connolly ML (1985) Computation of molecu-
ent constants and partition coefficients. Nature lar volume. J Am Chem Soc 107:1118–1124
194:178–180 8. Karelson K, Lobanov VS, Katritzky AR (1996)
2. Hansch C, Fujita T (1964) p-σ-π analysis. A Quantum-chemical descriptors in QSAR/
method for the correlation of biological activ- QSPR studies. Chem Rev 96:1027–1044
ity and chemical structure. J Am Chem Soc 9. Wold S, Sjostrom M, Eriksson L (2001) PLS-
86:1616–1626 regression: a basic tool of chemometrics.
3. Hansch C (1969) Quantitative approach to Chemom Intell Lab Syst 58:109–130
biochemical structure-activity relationships. 10. Rogers D, Hopfinger AJ (1994) Application of
Acc Chem Res 2:232–239 genetic function approximation to quantitative
4. Free SM, Wilson JW (1964) A mathemati- structure-activity relationships and quantitative
cal contribution to structure-activity studies. structure-property relationships. J Chem Inf
J Med Chem 7:395–399 Comput Sci 34:854–866
5. Kier LB, Hall LH, Murray WJ, Randić M 11. Li L, Hu J, Ho Y-S (2014) Global performance
(1975) Molecular connectivity I: relationship and trend of QSAR/QSPR research: a biblio-
to non specific local anesthesia. J Pharm Sci metric analysis. Mol Inform 33:655–668
64:1971–1974 12. Cherkasov A, Muratov EN, Fourches D,
6. Hall LH, Kier LB (1995) Electrotopological Varnek A, Baskin II, Cronin MTD et al (2014)
state indices for atom types: a novel combi- QSAR modeling: where have you been? Where
nation of electronic, topological, and valence are you going to? J Med Chem 57:4977−5010
104 Giuseppina Gini
13. Cramer RD, Patterson DE, Bunce JD (1988) 27. Cronin MTD, Schultz W (2003) Pitfalls
Comparative molecular field analysis (CoMFA). in QSAR. J Mol Struct (THEOCHEM)
1. Effect of shape on binding of steroids to car- 622:39–51
rier proteins. J Am Chem Soc 110:5959–5967 28. Alves V, Muratov E, Capuzzi S, Politi R, Low
14. Golbraikh A, Tropsha A (2002) Beware of q2! Y, Braga R et al (2016) Alarms about structural
J Mol Graph 20(4):269–276 alerts. Green Chem 18:4348–4360
15. Gramatica P (2007) Principles of QSAR mod- 29. Ferrari T, Gini G (2010) An open source multi-
els validation: internal and external. QSAR step model to predict mutagenicity from statis-
Comb Sci 26:694–701 tic analysis and relevant structural alerts. Chem
16. OECD principles for the validation, for regu- Cent J 4(Suppl 1):S2. (online http://www.
latory purposes, of (quantitative) structure- journal.chemistrycentral.com/content/4/
activity relationship models. Organization S1/S2)
for economic co-operation and development 30. Gini G, Franchi AM, Manganaro A, Golbamaki
(2004) http://www.oecd.org/env/ehs/risk- A, Benfenati E (2014) ToxRead: a tool to assist
assessment/37849783.pdf in read across and its use to assess mutagen-
17. José Ayala F, Dobzhansky T (eds) (1974) icity of chemicals. SAR QSAR Environ Res
Studies in the philosophy of biology: reduction 25:999–1011
and related problems. University of California 31. Benfenati E, Roncaglioni A, Petoumenaou M,
Press, California Cappelli C, Gini G (2015) Integrating QSAR
18. Popper KR (1974) Scientific reduction and and read across for environmental assessment.
the essential incompleteness of all science. In: SAR QSAR Environ Res 26:605–618
Ayala FJ, Dobzhansky T (eds) Studies in the 32. Rudner R (1953) The scientist qua scientist
philosophy of biology. Palgrave, London makes value judgments. Philos Sci 20:1–6
19. Schummer J (1999) Coping with the growth 33. Lovie AD, Lovie P (1986) The flat maximum
of chemical knowledge: challenges for chem- effect and linear scoring models for prediction.
istry documentation, education, and working J Forecast 5:159–168
chemists. Educación Química 10:92–101 34. Trout JD, Bishop M (2002) 50 years of suc-
20. Gòmez Bombarelli R, Duvenaud DK, cessful predictive modeling should be enough:
Hernàndez Lobato JM, Aguilera-Iparraguirre lessons for philosophy of science. Philos Sci
J, Hirzel TD, Adams RP, Aspuru-Guzik A 69(S3):S197–S208
(2016) Automatic chemical design using a 35. Solomonoff RJ (1964) A formal theory of
data-driven continuous representation of mol- inductive inference: parts 1 and 2. Inf Control
ecules. ACS Central Science, Washington, DC 7:1-22–224-254
21. Gini G, Ferrari T, Cattaneo D, Golbamaki 36. Suppes P (1962) Models of data. In Studies
Bakhtyari N, Manganaro A, Benfenati E (2013) in the methodology and foundations of sci-
Automatic knowledge extraction from chemi- ence. Selected Papers from 1951 to 1969,
cal structures: the case of mutagenicity predic- Dordrecht, Reidel. pp. 24–35
tion. SAR and QSAR Environ Res 24:365–383 37. Hodges W (1997) A shorter model theory.
22. Brieman L (2001) Statistical modeling: the Cambridge University Press, Cambridge
two cultures (with comment and a rejoinder by 38. Bailer-Jones DM (2003) When scientific mod-
the author). Stat Sci 16:199–231 els represent. Int Stud Philos Sci 17:59–74
23. Rissanen J (1978) Modeling by shortest data 39. Giere R (1988) Explaining science: a cogni-
description. Automatica 14:465–658 tive approach. University of Chicago Press,
24. Wolpert D (1996) The lack of a priori distinc- Chicago
tions between learning algorithms. Neural 40. Cartwright N (1983) How the laws of physics
Comput 8:1341–1390 lie. Clarendon Press, Oxford
25. Benfenati E, Gini G, Hoffmann S, Luttik R 41. Hempel CG, Oppenheim P (1948) Studies in
(2010) Comparing in vivo, in vitro, in Silico the logic of explanation. Philos Sci 15:135–175
methods and integrated strategies for chemi-
cal assessment: problems and prospects. ATLA 42. Witten H, Frank E (2000) Data mining: prac-
38:153–166 tical machine learning tools and techniques
with java implementations. Morgan Kaufmann
26. Benfenati E, Gonella Diaza R, Cassano Publishers, London
A, Pardoe S, Gini G, Mays C et al (2011)
The acceptance of in silico models for 43. Benfenati E, Crètien JR, Gini G, Piclin N,
REACH. Requirements, barriers, and perspec- Pintore M, Roncaglioni A (2007) Validation of
tives. Chem Cent J 5:58 the models. In: Benfenati E (ed) Quantitative
structure-activity relationships (QSAR) for
QSAR: What Else? 105
pesticides regulatory purposes. Elsevier, 46. Vapnik VN (1995) The nature of statistical
Amsterdam, pp 185–200 learning theory. Springer-Verlag, Berlin
44. Gütlein M, Helma C, Karwath A, Kramer S 47. Polishchuk PG (2017) Interpretation of QSAR
(2013) A large-scale empirical evaluation of models: past, present and future. J Chem Inf
cross-validation and external test set validation Model 57(11):2618–2639
in (Q)SAR. Mol Inform 32:516–528 48. Hartung T (2017) Food for thought. Opinion
45. Bi J, Bennett K P (2003) Regression error versus evidence for the need to move away
characteristic curves. Procs of the Twentieth from animal testing. ALTEX 34:193–200
international conference on machine learning 49. Ulanowicz RE (2009) A third window: natural
(ICML-2003), Washington DC life beyond Newton and Darwin. Templeton
Foundation Press, West Conshohocken
Chapter 4
Abstract
REACH is a regulation of the European Union adopted to improve the safe use of chemicals with regard
to human health and the environment. The safe use of chemicals can be achieved only if the hazard and
the exposure of the substances are well characterized. Testing on animals has been traditionally the main
tool for hazard assessment. For ethical and economic reasons, alternative ways of testing that do not use
laboratory animals have been developed by different parties (regulatory agencies, researchers, industry)
over the recent decades, and their proper use in hazard assessment is encouraged under REACH. In this
chapter, we describe how (Q)SAR models and predictions are included into REACH and their adequate
use promoted by the European Chemicals Agency (ECHA).
1 Introduction
*The views and opinions expressed in this chapter represent exclusively the personal ideas of the authors and do not
represent the official position of the Agency.
Orazio Nicolotti (ed.), Computational Toxicology: Methods and Protocols, Methods in Molecular Biology, vol. 1800,
https://doi.org/10.1007/978-1-4939-7899-1_4, © Springer Science+Business Media, LLC, part of Springer Nature 2018
107
108 Toni Alasuvanto et al.
Fig. 1 Relative proportions of the options used by registrants to cover REACH information requirements. In this
figure, omitted studies include both REACH Annexes VII–X Column 2 and Annex XI adaptations
3.1 Substance The vast majority of (Q)SAR models require the input of a SMILES
Characterization (i.e., a structure) to predict a result. Substances registered under
REACH can be of different types: monoconstituents, multicon-
stituents, and substances with unknown or variable composition,
complex reaction products, or biological materials (UVCB).
Monoconstituents are substances where the main constituent has a
concentration ≥80%, and the remaining part of the composition is
impurities and additives. Multiconstituents are substances consist-
ing of several main constituents, each of them at a concentration
between 10% and 80%. UVCBs are substances where the composi-
tion can be variable or difficult to predict, and some constituents
are unknown. The first aspect worth taking into account is the
concept of substance type and compositions in REACH.
While for monoconstituent substances, the selection of the
input structure might be simple (and yet the potential (lack of)
toxicity of impurities and additives needs to be discussed), the
choice of the input for multiconstituents and UVCBs is not trivial.
One way to address multiconstituents is to predict the different
structures individually, and then select the highest toxicity among
the predicted values for further hazard and risk assessment
calculations. Such an approach does not take into account possible
synergies in the toxicological mode of action of the component,
and therefore needs to be justified. For UVCBs, the situation is
even more complex, because some of the structures may be even
unknown. In this case, the input may include one or more repre-
sentative structures supposed to cover the range of expected toxic-
ity of the UVCB substance [7].
(Q)SARs as Adaptations to REACH Information Requirements 113
3.2 Results Are The validity of a model, for regulatory purposes, of (Q)SARs can
Derived from (Q)SAR be assessed according to the principles set by OECD [8]. The
Models Whose OECD document states that “To facilitate the consideration of a
Scientific Validity Has (Q)SAR model for regulatory purposes, it should be associated with
Been Established the following information:
1. a defined endpoint,
2. an unambiguous algorithm,
3. a defined domain of applicability,
4. appropriate measures of goodness-of-fit, robustness and
predictivity,
5. a mechanistic interpretation, if possible.”
Principle 1 ensures that there is clarity on the endpoint being
predicted and the experimental system being modeled. The rele-
vance of the endpoint will be analyzed when discussing the third
REACH requirement—adequacy of the results.
Principle 2 is set to ensure transparency and reproducibility of
the results, which are essential for regulatory acceptance.
Principle 3 is needed to be able to assess whether the chemical
under investigation falls within the applicability domain of the
model. A definition of applicability domain for (Q)SARs accepted
worldwide is lacking, therefore model developers define the appli-
cability domain differently. From the concept of applicability
derives the assessment of the reliability of a prediction, which will
be further discussed when dealing with the second REACH
requirement—the substance falls within the applicability domain.
Principle 4 gives the possibility to estimate uncertainties and
error probabilities associated with the predictions. This is very
important in the regulatory field and for risk assessment. The level
of uncertainty that can be accepted depends on the purpose of the
prediction (higher for screening, lower for definitive
considerations).
Principle 5 is optional in the OECD principles; however, expe-
rience indicates that there is less confidence in predictions without
a mechanistic rationale, which are therefore rarely accepted.
It is important to note that the validity of the model used to
obtain the prediction is only the first REACH requirement for hav-
ing an acceptable prediction.
3.3 The Substance The concept of applicability domain is linked to the assessment of
Falls Within the reliability of the prediction. There are some points that can be
the Applicability addressed to evaluate the reliability of the specific prediction and
Domain of the (Q) these points apply regardless of the definition of applicability
SAR Model domain. Results derived from interpolation have lower uncertainty
than results of extrapolation. Therefore, it is important that the
substance under investigation falls within the descriptor ranges,
whose mechanistic and metabolic domain are defined by the mol-
114 Toni Alasuvanto et al.
ecules in the training set of the model. On the same line, it can also
be checked that all the structural fragments of the target chemical
are represented in the training set of the models. Further on, pre-
dictions are more reliable when the models are trained with struc-
tures similar enough to the target, and these analogues are well
predicted. Finally, some types of substances (e.g., salts) may need
specific considerations because some models have difficulties han-
dling them.
3.4 Results Are Once the model is considered valid and the prediction within its
Adequate applicability domain, the adequacy of the prediction (i.e., its rele-
for the Purpose vance for the purpose it is being used) needs to be verified. The
of Classification adequacy of the prediction is assessed by comparing it to the regu-
and Labeling and/or latory requirements, both in term of endpoint and results. First,
Risk Assessment the type and quality of information obtained with the prediction
has to be of the same level than that of the test. A subchronic toxic-
ity study on rats would give information on the target organs, types
of effects and the doses at which these effects are observed. (Q)
SAR models available today cannot reliably provide such a wealth
of information and therefore should not be used as standalone to
replace standard information requirements for “high-tier” end-
points when it comes to definitive hazard assessment. On the same
line, a classification model predicting positive or negative outcomes
cannot replace a test providing quantitative information on the
toxic dose. Moreover, the model should be developed using train-
ing data of high quality from validated tests, comparable to the
standard requirement. Ames test should be performed on five bac-
terial strains in the absence and presence of metabolic activation,
and a model trained on data from four bacterial strains cannot
replace the standard test. If the standard requirement foresees a
test with an exposure to the substance of 28 days, the training set
data cannot refer to 14 day studies. A model predicting biodegra-
dation half-life (typically measured in simulation tests) should also
provide the structure of the degradation products to replace the
standard test.
4 Conclusions
References
1. Chemicals Legislation–European Commission. 5. European Chemicals Agency (2016) Practical
https://ec.europa.eu/growth/sectors/chemi- Guide How to use and report (Q)SARs.
cals/legislation_en. Accessed 29 Nov 2017 https://echa.europa.eu/documents/10162/
2. Regulation (EC) No 1907/2006 of the 13655/pg_report_qsars_en.pdf/407dff11-
European Parliament and of the Council of 18 aa4a-4eef-a1ce-9300f8460099. Accessed 29
December 2006 concerning the Registration, Nov 2017
Evaluation, Authorisation and Restriction of 6. European Chemicals Agency (2008) Guidance
Chemicals–REACH Legislation on ECHA’s on information requirements and chemical
website. https://echa.europa.eu/regulations/ safety assessment. Chapter R.6: QSARs and
reach/legislation. Accessed 29 Nov 2017 grouping of chemicals. https://echa.europa.
3. European Chemicals Agency The use of alter- eu/documents/10162/13632/information_
natives to testing on animals for the REACH requirements_r6_en.pdf. Accessed 29 Nov
Regulation. https://echa.europa.eu/docu- 2017
ments/10162/13639/alternatives_test_ani- 7. Dimitrov SD, Georgieva DG, Pavlov TS et al
mals_2017_en.pdf. Accessed 29 Nov 2017 (2015) UVCB substances: methodology for
4. European Chemicals Agency (2017) Non- structural description and application to fate
animal approaches - regulatory applicability of and hazard assessment. Environ Toxicol Chem
non-animal approaches under the REACH, 34:2450–2462. https://doi.org/10.1002/
CLP and Biocidal Products Regulations. etc.3100
https://echa.europa.eu/documents/ 8. OECD Principles for the Validation, for
10162/22931011/non_animal_approcches_ Regulatory Purposes, of (Quantitative)
en.pdf/87ebb68f-2038-f597-fc33- Structure-Activity Relationship Models. https://
f4003e9e7d7d. Accessed 2 Dec 2017 www.oecd.org/chemicalsafety/risk-assessment/
37849783.pdf. Accessed 29 Nov 2017
Part II
Abstract
Various methods of machine learning, supervised and unsupervised, linear and nonlinear, classification and
regression, in combination with various types of molecular descriptors, both “handcrafted” and “data-
driven,” are considered in the context of their use in computational toxicology. The use of multiple linear
regression, variants of naïve Bayes classifier, k-nearest neighbors, support vector machine, decision trees,
ensemble learning, random forest, several types of neural networks, and deep learning is the focus of atten-
tion of this review. The role of fragment descriptors, graph mining, and graph kernels is highlighted. The
application of unsupervised methods, such as Kohonen’s self-organizing maps and related approaches,
which allow for combining predictions with data analysis and visualization, is also considered. The neces-
sity of applying a wide range of machine learning methods in computational toxicology is underlined.
Key words Computational toxicology, Machine learning, Support vector machines, Random forest,
Neural networks, Deep learning
1 Introduction
Orazio Nicolotti (ed.), Computational Toxicology: Methods and Protocols, Methods in Molecular Biology, vol. 1800,
https://doi.org/10.1007/978-1-4939-7899-1_5, © Springer Science+Business Media, LLC, part of Springer Nature 2018
119
120 Igor I. Baskin
2 Methods
2.1 Molecular The first stage in the construction of any structure–property model
Descriptors is the selection and calculation of molecular descriptors, i.e., numer-
ical characteristics used to represent chemical structures [14]. The
most commonly used descriptors in computational toxicology are
various modifications of fragment descriptors [15–17], which indi-
cate the presence of certain fragments (substructures) in molecular
structures. The use of fragment descriptors allows for a natural way
for chemists to describe structure–property/activity relationships in
terms of the presence or absence of certain structural fragments.
This, in turn, makes it possible to interpret the structure–property/
activity models visually by tinting atoms with colors indicating their
contribution to activity [18]. Of the other types of descriptors often
used in computational toxicology, one should mention physical-
chemical and quantum-chemical. So the physicochemical descriptor
logP evaluates the lipophilicity of the molecule, which is very impor-
tant for assessing the fate of xenobiotics in different tissues, whereas
the quantum-chemical descriptor ELUMO, the energy of the lowest
unoccupied molecular orbital, assesses the ease of metabolic activa-
tion of xenobiotics through reduction reactions with antioxidants.
Electronic descriptors representing local atomic properties (partial
charges, residual electronegativity, effective polarizability, etc.),
connectivity descriptors (topological indices), and shape descriptors
(molecular volume, surface areas, etc.) are also actively used for
building QSAR models for toxicity endpoints.
2.2 Machine Most of machine learning methods belong to a very wide category
Learning Methods of supervised methods, which can build models by analyzing the
training sets containing the structures of chemical compounds rep-
resented by descriptors and information about their physicochemi-
cal properties or biological activities [12]. Such models allow
predicting the properties/activities for any new molecule given a
set of descriptors computed for it. Classes of activity (toxicity) are
predicted with classification models, whereas regression models are
used for predicting numeric property/activity values.
Machine Learning Methods in Computational Toxicology 121
2.2.1 Supervised Linear Multiple linear regression. The simplest and still very popular super-
Methods vised linear method is multiple linear regression (MLR) [19],
which produces linear models of the following form:
Activity = w0 + w1 ´ x 1 + w2 ´ x 2 + ¼, (1)
where Activity is the activity value, xi is the value of the i-th descrip-
tor, wi is the corresponding regression coefficient, w0 is the con-
stant (offset) term. Regression coefficients in such models can be
interpreted as contributions of the corresponding descriptors to
the value of activity. A lot of linear models of this type were con-
structed for various endpoints in toxicology. A typical example is
the following model reported in paper [20] for predicting loga-
rithm of his+ revertant number, ln(Nhis+), in the Ames Salmonella
histidine revertant assay for substituted polycyclic compounds con-
taining biphenyl substructure:
ln (N his + ) = -0.52 - 1.77 ´ E LUMO - 0.22 ´ log P + 0.90N para nitro , (2)
where ELUMO is the energy of the lowest unoccupied molecular
orbital (quantum-chemical descriptor), logP is the lipophilicity
(physicochemical descriptor), Npara-nitro is the number of para-nitro
groups in biphenyls.
Multiple linear regression equations of type (1) are used for
predicting continuous toxicity metrics, such as LD50, LC50, and
EC50. On the other hand, for dichotomous toxicity metrics (active
or inactive, 1 or 0), e.g., for mutagenicity, carcinogenicity, embryo-
toxicity, teratogenicity, and biodegradability, either eq. (3) or (4) is
used. They can be obtained using machine learning methods for
classification, such as linear discriminant analysis.
PActive = s (w0 + w1 ´ x 1 + w2 ´ x 2 + ¼) ,
(3)
s ( y ) = 1 / (1 + exp ( - y ) )
2.2.3 Artificial Neural Artificial neural networks (ANNs) are a broad category of machine
Networks and Deep learning methods, based on a simplified simulation of the opera-
Learning tion of human brain cells called neurons [95, 96]. There are three
types of neurons: (1) input neurons receiving input signals from
outside, (2) output neurons that form output signals, and (3) hid-
den neurons serving for intermediate computations. In structure–
property/activity modeling, input neurons correspond to
molecular descriptors, while output neurons correspond to the
predicted properties/activities. After training by adjusting the
weights of interneural connections, the ANN are able to predict
the values of the properties/activities of chemical compounds rep-
resented by molecular descriptors. Since the beginning of the 90s
of the last century, ANNs are actively used in all areas of structure–
property/activity modeling, see comprehensive reviews [97–99].
Backpropagation neural networks. The most widely used type
(architecture) of ANNs is multilayer feed-forward neural networks,
also known as backpropagation neural networks (BPNNs). In
BPNNs, all neurons are organized into three layers, and informa-
tion flow proceeds from the (first) layer of input neurons to the
(second) layer of hidden neurons, and from there—to the (third)
layer of output neurons. First applications of BPNNs in computa-
Machine Learning Methods in Computational Toxicology 129
3 Conclusions and Outlook
References
1. Barratt MD, Rodford RA (2001) The com- 10. Knudsen T, Martin M, Chandler K,
putational prediction of toxicity. Curr Opin Kleinstreuer N, Judson R, Sipes N (2013)
Chem Biol 5:383–388 Predictive models and computational toxicol-
2. Kavlock RJ, Ankley G, Blancato J, Breen M, ogy. In: Barrow PC (ed) Teratogenicity test-
Conolly R, Dix D, Houck K, Hubal E, Judson ing: methods and protocols. Humana Press,
R, Rabinowitz J, Richard A, Setzer RW, Shah Totowa, NJ, pp 343–374. https://doi.
I, Villeneuve D, Weber E (2008) org/10.1007/978-1-62703-131-8_26
Computational toxicology—a state of the sci- 11. Ekins S (2014) Progress in computational
ence mini review. Toxicol Sci 103:14–27 toxicology. J Pharmacol Toxicol Methods
3. Muster W, Breidenbach A, Fischer H, 69:115–140
Kirchner S, Müller L, Pähler A (2008) 12. Varnek A, Baskin I (2012) Machine learning
Computational toxicology in drug develop- methods for property prediction in chemoin-
ment. Drug Discov Today 13:303–310 formatics: quo vadis? J Chem Inf Mod
4. Valerio LG (2009) In silico toxicology for the 52:1413–1437
pharmaceutical sciences. Toxicol Appl 13. Cherkasov A, Muratov EN, Fourches D,
Pharmacol 241:356–370 Varnek A, Baskin II, Cronin M, Dearden J,
5. Nigsch F, Macaluso NJM, Mitchell JBO, Gramatica P, Martin YC, Todeschini R,
Zmuidinavicius D (2009) Computational Consonni V, Kuz'min VE, Cramer R, Benigni
toxicology: an overview of the sources of data R, Yang C, Rathman J, Terfloth L, Gasteiger
and of modelling methods. Expert Opin J, Richard A, Tropsha A (2015) QSAR mod-
Drug Metab Toxicol 5:1–14 eling: where have you been? Where are you
6. Merlot C (2010) Computational toxicol- going to? J Med Chem 57:4977–5010
ogy—a tool for early safety evaluation. Drug 14. Todeschini R, Consonni V (2009) Molecular
Discov Today 15:16–22 descriptors for chemoinformatics. In:
7. Raunio H (2011) In silico toxicology – non- Methods and principles in medicinal chemis-
testing methods. Front Pharmacol 2:33 try, vol 41. Wiley-VCH, Weinheim
8. Sun HM, Xia MH, Austin CP, Huang RL 15. Baskin I, Varnek A (2008) Fragment descrip-
(2012) Paradigm shift in toxicity testing and tors in SAR/QSAR/QSPR studies, molecular
modeling. AAPS J 14:473–480 similarity analysis and in virtual screening. In:
Varnek A, Tropsha A (eds) Chemoinformatics
9. Reisfeld B, Mayeno AN (2012) What is com- approaches to virtual screening. RSC
putational toxicology? In: Reisfeld B, Mayeno Publisher, Cambridge, pp 1–43
AN (eds) Computational toxicology, vol
Volume I. Humana Press, Totowa, NJ, 16. Baskin I, Varnek A (2008) Building a chemi-
pp 3–7 cal space based on fragment descriptors.
134 Igor I. Baskin
Comb Chem High Throughput Screen genicity using the MCASE QSAR expert sys-
11:661–668 tem. SAR QSAR Environ Res 14:165–180
17. Varnek A, Fourches D, Hoonakker F, Solov’ev 29. Klopman G, Chakravarti SK, Zhu H, Ivanov
V (2005) Substructural fragments: an univer- JM, Saiakhov RD (2004) ESP: a method to
sal language to encode reactions, molecular predict toxicity and pharmacological proper-
and supramolecular structures. J Comput ties of chemicals using multiple MCASE data-
Aided Mol Des 19:693–703 bases. J Chem Inf Comput Sci 44:704–715
18. Marcou G, Horvath D, Solov'ev V, Arrault A, 30. Klopman G, Ivanov J, Saiakhov R, Chakravarti
Vayer P, Varnek A (2012) Interpretability of S (2005) MC4PC–an artificial intelligence
SAR/QSAR models of any complexity by approach to the discovery of structure toxic
atomic contributions. Mol Inform activity relationships (STAR). In: Helma C
31:639–642 (ed) Predictive toxicology. CRC Press, Boca
19. Draper NR, Smith H (1998) Applied regres- Raton, pp 423–457
sion analysis, 3rd edn. John Wiley, 31. Carhart RE, Smith DH, Venkataraghavan R
New York (1985) Atom pairs as molecular features in
20. Lyubimova IK, Abilev SK, Gal'berstam NM, structure-activity studies: definition and appli-
Baskin II, Palyulin VA, Zefirov NS (2001) cations. J Chem Inf Comput Sci 2:64–73
Computer-aided prediction of the mutagenic 32. Xiao Y, Qiao Y, Zhang J, Lin S, Zhang W
activity of substituted polycyclic compounds. (1997) A method for substructure search by
Biol Bull 28:139–145 atom-centered multilayer code. J Chem Inf
21. Enslein K, Gombar VK, Blake BW (1994) Comput Sci 37:701–704
Use of SAR in computer-assisted prediction 33. Glen RC, Bender A, Arnby CH, Carlsson L,
of carcinogenicity and mutagenicity of chemi- Boyer S, Smith J (2006) Circular fingerprints:
cals by the TOPKAT program. Mutat Res flexible molecular descriptors with applica-
305:47–61 tions from physical chemistry to
22. Klopman G (1984) Artificial intelligence ADME. IDrugs 9:199–204
approach to structure-activity studies. 34. Filimonov D, Poroikov V, Borodina Y,
Computer automated structure evaluation of Gloriozova T (1999) Chemical similarity
biological activity of organic molecules. J Am assessment through multilevel neighborhoods
Chem Soc 106:7315–7321 of atoms: definition and comparison with the
23. Rosenkranz HS, Klopman G (1988) CASE, other descriptors. J Chem Inf Comput Sci
the computer-automated structure evaluation 39:666–670
system, as an alternative to extensive animal 35. Hassan M, Brown RD, Varma-O'Brien S,
testing. Toxicol Ind Health 4:533–540 Rogers D (2006) Cheminformatics analysis
24. Klopman G (1992) MULTICASE. 1. A hier- and learning in a data pipelining environ-
archical computer automated structure evalu- ment. Mol Divers 10(3):283–299
ation program. Quant Struct-Act Relat 36. Metz JT, Huth JR, Hajduk PJ (2007)
11(2):176–184. https://doi.org/10.1002/ Enhancement of chemical rules for predicting
qsar.19920110208 compound reactivity towards protein thiol
25. Klopman G (1998) The MultiCASE program groups. J Comput Aided Mol Des
II. Baseline activity identification algorithm 21:139–144
(BAIA). J Chem Inf Comput Sci 38:78–81 37. Langdon SR, Mulgrew J, Paolini GV, van
26. Klopman G (1996) The META-CASETOX Hoorn WP (2010) Predicting cytotoxicity
system. In: Puijnenburg WJGM, Damborsky from heterogeneous data sources with
J (eds) Biodegradability prediction. Springer, Bayesian learning. J Cheminform 2:11
Berlin, pp 27–40 38. Xia X, Maliski EG, Gallant P, Rogers D
27. Matthews EJ, Contrera JF (1998) A new (2004) Classification of kinase inhibitors
highly specific method for predicting the car- using a Bayesian model. J Med Chem
cinogenic potential of pharmaceuticals in 47:4463–4470
rodents using enhanced MCASE QSAR-ES 39. Liew CY, Lim YC, Yap CW (2011) Mixed
software. Regul Toxicol Pharmacol learning algorithms and features ensemble in
28:242–264 hepatotoxicity prediction. J Comput Aided
28. Klopman G, Chakravarti SK, Harris N, Ivanov Mol Des 25:855
J, Saiakhov RD (2003) In-silico screening of 40. Poroikov VV, Filimonov DA, Borodina YV,
high production volume chemicals for muta- Lagunin AA, Kos A (2000) Robustness of
Machine Learning Methods in Computational Toxicology 135
biological activity spectra predicting by com- 51. Rodgers AD, Zhu H, Fourches D, Rusyn I,
puter program PASS for noncongeneric sets Tropsha A (2010) Modeling liver-related
of chemical compounds. J Chem Inf Comput adverse effects of drugs using k nearest neigh-
Sci 4:1349–1355 bor quantitative structure−activity relation-
41. Lagunin AA, Dearden JC, Filimonov DA, ship method. Chem Res Toxicol
Poroikov VV (2005) Computer-aided rodent 23:724–732
carcinogenicity prediction. Mutat Res 52. Vapnik V (1998) Statistical learning theory.
586:138–146 Wiley-Interscience, New York
42. Borodina Y, Sadym A, Filimonov D, Blinova 53. Vapnik VN (1995) The nature of statistical
V, Dmitriev A, Poroikov V (2003) Predicting learning theory. Springer, Berlin
biotransformation potential from molecular 54. Cortes C, Vapnik V (1995) Support-vector
structure. J Chem Inf Comput Sci networks. Mach Learn 20:273–297
43:1636–1646 55. Czermiński R, Yasri A, Hartsough D (2001)
43. Borodina Y, Rudik A, Filimonov D, Use of support vector machine in pattern clas-
Kharchevnikova N, Dmitriev A, Blinova V, sification: application to QSAR studies. Mol
Poroikov V (2004) A new statistical approach Inform 20:227–240
to predicting aromatic hydroxylation sites. 56. Khandelwal A, Krasowski MD, Reschly EJ,
Comparison with model-based approaches. Sinz MW, Swaan PW, Ekins S (2008) Machine
J Chem Inf Comput Sci 44:1998–2009 learning methods and docking for predicting
44. Rudik AV, Dmitriev AV, Lagunin AA, human pregnane X receptor activation. Chem
Filimonov DA, Poroikov VV (2014) Res Toxicol 21:1457–1467
Metabolism site prediction based on xenobi- 57. Fourches D, Barnes JC, Day NC, Bradley P,
otic structural formulas and PASS prediction Reed JZ, Tropsha A (2010) Cheminformatics
algorithm. J Chem Inf Mod 54:498–507 analysis of assertions mined from literature
45. Rudik A, Dmitriev A, Lagunin A, Filimonov that describe drug-induced liver injury in dif-
D, Poroikov V (2015) SOMP: web server for ferent species. Chem Res Toxicol
in silico prediction of sites of metabolism for 23:171–183
drug-like compounds. Bioinformatics 58. Artemenko NV, Baskin II, Palyulin VA,
31:2046–2048 Zefirov NS (2001) Prediction of physical
46. Rudik AV, Dmitriev AV, Lagunin AA, properties of organic compounds using
Filimonov DA, Poroikov VV (2016) artificial neural networks within the sub-
Prediction of reacting atoms for the major structure approach. Dokl Chem
biotransformation reactions of organic xeno- 381:317–320
biotics. J Cheminf 8:68 59. Artemenko NV, Baskin II, Palyulin VA,
47. Rudik AV, Bezhentsev VM, Dmitriev AV, Zefirov NS (2003) Artificial neural network
Druzhilovskiy DS, Lagunin AA, Filimonov and fragmental approach in prediction of
DA, Poroikov VV (2017) MetaTox: web physicochemical properties of organic com-
application for predicting structure and toxic- pounds. Russ Chem Bull 52:20–29
ity of xenobiotics’ metabolites. J Chem Inf 60. Zhokhova NI, Baskin II, Palyulin VA, Zefirov
Mod 57:638–642 AN, Zefirov NS (2007) Fragmental descrip-
48. Saigo H, Tsuda K (2010) Graph mining in tors with labeled atoms and their application
chemoinformatics. In: Lodhi H, Yamanishi Y in QSAR/QSPR studies. Dokl Chem
(eds) Chemoinformatics and advanced 417:282–284
machine learning perspectives: complex com- 61. Sushko I, Novotarskyi S, Korner R, Pandey
putational methods and collaborative tech- AK, Cherkasov A, Li J, Gramatica P, Hansen
niques. IGI Global, Hershey, PA, pp 95–128 K, Schroeter T, Muller KR, Xi L, Liu H, Yao
49. Saigo H, Kadowaki T, Tsuda K (2006) A lin- X, Oberg T, Hormozdiari F, Dao P, Sahinalp
ear programming approach for molecular C, Todeschini R, Polishchuk P, Artemenko A,
QSAR analysis. Paper presented at the Kuz'min V, Martin TM, Young DM, Fourches
International Workshop on Mining and D, Muratov E, Tropsha A, Baskin I, Horvath
Learning with Graphs 2006, Berlin D, Marcou G, Muller C, Varnek A,
50. Zheng W, Tropsha A (2000) Novel variable Prokopenko VV, Tetko IV (2010)
selection quantitative structure-property rela- Applicability domains for classification prob-
tionship approach based on the k-nearest- lems: benchmarking of distance to models for
neighbor principle. J Chem Inf Comput Sci Ames mutagenicity set. J Chem Inf Model
40:185–194 50:2094–2111
136 Igor I. Baskin
62. Ralaivola L, Swamidass SJ, Saigo H, Baldi P Draize rabbit eye test: the use of QSARs and
(2005) Graph kernels for chemical informat- in vitro tests for the classification of eye irrita-
ics. Neural Netw 18:1093–1110 tion. Altern Lab Anim 33:215–237
63. Rupp M, Schneider G (2010) Graph kernels 76. Benigni R, Bossa C (2008) Predictivity and
for molecular similarity. Mol Inform reliability of QSAR models: the case of muta-
29:266–273 gens and carcinogens. Toxicol Mech Methods
64. Kashima H, Tsuda K, Inokuchi A (2003) 18:137–147
Marginalized kernels between labeled graphs. 77. Goldberg DE (1989) Genetic algorithms in
In: Proceedings, twentieth international con- search, optimization, and machine learning.
ference on machine learning, vol 1. AAAI Addison-Wesley Professional, New York
Press, Washington D.C., pp 321–328 78. DeLisle RK, Dixon SL (2004) Induction of
65. Menchetti S, Costa F, Frasconi P 2005 decision trees via evolutionary programming.
Weighted decomposition kernels. In: J Chem Inf Comput Sci 44:862–870
Proceedings of the 22nd international confer- 79. Dietterichl TG (2002) Ensemble learning. In:
ence on Machine learning. ACM, Arbib M (ed) The handbook of brain theory
pp. 585–592 and neural networks. MIT Press, Cambridge,
66. Swamidass SJ, Chen J, Phung P, Ralaivola L, pp 405–408
Baldi P (2005) Kernels for small molecules 80. Svetnik V, Wang T, Tong C, Liaw A, Sheridan
and the prediction of mutagenicity, toxicity RP, Song Q (2005) Boosting: an ensemble
and anti-cancer activity. Bioinformatics learning tool for compound classification and
21:I359–I368 QSAR modeling. J Chem Inf Mod
67. Mahé P, Ueda N, Akutsu T, Perret J-L, Vert 45:786–799
J-P (2005) Graph kernels for molecular struc- 81. Baskin II, Marcou G, Horvath D, Varnek A
ture-activity relationship analysis with support (2017) Bagging and boosting of classification
vector machines. J Chem Inf Mod models. In: Tutorials in chemoinformatics.
45:939–951 John Wiley & Sons, Ltd, Hoboken,
68. Breiman L, Friedman J, Stone CJ, Olshen RA pp 241–247
(1984) Classification and regression trees. 82. Baskin II, Marcou G, Horvath D, Varnek A
Chapman & Hall/CRC, Wadsworth, (2017) Bagging and boosting of regression
California models. In: Tutorials in chemoinformatics.
69. Cheng A, Dixon SL (2003) In silico models John Wiley & Sons, Ltd, Hoboken,
for the prediction of dose-dependent human pp 249–255
hepatotoxicity. J Comput Aided Mol Des 83. Baskin II, Marcou G, Horvath D, Varnek A
17:811–823 (2017) Random subspaces and random for-
70. Susnow RG, Dixon SL (2003) Use of robust est. In: Tutorials in chemoinformatics.
classification techniques for the prediction of John Wiley & Sons, Ltd, Hoboken,
human cytochrome P450 2D6 inhibition. pp 263–269
J Chem Inf Comput Sci 43:1308–1315 84. Baskin II, Marcou G, Horvath D, Varnek A
71. Feng J, Lurati L, Ouyang H, Robinson T, (2017) Stacking. In: Tutorials in chemoinfor-
Wang Y, Yuan S, Young SS (2003) Predictive matics. John Wiley & Sons, Ltd, Hoboken,
toxicology: benchmarking molecular descrip- pp 271–278
tors and statistical methods. J Chem Inf 85. Breiman L (1996) Bagging predictors. Mach
Comput Sci 43:1463–1470 Learn 24:123–140
72. Cramer GM, Ford RA, Hall RL (1976) 86. Ho TK (1998) The random subspace method
Estimation of toxic hazard—a decision tree for constructing decision forests. IEEE Trans
approach. Food Cosmet Toxicol 16:255–276 Pattern Anal 20:832–844
73. Verhaar HJM, van Leeuwen CJ, Hermens 87. Friedman JH (2002) Stochastic gradient
JLM (1992) Classifying environmental pol- boosting. Comput Stat Data Anal
lutants. Chemosphere 25:471–491 38:367–378
74. Walker JD, Gerner I, Hulzebos E, Schlegel K 88. Breiman L (1996) Stacked regressions. Mach
(2005) The skin irritation corrosion rules Learn 24:49–64
estimation tool (SICRET). QSAR Comb Sci 89. Breiman L (2001) Random forests. Mach
24:378–384 Learn 45:5–32
75. Gerner I, Liebsch M, Spielmann H (2005) 90. Svetnik V, Liaw A, Tong C, Culberson JC,
Assessment of the eye irritating properties of Sheridan RP, Feuston BP (2003) Random
chemicals by applying alternatives to the forest: a classification and regression tool for
Machine Learning Methods in Computational Toxicology 137
compound classification and QSAR model- 103. Molnar L, Keseru GM, Papp A, Lorincz Z,
ing. J Chem Inf Comput Sci Ambrus G, Darvas F (2006) A neural net-
43:1947–1958 work based classification scheme for cytotox-
91. Li S, Fedorowicz A, Singh H, Soderholm SC icity predictions: validation on 30,000
(2005) Application of the random forest compounds. Bioorg Med Chem Lett
method in studies of local lymph node assay 16(4):1037–1039
based skin sensitization data. J Chem Inf Mod 104. Hatrik S, Zahradnik P (1996) Neural network
45:952–964 approach to the prediction of the toxicity of
92. Zhang Q-Y, Aires-de-Sousa J (2007) Random benzothiazolium salts from molecular struc-
forest prediction of mutagenicity from ture. J Chem Inf Comput Sci 36:992–995
empirical
physicochemical descriptors. 105. Zakarya D, Larfaoui EM, Boulaamail A,
J Chem Inf Mod 47:1–8 Lakhlifi T (1996) Analysis of structure-
93. Polishchuk PG, Muratov EN, Artemenko toxicity relationships for a series of amide her-
AG, Kolumbin OG, Muratov NN, bicides using statistical methods and neural
Kuz'min VE (2009) Application of ran- network. SAR QSAR Environ Res
dom forest approach to QSAR prediction 5:269–279
of aquatic toxicity. J Chem Inf Model 106. Eldred DV, Jurs PC (1999) Prediction of
49:2481–2488 acute mammalian toxicity of organophospho-
94. Vasanthanathan P, Taboureau O, Oostenbrink rus pesticide compounds from molecular
C, Vermeulen NPE, Olsen L, Jorgensen FS structure. SAR QSAR Environ Res 10:75–99
(2009) Classification of cytochrome P450 107. Devillers J, Flatin J (2000) A general QSAR
1A2 inhibitors and noninhibitors by machine model for predicting the acute toxicity of pes-
learning techniques. Drug Metab Dispos ticides to Oncorhynchus mykiss. SAR QSAR
37:658–664 Environ Res 1:25–43
95. Rumelhart DE, McClelland JL (1986) 108. Devillers J (2001) A general QSAR model for
Parallel distributed processing, vol 1,2. MIT predicting the acute toxicity of pesticides to
Press, Cambridge, MA Lepomis macrochirus. SAR QSAR Environ
96. Gasteiger J, Zupan J (1993) Neural networks Res 11:397–417
in chemistry. Angew Chem Int Ed Engl 109. Devillers J, Pham-Delegue MH, Decourtye
105:503–527 A, Budzinski H, Cluzeau S, Maurin G (2002)
97. Halberstam NM, Baskin II, Palyulin VA, Structure-toxicity modeling of pesticides to
Zefirov NS (2003) Neural networks as a honey bees. SAR QSAR Environ Res
method for elucidating structure-property 13:641–648
relationships for organic compounds. Russ 110. Kaiser KLE (2003) The use of neural net-
Chem Rev 72:629–649 works in QSARs for acute aquatic toxicologi-
98. Baskin II, Palyulin VA, Zefirov NS (2008) cal endpoints. J Mol Struct (THEOCHEM)
Neural networks in building QSAR models. 622:85–95
Methods Mol Biol 458:137–158 111. Zakarya D, Boulaamail A, Larfaoui EM,
99. Baskin II, Winkler D, Tetko IV (2016) A Lakhlifi T (1997) QSARs for toxicity of
renaissance of neural networks in drug discov- DDT-type analogs using neural network. SAR
ery. Expert Opin Drug Discovery QSAR Environ Res 6:183–203
11:785–795 112. Eldred DV, Weikel CL, Jurs PC, Kaiser KLE
100. Villemin D, Cherqaoui D, Mesbah A (1994) (1999) Prediction of fathead minnow acute
Predicting carcinogenicity of polycyclic aro- toxicity of organic compounds from molecu-
matic hydrocarbons from back-propagation lar structure. Chem Res Toxicol 12:670–678
neural network. J Chem Inf Comput Sci 113. Martin TM, Young DM (2001) Prediction of
34:1288–1293 the acute toxicity (96-h LC50) of organic
101. Xu L, Ball JW, Dixon SL, Jurs PC (1994) compounds to the fathead minnow
Quantitative structure-activity relationships (Pimephales promelas) using a group contri-
for toxicity of phenols using regression analy- bution method. Chem Res Toxicol
sis and computational neural networks. 14:1378–1385
Environ Toxicol Chem 13:841–851 114. Moore DRJ, Breton RL, MacDonald DB
102. Devillers J, Bintein S, Domine D, Karcher W (2003) A comparison of model performance
(1995) A general QSAR model for predicting for six quantitative structure-activity rela-
the toxicity of organic chemicals to lumines- tionship packages that predict acute toxicity
cent bacteria (Microtox test). SAR QSAR to fish. Environ Toxicol Chem
Environ Res 4:29–38 22:1799–1809
138 Igor I. Baskin
115.
Garg A, Bhat KL, Bock CW (2002) 128. Novotarskyi S, Abdelaziz A, Sushko Y, Körner
Mutagenicity of aminoazobenzene dyes and R, Vogt J, Tetko IV (2016) ToxCast EPA
related structures: a QSAR/QPAR investiga- in vitro to in vivo challenge: insight into the
tion. Dyes Pigments 55:35–52 rank-I model. Chem Res Toxicol
116. Shoji R (2005) The potential performance of 29:768–775
artificial neural networks in QSTRs for pre- 129. Abdelaziz A, Spahn-Langguth H, Schramm
dicting ecotoxicity of environmental pollut- K-W, Tetko IV (2016) Consensus modeling
ants. Curr Comput Aided Drug Des for HTS assays using in silico descriptors cal-
1:65–72 culates the best balanced accuracy in Tox21
117. Dearden JC, Rowe PH (2015) Use of artifi- challenge. Front Environ Sci 4. https://doi.
cial neural networks in the QSAR prediction org/10.3389/fenvs.2016.00002
of physicochemical properties and toxicities 130. Sushko I, Novotarskyi S, Körner R, Pandey
for REACH legislation. Methods Mol Biol AK, Rupp M, Teetz W, Brandmaier S,
1260:65–88 Abdelaziz A, Prokopenko VV, Tanchuk VY,
118. Tetko IV, Livingstone DJ, Luik AI (1995) Todeschini R, Varnek A, Marcou G, Ertl P,
Neural network studies. 1. Comparison of Potemkin V, Grishina M, Gasteiger J,
overfitting and overtraining. J Chem Inf Schwab C, Baskin II, Palyulin VA,
Comput Sci 35:826–833 Radchenko EV, Welsh WJ, Kholodovych V,
119. Tikhonov AN, Arsenin VA (1977) Solution of Chekmarev D, Cherkasov A, Aires-De-
ill-posed problems. Winston & Sons, Sousa J, Zhang QY, Bender A, Nigsch F,
Washington Patiny L, Williams A, Tkachenko V, Tetko
IV (2011) Online chemical modeling envi-
120. Winkler DA, Burden FR (2004) Bayesian
ronment (OCHEM): web platform for data
neural nets for modeling in drug discovery. storage, model development and publishing
Drug Discov Today: BIOSILICO of chemical information. J Comput Aided
2:104–111 Mol Des 25:533–554
121. Burden F, Winkler D (2008) Bayesian regu- 131. LeCun Y, Bengio Y, Hinton G (2015) Deep
larization of neural networks. Methods Mol learning. Nature 521:436–444
Biol 458:25–44
132. Bengio Y (2009) Learning deep architectures
122. Burden FR, Ford MG, Whitley DC, Winkler for AI. Found Trends Mach Learn 2:1–127
DA (2000) Use of automatic relevance deter-
mination in QSAR studies using Bayesian 1
33. Gawehn E, Hiss JA, Schneider G (2016)
neural networks. J Chem Inf Comput Sci Deep learning in drug discovery. Mol Inform
40:1423–1430 35:3–14
123. Burden FR, Winkler DA (2000) A quantita- 134. Goh GB, Hodas NO, Vishnu A (2017) Deep
tive structure-activity relationships model for learning for computational chemistry. J Comp
the acute toxicity of substituted benzenes to Chem 38:1291–1307
Tetrahymena pyriformis using Bayesian- 1
35. Ekins S (2016) The next era: deep learning in
regularized neural networks. Chem Res pharmaceutical research. Pharm Res
Toxicol 13:436–440 33:2594–2603
124.
Cronin MTD, Schultz TW (2001) 136. Mayr A, Klambauer G, Unterthiner T,
Development of quantitative structure- Hochreiter S (2016) DeepTox: toxicity pre-
activity relationships for the toxicity of aro- diction using deep learning. Front Environ
matic compounds to tetrahymena pyriformis: Sci 3:80
comparative assessment of the methodolo- 137. Bengio Y, Courville A, Vincent P (2013)
gies. Chem Res Toxicol 14:1284–1295 Representation learning: a review and new
125. Polley MJ, Burden FR, Winkler DA (2005) perspectives. Pattern Anal Mach Intell IEEE
Predictive human intestinal absorption QSAR Trans 35:1798–1828
models using Bayesian regularized neural net- 138. Kohonen T (2001) Self-organizing maps.
works. Aust J Chem 58:859–863 Springer, Berlin Heidelberg
126. Epa VC, Burden FR, Tassa C, Weissleder R, 139. Anzali S, Barnickel G, Krug M, Sadowski J,
Shaw S, Winkler DA (2012) Modeling bio- Wagener M, Gasteiger J, Polanski J (1996)
logical activities of nanoparticles. Nano Lett The comparison of geometric and electronic
12:5808–5812 properties of molecular surfaces by neural
127. Tetko IV (2002) Neural network studies. 4. networks: application to the analysis of
Introduction to associative neural networks. corticosteroid-binding globulin activity of
J Chem Inf Comput Sci 42:717–728 steroids. J Comput Aided Mol Des
10:521–534
Machine Learning Methods in Computational Toxicology 139
140. Hecht-Nielsen R (1987) Counterpropagation 147. Gaspar HA, Baskin II, Marcou G, Horvath
networks. Appl Opt 26:4979–4984 D, Varnek A (2015) GTM-based QSAR mod-
141. Vracko M (1997) A study of structure-
els and their applicability domains. Mol
carcinogenic potency relationship with artifi- Inform 34:348–356
cial neural networks. The using of descriptors 148. Gaspar HA, Baskin II, Marcou G, Horvath
related to geometrical and electronic struc- D, Varnek A (2015) Stargate GTM: bridging
tures. J Chem Inf Comput Sci descriptor and activity spaces. J Chem Inf
37:1037–1043 Model 55:2403–2410
142.
Mazzatorta P, Vracko M, Jezierska A, 149. Gaspar HA, Baskin II, Varnek A (2016)
Benfenati E (2003) Modeling toxicity by Visualization of a multidimensional descrip-
using supervised Kohonen neural networks. tor space. In: Frontiers in molecular design
J Chem Inf Comput Sci 43:485–492 and chemical information science–Herman
143. Spycher S, Pellegrini E, Gasteiger J (2005) Skolnik Award Symposium 2015: Jürgen
Use of structure descriptors to discriminate Bajorath, vol 1222. ACS Symposium Series,
between modes of toxic action of phenols. vol 1222. American Chemical Society,
J Chem Inf Model 45:200–208 pp. 243–267
144. Bishop CM, Svensén M, Williams CKI (1998) 150. Gaspar HA, Sidorov P, Horvath D, Baskin II,
GTM: the generative topographic mapping. Marcou G, Varnek A (2016) Generative top-
Neural Comput 10:215–234 ographic mapping approach to chemical space
145. Kireeva N, Baskin II, Gaspar HA, Horvath D, analysis. In: Frontiers in molecular design and
Marcou G, Varnek A (2012) Generative top- chemical information science–Herman
ographic mapping (GTM): universal tool for Skolnik Award Symposium 2015: Jürgen
data visualization, structure-activity modeling Bajorath, vol 1222. ACS Symposium Series,
and dataset comparison. Mol Inform vol 1222. American Chemical Society,
31:301–312 pp. 211–241
146. Gaspar HA, Baskin II, Marcou G, Horvath 151. Kireeva N, Kuznetsov SL, Bykov AA, Tsivadze
D, Varnek A (2015) Chemical data visualiza- AY (2012) Towards in silico identification of
tion and analysis with incremental generative the human ether-a-go-go-related gene chan-
topographic mapping: big data challenge. nel blockers: discriminative vs. generative
J Chem Inf Mod 55:84–94 classification models. SAR QSAR Environ Res
24:103–117
Chapter 6
Abstract
In the context of human safety assessment through quantitative structure–activity relationship (QSAR)
modeling, the concept of applicability domain (AD) has an enormous role to play. The Organization of
Economic Co-operation and Development (OECD) for QSAR model validation recommended as prin-
ciple 3 “A defined domain of applicability” to be present for a predictive QSAR model. The study of AD
allows estimating the uncertainty in the prediction for a particular molecule based on how similar it is to
the training compounds which are used in the model development. In the current scenario, AD represents
an active research topic, and many methods have been designed to estimate the competence of a model
and the confidence in its outcome for a given prediction task. Thus, characterization of interpolation space
is significant in defining the AD. The diverse set of reported AD methods was constructed through differ-
ent hypotheses and algorithms. These multiplicities of methodologies mystify the end users and make the
comparison of the AD for different models a complex issue to address. We have attempted to summarize
in this chapter the important concepts of AD including particulars of the available methods to compute the
AD along with their thresholds and criteria for estimating AD through training set interpolation in the
descriptor space. The idea about transparent domain and decision domain are also discussed. To help read-
ers determine the AD in their projects, practical examples together with available open source software
tools are provided.
1 Introduction
Orazio Nicolotti (ed.), Computational Toxicology: Methods and Protocols, Methods in Molecular Biology, vol. 1800,
https://doi.org/10.1007/978-1-4939-7899-1_6, © Springer Science+Business Media, LLC, part of Springer Nature 2018
141
142 Supratik Kar et al.
the user about the quantity, quality and relevance of the information
available to the model to perform the prediction task. If the out-
come is good enough, the resulting prediction is expected to be
more reliable than the case when the input information is bad. The
reliability generally captures the relevance and quality of the infor-
mation available to the model for a given prediction.
Decidability: To conclude when the prediction has been supposed
to be valid and reliable, one can consider its actual outcome. The
degree of assertiveness of the conclusion that can be resultant from
the prediction will rely on the weight of confirmation that supports
this conclusion. Decidability is directly related to the error estimate
of an individual prediction and it measures how much one can
believe the conclusion derived from a prediction. It is imperative
not to confuse the term reliability with decidability. The former
defines how much one can trust the prediction itself, whereas the
latter captures how much one can trust the conclusion derived
from this prediction. The idea and steps behind decision domain is
reported in Fig. 1.
4.1 Range Based Hypothesis: Modeled descriptors are considered with a uniform dis-
Approaches in the tribution, defining an n-dimensional hyperrectangle developed on
Descriptor Space the basis of the maximum and minimum values of each descriptor
with sides parallel to the coordinate axes.
4.1.1 Bounding Box
Threshold: As suggested, this is based on the highest and lowest
values of X variables (descriptors of the QSAR model) and Y vari-
able (response) of the training set. Any test molecules, which are
outside of these particular ranges, are considered out of the AD
and their predictions are less reliable [15]. For the “bounding
box,” the zone of AD is the smallest axis-aligned rectangular box
encloses all the data points presented in Fig. 3.
Flaws: (a) The method encloses substantial empty space in case of
nonuniformly distributed data points, (b)As only descriptor ranges
are employed for AD space determination, empty regions in the
interpolation space cannot be identified, (c) Correlation among
descriptors cannot be considered [15].
146 Supratik Kar et al.
4.1.2 PCA Bounding Box Hypothesis: Principal components (PCs) transform the initial data
into a new orthogonal coordinate system by the rotation of axes
and facilitate to correct for correlations among descriptors. Newly
formed axes are defined as PCs presenting the maximum variance
of the total dataset. The points between the lowest and highest
value of each PC defined by M-dimensional (M is the number of
significant components) hyper-rectangle with sides parallel to the
PCs [18, 19].
Threshold: The AD in PC space is reported in Fig. 4 where the train-
ing set is represented by the biggest circle. The predictions of
query compounds within the circle are considered reliable. Query
molecules located outside the model space would be expected to
be less reliably predicted.
4.3.1 Leverage Approach Hypothesis: The leverage (h) of a molecule in the variable space is
computed based on the HAT matrix as:
H = (X T ( X T X )−1 X ) (1)
In Eq. (1), H is a [n × n] matrix that orthogonally projects vectors
into the space spanned by the columns of X [3, 10]. The AD space
of the model is defined as a squared area within the ±3 band for
standardized residuals (σ) and the leverage threshold is defined as
h* = 3(p + 1)/n, where p is the number of descriptors and n is the
number of molecules. The leverage values (h) are calculated for
and plotted vs. cross-validated standardized residuals (σ) (Y-axis)
labeled as the Williams plot.
Threshold criteria: Williams plot (Fig. 7) confirms the occurrence
of response outliers and training compounds that are structurally
very influential in determining model parameters. The data
predicted for high leverage chemicals in the prediction set are
extrapolated and could be less reliable.
150 Supratik Kar et al.
4.3.2 Euclidean Distance Hypothesis: Euclidean method calculates the distance from every
other point to a particular point in the data set. A distance score,
dij, for two different compounds Xi and Xj can be measured by the
Euclidean distance norm. The Euclidean distance can be expressed
by the following equation:
k =1
( )
2
dij = ∑ x ik − x jk (2)
m
The mean distances of one sample to the residual ones are calcu-
lated as follows:
j =1
∑ dij
di = n (3)
n −1
where, i = 1,2, …, n.
The mean distances are then normalized within the interval of
zero to one. It is appropriate only for statistically independent vari-
ables [22].
Threshold criteria: The mean normalized distances are measured
for both training and test molecules. The boundary area created by
normalized mean distance scores of the training set are defined as
the zone of AD for test molecules. If a test compound resides
inside the domain covered by the training set, it suggests that this
molecule is inside the AD, otherwise not. An example of Euclidean
distance plot is reported in Fig. 8.
Applicability Domain: A Step Toward Confident Predictions and Decidability for QSAR… 151
4.3.3 Mahalanobis Hypothesis: The Mahalanobis distance method calculates the dis-
Distance tance of an observation from the mean values of the independent
variables but not considering the effect on the predicted value. It
provides one of the exclusive and straightforward approaches for
identification of outliers. Mahalanobis distance is unique because it
automatically takes into account the correlation between descrip-
tor axes [23].
Threshold criteria: Observations values much higher than those of
the remaining ones may be considered to be outside the domain.
4.3.4 City Block Distance City-block distance is the summed difference across dimensions
and is computed from the following equation:
i =1
d ( x ,y ) = ∑ x i − y i (4)
n
It examines the absolute differences between coordinates of a pair
of objects (xi and yi) and assumes a triangular distribution. This
method is valuable for the discrete type of descriptors. It is used
only for training sets which are uniformly distributed with respect
to count-based descriptors or fragment mapping counts [22].
4.3.5 Hotelling T2 Test Hypothesis: The Hotelling T2 is a multivariate student’s t-test and
proportional to leverage and Mahalanobis distance methods. It
presumes a normal data distribution like the leverage approach and
utilized to estimate the statistical impact of the difference on the
means of two or more variables between two groups. Hotelling T2
corrects for collinear descriptors through the use of the covariance
152 Supratik Kar et al.
4.3.6 k-Nearest Hypothesis: The theory is based on similarity search for a new
Neighbors Approach chemical entity with respect to the space generated by the training
set compounds. The similarity is evaluated by taking the distance
of a query molecule from nearest training compound or its dis-
tances from k-nearest neighbors in the training set. Therefore,
similarity to the training set molecules is noteworthy for this
method to facilitate a query compound with trustworthy predic-
tion [24].
Threshold criteria: If the calculated distance values of test or query
compounds are within the user mentioned threshold set by the
training molecules, then the prediction of these molecules are reli-
able. A k-nearest neighbors plot is presented in Fig. 9.
4.3.7 DModX (Distance Hypothesis: The method was developed by Wold et al. [5] and usu-
to the Model in X-Space) ally applied for partial least squares (PLS) regression models. The
fundamental hypothesis lies in the residuals of Y and X which are of
diagnostic value for the quality of the model. As there are number
4.3.8 Tanimoto Similarity The Tanimoto index measures the similarity between two com-
pounds based on the number of common molecular fragments
[26]. To calculate the Tanimoto similarity, all unique fragments of
i =1
∑ ( x J ,i ⋅ x K ,i )
TANIMOTO ( J ,K ) = i =1
N
i =1 i =1 (5)
∑ ( x J ,i ⋅ x J ,i ) + ∑ ( x K ,i .x K ,i ) − ∑ ( x J ,i .x K ,i )
N N N
4.3.9 Standard Deviation Theory: The principle behind this idea is easy and it depicts that if
of the Ensemble different models give significantly different predictions for a par-
Predictions (STD) ticular molecule, then the prediction for this compound is more
likely to be unreliable. The sample standard deviation can be ide-
ally used as an estimator of model uncertainty [26].
For example, consider that Y(J) = {yi(J), i = 1 ... N} is a set of
predictions for a molecule J given by a set of N trained models, the
consequent distance to model STD can be defined as:
∑ ( yi ( J ) − y )
2
In Eq. (7), y (Ti ) and y ( J ) define the vectors of ensemble’s pre-
dictions for the training set compound Ti and the target compound
J, corr is Spearman rank correlation coefficient between the two
vectors and N is the number of compounds in the training set.
Threshold criteria: A low value of CORREL indicates that for tar-
get compound J, there is a compound T from the training set for
which predictions of the ensemble of models are highly correlated.
If a compound T has the same descriptors as the target compound
J, then predictions of models will be identical for both molecules
and thus resulted CORREL(J) will be 0. Compounds having high
correlation coefficient values are considered to be “closer to the
model” [29, 30].
(
f p = f i + (q − j ) f j +1 − f j ) (9)
4.6.1 Standardization Theory: The approach is proposed by Roy et al. [32]. According to
Technique ideal data distribution, 99.7% of the population would stay within
the range mean ± 3 standard deviation (SD). Thus, mean ± 3SD
represents the zone where majority of the training compounds
belong to. Any molecule outside this zone is different from the rest
of the compounds. Thus, after a descriptor column is standardized
based on the corresponding mean and standard deviation for the
training set compounds only, if the corresponding standardized
value for descriptor i of compound k (Ski) is more than 3, then the
compound should be a X-outlier (if in the training set) or outside
AD (if in the test set) based on descriptor i.
Algorithm and methodology:
X ki − X i
Ski = (10)
σ Xi
Applicability Domain: A Step Toward Confident Predictions and Decidability for QSAR… 157
4.6.3 Decision Trees The AD space is identified based on the consensus prediction of
and Decision Forests Decision Trees (DT) and Decision Forests (DF). The hypothesis is
to minimize the overfitting which can be attained by merging the
DTs and maintaining the differences within different DTs to maxi-
mum possible. Predictions from all the combined DTs are aver-
aged in order to find the prediction confidence for a particular
compound, while domain extrapolation offers the prediction preci-
sion for that compound outside the training space [34, 35].
4.6.4 Kernel-Based Most machine learning approaches for QSAR rely on a vectorial
Applicability Domain presentation of the compounds. Thus, the AD is expressed as a
subspace of the vector space with one dimension for each descrip-
tor used. However, this vectorial perception cannot be straight for-
wardly applied to kernel-based techniques like support vector
machines (SVM). Thus, these methods have to rely on an implicit
feature space which is only defined by the applied kernel similarity
and with unknown dimensions. Therefore the domain of applica-
bility of a kernel-based model has to be defined by means of the
kernel. This also offers us to utilize the structured similarity mea-
sures like the Optimal Assignment Kernel and its extension, instead
of a numerical encoding. The concept of kernel density estimation
has incorporated additional information in a trained model. The
added information can be achieved by employing a weighted aver-
age kernel similarity of a predicted molecule to the training data
set. The weights can be attained either by using the knowledge
contained in the learned model or by methods that describe the
feature space structure using the kernel [36].
Applicability Domain: A Step Toward Confident Predictions and Decidability for QSAR… 159
Fig. 12 Commonly employed open source software for defining AD of a QSAR model
5.1 Cheminformatics Professor Kunal Roy and his group have developed open source AD
Tools by DTC Lab software available under following links: http://teqip.jdvu.ac.in/
QSAR_Tools/ and http://dtclab.webs.com/software-tools. Users
can employ three different approaches for The AD study.
(a) AD (using standardization approach) v1.0: It is a tool to find
out test/query molecules that are outside the AD of the devel-
oped QSAR model, and it also detects outliers present in the
training set compounds by standardization technique.
(b) Euclidean AD 1.0: It is used to ensure that the compounds of
the test set are representative of the training set. It is based on
distance scores calculated by the Euclidean distance norm.
(c) AD-MDI GUI 1.2: Applicability domain-Model Disturbance
Index (AD-MDI) program is a tool to define AD of unknown
samples.
Applicability Domain: A Step Toward Confident Predictions and Decidability for QSAR… 161
5.2 QSARINS Professor Paola Gramatica and her group have developed QSARINS
software under which user can develop and validate QSAR model
along with AD study employing leverage based technique. The
software is available from http://www.qsar.it/. Along with the
leverage based AD evaluation, user can obtain Williams plot
explaining critical HAT value and zone of applicability for the
developed model.
5.3 Domain Manager The Domain Manager (Presently Version 1.06) is a software tool
developed by Laboratory of Mathematical Chemistry, University
“Prof. Assen Zlatarov” Bourgas,Bulgaria, obtainable from http://
oasis-lmc.org/products/software/domain-manager.aspx. The
tool can be implemented for automatic extraction of features from
training set compounds utilized for QSAR modeling purpose.
These features are encoded and ordered as layers in the final model
domain ordered in the following way:
Parameter ranges: It corresponds to ranges of the molecular
parameters for compounds in the training set, which is the descrip-
tor space of the training set;
Structural domain: It defines the structural similarity between
molecules that are correctly predicted by the model. The structural
neighborhood of atom-centered fragments is employed to deter-
mine this similarity;
Mechanistic domain: It combines the consistency of definite reac-
tive groups hypothesized to cause the effect and the domain of
explanatory variables determining the parametric requirements in
order for functional groups to elicit their reactivity;
Metabolism domain: It accounts the reliability of simulated metab-
olism (metabolites, pathways, and maps) if metabolic activation of
chemicals is a part of the QSAR model.
Table 1
An arbitrary example representing calculation of AD for a regression-based QSAR model employing methods like bounding box, leverage method,
Euclidean distance based method, and standardization approach
ID Y* (Observed) X1* X2* X3* Y (Calculated) ha Standardized residual Distance scoreb Mean distancec Normalized Mean distance
1 3.50 1.64 2 1.64 3.45 0.41 1.63 16.174 1.348 0.841
3 3.21 1.62 3 1.10 2.74 0.14 0.48 8.708 0.726 0.014
5 2.87 2.12 3 1.37 2.68 0.39 1.56 10.072 0.839 0.165
7 2.68 2.01 3 1.35 2.68 0.22 0.83 9.422 0.785 0.093
9 2.27 1.85 3 1.23 2.71 0.17 0.61 8.709 0.726 0.014
10 2.22 1.91 4 0.97 1.98 0.32 1.26 12.334 1.028 0.416
11 2.18 1.36 4 0.68 2.05 0.36 1.41 13.068 1.089 0.497
13 1.79 1.52 4 0.78 2.02 0.29 1.14 12.136 1.011 0.394
15 3.47 1.93 2 1.94 3.41 0.66 2.70 17.605 1.467 1
16 2.53 1.68 3 1.12 2.70 0.16 0.56 8.582 0.715 0
17 2.81 1.12 3 0.74 2.85 0.65 2.67 12.139 1.012 0.394
19 1.96 1.61 4 0.79 1.91 0.22 0.83 12.019 1.002 0.381
ID Y* (Observed) X1* X2* X3* Y (Calculated) ha Standardized residual Distance scoreb Mean distancec Normalized Mean distance
Test set
2 3.22 1.96 2 1.92 3.40 0.38 1.49 17.698 1.475 1.01
4 2.82 1.24 3 0.83 2.82 0.32 1.23 11.151 0.929 0.285
6 2.86 1.80 3 1.19 2.73 0.15 0.51 8.65 0.721 0.007
8 2.52 1.63 3 1.07 2.75 0.17 0.60 8.761 0.73 0.02
12 2.12 1.98 4 0.98 1.98 0.27 1.03 12.672 1.056 0.453
14 3.56 1.91 2 1.88 3.40 0.36 1.41 17.373 1.448 0.974
18 2.21 1.93 3 1.93 2.25 0.81 3.33 13.526 1.127 0.548
* a b
Y is the dependent variable; X1, X2, and X3 are the modeled descriptors, h is leverage value, h* is1, calculated from eq. 1, calculated from eq. 2, and ccalculated from eq. 3
Applicability Domain: A Step Toward Confident Predictions and Decidability for QSAR…
163
164 Supratik Kar et al.
Fig. 13 (a) Euclidean distance plot, (b) Williams plot, (c) Training set outliers and test set AD information from
standardization technique employing dependent and independent variable values provided in Table 1
7 Future Direction
8 Conclusion
Acknowledgments
References
1. Roy K, Kar S, Das RN (2015) Understanding formal definition. SAR QSAR Environ Res
the basics of QSAR for applications in pharma- 27:865–881
ceutical sciences and risk assessment. Academic 15. Jaworska J, Nikolova-Jeliazkova N, Aldenberg
Press, San Diego, CA, USA T (2005) QSAR applicability domain estima-
2. Roy K, Kar S (2015) Importance of applicability tion by projection of the training set descriptor
domain of QSAR models. In: Roy K (ed) space: a review. Altern Lab Anim 33:445–459
Quantitative structure-activity relationships in drug 16. Stanforth RW, Kolossov E, Mirkin B (2007) A
design, predictive toxicology, and risk assessment. measure of domain of applicability for QSAR
IGI Global, Hershey PA, USA, pp 180–211 modeling based on intelligent K-means cluster-
3. Gadaleta D, Mangiatordi GF, Catto M, Carotti ing. QSAR Comb Sci 26:837–844
A, Nicolotti O (2016) Applicability domain for 17. Guha R, Jurs PC (2005) Determining the
QSAR models: where theory meets reality. Int validity of a QSAR model-a classification
J Quant Struct Prop Relat J 1:45–63 approach. J Chem Inf Model 45:65–73
4. Mathea M, Klingspohn W, Baumann K (2016) 18. Nikolova-Jeliazkova N, Jaworska J (2005) An
Chemoinformatic classification methods and approach to determining applicability domain
their applicability domain. Mol Inform for QSAR group contribution models: an analy-
35:160–180 sis of SRC KOWWIN. Altern Lab Anim
5. Wold S, Sjostrom M, Eriksson L (2001) PLS- 33:461–470
regression: a basic tool of chemometrics. 19. Worth AP, Bassan A, Gallegos A, Netzeva TI,
Chemom Intell Lab Syst 58:109–130 Patlewicz G, Pavan M et al (2005) The charac-
6. Netzeva TI, Worth AP, Aldenberg T, Benigni terisation of (quantitative) structure- activity
R, Cronin MTD, Gramatica P et al (2005) relationships: preliminary guidance. ECB
Current status of methods for defining the Report EUR 21866 EN, European
applicability domain of (quantitative) structure- Commission, Joint Research Centre; Ispra,
activity relationships. Altern Lab Anim Italy, pp. 95
33:155–173 20. Topkat OPS (2000). U.S. Patent 6, 036, 349
7. Golbraikh A, Tropsha A (2002) Beware of q2! 21. Preparata FP, Shamos MI (1991) In: Preparata
J Mol Graph Model 20:269–276 FP, Shamos MI (eds) Computational geome-
8. OECD, Principles for the validation of (Q) try: an introduction. Springer-Verlag,
SARs (2004). http://www.oecd.org/datao- New York
ecd/33/37/37849783.pdf (Accessed 20 May, 22. Jaworska JS, Nikolova-Jeliazkova N, Aldenberg
2017) T (2004) Review of methods for applicability
9. Jaworska JS, Comber M, Auer C, Van Leeuwen domain estimation. Report, The European
CJ (2003) Summary of a workshop on regula- Commission-Joint Research Centre, Ispra,
tory acceptance of (Q)SARs for human health Italy
and environmental endpoints. Environ Health 23. Hair JF Jr, Anderson RE, Tatham RL, Black
Perspect 111:1358–1360 WC (2005) Multivariate data analysis. Pearson
10. Gramatica P (2007) Principles of QSAR mod- Education, Singapore
els validation: internal and external. QSAR 24. Sheridan R, Feuston RP, Maiorov VN, Kearsley
Comb Sci 26:694–701 S (2004) Similarity to molecules in the training
11. Weaver S, Paul Gleeson M (2008) The impor- set is a good discriminator for prediction accu-
tance of the domain of applicability in QSAR racy in QSAR. J Chem Inform Comput Sci
modeling. J Mol Graph Model 44:1912–1928
26:1315–1326 25. SIMCA-P 10.0. (2002) info@umetrics.com,
12. Roy K, Kar S, Das RN (2015) A primer on UMETRICS, Umea, Sweden, www.umetrics.
QSAR/QSPR modeling: fundamental con- com
cepts (SpringerBriefs in Molecular Science). 26. Tetko IV, Sushko I, Pandey AK, Zhu H,
Springer, Berlin Tropsha A, Papa E et al (2008) Critical assess-
13. Roy K, Kar S (2015) How to judge predictive ment of QSAR models of environmental toxic-
quality of classification and regression based ity against Tetrahymena pyriformis: focusing
QSAR models? In: Haq ZU, Madura J (eds) on applicability domain and overfitting by vari-
Frontiers of computational chemistry. able selection. J Chem Inform Comput Sci
Bentham, Sharjah, pp 71–120 48:1733–1746
14. Hanser T, Barber C, Marchaland JF, Werner S 27. Manallack DT, Tehan BG, Gancia E, Hudson
(2016) Applicability domain: towards a more BD, Ford MG, Livingstone DJ et al (2003) A
Applicability Domain: A Step Toward Confident Predictions and Decidability for QSAR… 169
consensus neural network-based technique for approach for defining the applicability domain
discriminating soluble and poorly soluble com- of SAR and QSAR models. J Chem Inform
pounds. J Chem Inform Comput Sci Model 45:839–849
43:674–679 34. Tong W, Hong H, Fang H, Xie Q, Perkins R
28. Tetko IV (2008) Associative neural network. (2003) Decision forest: combining the predic-
Methods Mol Biol 458:185–202 tions of multiple independent decision tree
29. Tetko IV, Tanchuk VY (2002) Application of models. J Chem Inform Comput Sci
associative neural networks for prediction of 43:525–531
lipophilicity in ALOGPS 2.1 program. J Chem 35. Tong W, Hong H, Xie Q, Xie L, Fang H,
Inform Comput Sci 42:1136–1145 Perkins R (2004) Assessing QSAR limitations–
30. Chen JJ, Tsai CA, Young JF, Kodell RL (2005) a regulatory perspective. Curr Comput Aided
Classification ensembles for unbalanced class Drug Des 1:195–205
sizes in predictive toxicology. SAR QSAR 36. Fechner N, Jahn A, Hinselmann G, Zell A
Environ Res 16:517–529 (2009) Atomic local neighborhood flexibility
31. Jouan-Rimbaud D, Bouveresse E, Massart DL, incorporation into a structured similarity mea-
de Noord OE (1999) Detection of prediction sure for QSAR. J Chem Inform Model
outliers and inliers in multivariate calibration. 49:549–560
AnalyticaChimicaActa 388:283–301 37. Mirkin B (2005) Clustering for data mining: a
32. Roy K, Kar S, Ambure P (2015) On a simple data recovery approach. Chapman & Hall/
approach for determining applicability domain CRC, London
of QSAR models. Chemom Intell Lab Syst 38. Smellie A (2004) Accelerated K-means cluster-
145:22–29 ing in metric spaces. J Chem Inform Comput
33. Dimitrov S, Dimitrova G, Pavlov T, Dimitrova Sci 44:1929–1935
N, Patlewicz G, Niemela J et al (2005) Stepwise
Chapter 7
Abstract
The concept of chemical similarity has many applications in several fields of cheminformatics. One com-
mon use of chemical similarity measurements, based on the principle that similar molecules have similar
properties, is in the context of the read-across approach, where estimates of a specific endpoint for a chemi-
cal are obtained starting from experimental data available from highly similar compounds.
This chapter reports an implementation of chemical similarity and the analysis of multiple combina-
tions of binary fingerprints and similarity metrics in the context of the read-across technique.
This analysis demonstrates that the classical similarity measurements can be improved with a general-
izable model of similarity. The approach presented here has been implemented in two open-source soft-
ware tools for computational toxicology (CAESAR and VEGA).
Key words Chemical similarity, QSAR, Toxicity prediction, Similarity searching, Read-across
1 Introduction
Animal models have been used for a long time as classical tools for
toxicity testing. However, in vivo animal tests are mainly limited by
ethical considerations and financial burden. Therefore, when pos-
sible, alternative testing methods for estimating the toxicity of
chemicals-such as computational methods-are preferred. In silico
toxicology refers to the class of computational methods used to
predict the toxicity of chemicals; such methods are intended not to
replace but to complement classical toxicity tests in different appli-
cations, such as toxicity prediction and prioritization of chemicals,
and in drug design for minimizing late-stage failures. Computational
methods have the unique advantage of returning an early toxicity
estimate even before the synthesis of chemicals [1].
Such non-testing data can be generated by three main
approaches: (1) grouping approaches, which include read-across
and chemical category formation; (2) structure–activity relation-
ship (SAR) and quantitative SAR (QSAR, a term used in the fol-
lowing text to imply both); and (3) expert systems. The development
Orazio Nicolotti (ed.), Computational Toxicology: Methods and Protocols, Methods in Molecular Biology, vol. 1800,
https://doi.org/10.1007/978-1-4939-7899-1_7, © Springer Science+Business Media, LLC, part of Springer Nature 2018
171
172 Matteo Floris and Stefania Olla
1.3 Similarity Similarity coefficients are used to calculate the similarities between
Coefficients a reference and a target fingerprint. Several binary similarity coef-
ficients are available; a comprehensive and up-to-date list has been
summarized by Todeschini et al. [18]. Remarkably, Todeschini
listed 51 similarity coefficients for binary variables (or 44 different
coefficients) extracted from the literature and compared using
both simulated and real data. Todeschini identified also seven simi-
larity coefficients, five pairs and a triplet of coefficients, which are
completely correlated.
Similarity-based methods have been successfully applied to
solve various cheminformatics related issues, such as predictions
of biological targets [19], likelihood of therapeutic indications
[20] or side-effects estimates [21]. In particular, state-of-the-art
machine learning approaches-such as k-nearest neighbors (kNN),
naïve Bayes models, support vector machines (SVM), random
forests (RF), or ensembles of different methods-can implement
the toxicity endpoints predictions on the basis of different chemi-
cal similarity algorithms. This concept has been applied to predic-
tions of various toxicological endpoints in different research areas,
such as in oral toxicity [22] and persistence in the sediment com-
partment [23].
174 Matteo Floris and Stefania Olla
2 Material and Methods
2.1 Fingerprints The performance of nine different fingerprint algorithms has been
evaluated. All of them are implemented in the CDK libraries. While
they fall under the generic definition of fingerprints, some of them
can be classified as structural keys and not as hashing-based finger-
prints. More specifically, the fingerprints here considered are the
following:
Molecular Similarity in Computational Toxicology 175
2.3 Similarity Two sets of similarity coefficients were implemented and tested
Coefficients respectively with the chosen fingerprints (binary coefficients) and
descriptors based keys (nonbinary coefficients). The chosen binary
coefficients are 44, coming from the work of Todeschini et al. [18].
The chosen nonbinary coefficients are six, from the work of
Holliday [33]. All the coefficients have been implemented in an
in-house JAVA software module.
176 Matteo Floris and Stefania Olla
2.4 Similarity Index In order to combine the fingerprint with the descriptors-based
keys, a generic scheme for the similarity index SI is defined as
follows:
L L L
SI ( A,B ) = éë Sb ( FPa,FPb ) ùû Wfp ´ éë Snb (CDa,CDb ) ùû Wcd ´ éë Snb ( HDa,HDb ) ùû Whd
´ éë Snb ( FGa,FGb ) ùû ´ Wfg
where A and B are two molecules to be compared, while FPa, CDa,
HDa, FGa, FPb, CDb, HDb, and FGb are, respectively, the
Fingerprint, the Constitutional Descriptors, the Heteroatom
Descriptors, and the Functional Groups keys as defined before,
calculated for the two molecules A and B; Sb (Xa,Xb) is the result
of the application of a binary similarity coefficient to two finger-
prints Xa and Xb, where the resulting values are in the interval
[0,1]; Snb (Xa,Xb) is the result of the application of a nonbinary
similarity coefficient to two descriptors based keys Xa and Xb,
where the resulting values are in the interval [0,1]; Wfp, Wcd,
Whd, and Wfg are the relative weights of the four contributions,
under the condition that their sum is equal to 1.
2.5 Datasets Two publicly available datasets where extracted from the VEGA
and Read-Across project [27].
Model The bioconcentration factor in fish (BCF) dataset composed by
473 compounds with corresponding experimental BCF values, and
the water–octanol partition coefficient (logP) dataset composed by
10,005 compounds, each with an experimental logP value. The
choice of these two different datasets is aimed at finding an optimal
setting for the SI on different kinds of data, thus implementing a
“generic” idea of chemical similarity. Indeed, this experiment is
based on two endpoints with relevance for toxicity (BCF) and on a
physical-chemical property (logP) with several applications, with
two datasets with marked different size (BCF: 860 molecules; logP
10,005 molecules).
For the purpose of testing the performances of the proposed
Similarity Index with different settings, an in-house JAVA imple-
mentation of a simple read-across based prediction model has been
used, where BCF or logP is predicted for a query compound by
finding the three most similar compounds of the dataset according
to the SI, then calculating the mean of their three experimental
values, weighted by their SI values.
In this procedure, the leave-one-out strategy was adopted for
cross-validation. Iteratively, one molecule at a time was left out of
the dataset to be predicted using the read-across approach on the
remaining molecules.
Finally, as the above described model approach is analogous to
a regression model, the values of the coefficient of determination
Molecular Similarity in Computational Toxicology 177
(R) and of the root mean square error (RMSE) on all the predic-
tions of the dataset were calculated to quantify the quality of the
model, that is a measure of the goodness of the SI.
2.6 Evaluation A combinatorial strategy was applied to test all the possible permu-
Process tation (N = 400) of different settings (similarity coefficient, binary
fingerprints, nonbinary descriptors, weighting scheme). The best
combinations of these settings were the ones identified on the basis
of R and RMSE.
A second analysis was then performed using the selected cou-
ple of fingerprint/coefficient and a set of combinations of the
weights for the SI contributions and of nonbinary similarity coef-
ficients for the descriptors keys. The batch process generated a
total of ca. 7200 combinations of weights and coefficients.
3 Notes
The fingerprints found in the ten best solutions are the Extended
Fingerprints, PubChem structural keys, and Default Fingerprint.
For the fingerprints, it is not surprising that the Extended yield
better results than the Default, as Extended are the same as default
but with some extra bits encoding information about rings. The
best coefficients found in combination with the fingerprints are 37
(Maxwell–Pilliner), 34 (Cohen), 18 (Rogot–Goldberg), 42 (CT4),
13 (Sokal–Sneath), and 1 (simple matching).
In the second step, having identified as best solutions the
Extended fingerprints and the coefficient no. 37 (Maxwell–
Pilliner), about 7200 combinations of weights and nonbinary simi-
larity coefficients have been analysed. All the 10 best solutions use
the coefficient no. 3 (Bray–Curtis) for the measurement of the
nonbinary keys of descriptors. Subsequently, it can be easily
observed that all the 10 solutions have a similar distribution of the
weight values. In the best solution the fingerprints block represent
the most important contribution (weight of 0.4), followed by the
Constitutional Descriptors block (0.35), the Functional Groups
Descriptors block (0.15), and the Heteroatoms Descriptors block
(0.1). This result can be interpreted as follows: the SI is mainly
constituted by the classical fingerprint-based comparison, strongly
corrected with some constitutional information like number (and
type) of atoms and number (and type) of bonds; this part of the SI
could be considered as the core contribution to generalizability of
the SI; furthermore, a smaller contribution of functional and het-
eroatoms descriptors is sufficient to extend the information embed-
ded in the fingerprint and constitutional descriptor blocks; this
block essentially explains the “fine chemical differences” within the
dataset.
178 Matteo Floris and Stefania Olla
References
1. Madan AK, Bajaj S, Dureja H (2013) 17. Steinbeck C, Hoppe C, Kuhn S, Floris M,
Classification models for safe drug molecules. Guha R, Willighagen EL (2006) Recent devel-
In: Reisfeld B, Mayeno AN (eds) Computational opments of the chemistry development kit
toxicology, vol 930. Humana Press, New York, (CDK)–an open-source java library for chemo-
pp 99–124 and bioinformatics. Curr Pharm Des
2. Read-Across Assessment Framework (RAAF), 12(17):2111–2120
accessed Sept 2017 18. Todeschini R, Consonni V, Xiang H, Holliday
3. http://www.chemcomp.com/ J, Buscema M, Willett P (2012) Similarity coef-
4. https://www.schrodinger.com/ ficients for binary chemoinformatics data: over-
view and extended comparison using simulated
5. h t t p : / / a c c e l r y s . c o m / p r o d u c t s / and real data sets. J Chem Inf Model
discovery-studio/ 52:2884–2901
6. http://www.talete.mi.it/, accessed Sept 2017 19. Campillos M, Kuhn M, Gavin A-C, Jensen LJ,
7. h t t p s : / / w w w. m n - a m . c o m / p r o d u c t s / Bork P (2008) Drug target identification using
adrianacode side-effect similarity. Science 321:263–266
8. h t t p s : / / w w w . n c b i . n l m . n i h . g o v / 20. Nickel J, Gohlke B-O, Erehman J, Banerjee P,
pubmed/29086040,cdk.sf.net Rong WW, Goede A, Dunkel M, Preissner R
9. h t t p : / / w w w . y a p c w s o f t . c o m / d d / (2014) SuperPred: update on drug classifica-
padeldescriptor/ tion and target prediction. Nucleic Acids Res
10. h t t p s : / / w w w . n c b i . n l m . n i h . g o v / 42:W26–W31
pubmed/26664458 21. Lounkine E, Keiser MJ, Whitebread S,
11. http://www.molgen.de/ Mikhailov S, Hamon J, Jenkins JL, Lavan P,
12. http://www.edusoft-lc.com/molconn/ Weber E, Doak AK, Côté S, Shoichet BK,
Urban L (2012) Large-scale prediction and
13. Cereto-Massagué A, Ojeda MJ, Valls C, Mulero testing of drug activity on side-effect targets.
M, Garcia-Vallvé S, Pujadas G (2015) Nature 486:361–367
Molecular fingerprint similarity search in vir-
tual screening. Methods 71:58–63 22. Drwal MN, Banerjee P, Dunkel M, Wettig MR,
Preissner R (2014) ProTox: a web server for
14. Daylight Chemical Information Systems Inc., the in silico prediction of rodent oral toxicity.
http://www.daylight.com/ Nucleic Acids Res 42:W53–W58
15. Tripos Inc., http://www.tripos.com
23. Manganaro A, Pizzo F, Lombardo A,
16. Steinbeck C, Han Y, Kuhn S, Horlacher O, Pogliaghi A, Benfenati E (2016) Predicting
Luttmann E, Willighagen E (2003) The chem- persistence in the sediment compartment with
istry development kit (CDK): an open-source a new automatic software based on the k-near-
java library for chemo- and bioinformatics. est neighbor (k-NN) algorithm. Chemosphere
J Chem Inf Comput Sci 43:493–500 144:1624–1630
Molecular Similarity in Computational Toxicology 179
Abstract
Molecular docking is an in silico method widely applied in drug discovery programs to predict the binding
mode of a given molecule interacting with a specific biological target. This computational technique is
today emerging also in the field of predictive toxicology for regulatory purposes, being for instance suc-
cessfully applied to develop classification models for the prediction of the endocrine disruptor potential of
chemicals. Herein, we describe the protocol for adapting molecular docking to the purposes of predictive
toxicology.
Key words Molecular docking, Classification model, Predictive toxicology, Endocrine potential,
Applicability domain
1 Introduction
Orazio Nicolotti (ed.), Computational Toxicology: Methods and Protocols, Methods in Molecular Biology, vol. 1800,
https://doi.org/10.1007/978-1-4939-7899-1_8, © Springer Science+Business Media, LLC, part of Springer Nature 2018
181
182 Daniela Trisciuzzi et al.
2 Materials
2.1 Training Set 1. Retrieve a suitable training data set. We used a curated col-
lection of 1689 chemicals (hereafter referred to as
EPA-DB) having high quality experimental data (androgenic
Molecular Docking for Predictive Toxicology 183
2.2 Validation Set(s) 1. Build appropriate validation sets (VS) to challenge model(s)
performance. Data can be extracted from commercial and pub-
lic sources. An example is given by the DUD-E (Directory of
Useful Decoys Enhanced) database [18], a web accessible and
free of charge repository comprising decoys and active chemi-
cals against 102 targets, included AR. In order to properly vali-
date a classification model, it is advisable to build at least three
equally sized VS (namely, VS1, VS2, and VS3) comprising the
same number of active (hazard) compounds. Accordingly, we
employed three VS obtained by extracting data from DUD-E,
each comprising 2590 decoys randomly selected and the same
pool of 259 compounds (10% of the set size) experimentally
proved to bind AR (the interested reader if refereed to http://
dude.docking.org/targets/ and for additional details).
2. Make sure 3D information of all the chemicals is available in
the VS (see Note 1).
3. Identify and remove duplicate compounds (same entry in both
the VS and TS).
2.3 Protein Target 1. Retrieve the structural information of the protein target (e.g.,
Coordinates AR X-ray crystal solved structures). Nowadays, several public
repositories containing all the information of experimentally
determined structures of proteins are available (see Note 2).
It is advisable to select, if possible, protein structure(s) pro-
vided with high quality data having a resolution better than
2.0 Å and, more importantly, a structurally unambiguous
184 Daniela Trisciuzzi et al.
Fig. 1 Example of slider chart relative to an X-ray crystal structure provided with high-quality indicators.
Reproduced from RCSB PDB (www.rcsb.org) of PDB ID 1CBS with the permission from (Kleywegt GJ, Bergfors
T, Senn H, Le Motte P, Gsell B, Shudo K, Jones TA (1994) Crystal structures of cellular retinoic acid binding
proteins I and II in complex with all-trans-retinoic acid and a synthetic retinoid. Struct Lond Engl 1993
2:1241–1258)
2.4 Docking 1. Select the docking software to be employed. For example, free
Software automated Autodock suite [19] as well as the licensed Grid-
based Ligand Docking with Energetics (Glide) [20] or the
Genetic Optimization for Ligand Docking (Gold) [21] are
popular software for drug discovery recently exploited also in
the toxicological context (see ref. [11, 22] and Note 5). In
particular, we used Gold v.5.2.
2. For the sake of time, docking software employing an efficient
parallel implementation should be preferred.
3 Methods
3.1 TS Data Handling Before carrying out docking simulations, the TS must be properly
handled following two different phases.
3.3 Ligand 1. Generate all the tautomers and ionization states at a pH value
Preparation of 7.0 ± 2.0. We processed all the chemicals belonging to TS
and VS using LigPrep (Schrodinger Suite 2016-3) [24].
3.4 Target Structures 1. Pretreat the selected protein target structures adding missing
Preparation hydrogen atoms, filling missing side chains and loops (see Note
7) and assigning bond and metals orders.
2. Delete water molecules distant from HET groups (see Note 8)
and retain water molecules in binding site having a functional
role.
3. Optimize the hydrogen bond network by reorienting hydroxyl
and thiol groups, amide groups of asparagine and glutamine,
and the imidazole ring in histidine.
4. Predict optimal protonation states of histidine, aspartic acid
and glutamic acid residues at a pH value of 7.0 ± 2.0.
186 Daniela Trisciuzzi et al.
3.5 Grid Generation 1. Generate a grid defining the binding site of the target proteins.
According to the selected docking software (see Subheading
2.4), the grid can be set as a sphere (Gold [21]) or as a cube
(Glide [20]) centered on the centroid of the cognate ligand or
of some selected residues. Usually, it is possible to specify the
resolution of the grid as the number of points in each dimen-
sion of the 3D space, or as the spacing between points
(Autodock [19]). Grid size should be carefully set in order to
minimize, in a reasonable computational time, the number of
undocked compounds. In our study, we used a spherical grid
having a radius of 10 Å centered on the center of mass of the
cognate ligands for all the docking simulations.
3.6 Docking Protocol 1. Challenge the reliability of the simulation protocol by per-
Calibration forming preliminary docking simulations of the cognate
ligands, if present. In particular, for each selected target struc-
ture (see Subheading 2.3), compare the Cartesian coordinates
of the pose of the cognate ligands resulting from docking sim-
ulation with the experimental ones by computing RMSD val-
ues (see Note 9). For the 9 AR crystal structures considered,
we computed RMSD values ranging from 0.436 Å to 2.069 Å.
3.8 Model Selection 1. Measure the goodness of the derived classification models by
computing several statistical parameters. First, we calculated
the Negative predictive values (NPV) and Positive predictive
values (PPV) (see Note 12) for each SE threshold. Notably, in
the case of unbalanced TS, ad hoc statistical metrics are recom-
mended (see Note 13 for methodological details). Accordingly,
we computed the positive–negative likelihood ratio (+/−LR)
and balance classification rate (BCR) to select the best per-
forming docking-based classification model. It should be noted
that the performance of a classification model conceived for
toxicological purposes chiefly depends on its ability to mini-
mize the rate of FNs (e.g., low −LR at the SE threshold equals
to 0.75). We computed, depending on the employed target
structure, −LR values ranging from 0.38 to 0.48, thus indicat-
ing, for all the developed models, an optimal capability to min-
imize FNs. Based on the computed statistical parameters (see
Fig. 2), we selected one AR structure (PDB code 2PNU [26])
as that providing the best performing classification model.
Fig. 2 Summary of the 9 classification models ranking obtained by the three statistical metrics (NPV, −LR and
BCR) computed at SE = 0.75. (NPV Negative Predictive Value, −LR Negative likelihood ratio, BCR Balanced
Classification Rate). Each model is specifically indicated as the PDB entry employed for docking simulation.
Reprinted (adapted) with permission from (Trisciuzzi D, Alberga D, Mansouri K, Judson RS, Novellino E,
Mangiatordi GF, Nicolotti O (2017) Predictive structure-based toxicology approaches to assess the androgenic
potential of chemicals. J Chem Inf Model. doi: https://doi.org/10.1021/acs.jcim.7b00420). Copyright 2017
American Chemical Society
Fig. 3 Projection of both TS and VS into the top two PCs obtained from the 162 descriptors computed for each
compound of the EPA-DB. The outer polygon (dashed line) takes into accounts all the chemicals in the EPA-DB
(black circles), while the inner polygon (solid line) retains the 95% of them. Chemicals of the VS (red circles)
outside the inner 95% polygon are flagged as outside AD. Reprinted (adapted) with permission from (Trisciuzzi
D, Alberga D, Mansouri K, Judson RS, Novellino E, Mangiatordi GF, Nicolotti O (2017) Predictive structure-based
toxicology approaches to assess the androgenic potential of chemicals. J Chem Inf Model. doi: https://doi.
org/10.1021/acs.jcim.7b00420). Copyright 2017 American Chemical Society
4 Notes
Bi = 8π 2U i2
where Ui2 stands for the mean square displacement of atom i from
its rest position. As U rises, the B-factor increases and the contri-
bution of the atom to the scattering decreases [31, 32]. In the
PDB file, the value in the last column represents the B-factor for
each atom in the structure. NMR models provide no information
in the temperature value fields of PDB files. B-values normally
range from 15 to 30 Å2 and are often higher than 30 Å2 for atoms
in flexible regions (e.g., atoms at side-chains are expected to exhibit
higher degree of freedom). In other words, very high B-factors
designate atoms incorrectly positioned in the model, thus indicat-
ing a local disorder due to purely thermal motions.
(b) The R-factor is a metric used to assess the quality of the
crystallographic model fitting the original X-ray diffraction
data. It is defined as follows:
∑h ,k ,l Fobs (h ,k,l ) − ∑h ,k ,l Fcalc (h ,k,l )
R=
∑h ,k ,l Fobs (h ,k,l )
where h, k, l are the reciprocal lattice points of the crystal struc-
ture. In this equation, |Fobs (h, k, l)| and |Fcalc (h, k, l)| are derived
from the intensity of a reflection measured in the diffraction pat-
tern and the intensity of the same reflection calculated from the
model, respectively [33]. Furthermore, a different version of the
R factor is the free R-factor (Rfree). If the R factor measures how
well the model predicts the whole data set that produced the
model, the Rfree evaluates how the atomic model predicts a small
“test set” (about 5–10% of observed intensities) of the diffraction
data that were not included in the building and refinement pro-
cess. Low values and small differences between the R and Rfree
factors indicate a high prediction power of the crystallographic
model [32, 33].
5. Each docking algorithm adds different penalties to the final
docking score. For instance, if a compound has been processed
(see Subheading 3.3) using Epik, a tool available from
Schrodinger Suite, ionization or tautomeric states penalties are
added to the docking score for adopting higher-energy states
[34]. Compounds without this information are not penalized
and will therefore show better scores.
6. Here an example of in-house python script used in our study to
compute, based on the obtained docking-based ranking, the
statistical parameters SE, SP, PPV, and NPV (see Fig. 4).
7. If the protein has residues in the binding site with atom types
that are mis-assigned, overlapped or having alternate positions,
the user should carefully inspect this region.
Molecular Docking for Predictive Toxicology 191
Fig. 4 Example of an in-house Python script employed to assess the performance of the obtained docking-
based classification models
192 Daniela Trisciuzzi et al.
1 i =1 2
RMSD = ∑ δi
N N
where δi is the distance between the atom i and either a refer-
ence structure or the mean position of the N equivalent atoms.
The RMSD is computed for the backbone heavy atoms (i.e., C, O,
N, and C-alpha) or considering only C-alpha atoms.
In the case of a high RMSD values (>2.5 Å) returned by the
performed docking protocol calibration (see Subheading 3.6), it is
advisable to modify the docking protocol (i.e., increase the accu-
racy at the expanse of the computational time) or, alternatively,
discard the corresponding target structure [35].
10. A confusion matrix includes information about experimental
and predicted matches and mismatches returned by each clas-
sification system (see Table 1) [36].
In particular, the confusion matrix takes into account the
number of:
(a) true positives (TPs), i.e., the number of experimental posi-
tives cases that are correctly identified;
(b) true negatives (TNs), i.e., the number of experimental neg-
atives cases that are classified correctly;
(c) false positives (FPs), i.e., the number of experimental nega-
tives cases that are incorrectly classified as positive;
(d) false negatives (FNs), i.e., the number of experimental posi-
tives cases that are incorrectly classified as negative.
The correctly classified proportion of binders (hazard
compounds) and nonbinders (safe compounds) represent
the Sensitivity (SE) and Specificity (SP), respectively. They
are defined as follows:
TP
SE =
TP + FN
and
TP
SP =
TN + FP
Table 1
Synoptic view of a confusion matrix
Experimental class
P N
Predicted class P True positive False positive
N False negative True negative
and
TN
NPV =
TN + FN
SE
+LR =
1 − SP
and
1 − SE
−LR =
SP
Acknowledgments
References
1. Kitchen DB, Decornez H, Furr JR, Bajorath inhibitors: structure-based studies and efficacy
J (2004) Docking and scoring in virtual screen- in hypertension and associated CLC-K poly-
ing for drug discovery: methods and applica- morphisms. J Hypertens 34:981–992
tions. Nat Rev Drug Discov 3:935–949 6. Nicolotti O, Benfenati E, Carotti A, Gadaleta
2. Shoichet BK (2004) Virtual screening of chem- D, Gissi A, Mangiatordi GF, Novellino E
ical libraries. Nature 432:862–865 (2014) REACH and in silico methods: an
3. Meng XY, Zhang HX, Mezei M, Cui M (2011) attractive opportunity for medicinal chemists.
Molecular docking: a powerful approach for Drug Discov Today 19:1757–1768
structure-based drug discovery. Curr Comput 7. Merlot C (2010) Computational toxicology--a
Aided Drug Des 7:146–157 tool for early safety evaluation. Drug Discov
4. Wang G, Zhu W (2016) Molecular docking for Today 15:16–22
drug discovery and development: a widely used 8. Kavlock R, Dix D (2010) Computational toxi-
approach but far from perfect. Future Med cology as implemented by the U.S. EPA: pro-
Chem 8:1707–1710 viding high throughput decision support tools
5. Liantonio A, Imbrici P, Camerino GM et al for screening and assessing chemical exposure,
(2016) Kidney CLC-K chloride channels hazard and risk. J Toxicol Environ Health B
Crit Rev 13:197–217
196 Daniela Trisciuzzi et al.
9. Gissi A, Mangiatordi GF, Sobański T, Netzeva 21. Jones G, Willett P, Glen RC, Leach AR, Taylor
T, Nicolotti O (2017) Non-test methods for R (1997) Development and validation of a
REACH legislation. Comprehensive Medicinal genetic algorithm for flexible docking. J Mol
Chemistry 3rd ed, Volume 1 Biol 267:727–748
10. Gissi A, Gadaleta D, Floris M, Olla S, Carotti 22. Kolšek K, Mavri J, Sollner Dolenc M, Gobec S,
A, Novellino E, Benfenati E, Nicolotti O Turk S (2014) Endocrine disruptome–an open
(2014) An alternative QSAR-based approach source prediction tool for assessing endocrine
for predicting the bioconcentration factor for disruption potential through nuclear receptor
regulatory purposes. ALTEX 31:23–36 binding. J Chem Inf Model 54:1254–1267
11. Trisciuzzi D, Alberga D, Mansouri K et al 23. Lyne PD (2002) Structure-based virtual
(2015) Docking-based classification models for screening: an overview. Drug Discov Today
exploratory toxicology studies on high-quality 7:1047–1055
estrogenic experimental data. Future Med 24. Schrödinger Release 2016–3: LigPrep,
Chem 7:1921–1936 Schrödinger, LLC, New York, NY, 2016
12. Gadaleta D, Mangiatordi GF, Catto M, Carotti 25. Schrödinger Suite 2016–3 Protein Preparation
A, Nicolotti O (2016) Applicability domain for Wizard; Epik, Schrödinger, LLC, New York,
QSAR models: where theory meets reality. Int NY, 2016; Impact, Schrödinger, LLC,
J Quant Struct-Prop Relatsh IJQSPR 1:45–63 New York, NY, 2016; Prime, Schrödinger,
13. Mansouri K, Abdelaziz A, Rybacka A et al LLC, New York, NY, 2016
(2016) CERAPP: collaborative estrogen recep- 26. Cantin L, Faucher F, Couture JF et al (2007)
tor activity prediction project. Environ Health Structural characterization of the human
Perspect 124:1023–1033 androgen receptor ligand-binding domain
14. Kamel M, Kleinstreuer N, Watt E, Harris J, complexed with EM5744, a rationally
Judson R (2017) CoMPARA: collaborative designed steroidal ligand bearing a bulky
modeling project for androgen receptor activ- chain directed toward helix 12. J Biol Chem
ity conference: SOT meeting 56th annual 282:30910–30919
meeting and ToxExpo. doi: https://doi. 27. Sahigara F, Mansouri K, Ballabio D, Mauri A,
org/10.13140/rg.2.2.16791.78241 Consonni V, Todeschini R (2012) Comparison
15. Trisciuzzi D, Alberga D, Mansouri K, Judson of different approaches to define the applicabil-
RS, Novellino E, Mangiatordi GF, Nicolotti O ity domain of QSAR models. Mol Basel Switz
(2017) Predictive structure-based toxicology 17:4791–4810
approaches to assess the androgenic potential 28. Berman HM, Westbrook J, Feng Z, Gilliland
of chemicals. J Chem Inf Model G, Bhat TN, Weissig H, Shindyalov IN, Bourne
57:2874–2884 PE (2000) The protein data bank. Nucleic
16. Klimisch HJ, Andreae M, Tillmann U (1997) Acids Res 28:235–242
A systematic approach for evaluating the qual- 29. Berman H, Henrick K, Nakamura H (2003)
ity of experimental toxicological and ecotoxi- Announcing the worldwide protein data bank.
cological data. Regul Toxicol Pharmacol RTP Nat Struct Biol 10:980
25:1–5 30. Kinjo AR, Suzuki H, Yamashita R et al (2012)
17. Kleinstreuer NC, Ceger P, Watt ED et al Protein Data Bank Japan (PDBj): maintaining
(2017) Development and validation of a com- a structural data archive and resource descrip-
putational model for androgen receptor activ- tion framework format. Nucleic Acids Res
ity. Chem Res Toxicol 30:946–964 40:D453–D460
18. Mysinger MM, Carchia M, Irwin JJ, Shoichet 31. Trueblood KN, Bürgi H-B, Burzlaff H, Dunitz
BK (2012) Directory of useful decoys, JD, Gramaccioli CM, Schulz HH, Shmueli U,
enhanced (DUD-E): better ligands and decoys Abrahams SC (1996) Atomic displacement
for better benchmarking. J Med Chem parameter nomenclature. Report of a subcom-
55:6582–6594 mittee on atomic displacement parameter
19. Morris GM, Huey R, Lindstrom W, Sanner nomenclature. Acta Crystallogr A 52:
MF, Belew RK, Goodsell DS, Olson AJ (2009) 770–781
AutoDock4 and AutoDockTools4: automated 32. Rupp B (2007) Biomolecular crystallography:
docking with selective receptor flexibility. principles, practice, and application to struc-
J Comput Chem 30:2785–2791 tural biology. Garland Science, Taylor and
20. Friesner RA, Banks JL, Murphy RB et al (2004) Francis Group, New York
Glide: a new approach for rapid, accurate dock- 33. Brünger AT (1992) Free R value: a novel statis-
ing and scoring. 1. Method and assessment of tical quantity for assessing the accuracy of crys-
docking accuracy. J Med Chem 47:1739–1749 tal structures. Nature 355:472–475
Molecular Docking for Predictive Toxicology 197
34. Greenwood JR, Calkins D, Sullivan AP, Shelley multi-objective optimization algorithm. BMC
JC (2010) Towards the comprehensive, rapid, Bioinformatics 10:58
and accurate prediction of the favorable tauto- 41. Powers DM (2011) Evaluation: from preci-
meric states of drug-like molecules in aqueous sion, recall and F-measure to ROC, informed-
solution. J Comput Aided Mol Des 24: ness, markedness and correlation. J Mach
591–604 Learn Technol 2:37–63
35. Wilantho A, Tongsima S, Jenwitheesuk E 42. Matthews BW (1975) Comparison of the pre-
(2008) Pre-docking filter for protein and ligand dicted and observed secondary structure of T4
3D structures. Bioinformation 3:189–193 phage lysozyme. Biochim Biophys Acta
36. Provost F, Kohavi R (1998) Guest editors’ 405:442–451
introduction: on applied research in machine 43. Truchon JF, Bayly CI (2007) Evaluating vir-
learning. Mach Learn 30:127–132 tual screening methods: good and bad metrics
37. Triballeau N, Acher F, Brabet I, Pin JP, for the “early recognition” problem. J Chem
Bertrand HO (2005) Virtual screening work- Inf Model 47:488–508
flow development guided by the “receiver 44. Sokolova M, Lapalme G (2009) A systematic
operating characteristic” curve approach. analysis of performance measures for classifica-
Application to high-throughput docking on tion tasks. Inf Process Manage 45:427–437
metabotropic glutamate receptor subtype 4. 45. Sánchez-Rodríguez A, Pérez-Castillo Y,
J Med Chem 48:2534–2547 Schürer SC, Nicolotti O, Mangiatordi GF,
38. Youden WJ (1950) Index for rating diagnostic Borges F, Cordeiro MNDS, Tejera E, Medina-
tests. Cancer 3:32–35 Franco JL, Cruz-Monteagudo M (2017) From
39. Schisterman EF, Perkins NJ, Liu A, Bondell H flamingo dance to (desirable) drug discovery: a
(2005) Optimal cut-point and its corresponding nature-inspired approach. Drug Discov Today
Youden index to discriminate individuals using 22:1489–1502
pooled blood samples. Epidemiology 16:73–81 46. Nembri S, Grisoni F, Consonni V, Todeschini
40. Li H, Zhang H, Zheng M, Luo J, Kang L, Liu R (2016) In silico prediction of cytochrome
X, Wang X, Jiang H (2009) An effective dock- P450-drug interaction: QSARs for CYP3A4
ing strategy for virtual screening based on and CYP2C9. Int J Mol Sci 17:914–933
Chapter 9
Abstract
Nontesting methods (NTM) proved to be a valuable resource for risk assessment of chemical substances.
Indeed, they can be particularly useful when the information provided by different sources was integrated
to increase the confidence in the final result. This integration can be sometimes difficult because different
methods can lead to conflicting results, and because a clear guideline for integrating information from dif-
ferent sources was not available in the recent past. In this chapter, we present and discuss the recently
published guideline from EFSA for integrating and weighting evidence for scientific assessment. Moreover,
a practical example on the application of these integration principles on evidence from different in silico
models was shown for the assessment of bioconcentration factor (BCF). This example represents a dem-
onstration of the suitability and effectiveness of in silico methods for risk assessment, as well as a practical
guide to end-users to perform similar analyses on likely hazardous chemicals.
Key words Nontesting methods, Weight of evidence, BCF, QSAR models, Read-across
1 Introduction
Orazio Nicolotti (ed.), Computational Toxicology: Methods and Protocols, Methods in Molecular Biology, vol. 1800,
https://doi.org/10.1007/978-1-4939-7899-1_9, © Springer Science+Business Media, LLC, part of Springer Nature 2018
199
200 Anna Lombardo et al.
3 Case Study
Table 1
Summary of the results obtained by nontesting methods
Applicability
Programs used Models used Results domain
EPISuite™ v.4.1 BCF, Meylan model 2.137 (137.2 L/kg ww) Manually
checked
BCF/BAF, Arnot–Gobas, 2.516/2.516
upper trophic level model (328/328.3 L/kg ww)
BCF/BAF, Arnot–Gobas, 2.442/2.445
mid trophic level model (276.6/278.4 L/kg
ww)
BCF/BAF, Arnot–Gobas, 2.409/2.418
lower trophic level model (256.2/261.7 L/kg
ww)
BCF/BAF, Arnot–Gobas, 2.766/3.000
upper trophic level, (583.1/1001 L/kg ww)
kM = 0 model
T.E.S.T. v. 4.2.1 CONSENSUS method 2.36 (230.04 L/kg) Internally
2012 U.S. EPA checked
Hierarchical clustering method 2.31 (202.85 L/kg);
PI* = 0.39
Single model method 2.30 (199.95 L/kg);
PI* = 1.24
Group contribution method 2.46 (288.98 L/kg);
PI* = 2.62
FDA 2.29 (196.35 L/kg);
PI* = 1.19
Nearest neighbor method 2.45 (279.89 L/kg)
VEGA platform BCF model (CAESAR) 2.1.14 1.61 (40 L/kg) 0.85 ADI
v.1.1.3
BCF model (Meylan) 1.0.3 2.14 (137 L/kg) 1 ADI
BCF (KNN/read-across) 1.1.0 2.47 0.7 ADI
*PI = Prediction interval
for both the entire training set and for the most similar compounds.
Analyzing each prediction is it possible to observe that the Single
model method, the Group contribution method and the FDA
method give low reliable prediction due to the high prediction inter-
val (of 1.18/3.42, 1.15/3.77, and 1.70/2.89 respectively). The
Hierarchical clustering gives a reliable prediction (with a prediction
interval of 2.11/2.50). The MAE values confirm this tendency (data
not shown). In any case, all the MAE are below or equal to the
experimental variability (that spreads from 0.42 to 0.75 [11, 12]).
Observing the list of the most similar compounds of the training set,
we can see that the most similar compound (S1, CAS 117-18-0) is very
similar. Indeed, it has only one added chlorine atom. It has an experi-
mental log BCF value of 3.25. The second one (S2, CAS 609-89-2)
has an added hydroxyl group; therefore, it is probably more soluble
Criteria and Application on the Use of Nontesting Methods within a Weight of Evidence… 205
1
N/A 2,36
(test chemical)
Fig. 5 Prediction for the similar compounds of the external set of T.E.S.T.
The second model (see Fig. 9), the BCF model (Meylan) 1.0.3,
assesses the target compound into the Applicability Domain of the
model. Indeed, the two most similar compounds (see Fig. 10) are
very similar to the target (with three chlorine atoms and one nitro
group attached to a benzene ring). Moreover, they are well pre-
dicted and concordant with the prediction for the target compound
(of about 2 l.u.). We notice that this model is the same model as in
EPISuite, but in this case the AD is checked by the VEGA platform,
and more information is provided, such as the similar compounds.
The third model, the BCF (KNN/read-across) 1.1.0, assesses the
target compound as outside the Applicability Domain of the model
(see Fig. 11). The reason is the wrong prediction of the second most
similar compound (CAS 117-18-0), with an error in prediction above
1 l.u. This model uses the first four similar compounds (see Fig. 12).
The first one (CAS 18708-70-8) is similar to the target substance
(three chlorine atoms and a nitro group attached to the benzene ring),
the second one (CAS 117-18-0) has an additional chlorine atom, the
third one (CAS 99-54-7) a chlorine atom less and the fourth one
(CAS 99-30-9) has a chlorine atom less and an additional secondary
amine. As for the T.E.S.T. model similar compounds, the molecules
with four chlorine atoms have a higher log BCF value, whereas the
ones with two chlorine atoms a lower value.
The BCF CAESAR and Meylan models in VEGA offer also
another tool of analysis for the users: the scatter plots log BCF vs
MlogP (see Fig. 13). MlogP is a log Kow calculated by the model. For
each model, a scatter plot with all the compounds of the training set
208 Anna Lombardo et al.
1
N/A 2,36
(test chemical)
(Fig. 13a and c) and one with the three most similar compounds
(Fig. 13b and d) are shown. If the target compound is in the cloud,
as in the first plot, it means that it is similar to the compounds of the
training set (They are represented by their experimental value.). The
second plot allows for verifying if there is a trend between the three
most similar compounds and the target. It shows both the experi-
mental (circles) and predicted (black dot) value of the similar sub-
stances (the dimension of the circle represents the similarity).
Figure 13a indicates that the target compound is inside the cloud for
the CAESAR model but borderline, whereas Fig. 13c that it is inside
the cloud of the Meylan model. This is confirmed by Fig. 13b and d:
in the first one the target has a higher MlogP than the similar com-
pounds. Considering that the log BCF increases if the MlogP
increases, this plot may indicate an underestimation of the log BCF
for the target compound: all the similar compounds have the experi-
210 Anna Lombardo et al.
Fig. 8 List of the most similar compounds found in the BCF CAESAR model
Criteria and Application on the Use of Nontesting Methods within a Weight of Evidence… 211
mental values (the open, white circles in the figure) higher than the
predicted values (the black dots). Extrapolating these experimental
values, or adding the error done in prediction, we can assume that the
correct value is about 2.2. In Fig. 13d all similar compounds and the
target have the same MlogP, and log BCF spreads from 1.84 to 2.47.
Figure 14 shows as additional analysis: the uncertainty assess-
ment. In this assessment, a safety margin is added to the predicted
value. The safety margin is calculated on the bases of the ADI and
the threshold (3.3 or 3.7 l.u.) considered. In this case, with both
the thresholds the values remain widely below 3.3 l.u.
ToxRead gives details about the most similar compounds (their
number is selected by the user), the rules identified (with a descrip-
tion and the list of the compounds in which the rule appear), and
the interpolation chart (logP vs log BCF). Figure 15 shows the out-
put with three most similar molecules. ToxRead found two rules,
the nitro aromatic and the acceptor atoms for H-bonds (N, O, F).
212 Anna Lombardo et al.
Fig. 10 List of the most similar compounds found in the BCF Meylan model
The three most similar compounds have a nitro group and two,
three, or four chlorine atoms attached to the benzene ring. The
same rules are present in all of them. In the interpolation chart we
can see that the most similar compound (CAS 18708-70-8), which
Criteria and Application on the Use of Nontesting Methods within a Weight of Evidence… 213
differs only for the chlorine atoms position, has a very similar logP
value and an experimental log BCF value of 2.72. The other two
chemicals have lower logP and log BCF or higher logP and log
BCF. The log BCF value of the target molecule should be between
1.92 and 3.26 and should be very similar to the one of the CAS
18708-70-8.
Figure 16 shows the behavior of a larger number of similar
compounds (10 in this case). The interpolation chart shows also in
this case a linear trend: the higher the logP, the higher the log
BCF. This confirms the conclusion obtained with the analysis done
with three similar compounds.
214 Anna Lombardo et al.
Fig. 12 List of the most similar compounds found in the BCF KNN/read-across model
Criteria and Application on the Use of Nontesting Methods within a Weight of Evidence… 215
a c
LogBCF LogBCF
5 6
4 5
3 4
2 3
1 2
1
0
0
-1
-3 -2 -1 0 1 2 3 4 5 6 7 8 9 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13
MLogP MLogP
b d
LogBCF
2 2.5
2.4
1.5 2.3
2.2
1 2.1
2
0.5 1.9
1.8
Fig. 13 Scatter plots for the CAESAR model (a, b) and for the Meylan model (c, d)
4 Conclusion
In conclusion, all the QSAR models used agree and predict the
target compound as nonbioaccumulative, with a log BCF of
about 2.5, and values from 1.61 to 2.766. The lowest value is
clearly underestimated, as we discussed (Fig. 13). Since the
majority of the QSAR models have, in this case, the tendency to
underestimate the similar compounds, the most probable value
is about 2.7–2.8 l.u. The same conclusion can be reached also
using the ToxRead software that shows a similar compound
with a very similar logP with a log BCF of 2.72.
Here we show how in silico methods and, in general, NTMs
are a valuable resource for risk assessment of substances when
multiple pieces of evidence from multiple tools are available.
Evidence can be integrated and compared in order to obtain a
highly confident assessment, when it results fromdifferent tools
yet shows concordance.
216 Anna Lombardo et al.
Acknowledgments
Fig. 15 ToxRead output with three similar compounds (lower part of the figure), the rules (on the right), the
interpolation chart with logP (upper part of the figure, in the middle) and the graph with the overview of the
similar compounds and rules (upper, left)
References
1. Benfenati E, Pardoe S, Martin T, Gonella 8. Milan C, Schifanella O, Roncaglioni A,
Diaza R, Lombardo A, Manganaro A, Gissi Benfenati E (2011) Comparison and possible
A (2013) Using toxicological evidence from use of in silico tools for carcinogenicity within
QSAR models in practice. ALTEX 30:19–40 REACH legislation. J Environ Sci Health
2. Benfenati E, Belli M, Borges T, Casimiro E, Part C Environ Carcinog Ecotoxicol Rev
Cester J, Fernandez A, Gini G, Honma M, 29:300–323
Kinzl M, Knauf R, Manganaro A, Mombelli E, 9. Scientific Committee EFSA, Hardy A, Benford
Petoumenou MI, Paparella M, Paris P, Raitano D, Halldorsson T, Jeger MJ, Knutsen HK,
G (2016) Results of a round-robin exer- More S, Naegeli H, Noteborn H, Ockleford
cise on read-across. SAR QSAR Environ Res C, Ricci A, Rychen G, Schlatter JR, Silano V,
27:371–384 Solecki R, Turck D, Benfenati E, Chaudhry
3. Benfenati E, Roncaglioni A, Petoumenou MI, QM, Craig P, Frampton G, Greiner M, Hart A,
Cappelli CI, Gini G (2015) Integrating QSAR Hogstrand C, Lambre C, Luttik R, Makowski
and read-across for environmental assessment. D, Siani A, Wahlstroem H, Aguilera J, Dorne
SAR QSAR Environ Res 26:605–618 J-L, Fernandez Dumont A, Hempen M,
4. Cappelli CI, Benfenati E, Cester J (2015) Valtuena Martınez S, Martino L, Smeraldi C,
Evaluation of QSAR models for predicting Terron A, Georgiadis N, Younes M (2017)
the partition coefficient (logP) of chemicals Guidance on the use of the weight of evidence
under the REACH regulation. Environ Res approach in scientific assessments. EFSA J 15.
143:26–32 https://doi.org/10.2903/j.efsa.2017.4971
5. Cappelli CI, Cassano A, Golbamaki A, Moggio 10. Marzo M, Roncaglioni A, Kulkarni S, Barton-
Y, Lombardo A, Colafranceschi M, Benfenati E Maclaren TS, Benfenati E (2016) In Silico
(2015) Assessment of in silico models for acute model for developmental toxicity: how to
aquatic toxicity towards fish under REACH reg- use QSAR models and interpret their results.
ulation. SAR QSAR Environ Res 26:977–999 Methods Mol Biol 1425:139–161
6. Diaza RG, Manganelli S, Esposito A, 11. Lombardo A, Roncaglioni A, Boriani E, Milan
Roncaglioni A, Manganaro A, Benfenati C, Benfenati E (2010) Assessment and vali-
E (2015) Comparison of in silico tools for dation of the CAESAR predictive model for
evaluating rat oral acute toxicity. SAR QSAR bioconcentration factor (BCF) in fish. Chem
Environ Res 26:1–27 Cent J 4:S1
7. Bakhtyari NG, Raitano G, Benfenati E, 12. Dimitrov S, Dimitrova N, Parkerton T,
Martin T, Young D (2013) Comparison of in Comber M, Bonnell M, Mekenyan O (2005)
silico models for prediction of mutagenicity. Base-line model for identifying the bioaccu-
J Environ Sci Health Part C Environ Carcinog mulation potential of chemicals. SAR QSAR
Ecotoxicol Rev 31:45–66 Environ Res 16:531–554
Chapter 10
Abstract
Uncertainties can be defined as the gaps of knowledge and/or of data sets and/or of methodologies that
can exert an unwanted influence on the outcome of a risk assessment. In principle, uncertainties are
unavoidable, and thus, a transparent description and weighing of relevant uncertainties should be a neces-
sary component of risk assessment. Examples are provided of uncertainty analysis in recent opinions of the
European Food Safety Authority concerning additives, pesticides, and contaminants. Whereas it is difficult
to quantify the impact of each specific uncertainty on the outcome, it should be possible to quantify the
combined effect of identified uncertainties; also, a stepwise approach may be envisaged, focusing on those
issues where a detailed appraisal of uncertainties is needed. On a more general ground, consideration of
uncertainty and its sources meets the general requirement for transparency in scientific assessment.
Key words Food safety, Contaminant, Pesticide, Additive, Exposure, Point of departure, Adverse
outcome pathway, Benchmark dose
1 Introduction
Orazio Nicolotti (ed.), Computational Toxicology: Methods and Protocols, Methods in Molecular Biology, vol. 1800,
https://doi.org/10.1007/978-1-4939-7899-1_10, © Springer Science+Business Media, LLC, part of Springer Nature 2018
219
220 Alberto Mantovani
cells (the primary site of the AO) to their mature counterparts; the
rarity of the disease itself is a source of uncertainty, as it limits the
power of human studies investigating the biology of the disease.
For infant leukemia the in utero disruption of DNA topoisom-
erase II is the molecular initiating event; conversely, the main spe-
cific uncertainty for childhood leukemia is the lack of an identified
molecular initiating event, although this is considered to occur in
utero. Pesticides authorized in Europe do not show genotoxic
effects in regulatory studies; on the other hand, some studies in the
open literature suggest genotoxicity by using different biomarkers
and some epidemiological studies on agricultural workers exposed
to pesticides have reported DNA damage. These uncertainties and
inconsistencies warrant further research to delineate whether and
how specific pesticides may interact with DNA in stem cells in
utero and produce genetic lesions.
General aspects of uncertainty concerned the three major areas
explored during the development of the opinion: epidemiological
studies, experimental studies, and AOP development.
●● Epidemiological studies: uncertainties include the definition of
outcome. In particular, epidemiological studies grouped infant
and childhood leukemia which in fact have different pathogen-
esis, leading to significant data heterogeneity. The main uncer-
tainty in epidemiological studies are exposure estimates, from
both standpoints of the generic definition of the substances of
concern (e.g., “insecticides” which include substances with
largely different toxicological profiles) and of the lack of quan-
titative information concerning internal exposure. Furthermore,
in a realistic field scenario, humans are exposed to several sub-
stances and coformulants contained in pesticide products, rep-
resenting an additional source of uncertainty. Finally, the
limitation of knowledge about the etiology of multifactorial
human diseases, involving the presence of different pheno-
types, genetic and environmental factors, is an uncertainty
directly relevant to AOP development.
●● Experimental studies. For most pesticides, the bulk of experi-
mental studies is provided by regulatory studies which usually
do not provide information on upstream events. For some
human diseases, including infant and childhood leukemias,
predictive in vivo models are not available.
●● The most relevant uncertainties concerned the process of AOP
development. Most of the current empirical support for molec-
ular initiating events and upstream key events is derived from in
vitro assays, which use ad hoc nonstandardized models; the
availability of such models and their characteristics may affect
the full description and characterization of the AOP. The differ-
ences in the exposure (time, concentrations, and route of
administration) between in vitro models and in vivo scenarios
226 Alberto Mantovani
References
1. FAO (Food and Agriculture Organization of 11. Mensinga TT, Speijers GJA, Meulenbelt
the United Nations) and WHO (World Health J (2003) Health implications of exposure
Organization) (2009). Principles and methods to environmental nitrogenous compounds.
for the risk assessment of chemicals in food. Toxicol Rev 22:41–51
Environmental Health Criteria 240, http:// 12. European Food Safety Authority (2007)
www.who.int/foodsafety/publications/chem- Scientific opinion of the scientific committee
ical-food/en/ (last visit September 2017) related to uncertainties in dietary exposure
2. Dong Z, Liu Y, Duan L, Bekele D, Naidu R assessment. EFSA J 5:438–492
(2015) Uncertainties in human health risk 13. IARC - International Agency for Research on
assessment of environmental contaminants: a Cancer (2010) Ingested nitrate and nitrite, and
review and perspective. Environ Int 85:120–132 cyanobacterial peptide toxins. IARC Monogr
3. Aiassa E, Higgins JP, Frampton GK, Greiner Eval Carcinog Risks Hum 94:v–vii, 1–412
M, Afonso A, Amzal B, Deeks J, Dorne JL, 14. Organisation for Economic Co-operation
Glanville J, Lövei GL, Nienstedt K, O'connor and Development - OECD (2013) Guidance
AM, Pullin AS, Rajić A, Verloo D (2015) document on developing and assessing adverse
Applicability and feasibility of systematic review outcome pathways. OECD, Paris. ENV/JM/
for performing evidence-based risk assessment MONO 6, p 45
in food and feed safety. Crit Rev Food Sci Nutr 15. Mantovani A (2017) Some considerations on
55:1026–1034 adverse outcome pathways: how to use them
4. European Food Safety Authority (2017) in toxicological risk assessment. Adjacent
Guidance on Uncertainty in EFSA Scientific Government August 2017, pp .130–131.
Assessment. Revised Draft for Internal Testing. http://edition.pagesuite-professional.co.uk/
https://www.efsa.europa.eu/sites/default/fil html5/reader/production/default.aspx?
es/160321DraftGDUncertaintyInScientificAs pubname=&edid=f8abf7ad-bc1b-4a51-b42b-
sessment.pdf (last visit, January 2018) d7b202a02971 (last visit, September 2017)
5. European Food Safety Authority (2014) 16. Fujita KA, Ostaszewski M, Matsuoka Y, Ghosh
Guidance on expert knowledge elicitation in S, Glaab E, Trefois C, Crespo I, Perumal TM,
food and feed safety risk assessment. EFSA Jurkowski W, Antony PM, Diederich N, Buttini
J 12:3734. [278 pp] M, Kodama A, Satagopam VP, Eifes S, Del Sol
6. European Food Safety Authority, Scientific A, Schneider R, Kitano H, Balling R (2014)
Committee (2009) Guidance of the scientific Integrating pathways of Parkinson’s disease in
committee on a request from EFSA on the use a molecular interaction map. Mol Neurobiol
of the benchmark dose approach in risk assess- 49:88–102
ment. EFSA J 1150:1–72 17. Pelkonen O, Terron A, Hernandez AF,
7. European Food Safety Authority, Scientific Menendez P, Bennekou SH, EFSA WG EPI1
Committee (2017) Update: guidance on the and its other members (Angeli K, Fritsche E,
use of the benchmark dose approach in risk Leist M, Mantovani A, Price A, Viviani B)
assessment. EFSA J 15:4658–5079 (2017) Chemical exposure and infant leukae-
8. Frazzoli C, Robouch P, Caroli S (2010) mia: development of an adverse outcome path-
Analytical accuracy for trace elements in food: way (AOP) for aetiology and risk assessment
a graphical approach to support uncertainty research. Arch Toxicol 91:2763–2780
analysis in assessing dietary exposure. Toxicol 18. Leist M, Ghallab A, Graepel R, Marchan R,
Environ Chem 92:641–654 Hassan R, Bennekou SH, Limonciel A, Vinken
9. European Food Safety Authority, Panel on M, Schildknecht S, Waldmann T, Danen E, van
Food Additives and Nutrient Sources added Ravenzwaay B, Kamp H, Gardner I, Godoy
to Food (2017) Scientific opinion on the re- P, Bois FY, Braeuning A, Reif R, Oesch F,
evaluation of potassium nitrite (E 249) and Drasdo D, Höhme S, Schwarz M, Hartung T,
sodium nitrite (E 250) as food additives. EFSA Braunbeck T, Beltman J, Vrieling H, Sanz F,
J 15:4786–4943 Forsby A, Gadaleta D, Fisher C, Kelm J, Fluri
10. European Food Safety Authority, Pamel on D, Ecker G, Zdrazil B, Terron A, Jennings
Plant Protection Producys and their Residues P, van der Burg B, Dooley S, Meijer AH,
(2017) Scientific opinion on the investigation Willighagen E, Martens M, Evelo C, Mombelli
into experimental toxicological properties of E, Taboureau O, Mantovani A, Hardy B, Koch
plant protection products having a potential B, Escher S, van Thriel C, Cadenas C, Kroese
link to Parkinson’s disease and childhood leu- D, van de Water B, Hengstler JG (2017)
kaemia. EFSA J 15:4691–5016 Adverse outcome pathways: opportunities,
Characterization and Management of Uncertainties in Toxicological Risk Assessment… 229
limitations and open questions. Arch Toxicol health related to the presence of 3- and
91:3477–3505 2-monochloropropanediol (MCPD), and their
19. European Food Safety Authority, Panel on fatty acid esters, and glycidyl fatty acid esters in
Additives and Products or Substances used in food. EFSA J 14(4426):159
Animal Feed (2016) Scientific opinion on the 21. European Food Safety Authority, Panel on
safety of lancer (lanthanide citrate) as a zoo- Contaminants in the Food Chain (2017)
technical additive for weaned piglets. EFSA Scientific Opinion on the risks to human
J 214(4477):10 and animal health related to the presence of
20. European Food Safety Authority, Panel on deoxynivalenol and its acetylated and modi-
Contaminants in the Food Chain (2017) fied forms in food and feed. EFSA J 15:4718.
Scientific opinion on the risks for human [345 pp]
Part III
Abstract
The use of computational toxicology methods within drug discovery began in the early 2000s with appli-
cations such as predicting bacterial mutagenicity and hERG inhibition. The field has been continuously
expanding ever since and the tasks at hand have become more complex. These approaches are now strate-
gically integrated into the risk assessment process, as a complement to in vitro and in vivo methods. Today,
computational toxicology can be used in every phase of drug discovery and development, from profiling
large libraries early on, to predicting off-target effects in the mid-discovery phase, to assessing potential
mutagenic impurities in development and degradants as part of life-cycle management. This chapter pro-
vides an overview of the field and describes the application of computational toxicology throughout the
entire discovery and development process.
Key words Computational toxicology, Drug discovery, Hit identification, Lead identification, Lead
optimization
1 Introduction
Orazio Nicolotti (ed.), Computational Toxicology: Methods and Protocols, Methods in Molecular Biology, vol. 1800,
https://doi.org/10.1007/978-1-4939-7899-1_11, © Springer Science+Business Media, LLC, part of Springer Nature 2018
233
234 Catrin Hasselgren and Glenn J. Myatt
Target identification Hit identification Lead optimization Preclinical Clinical trial Market
Manufacture
3 Hit Identification
4.1 Structural Alerts Once in vitro active compounds are identified, it is desirable to
and Warning Systems develop series around the hit molecules and to computationally
profile the series for hazard identification purposes. This is pref-
erentially done before synthesis to have an idea of what the series
liabilities are, or in the ideal case of having multiple series to pur-
sue, which one might be more favorable. This type of profiling is
typically done using QSAR methods or by matching against
structural alerts and most commonly involves predicting poten-
tial in vitro effects. Structural alerts for toxicology are used in
many different phases of drug discovery and development. They
were first introduced for genotoxicity related carcinogenicity by
Ashby and Tennant [13] and their initial alerts have since been
added to, modified and refined for genotoxicity by several groups
[14–18]. Structural alerts are suitable for any application where
there is knowledge of the relationship between a particular sub-
structure and a certain biological effect (or in this case, toxicity).
It is preferable that the alerts are based on mechanistic under-
standing relating the structure to the effects but sometimes they
are purely based on a hypothesis and are not confirmed. It is
important that alerts are defined as specifically as possible to avoid
generating too many false positive predictions by flagging very
generic substructures, when they are applied. In addition to
genotoxicity, reactive metabolite formation [19–21] and skin
sensitization [22, 23] (mainly for occupational toxicity) are two
endpoints where structural alerts are often used.
Structural alerts should be used as hazard identification flags
only and require assessment as to the relevance of the alert within
the compound they are contained in. They can be used on their
own but are better used in a weight-of-evidence scenario together
with, for example, a QSAR prediction for the same endpoint and
the assessment of structural analogs with available experimental
data (read-across) to understand better how predictive the alert is in
a specific chemical series. This type of tool can be used very early in
lead identification to profile compounds in a series, or across series,
but is also useful in lead optimization to check new experimental
238 Catrin Hasselgren and Glenn J. Myatt
5 Development
6.1 Summary This chapter provides an overview of the use of different types of
and Conclusion computational toxicology methodologies associated with the
drug discovery and development process, including their use in
early stage of discovery to make strategic decisions concerning
research direction, and their use in development to support
safety assessment, regulatory submissions, and manufacture of
the drug product. The chapter also reviews some of the current
challenges and identifies a number of recent initiatives to over-
come such challenges, including how to increase the quality and
reproducibility of computational toxicology by incorporation of
good “in silico toxicology practice” documented as part of gen-
erally agreed protocols.
Computational Toxicology and Drug Discovery 243
References
1. DiMasi JA, Grabowski HG, Hansen RW (2016) 12. Waring MJ, Arrowsmith J, Leach AR, Leeson
Innovation in the pharmaceutical industry: PD, Mandrell S, Owen RM, Pairaudeau G,
new estimates of R&D costs. J Health Econ Pennie WD, Pickett SD, Wang J, Wallace O,
47:20–33 Weir A (2015) An analysis of the attrition of
2. Kola I, Landis J (2004) Can the pharmaceu- drug candidates from four major pharma-
tical industry reduce attrition rates? Nat Rev ceutical companies. Nat Rev Drug Discov
Drug Discov 3:711–716 14:475–486
3. Cook D, Brown D, Alexander R, March R, 13. Ashby J, Tennant RW (1988) Chemical struc-
Morgan P, Satterthwaite G, Pangalos MN ture, Salmonella mutagenicity and extent of
(2014) Lessons learned from the fate of carcinogenicity as indicators of genotoxic car-
AstraZeneca’s drug pipeline: a five-dimensional cinogenesis among 222 chemicals tested in
framework. Nat Rev Drug Discov 13:419–431 rodents by the U.S. NCI/NTP. Mutat Res
4. Harpaz R, Callahan A, Tamang S, Low Y, 204:17–115
Odgers D, Finlayson S, Jung K, LePendu P, 14. Tennant RW, Ashby J (1991) Classification
Shah NH (2014) Text mining for adverse drug according to chemical structure, mutagenicity
events: the promise, challenges, and state of the to Salmonella and level of carcinogenicity of a
art. Drug Saf 37:777–790 further 39 chemicals tested for carcinogenic-
5. Administration FaD FDA Adverse ity by the U.S. National Toxicology Program.
Event Reporting System (FAERS) Mutat Res 257:209–227
(2017). https://www.fda.gov/Drugs/ 15. Kazius J, McGuire R, Bursi R (2005) Derivation
GuidanceComplianceRegulatoryInformation/ and validation of toxicophores for mutagenic-
Surveillance/AdverseDrugEffects/default.htm ity prediction. J Med Chem 48:312–320
6. Myatt GJ, Beilke LD, Cross KP (2017) 16. Benigni R, Bossa C (2008) Structure alerts
4.09 – In silico tools and their application. for carcinogenicity, and the Salmonella assay
In: Chackalamannil S, Rotella D, Ward SE system: a novel insight through the chemical
(eds) Comprehensive medicinal chemistry relational databases technology. Mutat Res
III. Elsevier, Oxford, pp 156–176 659:248–261
7. Mitchell JBO (2014) Machine learning meth- 17. Romualdo B, Cecilia B (2006) Structural alerts
ods in chemoinformatics. Wiley Interdiscip Rev of mutagens and carcinogens. Curr Comput
Comput Mol Sci 4:468–481 Aided Drug Des 2:169–176
8. Hasselgren C, Muthas D, Ahlberg E, Andersson 18. Ahlberg E, Amberg A, Beilke LD, Bower D,
S, Carlsson L, Noeske T, Stålring J, Boyer S Cross KP, Custer L, Ford KA, Van Gompel
(2013) Chemoinformatics and beyond. In: J, Harvey J, Honma M, Jolly R, Joossens E,
Chemoinformatics for drug discovery. Wiley, Kemper RA, Kenyon M, Kruhlak N, Kuhnke
Hoboken, pp 267–290 L, Leavitt P, Naven R, Neilan C, Quigley DP,
9. Luker T, Alcaraz L, Chohan KK, Blomberg N, Shuey D, Spirkl HP, Stavitskaya L, Teasdale
Brown D, Butlin R, Elebring T, Griffin MA, A, White A, Wichard J, Zwickl C, Myatt GJ
Guile S, St-Gallay S, Swahn B-M, Swallow (2016) Extending (Q)SARs to incorporate
S, Waring M, Wenlock M, Leeson P (2011) proprietary knowledge for regulatory pur-
Strategies to improve in vivo toxicology out- poses: a case study using aromatic amine muta-
comes for basic candidate drug. Molecules genicity. Regul Toxicol Pharmacol 77:1–12
21:5673–5682 19. Kalgutkar AS, Dalvie D, Obach RS, Smith DA
10. Hughes JD, Blagg J, Price DA, Bailey S, (2012) Pathways of reactive metabolite forma-
Decrescenzo GA, Devraj RV, Ellsworth E, tion with toxicophores/-structural alerts. In:
Fobian YM, Gibbs ME, Gilles RW, Greene N, Reactive drug metabolites. Wiley-VCH Verlag
Huang E, Krieger-Burke T, Loesel J, Wager T, GmbH & Co, KGaA, pp 93–129
Whiteley L, Zhang Y (2008) Physiochemical 20. Stepan AF, Walker DP, Bauman J, Price DA,
drug properties associated with in vivo toxi- Baillie TA, Kalgutkar AS, Aleo MD (2011)
cological outcomes. Bioorg Med Chem Lett Structural alert/reactive metabolite concept as
18:4872–4875 applied in medicinal chemistry to mitigate the
11. Muthas D, Boyer S, Hasselgren C (2013) risk of idiosyncratic drug toxicity: a perspec-
A critical assessment of modeling safety- tive based on the critical examination of trends
related drug attrition. Med Chem Commun in the top 200 drugs marketed in the United
4:1058–1065 States. Chem Res Toxicol 24:1345–1410
244 Catrin Hasselgren and Glenn J. Myatt
21. Blagg J, Abraham DJ (2003) Structural alerts kinase inhibitors. Acta Oncol (Stockholm,
for toxicity. In: Burger's medicinal chemistry Sweden) 48(7):964–970
and drug discovery. Wiley, Hoboken 33. Johnson M, Lajiness M, Maggiora G (1989)
22. Patlewicz G, Aptula AO, Roberts DW, Uriarte Molecular similarity: a basis for designing
E (2008) A minireview of available skin sensiti- drug screening programs. Prog Clin Biol Res
zation (Q)SARs/expert systems. QSAR Comb 291:167–171
Sci 27:60–76 34. Remez N, Garcia-Serna R, Vidal D, Mestres
23. Verheyen GR, Braeken E, Van Deun K, Van J (2016) The in vitro pharmacological pro-
Miert S (2017) Evaluation of in silico tools file of drugs as a proxy indicator of potential
to predict the skin sensitization potential of in vivo organ toxicities. Chem Res Toxicol
chemicals. SAR QSAR Environ Res 28:59–73 29:637–648
24. Bowes J, Brown AJ, Hamon J, Jarolimek W, 35. Schmidt F, Matter H, Hessler G, Czich A
Sridhar A, Waldron G, Whitebread S (2012) (2014) Predictive in silico off-target profil-
Reducing safety-related drug attrition: the use ing in drug discovery. Future Med Chem
of in vitro pharmacological profiling. Nat Rev 6:295–317
Drug Discov 11:909–922 36. Muthas D, Boyer S (2013) Exploiting pharma-
25. Whitebread S, Dumotier B, Armstrong D, cological similarity to identify safety concerns –
Fekete A, Chen S, Hartmann A, Muller PY, listen to what the data tells you. Mol Inform
Urban L (2016) Secondary pharmacology: 32:37–45
screening and interpretation of off-target activ- 37. Mulliner D, Schmidt F, Stolte M, Spirkl HP,
ities - focus on translation. Drug Discov Today Czich A, Amberg A (2016) Computational
21:1232–1242 models for human and animal hepatotoxic-
26. Zhou Z, Gong Q, Epstein ML, January ity with a global application scope. Chem Res
CT (1998) HERG channel dysfunction in Toxicol 29:757–767
human long QT syndrome. Intracellular 38. Cross KP, Hasselgren C, Myatt GJ (2017)
transport and functional defects. J Biol Chem Integrated in silico methods for predicting
273:21061–21066 human hepatotoxicity. Poster presented at the
27. Recanatini M, Poluzzi E, Masetti M, Cavalli society of toxicology, Baltimore, USA
A, De Ponti F (2005) QT prolongation 39. ICH (2014) M7 assessment and control of
through hERG K(+) channel blockade: current DNA reactive (mutagenic) impurities in phar-
knowledge and strategies for the early predic- maceuticals to limit potential carcinogenic
tion during drug development. Med Res Rev risk. http://www.ich.org/fileadmin/Public_
25:133–166 Web_Site/ICH_Pr oducts/Guidelines/
28. Viskin S (1999) Long QT syndromes and tor- Multidisciplinary/M7/M7_R1_Addendum_
sade de pointes. Lancet (London, England) Step_4_31Mar2017.pdf
354:1625–1633 40. ICH (2015) Addendum to ICH m7: assess-
29. Olaharski AJ, Gonzaludo N, Bitter H, ment and control of DNA reactive (mutagenic)
Goldstein D, Kirchner S, Uppal H, Kolaja K impurities in pharmaceuticals to limit poten-
(2009) Identification of a kinase profile that tial carcinogenic risk. http://www.ich.org/
predicts chromosome damage induced by small fileadmin/Public_Web_Site/ICH_Products/
molecule kinase inhibitors. PLoS Comput Biol Guidelines/Multidisciplinary/M7/M7_R1_
5:e1000446 Addendum_Step_4_31Mar2017.pdf
30. Kirchner S (2012) Kinases as antitargets in geno- 41. OECD (2017) OECD guidelines for the test-
toxicity. In: Polypharmacology in drug discovery. ing of chemicals. http://www.oecd.org/
John Wiley & Sons, Inc., Hoboken, pp 63–81 chemicalsafety/testing/oecdguidelinesforth-
31. Force T, Kolaja KL (2011) Cardiotoxicity of etestingofchemicals.htm
kinase inhibitors: the prediction and translation 42. Bower DC, Cross KP, Hasselgren C, Miller S,
of preclinical models to clinical outcomes. Nat Myatt GJ, Quigley PD (2017) In silico toxicol-
Rev Drug Discov 10:111–126 ogy protocols and software platforms. Poster
32. Orphanos GS, Ioannidis GN, Ardavanis AG presented at the EuroTox, Bratislava, Slovakia
(2009) Cardiotoxicity induced by tyrosine 43. http://www.etoxproject.eu/ (2017)
Chapter 12
Abstract
With a view to introducing the concept of pharmacological space and its potential applications in investi-
gating and predicting the toxic mechanisms of xenobiotics, this opening chapter describes the logical rela-
tions between conformational behavior, physicochemical properties and binding spaces, which are seen as
the three key elements composing the pharmacological space. While the concept of conformational space
is routinely used to encode molecular flexibility, the concepts of property spaces and, particularly, of bind-
ing spaces are more innovative. Indeed, their descriptors can find fruitful applications (a) in describing the
dynamic adaptability a given ligand experiences when inserted into a specific environment, and (b) in
parameterizing the flexibility a ligand retains when bound to a biological target. Overall, these descriptors
can conveniently account for the often disregarded entropic factors and as such they prove successful when
inserted in ligand- or structure-based predictive models. Notably, and although binding space parameters
can clearly be derived from MD simulations, the chapter will illustrate how docking calculations, despite
their static nature, are able to evaluate ligand’s flexibility by analyzing several poses for each ligand. Such
an approach, which represents the founding core of the binding space concept, can find various applica-
tions in which the related descriptors show an impressive enhancing effect on the statistical performances
of the resulting predictive models.
Key words Conformational space, Property space, Binding space, Ensemble docking simulations,
Ligand mobility, Entropic factors
Orazio Nicolotti (ed.), Computational Toxicology: Methods and Protocols, Methods in Molecular Biology, vol. 1800,
https://doi.org/10.1007/978-1-4939-7899-1_12, © Springer Science+Business Media, LLC, part of Springer Nature 2018
245
246 Giulio Vistoli et al.
Table 1
Main subdivisions of the pharmacological space
Fig. 2 A schematic representation of the ADME events, extending from administration and absorption to distri-
bution and storage, to wanted and unwanted effects, and ending with elimination, namely metabolism (chemi-
cal elimination) and/or excretion (physical elimination). (Modified from ref. 4)
Fig. 3 (a) Molecular properties of nicotine taken as an example, showing the structural/geometric features
of this xenobiotic. Note the conformational behavior of the compound around its single relevant rotor. The
conformation of lowest energy (M2) is a staggered form; the conformations of highest energy are the
eclipsed ones (M1) and (M3). (b) The distribution in space of the electrostatic field and the lipophilicity
potential are shown here for the M2-conformer (see a). These potentials necessarily vary in space with the
conformation considered
Approaching Pharmacological Space: Events and Components 249
Table 2
The TOX space of drugs and other xenobiotics
Fig. 4 A schematic representation of the logical relations between conformational, property and binding
spaces. In detail, the relations between conformational and property spaces encode the molecular sensitivity
which in fact can be similarly described also between conformational and property spaces. The relations
between the free and bound ligand spaces define the flexibility and property constraints a ligand experiences
upon binding and the relations between these constraints parameterize the ligand plasticity which can be seen
as an index of the residual mobility a ligand retains when bound to its target
252 Giulio Vistoli et al.
Table 3
Average values (± standard deviations) for selected conformational and property spaces, as computed
by Monte Carlo searches in vacuo as well as by docking simulations in the binding cavity of the human
5HT1a receptors in its inactive and active states (rmsd and mobility are expressed in Å, surface values
(means and ranges) are expressed in Å2, while all constraint values are dimensionless)
Fig. 5 A schematic representation of the different cases in which the binding space concept can be fruitfully
exploited. The cases differ for the number of ligand states (cases a and b), protein states (cases c and d) or
generated poses (case e) considered for each ligand. The last case corresponds to the various combinations
of the previous cases. Adapted from ref. 31 with permission of ACS, 2017
Approaching Pharmacological Space: Events and Components 261
(a) Ligand structural diversity (Subheading 4.2, i.e., case b); (b)
protein flexibility (Subheading 4.3, i.e., case c); and (c) multiple
binding modes (Subheading 4.4, i.e., cases e & f).
4.2 Molecular A selection of the optimal ligand’s input structure plays a critical
Diversity and Binding role in determining the reliability of a docking simulation [13, 46].
Space of Ligands On the one hand, the choice of the starting ligand conformation
has a marginal role since ligand flexibility is directly considered by
most docking programs. Hence, the use of the lowest energy con-
formation as derived by canonical conformational searches is a
commonly adopted choice [47]. On the other hand, the selection
of the correct ligand state becomes vastly more critical especially
for ligands endowed with complex tautomeric and/or ionization
equilibria. In these cases, selection should fall on the ligand form,
which optimizes the interaction with the protein target while being
compatible with the physiological conditions [48].
Even though the effect of ionization equilibria on ligand affin-
ity can very often be accounted for by simply focusing attention on
the predominant forms at physiological conditions, [49] there are
relevant cases in which the simulated protein target seems to
preferentially interact with specific ionization states regardless of
the pKa values of the involved ligands [50]. Here, the comparison
of the docking results obtained by simulating different ionization
states as well as the calculation of the corresponding score averages
can allow the specific role of each state to be revealed. For example,
comparative docking studies on human carboxylesterases (CES1
and CES2) showed that the best predictive equations were devel-
oped by considering all simulated substrates in their neutral form,
while including the scores of the ionized species or the correspond-
ing score averages worsened the resulting correlative models [51,
52]. This finding, which is in line with the well-known preference
of metabolizing enzymes for lipophilic substrates, emphasizes that
the protein binding site can create specific microenvironments able
to markedly influence the ionization equilibria of the bound ligand
thus selecting specific ionization states [53]. Such a phenomenon
surely becomes more relevant when the simulated ligands have pKa
values very close to the physiological pH. In such cases, the various
ionization states are similarly populated and one might suppose
they are synergistically involved in the observed affinity [54].
Thus, a benchmarking docking study performed by Park and
coworkers [55] compared the performances of different protocols
amenable to estimating the role of distinct ionization states in
docking simulations. In detail, the study analyzed the predictive
power of the correlations as developed by using (a) only the pre-
dominant ionization state; (b) the ionization state affording the
best score and/or the best pose compared to the experimental one
(when available); (c) a weighted ensemble of all possible states.
Even though the results are influenced by the simulated proteins,
262 Giulio Vistoli et al.
Fig. 6 Application of the binding space concept for accounting for the synergistic role of multiple tautomers as
exemplified by the interactions between anthocyanins and sGLT1
264 Giulio Vistoli et al.
4.3 Protein Flexibility As mentioned above, protein flexibility is rarely accounted for by
and Binding Space docking analyses and thus the selection of the optimal protein con-
formation represents a crucial step in all docking simulations [39].
As a preamble, it should be noted that considering protein
flexibility in docking simulations can result in two different sce-
narios. On one hand, docking calculations can involve ensembles
of protein structures, which slightly differ in the fine architecture
of their binding cavity [64]. These structural ensembles can include
diverse resolved protein structures or can be generated by various
MD/MM simulations [65]. These ensembles account for the lim-
ited conformational changes the binding site experiences when
interacting with ligands of different shape and size. In other words,
they allow the ligand-induced adaptability of the receptor to be
simulated a priori [66]. Ensemble docking simulations are substan-
tially performed by selecting the single protein structure which
affords the best complex for each docked ligand. The target selec-
tion can involve exhaustive and combinatorial docking simulations
or can be based on the similarity between simulated and cocrystal-
lized ligands, but simulations rarely combine the docking results
from more than one protein structure for each ligand [see for
example ref. 67] An exception is represented by a recent bench-
marking analysis which proposed to consider multiple protein
structures and to generate multiple ligand poses which are clus-
tered and scored based on the Ruvinksy Colony Entropy approach
[68]. However, such a method did not reveal increased
Approaching Pharmacological Space: Events and Components 265
Fig. 7 Application of the binding space concept for accounting for the different protein states as exemplified
by the interactions between a set of dioxane-based serotoninergic agonists and the human 5HT1a receptor in
its active and inactive conformations
specific contribution of each state and how a ligand adapts its con-
tacts during receptor activation.
4.4 Ligand Mobility The capacity of a ligand to assume multiple poses within a binding
as Described site is finding countless experimental confirmations. Such a phe-
by Binding Space nomenon is usually studied by various MD simulations which allow
the ligand mobility within a binding cavity to be dynamically mon-
itored [see for example ref. 76]. In contrast and although the
ligands are routinely considered as flexible, docking simulations are
considered to be unsuitable to investigate ligand mobility due to
their static nature and the fact that protein structures are kept
fixed. Even though the contribution of protein flexibility cannot be
clearly accounted for by docking simulations [77], one may ask
whether the solely ligand flexibility can have a role in describing
ligand mobility. As mentioned above, the basic idea of the binding
space concept is that ligand mobility can be simulated by taking
into consideration all generated poses instead of focusing attention
on the best pose only and the scores of these poses can be analyzed
by calculating the discussed binding space descriptors. Such an
approach was preliminarily tested on a comparative docking study
focused on BChE substrates and involving the same five resolved
protein structures already seen in the previously reported study
[31]. Figure 8 reports the statistics of the thus developed equa-
Fig. 8 Effect of the binding space parameters (i.e., mean scores, score ranges, and sensitivities) on the reli-
ability of the predictive models as parameterized by the r2 values by considering minimized or nonminimized
complexes for the five utilized BChE structures plus the overall parameters. Adapted from ref. 31 with
permission of ACS, 2017
Approaching Pharmacological Space: Events and Components 269
Table 4
Key structural features of the simulated BChE structures plus their corresponding binding space
performances (as encoded by the r2 of the best model). Hydrophobic score and volume of the pockets
were calculated by FPocket [80], while the protein parameters were taken from PDB
5 Conclusion
References
1. Testa B (1987) Pharmacokinetic and pharma- 17. Hatfield MP, Lovas S (2014) Conformational
codynamic events: can they always be distin- sampling techniques. Curr Pharm Des
guished? Trends Pharmacol Sci 8:381–383 20:3303–3313
2. Testa B, Krämer SD (2009) The biochemistry 18. Zheng Y, Tice CM, Singh SB (2017)
of drug metabolism – an introduction. Part 5: Conformational control in structure- based
metabolism and bioactivity. Chem Biodivers drug design. Bioorg Med Chem Lett
6:591–684 27:2825–2837
3. Testa B (2009) Drug metabolism for the per- 19. Vistoli G, Pedretti A, Testa B (2008) Assessing
plexed medicinal chemist. Chem Biodivers drug-likeness – what are we missing? Drug
6:2055–2070 Discov Today 13:285–294
4. van de Waterbeemd H, Testa B (2009) 20. Ballante F, Marshall GR (2016) An automated
Introduction: the how and why of bioavailabil- strategy for binding-pose selection and docking
ity research. In: van de Waterbeemd H, Testa B assessment in structure-based drug design.
(eds) Drug bioavailability – estimation of solu- J Chem Inf Model 56:54–72
bility, permeability, absorption and bioavailabil- 21. Salmaso V, Sturlese M, Cuzzolin A, Moro S
ity, 2nd edn. Wiley-VCH, Weinheim, pp 1–6 (2017) Combining self- and cross-docking as
5. Guengerich FP (2006) Cytochrome P450s and benchmark tools: the performance of
other enzymes in drug metabolism and toxic- DockBench in the D3R grand challenge 2.
ity. AAPS J 8:E101–E111 J Comput Aided Mol Des Aug 32(1):251–264.
6. Williams DP, Naisbitt DJ (2002) Toxicophores: https://doi.org/10.1007/s10822-017-
groups and metabolic routes associated with 0051-4
increased safety risks. Curr Opin Drug Discov 22. Kabsch W (1978) A discussion of the solution
Devel 5:104–115 for the best rotation to relate two sets of vec-
7. Pirmohamed M, Park BK (2001) Genetic sus- tors. Acta Crystallogr A34:827
ceptibility to adverse drug reactions. Trends 23. Veber DF, Johnson SR, Cheng HY, Smith BR,
Pharmacol Sci 22:298–230 Ward KW, Kopple KD (2002) Molecular prop-
8. Park BK, Pirmohamed M, Kitteringham NR erties that influence the oral bioavailability of
(1998) Role of drug disposition in drug drug candidates. J Med Chem 45:2615–2623
hypersensitivity: a chemical, molecular, and 24. Jamroz M, Kolinski A, Kihara D (2016)
clinical perspective. Chem Res Toxicol Ensemble-based evaluation for protein struc-
11:969–988 ture models. Bioinformatics 32:i314–i321
9. Hofmann KL (2000) Combinatorial optimiza- 25. Testa B, Vistoli G, Pedretti A, Bojarski AJ
tion: current successes and directions for the (2009) Atomic diversity, molecular diversity,
future. J Comput Appl Math 124:341–360 and chemical diversity: the concept of chemo-
10. Eliel E, Allinger N, Angyal S, Morrison G diversity. Chem Biodivers 6:1145–1151
(2007) Conformational analysis. Wiley, 26. Vistoli G, Pedretti A, Testa B (2009) Partition
New York, p 1965 coefficient and molecular flexibility: the con-
11. Agrafiotis DK, Gibbs AC, Zhu F, Izrailev S, cept of lipophilicity space. Chem Biodivers
Martin E (2007) Conformational sampling of 6:1152–1169
bioactive molecules: a comparative study. 27. Vistoli G, De Maddis D, Straniero V, Pedretti
J Chem Inf Model 47:1067–1086 A, Pallavicini M, Valoti E, Carini M, Testa B,
12. Pagadala NS, Syed K, Tuszynski J (2017) Aldini G (2013) Exploring the space of histi-
Software for molecular docking: a review. dine containing dipeptides in search of novel
Biophys Rev 9:91–102 efficient RCS sequestering agents. Eur J Med
13. Wales DJ, Scheraga HA (1999) Global optimi- Chem 66:153–160
zation of clusters. Cryst Biomol Sci 28. Vistoli G, Colzani M, Mazzolari A, Maddis
285:1368–1372 DD, Grazioso G, Pedretti A, Carini M, Aldini
14. Guedes IA, de Magalhães CS, Dardenne LE G (2016) Computational approaches in the
(2014) Receptor-ligand molecular docking. rational design of improved carbonyl quench-
Biophys Rev 6:75–87 ers: focus on histidine containing dipeptides.
Future Med Chem 8:1721–1737
15. Vajda S, Hall DR, Kozakov D (2013) Sampling
and scoring: a marriage made in heaven. 29. Vistoli G, Straniero V, Pedretti A, Fumagalli L,
Proteins 81:1874–1884 Bolchi C, Pallavicini M, Valoti E, Testa B
(2012) Predicting the physicochemical profile
16. Mitsutake A, Mori Y, Okamoto Y (2013) of diastereoisomeric histidine-containing
Enhanced sampling algorithms. Methods Mol dipeptides by property space analysis. Chirality
Biol 924:153–195 24:566–576
Approaching Pharmacological Space: Events and Components 273
30. Vistoli G, Pedretti A, Villa L, Testa B (2005) 41. Gaillard P, Carrupt PA, Testa B, Boudon A
Range and sensitivity as descriptors of molecu- (1994) Molecular lipophilicity potential, a tool
lar property spaces in dynamic QSAR analyses. in 3D QSAR: method and applications.
J Med Chem 48:4947–4952 J Comput Aided Mol Des 8:83–96
31. Vistoli G, Mazzolari A, Testa B, Pedretti A 42. Weill N, Therrien E, Campagna-Slater V,
(2017) Binding space concept: a new approach Moitessier N (2014) Methods for docking
to enhance the reliability of docking scores and small molecules to macromolecules: a user’s
its application to predicting butyrylcholinester- perspective. 1. The theory. Curr Pharm Des
ase hydrolytic activity. J Chem Inf Model 20:3338–3359
57:1691–1702 43. Campagna-Slater V, Therrien E, Weill N,
32. Vistoli G, Pedretti A, Villa L, Testa B (2005) Moitessier N (2014) Methods for docking
Solvent constraints on the property space of small molecules to macromolecules: a user’s
acetylcholine. I. Isotropic solvents. J Med perspective. 2. Applications. Curr Pharm Des
Chem 48:1759–1767 20:3360–3372
33. Vistoli G, Pedretti A, Villa L, Testa B (2005) 44. Moitessier N, Englebienne P, Lee D, Lawandi
Solvent constraints on the property space of J, Corbeil CR (2008) Towards the develop-
acetylcholine. 2. Ordered media. J Med Chem ment of universal, fast and highly accurate
48:6926–6935 docking/scoring methods: a long way to go.
34. McAuley M, Timson DJ (2017) Modulating Br J Pharmacol 153(Suppl 1):S7–S26
mobility: a paradigm for protein engineering? 45. Gao YD, Hu Y, Crespo A, Wang D, Armacost
Appl Biochem Biotechnol 181:83–90 KA, Fells JI, Fradera X, Wang H, Wang H,
35. Alterio V, Di Fiore A, D’Ambrosio K, Supuran Sherborne B, Verras A, Peng Z (2018)
CT, De Simone G (2012) Multiple binding Workflows and performances in the ranking
modes of inhibitors to carbonic anhydrases: prediction of 2016 D3R grand challenge 2: les-
how to design specific drugs targeting 15 dif- sons learned from a collaborative effort.
ferent isoforms? Chem Rev 112:4421–4468 J Comput Aided Mol Des 32:129–142
36. Chakraborti S, Chakravarty D, Gupta S, 46. Sastry GM, Adzhigirey M, Day T, Annabhimoju
Chatterji BP, Dhar G, Poddar A, Panda D, R, Sherman W (2013) Protein and ligand prep-
Chakrabarti P, Ghosh Dastidar S, Bhattacharyya aration: parameters, protocols, and influence
B (2012) Discrimination of ligands with differ- on virtual screening enrichments. J Comput
ent flexibilities resulting from the plasticity of Aided Mol Des 27:221–234
the binding site in tubulin. Biochemistry 47. Oda A, Yamaotsu N, Hirono S, Watanabe Y,
51:7138–7148 Fukuyoshi S, Takahashi O (2015) Effects of
37. Vistoli G, Pedretti A, Testa B, Matucci R initial settings on computational protein–ligand
(2007) The conformational and property space docking accuracies for several docking pro-
of acetylcholine bound to muscarinic receptors: grams. Mol Simul 41:10–12
an entropy component accounts for the sub- 48. ten Brink T, Exner TE (2009) Influence of
type selectivity of acetylcholine. Arch Biochem protonation, tautomeric, and stereoisomeric
Biophys 464:112–121 states on protein-ligand docking results.
38. Vistoli G, Pedretti A, Testa B (2011) J Chem Inf Model 49:1535–1546
Chemodiversity and molecular plasticity: rec- 49. Krieger E, Dunbrack RL Jr, Hooft RW, Krieger
ognition processes as explored by property B (2012) Assignment of protonation states in
spaces. Future Med Chem 3:995–1010 proteins and ligands: combining pKa p rediction
39. Wong CF (2015) Flexible receptor docking for with hydrogen bonding network optimization.
drug discovery. Expert Opin Drug Discov Methods Mol Biol 819:405–421
10:1189–1200 50. Onufriev AV, Alexov E (2013) Protonation and
40. Del Bello F, Bonifazi A, Giannella M, Giorgioni pK changes in protein-ligand binding. Q Rev
G, Piergentili A, Petrelli R, Cifani C, Micioni Di Biophys 46:181–209
Bonaventura MV, Keck TM, Mazzolari A, 51. Vistoli G, Pedretti A, Mazzolari A, Testa B
Vistoli G, Cilia A, Poggesi E, Matucci R, Quaglia (2010) In silico prediction of human carboxy-
W (2017) The replacement of the 2-methoxy lesterase-1 (hCES1) metabolism combining
substituent of N-((6,6-diphenyl-1,4-dioxan- docking analyses and MD simulations. Bioorg
2-yl)methyl)-2-(2-methoxyphenoxy)Ethan-1- Med Chem 18:320–329
amine improves the selectivity for 5-HT(1A) 52. Vistoli G, Pedretti A, Mazzolari A, Testa B
receptor over α(1)-adrenoceptor and D(2)- (2010) Homology modeling and metabolism
like receptor subtypes. Eur J Med Chem prediction of human carboxylesterase-2 using
125:233–244 docking analyses by GriDock: a parallelized
274 Giulio Vistoli et al.
tool based on AutoDock 4.0. J Comput Aided approach. J Comput Aided Mol Des
Mol Des 24:771–787 32:187–198
53. Aguilar B, Anandakrishnan R, Ruscio JZ, 68. Ruvinsky AM, Kozintsev AV (2006) Novel
Onufriev AV (2010) Statistics and physical ori- statistical-thermodynamic methods to predict
gins of pK and ionization state changes upon protein-ligand binding positions using proba-
protein-ligand binding. Biophys J 98:872–880 bility distribution functions. Proteins
54. Petukh M, Stefl S, Alexov E (2013) The role of 62:202–208
protonation states in ligand-receptor recogni- 69. Fradera X, Verras A, Hu Y, Wang D, Wang H,
tion and binding. Curr Pharm Des Fells JI, Armacost KA, Crespo A, Sherborne B,
19:4182–4190 Wang H, Peng Z, Gao YD (2018) Performance
55. Park MS, Gao C, Stern HA (2011) Estimating of multiple docking and refinement methods in
binding affinities by docking/scoring methods the pose prediction D3R prospective grand
using variable protonation states. Proteins challenge 2016. J Comput Aided Mol Des
79:304–314 32:113–127
56. Sayle RA (2010) So you think you understand 70. Hoeppner A, Schmitt L, SHJ S (2013) Proteins
tautomerism? J Comput Aided Mol Des and their ligands: their importance and how to
24:485–496 crystallize them. In: Ferreira SO (ed) Advanced
57. Katritzky AR, Hall CD, El-Gendy B-D, topics on crystal growth. Rijeka, InTech
Draghici B (2010) Tautomerism in drug dis- 71. Manglik A, Kruse AC (2017) Structural basis
covery. J Comput Aided Mol Des 24:475–484 for G protein-coupled receptor activation.
58. Martin YC (2009) Let's not forget tautomers. Biochemistry 56:5628–5634
J Comput Aided Mol Des 23:693–704 72. Tehan BG, Bortolato A, Blaney FE, Weir MP,
59. Milletti F, Vulpetti A (2010) Tautomer prefer- Mason JS (2014) Unifying family a GPCR
ence in PDB complexes and its impact on theories of activation. Pharmacol Ther
structure-based drug discovery. J Chem Inf 143:51–60
Model 50:1062–1074 73. Lu M, Wu B (2016) Structural studies of G
60. Baron G, Altomare A, Regazzoni L, Redaelli V, protein-coupled receptors. IUBMB Life
Grandi S, Riva A, Morazzoni P, Mazzolari A, 68:894–903
Carini M, Vistoli G, Aldini G (2017) 74. Sengupta D, Joshi M, Athale CA,
Pharmacokinetic profile of bilberry anthocya- Chattopadhyay A (2016) What can simulations
nins in rats and the role of glucose transporters: tell us about GPCRs: integrating the scales.
LC-MS/MS and computational studies. Methods Cell Biol 132:429–452
J Pharm Biomed Anal 144:112–121 75. Rodríguez D, Gao ZG, Moss SM, Jacobson
61. Wright EM, Ghezzi C, Loo DDF (2017) KA, Carlsson J (2015) Molecular docking
Novel and unexpected functions of SGLTs. screening using agonist-bound GPCR struc-
Physiology (Bethesda) 32:435–443 tures: probing the A2A adenosine receptor.
62. Yan N (2017) A glimpse of membrane trans- J Chem Inf Model 55:550–563
port through structures-advances in the struc- 76. Anselmi M, Pisabarro MT (2015) Exploring
tural biology of the GLUT glucose transporters. multiple binding modes using confined replica
J Mol Biol 429:2710–2725 exchange molecular dynamics. J Chem Theory
63. Smeriglio A, Barreca D, Bellocco E, Trombetta Comput 11:3906–3918
D (2016) Chemistry, pharmacology and health 77. Buonfiglio R, Recanatini M, Masetti M (2015)
benefits of Anthocyanins. Phytother Res Protein flexibility in drug discovery: from the-
30:1265–1286 ory to computation. ChemMedChem
64. Okamoto Y, Kokubo H, Tanaka T (2013) 10:1141–1148
Ligand docking simulations by generalized- 78. Brünger AT (1992) Free R value: a novel statis-
ensemble algorithms. Adv Protein Chem Struct tical quantity for assessing the accuracy of crys-
Biol 92:63–91 tal structures. Nature 355:472–475
65. Hospital A, Goñi JR, Orozco M, Gelpí JL (2015) 79. Nicolotti O, Giangreco I, Miscioscia TF,
Molecular dynamics simulations: advances and Carotti A (2009) Improving quantitative struc-
applications. Adv Appl Bioinform Chem 8:37–47 ture-activity relationships through multiobjec-
66. Lin JH (2011) Accommodating protein flexi- tive optimization. J Chem Inf Model
bility for structure-based drug design. Curr 49:2290–2302
Top Med Chem 11:171–178 80. Le Guilloux V, Schmidtke P, Tuffery P (2009)
67. Lam PC, Abagyan R, Totrov M (2018) Ligand- Fpocket: an open source platform for ligand
biased ensemble receptor docking (LigBEnD): pocket detection. BMC Bioinformatics
a hybrid ligand/receptor structure- based 10:168
Chapter 13
Abstract
The discovery of molecular toxicity in a clinical drug candidate can have a significant impact on both the
cost and timeline of the drug discovery process. Early identification of potentially toxic compounds during
screening library preparation or, alternatively, during the hit validation process, is critical to ensure that
valuable time and resources are not spent pursuing compounds that may possess a high propensity for
human toxicity. This chapter focuses on the application of computational molecular filters, applied either
prescreening or postscreening, to identify and remove known reactive and/or potentially toxic compounds
from consideration in drug discovery campaigns.
Key words Molecular toxicity, Computational filter, High-throughput screening, Virtual screening,
Library design, Drug discovery
1 Introduction
Orazio Nicolotti (ed.), Computational Toxicology: Methods and Protocols, Methods in Molecular Biology, vol. 1800,
https://doi.org/10.1007/978-1-4939-7899-1_13, © Springer Science+Business Media, LLC, part of Springer Nature 2018
275
276 Kirk E. Hevener
2 Applications
2.2 Tools A wide variety of tools have been developed for the prediction of
for Prediction molecular toxicity, including predictive software, servers, and data-
of Reactivity/Toxicity bases. A representative selection is discussed here, with a more
complete listing, including system classification and web address
given in Table 1. One example is the eTOX project. Sponsored by
the European Innovative Medicines Initiative (IMI), the eTOX
project collected toxicological data from pharmaceutical industry
and academia into an online database that has been used to create
a series of toxicity prediction models, including the online toxicity
prediction system, eTOXsys [30]. Key features of this system are
the ability to query the database for toxicities related to the target
rather than the drug and chemical substructures showing a high
correlation with specific toxicities.
Several companies offer model-based toxicity prediction ser-
vices, including Leadscope’s® Toxicity Database comprising over
180,000 compounds and over 400,000 toxicity study results.
Leadscope® models include hepatobiliary, cardiological, and uri-
nary effects as well as developmental toxicity, genetic toxicity, and
neurotoxicity. Derek Nexus (Lhasa Limited) is another member-
based (proprietary) service that offers a rapid toxicity assessment
for compounds submitted.
ToxAlerts is an open-source, web-based server integrated with
the Online Chemical Modeling Environment (OCHEM) that col-
lects and stores toxicological data collected from existing literature
or submitted by users [20]. Structural alerts in the form of
SMARTS patterns are generated and can be used to screen indi-
vidual compounds for toxicity prediction or to prefilter libraries
during screening library design. Structure alerts currently exist for
endpoints including mutagenicity, carcinogenicity, skin sensitiza-
tion, and compounds that can form reactive metabolites.
Lastly, the US National Library of Medicine’s Toxicology Data
Network (TOXNET) is a publicly available resource that allows
users to screen specific compounds against a large variety of toxi-
cology databases (HSDB, CCRIS, GENETOX, etc.) and literature
references (TOXLINE, DART) [31, 32]. More of a data mining
than predictive tool, the TOXNET databases are useful for assess-
ing hit compounds from virtual or experimental screening during
the hit validation process to assess a compound for further advance-
ment. A note of caution is merited here: the absence of data for a
submitted compound does not necessarily mean there is no risk for
toxicity, rather the absence of a literature or a database report.
Table 1
Selected software and web-based tools for prediction of chemical reactivity/toxicity
2.3 Structure Alerts The use of structure alerts for the identification of potentially
for Reactive and/or reactive or toxic compounds is regularly employed in both prefilter-
Toxic Functional ing of chemical libraries during the library design stage and postfil-
Groups tering of screening compounds during the hit validation stage
[33–36]. There are advantages and disadvantages to both prefilter-
ing and postfiltering strategies. For the former, the unilateral
removal of compounds containing known reactive or toxic func-
tional groups may result in the loss of a valid hit compound which
might be modified during the optimization process to remove the
offending chemical moiety. For the latter, there is an increased per-
sonnel and infrastructure expense related to the cost of screening
larger libraries that have not been prefiltered, as well as a potential
high cost of late-stage failure of a clinical candidate. One possible
solution is the use of customizable threshold cutoffs, an available
option for most structure alert algorithms, based on the number of
occurrences of a reactive substructure (e.g. no more than two nitro
groups). Alternatively, toxicophore filtering may be cautiously
employed postscreening to “flag” compounds identified with
potentially problematic groups that may warrant close attention.
Lastly, it is possible to employ a prescreen filter for the elimination
of particularly high-risk compounds, coupled with a postscreen filter
to flag lower-risk compounds, such as PAINS compounds or com-
pounds with less reactive functional groups occasionally still seen in
approved drugs (e.g., aniline and nitro groups).
In most cases, structure alert algorithms employ the use of
SMARTS (SMILES Arbitrary Target Specification) patterns to
identify predetermined chemical patterns in compounds and
chemical libraries. SMARTS, developed by Daylight Chemical
Information Systems, Inc., is a SMILES-based 2D line notation
that allows for the incorporation of variability, wildcards, atomic
properties, and connectivity in the search [37]. Using SMARTS,
atoms can be represented by atomic number, capital or lowercase
letters. As an example, carbon can be represented as C (aliphatic
carbon atom), c (aromatic carbon atom), or [#6] (any carbon
atom). Wildcard values can be included in SMARTS patterns to
represent * (any atom), a (aromatic atom), A (aliphatic atom), R
(ring membership), r (ring size), X (connectivity), charge, chiral-
ity, valence, mass, and several others. Values for atoms can be
coupled together for greater specificity using brackets and semico-
lons, as with [C;X2] (aliphatic carbon with two total bonds,
including implicit hydrogens). Variability at atomic positions can
be specified using brackets and commas, as with [O,N,S;X2;r3]
(oxygen, nitrogen, or sulfur with two total bonds in a three-mem-
bered ring system). Various symbols are used to represent atomic
bonds making connections between atoms, including - (single
bond), = (double bond), # (triple bond), : (aromatic bond), ~ (any
bond), @ (any ring bond), and others. A missing bond symbol is
interpreted as “single or aromatic,” which can be used to prevent
Computational Toxicology Methods in Chemical Library Design and High-Throughput… 281
Table 2
Commonly filtered reactive groups and their SMARTS patterns
Table 2
(continued)
4 Conclusions
Acknowledgments
References
1. Waring MJ, Arrowsmith J, Leach AR, Leeson 6. Bruns RF, Watson IA (2012) Rules for identi-
PD, Mandrell S, Owen RM, Pairaudeau G, fying potentially reactive or promiscuous com-
Pennie WD, Pickett SD, Wang J, Wallace O, pounds. J Med Chem 55:9763–9772
Weir A (2015) An analysis of the attrition of 7. Goh GB, Hodas NO, Vishnu A (2017) Deep
drug candidates from four major pharmaceuti- learning for computational chemistry.
cal companies. Nat Rev Drug Discov J Comput Chem 38:1291–1307
14:475–486 8. Gawehn E, Hiss JA, Schneider G (2016) Deep
2. Hughes JD, Blagg J, Price DA, Bailey S, learning in drug discovery. Mol Inform
Decrescenzo GA, Devraj RV, Ellsworth E, 35:3–14
Fobian YM, Gibbs ME, Gilles RW, Greene N, 9. van de Waterbeemd H, Gifford E (2003)
Huang E, Krieger-Burke T, Loesel J, Wager T, ADMET in silico modelling: towards predic-
Whiteley L, Zhang Y (2008) Physiochemical tion paradise? Nat Rev Drug Discov
drug properties associated with in vivo toxico- 2:192–204
logical outcomes. Bioorg Med Chem Lett
18:4872–4875 10. Bugrim A, Nikolskaya T, Nikolsky Y (2004)
Early prediction of drug metabolism and toxic-
3. Price DA, Blagg J, Jones L, Greene N, Wager ity: systems biology approach and modeling.
T (2009) Physicochemical drug properties Drug Discov Today 9:127–135
associated with in vivo toxicological outcomes:
a review. Expert Opin Drug Metab Toxicol 11. Segall MD, Barber C (2014) Addressing toxic-
5:921–931 ity risk when designing and selecting com-
pounds in early drug discovery. Drug Discov
4. Barratt MD (2000) Prediction of toxicity from Today 19:688–693
chemical structure. Cell Biol Toxicol 16:1–13
12. Gertrudes JC, Maltarollo VG, Silva RA,
5. Rishton GM (1997) Reactive compounds and Oliveira PR, Honorio KM, da Silva AB (2012)
in vitro false positives in HTS. Drug Discov Machine learning techniques and drug design.
Today 2:382–384 Curr Med Chem 19:4289–4297
Computational Toxicology Methods in Chemical Library Design and High-Throughput… 285
13. Moroy G, Martiny VY, Vayer P, Villoutreix 27. Lagorce D, Sperandio O, Baell JB, Miteva MA,
BO, Miteva MA (2012) Toward in silico Villoutreix BO (2015) FAF-Drugs3: a web
structure- based ADMET prediction in drug server for compound property calculation and
discovery. Drug Discov Today 17:44–55 chemical library design. Nucleic Acids Res
14. Gini G (2016) QSAR methods. Methods Mol 43(W1):W200–W207
Biol 1425:1–20 28. Sterling T, Irwin JJ (2015) ZINC 15 – ligand
15. Singh PK, Negi A, Gupta PK, Chauhan M, discovery for everyone. J Chem Inf Model
Kumar R (2016) Toxicophore exploration as a 55(11):2324–2337
screening technology for drug design and dis- 29. Abreu RM, Froufe HJ, Daniel PO, Queiroz
covery: techniques, scope and limitations. Arch MJ, Ferreira IC (2011) ChemT, an open-
Toxicol 90:1785–1802 source software for building template-based
16. Rishton GM (2003) Nonleadlikeness and lead- chemical libraries. SAR QSAR Environ Res
likeness in biochemical screening. Drug Discov 22:603–610
Today 8:86–96 30. Sanz F, Carrio P, Lopez O, Capoferri L, Kooi
17. Pearce BC, Sofia MJ, Good AC, Drexler DM, DP, Vermeulen NP, Geerke DP, Montanari F,
Stock DA (2006) An empirical process for the Ecker GF, Schwab CH, Kleinoder T, Magdziarz
design of high-throughput screening deck fil- T, Pastor M (2015) Integrative modeling strat-
ters. J Chem Inf Model 46:1060–1068 egies for predicting drug toxicities at the eTOX
18. Walters WP, Ajay, Murcko MA (1999) project. Mol Inform 34:477–484
Recognizing molecules with drug-like proper- 31. Fowler S, Schnall JG (2014) TOXNET: infor-
ties. Curr Opin Chem Biol 3:384–387 mation on toxicology and environmental
19. Cumming JG, Davis AM, Muresan S, health. Am J Nurs 114:61–63
Haeberlein M, Chen H (2013) Chemical pre- 32. Wexler P (2001) TOXNET: an evolving web
dictive modelling to improve compound qual- resource for toxicology and environmental
ity. Nat Rev Drug Discov 12:948–962 health information. Toxicology 157:3–10
20. Sushko I, Salmina E, Potemkin VA, Poda G, 33. Zhu T, Cao S, Su PC, Patel R, Shah D, Chokshi
Tetko IV (2012) ToxAlerts: a Web server of HB, Szukala R, Johnson ME, Hevener KE
structural alerts for toxic chemicals and com- (2013) Hit identification and optimization in
pounds with potential adverse reactions. virtual screening: practical recommendations
J Chem Inf Model 52:2310–2316 based on a critical literature analysis. J Med
21. Lipinski CA (2000) Drug-like properties and Chem 56:6560–6572
the causes of poor solubility and poor permea- 34. Blagg J (2010) Structural alerts for toxicity. In:
bility. J Pharmacol Toxicol Methods Abraham DJ, Rotella DP (eds) Burger’s medic-
44:235–249 inal chemistry and drug discovery, 7th edn.
22. Veber DF, Johnson SR, Cheng HY, Smith BR, Wiley, Hoboken, pp 301–334
Ward KW, Kopple KD (2002) Molecular prop- 35. Smith GF (2011) Designing drugs to avoid
erties that influence the oral bioavailability of toxicity. Prog Med Chem 50:1–47
drug candidates. J Med Chem 45:2615–2623 36. Kazius J, McGuire R, Bursi R (2005)
23. Teague SJ, Davis AM, Leeson PD, Oprea T Derivation and validation of toxicophores for
(1999) The design of leadlike combinatorial mutagenicity prediction. J Med Chem
libraries. Angew Chem Int Ed Engl 48:312–320
38:3743–3748 37. SMARTS – a language for describing molecu-
24. Baell JB, Holloway GA (2010) New substruc- lar patterns. Daylight Chemical Information
ture filters for removal of pan assay interference Systems, Inc. http://www.daylight.com/
compounds (PAINS) from screening libraries dayhtml/doc/theor y/theor y.smarts.html.
and for their exclusion in bioassays. J Med Accessed 20 Dec 2017
Chem 53:2719–2740 38. Walters WP, Stahl MT, Murcko MA (1998)
25. Dahlin JL, Nissink JW, Strasser JM, Francis S, Virtual screening - an overview. Drug Discov
Higgins L, Zhou H, Zhang Z, Walters MA Today 3:160–178
(2015) PAINS in the assay: chemical mecha- 39. Williams DP, Naisbitt DJ (2002) Toxicophores:
nisms of assay interference and promiscuous groups and metabolic routes associated with
enzymatic inhibition observed during a sulfhy- increased safety risk. Curr Opin Drug Discov
dryl-scavenging HTS. J Med Chem Dev 5:104–115
58:2091–2113 40. Hakimelahi GH, Khodarahmi GA (2005) The
26. Dolle RE (2011) Historical overview of chemi- identification of toxicophores for the predic-
cal library design. Methods Mol Biol tion of mutagenicity, hepatotoxicity and car-
685:3–25 diotoxicity. J Iran Chem Soc 2:244–267
Chapter 14
Abstract
In this chapter we present and discuss, with the aid of several representative case studies from drug discov-
ery and computational toxicology, a new cheminformatics platform, Enalos Suite, that was developed with
open source and freely available software. Enalos Suite (http://enalossuite.novamechanics.com/) was
designed and developed as a useful tool to address a variety of cheminformatics problems, given that it
expedites tasks performed in predictive modeling and allows access, data mining and manipulation for
multiple chemical databases (PubChem, UniChem, etc.). Enalos Suite was carefully designed to permit its
extension and adjustment to the special field of interest of each user, including, for instance, nanoinformat-
ics, biomedical, and other applications. To demonstrate the functionalities of Enalos Suite that are useful
in different cheminformatics applications, we present indicative case studies that include the exploitation
of chemical databases within a drug discovery project, the calculation of molecular descriptors, and finally
the development of a predictive QSAR model validated according to OECD principles. We aspire that at
the end of this chapter, the reader will capture the effectiveness of different functionalities included in the
Enalos Suite that could be of significant value in a multitude of cheminformatics applications.
1 Introduction
Orazio Nicolotti (ed.), Computational Toxicology: Methods and Protocols, Methods in Molecular Biology, vol. 1800,
https://doi.org/10.1007/978-1-4939-7899-1_14, © Springer Science+Business Media, LLC, part of Springer Nature 2018
287
288 Dimitra-Danai Varsou et al.
ell-
w organized database including among others, compounds
along with their activity information [2]. A systematic, reproduc-
ible, and sustainable exploitation of this chemical information can
be significantly promoted with the aid of cheminformatics in silico
tools [3].
Cheminformatics employ several strategies and methods to
deal with a range of problems, including, among others, data
mining and predictive modeling [4–6]. For a subsequent chemin-
formatics analysis, the chemical information extracted from chemi-
cal databases must often be transformed into mathematical
representations (known as molecular descriptors) that can be later
analyzed and used in data mining, modeling, virtual screening,
similarity analysis, and more [7]. Predictive modeling techniques,
including quantitative structure–activity relationship (QSAR)
modeling, can be then employed to correlate the structural prop-
erties of the compounds to their bioactivities or other properties in
an effort to discover patterns in the structure of molecules that can
help in explaining their activity profile [7–9]. These models can be
later used to obtain reliable predictions for the activities of novel
compounds [1, 7].
QSAR model building includes different steps such as data
curation, modeling, internal and external validation, and defini-
tion of the domain of their applicability, in order to afford robust
models with high predictive power according to OECD princi-
ples [8]. These steps can be performed using different software
tools or programming languages, which often require specific
programming skills.
An important challenge in cheminformatics is the employ-
ment of various computational techniques and developed models
in real drug discovery applications to eliminate the time spent
and the cost required for experimental procedures. This often
involves animal testing that can be significantly eliminated by
cheminformatics procedures [10]. Experimental researchers usu-
ally do not have a strong programming or computational back-
ground, and they are not expected to easily handle workflows and
scripts in different programming languages. Therefore, the devel-
opment of handy tools that can be used directly from the experi-
mentalists to predict properties of compounds, explore and
interpret easily the results, within just a few steps, is highly desired
[11, 12]. Enalos Suite software [13] aims to fill the gap of the
lack of user-friendly and ready-to-use tools for cheminformatics
and aspires to become a useful tool in computer-aided drug dis-
covery process, as it provides many functionalities including the
calculation of molecular descriptors, data mining and manipula-
tion from popular chemical databases like PubChem, UniChem,
SureChEMBL, IBM patents, and the employment and develop-
ment of custom-made predictive models.
Enalos Suite: New Cheminformatics Platform for Drug Discovery and Computational… 289
2 Enalos Suite
In the main tab, the sketcher [14] enables the user to easily draw
any chemical compound that is then submitted for further analysis.
Several possibilities are available in different panels like drawing dou-
ble and triple bonds, creating chains and making any stereo bonds.
Propane, butane, pentane, hexane, octane, and benzene rings are
also available for use, along with some more complex structure tem-
plates like alkaloids, amino acids, beta lactams, carbohydrates, and
steroids. The user can also select among different heteroatoms (N,
P, S, F, Cl, Br, I) that usually exist in organic chemical molecules,
enter an element or group symbol via keyboard and select new draw-
ing symbols from the periodic table. More functionalities are avail-
able including selecting, deleting, rotating, moving, and removing
whole or parts of the designed molecules. Finally, the sketcher
enables the user to open, save, and convert files with a variety of
chemical formats such as SMILES or IUPAC Chemical Identifier.
If the SMILES notation is known, the user can directly submit it
in the Import SMILES tab. However, if the SMILES notation is not
initially known, the chemical sketcher included gives the users the
opportunity to draw the chemical structure and then convert the
structure in SMILES format. This facilitates the generation of several
structures, by allowing for multiple modifications to be performed
using the sketcher and then all structures can be transferred as
SMILES to the appropriate tab allowing for the analysis for the whole
set of produced structures.
The .sdf files contain chemical structure records, used as a
standard exchange format for chemicals’ information. The
compound structure in .sdf format can be extracted from PubChem
database or other repositories and uploaded to Enalos Suite
through the corresponding field.
Enalos Suite: New Cheminformatics Platform for Drug Discovery and Computational… 291
2.1 Enalos Suite A main area of interest in the cheminformatics field is the applica-
Database Functions tion of computational methods and the development of predictive
models, in an effort to reduce the time and resources spent on
experiments and to make faster decisions in the drug discovery
framework. This in silico approach demands the availability of
accessible data that could be mined, refined, and proceeded within
a model development or virtual screening procedure [5, 6].
Enalos Suite Database functions allow for access and data
retrieval from CIR, PubChem, and UniChem. More specifically,
these functions give access to NCI, PubChem, and UniChem
chemical databases for data mining and manipulation and also con-
tain more functions using PubChem Database related to Assay,
Patent Coverage Information, Similar Compounds, and Vendor.
2.1.1 CIR Function The CIR function enables the user to get direct access to CIR
(Chemical Identifier Resolver). CIR works as a resolver for differ-
ent chemical structure identifiers and allows the conversion of a
given structure identifier into another representation or structure
identifier. Several output formats can be selected through a GUI
menu. The available identifiers are: Standard InChI, Standard
InChiKey, InChiKey Simplified, SMILES, NCI/CADD FICTS
Identifier, NCI/CADD FICuS Identifier, NCI/CADD Identifier,
CACTVS HASHISY Hashcode, IUPAC Name, CAS Registry
Number, ChemSpider ID, Molecular Weight, Chemical Formula,
Number of Hydrogen Bond Donors, Number of Hydrogen Bond
Acceptors, Lipinski Rule of five Violations, Number of Effectively
Rotatable Bonds, list of chemical names for the structure, and SD
file of the structure [2, 15].
2.1.4 Searching This case study demonstrates the process of searching Chemical
Chemical Databases: Databases such as PubChem and UniChem using the Enalos Suite
A Case Study Databases Functions as described above. The specific paradigm
deals with the chemical compound abacavir, which is a medication
used to prevent and treat HIV/AIDS.
Enalos Suite: New Cheminformatics Platform for Drug Discovery and Computational… 293
2.2 Enalos Suite The molecular descriptors play an essential part in model develop-
Molecular Descriptors ment as they convey molecular structure key data [1]. More spe-
cifically, molecular descriptors encode with numerical values the
properties and the features of the molecules that can be later linked
through modeling techniques, to the biological activities or other
properties of the molecules [7, 20]. The calculation of molecular
descriptors and their use in modeling can have direct applications
in different research fields such as health and pharmaceutical sci-
ences, toxicology, and environmental chemistry [7].
There are different methodologies and software tools that
can be applied in order to determine molecular descriptors that
employ organic and quantum chemistry, graph theory, etc. [1,
21, 22]. The National Center for Toxicological Research of FDA
designed and released the Mold2, a freely available software, that
can be employed for the calculation of 777 important molecular
descriptors that encode two-dimensional chemical structure
information, including topological, geometric, and structural
characteristics of compounds [7, 12]. Comparison of Mold2
descriptors with descriptors calculated from commercial software
on several p ublished datasets showed that Mold2 descriptors pro-
duce models with higher quality than other packages. The calcu-
lation of molecular descriptors using the Mold2 software through
Enalos Suite was made available via the Calculate Through Mold2
function [2, 7].
2.2.1 Calculation This example is designed to help the interested reader in calculat-
of Molecular Descriptors: ing the molecular descriptors of one or more chemical compounds,
A Case Study using the Calculate Through Mold2 Enalos Suite function. Two
case studies will be considered. The first one deals with the calcula-
tion of the molecular descriptors for a single chemical compound,
abacavir (CID 441300 in PubChem), and the second one deals
with all the small molecules associated with a specific assay in
PubChem: Luminescence Cell-Based Counter screen to Identify
Inhibitors of A1 Apoptosis (AID 449761 in PubChem).
In the first case, the user has three options to insert the com-
pound in the Suite as presented in the previous case study. The
Calculate Through Mold2 function calculates a large and diverse set
of molecular descriptors (777), encoding two-dimensional chemi-
cal structure information, including topological, geometric, and
structural characteristics of compounds. The output is presented as
an .xls file (Fig. 16).
In the second case, the user first searches the PubChem data-
base for the assay with AID 449761 and downloads the .sdf file
including both active and inactive compounds that have been
tested in the specific assay (Fig. 17).
Consequently, the .sdf file can be inserted in the Import sdf
window. The calculated molecular descriptors for all the active and
inactive compounds are presented in Figs. 18 and 19 respectively.
Enalos Suite: New Cheminformatics Platform for Drug Discovery and Computational… 299
Fig. 18 Molecular descriptors for the active compounds in PubChem Assay, AID 449761
Fig. 19 Molecular descriptors for inactive compounds in the PubChem Assay, AID 449761
300 Dimitra-Danai Varsou et al.
2.3 Enalos Suite Recent efforts within the drug discovery and the risk assessment
Predictive Models framework, are focused in the substitution of the in vivo and in
vitro experiments with computational in silico approaches. The
computer-aided research in a drug discovery process addresses the
need of eliminating the expensive and labor-intensive experimental
procedures, and can also be the answer to the question of ethics
that arises by the use of laboratory animals. One of the most well-
known in silico approaches is the development of predictive mod-
els that connect the structural characteristics of the molecules to
their activity or toxicity profile (quantitative structure–activity/
toxicity relationships models, QSAR/QSTR) [9]. Using this kind
of models, the researchers can seek strong patterns among the
experimental data and use these patterns in order to make accurate
predictions in future data [23, 24]. Modeling procedures require a
series of important steps, such as variable selection, and internal
and external evaluation, in order to build a robust model that pro-
duces trustworthy predictions. When a model is developed it is
crucial that its results are disseminated to the scientific community,
including researchers with no computational background, in order
to use them in real-life applications.
KNIME (Konstanz Information Miner) platform is used as the
basic infrastructure in the development of Enalos Suite. KNIME is
a powerful tool for data analysis, integration and modeling, which
is open-source and freely available. This software offers a user-
friendly interface for creating visual data flows which consist of
nodes and connections between them. The ease of visualization of
data “pipelines” gives the user the flexibility to interact and selec-
tively execute some or all analysis steps and investigate the results.
Within the same workflow it is possible to combine tools from dif-
ferent suites in short time, including different cheminformatics
tools (CDK, RDKit, ChEMBL, etc.) and other modeling tools
(e.g., WEKA) [25].
The development of appropriate KNIME workflows can sig-
nificantly facilitate data mining, analysis, and modeling in the area
of cheminformatics and nanoinformatics. Within KNIME,
NovaMechanics has developed its’ proprietary nodes, Enalos+
nodes, that are designed to fill some cheminformatics related tasks
lacking from KNIME platform, that are associated with data pre-
processing, modeling and data mining of chemical databases
(PubChem, UniChem, NCI, etc.). Enalos Suite, as a part of the
Enalos software family, can easily integrate every KNIME w orkflow,
offering in this way a friendly user interface for any functionality or
model that is already or will be developed in KNIME.
One of the main advantages of Enalos Suite is that it can host
any predictive model developed within KNIME. Enalos Suite con-
tains several custom-made predictive and validated models includ-
ing the MouseTox and K562 models described below.
Enalos Suite: New Cheminformatics Platform for Drug Discovery and Computational… 301
2.3.1 MouseTox Model Within Enalos Suite a fully validated QSTR model, MouseTox
for Cytotoxicity Prediction model, that can predict the cytotoxic effects of a wide range of
compounds, is incorporated [10]. The developed model was based
on calculated molecular descriptors from the Mold2 software and
the Random Forest machine learning methodology. MouseTox
model was first released as a web service via Enalos Cloud Platform
(http://enalos.insilicotox.com/MouseTox/) that allows for
online predictions for novel structures that are designed or
uploaded to the server.
The dataset selected for MouseTox model development,
included 5416 compounds that were tested for cytotoxic effects to
NIH/3T3 cells, as part of a project for the identification of new
drugs for the treatment of Chagas disease [10, 26]. The cytotoxic-
ity profile of compounds was considered as the output variable and
compounds included in the original dataset were classified as
“actives” (compounds cytotoxic to NIH/3T3s) or “inactives”
(compounds noncytotoxic to NIH/3T3s).
The MouseTox model was built based on a workflow devel-
oped in KNIME analytics platform that included the Enalos+ pro-
prietary KNIME nodes developed by NovaMechanics [13]. The
workflow included all steps required to afford a fully validated pre-
dictive model. Initially, for each compound in the original dataset,
777 molecular descriptors were calculated using the Mold2 soft-
ware. During the correlation analysis, some descriptors were fil-
tered out, leaving 424 to be used as inputs for the QSTR model
development. For validation purposes the initial dataset was divided
into training and test sets, using the Kennard and Stones algo-
rithm. The training set was used for model development, whereas
the test set was used during the external validation process. The
InfoGain variable selection along with Ranker evaluator method
was applied on the training set, in order to identify 15 descriptors
as the most critical for the model development. A detailed analysis
of the physical meaning of these descriptors can be found in the
original publication [10].
Special attention was given to the validation of the proposed
model, by employing different strategies for external and internal
validation to meet the criteria recommended by OECD. For inter-
nal validation, the model’s stability to the inclusion-exclusion of
data was tested by performing leave-k-out cross-validation tests. In
addition, external validation, was performed using the blank test
set. Tables 1 and 2 present the confusion matrices for the training
and the test set respectively, while Table 3 presents the relevant
statistics that prove the accuracy and the robustness of the devel-
oped model. Finally, the Y-randomization test, when applied on
the data, demonstrated the robustness and the statistical signifi-
cance of the proposed model [10].
302 Dimitra-Danai Varsou et al.
Table 1
Confusion matrix (training set)
Inactive
Active (predicted) (predicted)
Active 2114 165
Inactive 181 1602
Table 2
Confusion matrix (test set)
Table 3
Accuracy statistics of the predictive model
For these compounds, the user can sketch the structure of the
molecules in the Enalos Suite sketcher (Figs. 20, 21, 22, 23, and
24), or import their SMILES (Fig. 25) and execute the MouseTox
model. The results are presented in a .csv file and provide predictions
on the cytotoxic class of given compounds (“active”/“inactive”)
together with an indication of whether the predictions fall within
the applicability domain.
The predictions for the compounds that fall out the model’s
applicability domain limits cannot be considered reliable (Fig. 26).
2.3.2 K562 Apart from the MouseTox model, more predictive models are
Inhibition Model incorporated within Enalos Suite covering a wide range of biologi-
cal activities. Among these, a K562 inhibition predictive model is
included. Recent efforts in beta thalassemia treatment, suggest the
discovery of fetal hemoglobin inducers (HbF) that could compen-
sate the effects caused by this disorder. Toward this goal, an in
silico and fully validated model for the prediction of K562 func-
tional inhibition, possibly associated with HbF induction was pro-
posed [30] This K562 model was made available online through
Enalos Cloud Platform (http://enalos.insilicotox.com/K562)
and was also incorporated within Enalos Suite.
304 Dimitra-Danai Varsou et al.
and after a filtering step, some of them were excluded from further
analysis, due to their low discrimination power. A consensus model-
ing scheme was then developed: three different models—including
three different modeling methodologies (the kNN, random tree,
and random forest) and two different variable selection techniques
(the Gain Attribute evaluator and the InfoGain Attribute Ratio
Feature)—were used, and later combined based on a consensus
majority vote approach.
306 Dimitra-Danai Varsou et al.
2.3.3 Virtual Screening All models included within Enalos Suite can be used in combina-
tion to virtually examine the biological effects of a given com-
pound. As an example, we can consider a specific paradigm for the
in silico screening of a structure included in PubChem database.
Assuming that the chemical compound of interest is C18H19ClN4O2
that can be found in PubChem (https://pubchem.ncbi.nlm.nih.
gov/compound/16663089) the following procedure can be
undertaken. In a first step this compound of interest is submitted
through the main Enalos Suite window, as shown in Fig. 27, and
the K562 workflow is selected from the available list, for the execu-
tion of the corresponding model.
Enalos Suite: New Cheminformatics Platform for Drug Discovery and Computational… 307
Table 4
Model validation results (test set)
3 Conclusions
References
1. Todeschini R, Consonni V (2010) Molecular Ligand (RANKL). PLoS Comput Biol
descriptors for chemoinformatics. Wiley- VCH, 13:e1005372
Weinheim 12. Melagraki G, Afantitis A (2014) Enalos
2. Leonis G et al (2016) Handbook of computa- InSilicoNano platform: an online decision sup-
tional chemistry. Springer, New York, NY port tool for the design and virtual screening of
3. Willett P (2002) Chemistry plans a structural nanoparticles. RSC Adv 4:50713–50725
overhaul The rising tide of data being gener- 13. NovaMechanics Ltd (2017) NovaMechanics
ated by high-throughput. Nature 419:4–7 Ltd. http://www.novamechanics.com/index.
4. Leach Andrew R, Gillet VJ (2007) An intro- php/what-we-do/software/. Accessed 7 Apr
duction to chemoinformatics, revised Ed. 2017
Springer, New York, NY 14. Krause S, Willighagen E, Steinbeck C (2000)
5. Melagraki G, Afantitis A (2016) Editorial: JChemPaint - Using the collaborative forces of
towards open access for cheminformatics. the internet to develop a free editor for 2D
Comb Chem High Throughput Screen chemical structures. Molecules 5:93–98
19:260–261 15. Chemical Identifier Resolver. https://cactus.
6. Vrontaki E, Melagraki G, Mavromoustakos T, nci.nih.gov/chemical/structure. Accessed 22
Afantitis A (2014) Exploiting ChEMBL data- Dec 2017
base to identify indole analogs as HCV replica- 16. Kim S, Thiessen PA, Bolton EE et al (2016)
tion inhibitors. Methods 71:4–13 PubChem substance and compound databases.
7. Hong H, Xie Q, Ge W et al (2008) Mold2, Nucleic Acids Res 44:D1202–D1213
molecular descriptors from 2D structures for 17. Melagraki G, Afantitis A (2015) A risk assess-
chemoinformatics and toxicoinformatics. ment tool for the virtual screening of metal
J Chem Inf Model 48:1337–1344 oxide nanoparticles through enalos insili-
8. Tropsha A (2010) Best practices for QSAR conano platform. Curr Top Med Chem
model development, validation, and exploita- 15:1827–1836
tion. Mol Informatics 29:476–488 18. Chen B, Wild DJ (2010) PubChem BioAssays
9. Melagraki G, Afantitis A, Sarimveis H et al as a data source for predictive models. J Mol
(2006) A novel RBF neural network training Graph Model 28:420–426
methodology to predict toxicity to Vibrio 19. Cheng T, Pan Y, Hao M et al (2014) PubChem
fischeri. Mol Divers 10:213–221 applications in drug discovery: a bibliometric
10. Varsou D-D, Melagraki G, Sarimveis H, analysis. Drug Discov Today 19:1751–1756
Afantitis A (2017) MouseTox: an online toxic- 20. Ojha PK, Roy K (2018) Development of a
ity assessment tool for small molecules through robust and validated 2D-QSPR model for
Enalos Cloud platform. Food Chem Toxicol sweetness potency of diverse functional organic
110:83–93 molecules. Food Chem Toxicol 112:551
11. Melagraki G, Ntougkos E, Rinotas V et al 21. Melagraki G, Afantitis A, Sarimveis H et al
(2017) Cheminformatics-aided discovery of (2007) A novel QSPR model for predicting θ
small-molecule Protein-protein interaction (lower critical solution temperature) in poly-
(PPI) dual inhibitors of Tumor Necrosis Factor mer solutions using molecular descriptors.
(TNF) and Receptor Activator of NF-κB J Mol Model 13:55–64
Enalos Suite: New Cheminformatics Platform for Drug Discovery and Computational… 311
22. Afantitis A, Melagraki G, Sarimveis H et al 27. Melagraki G, Afantitis A, Sarimveis H et al
(2008) Development and evaluation of a (2010) In silico exploration for identifying
QSPR model for the prediction of diamag- structure-activity relationship of MEK inhibi-
netic susceptibility. QSAR Comb Sci tion and oral bioavailability for isothiazole
27:432–436 derivatives. Chem Biol Drug Des 76:397–406
23. Melagraki G, Afantitis A (2011) Ligand and 28. Papa E, Sangion A, Arnot JA, Gramatica P
structure based virtual screening strategies for (2018) Development of human biotransforma-
hit-finding and optimization of hepatitis C tion QSARs and application for PBT assess-
virus (HCV) inhibitors. Curr Med Chem ment refinement. Food Chem Toxicol 112:535
18:2612–2619 29. Alves VM, Muratov EN, Zakharov A et al (2018)
24. Afantitis A, Melagraki G, Sarimveis H et al Chemical toxicity prediction for major classes of
(2006) A novel QSAR model for evaluating industrial chemicals: is it possible to develop uni-
and predicting the inhibition activity of dipep- versal models covering cosmetics, drugs, and
tidyl aspartyl fluoromethylketones. QSAR pesticides? Food Chem Toxicol 112:526
Comb Sci 25:928–935 30. Afantitis A, Leonis G, Gambari R, Melagraki G
25. Leonis G, Melagraki G, Afantitis A (2016) (2017) Consensus Predictive Model for the
Open source chemoinformatics software prediction of Human K562 Cell Growth
including KNIME analytics platform among a Inhibition through Enalos Cloud Platform.
multitude. Springer, New York, NY, ChemMedChem. https://doi.org/10.1002/
pp 2201–2230 cmdc.201700675
26. National Center for Biotechnology Information 31. National Center for Biotechnology Information
(2012) PubChem BioAssay Database, (2016) SANGER: inhibition of human K-562
AID=651744. https://pubchem.ncbi.nlm.nih. cell growth in a cell viability assay. https://
gov/bioassay/651744. Accessed 3 Jan 2017 pubchem.ncbi.nlm.nih.gov/bioassay/742260.
Accessed 19 Dec 2017
Chapter 15
Abstract
Ion channels are membrane proteins involved in almost all physiological processes, including neurotrans-
mission, muscle contraction, pace-making activity, secretion, electrolyte and water balance, immune
response, and cell proliferation. Due to their broad distribution in human body and physiological roles, ion
channels are attractive targets for drug discovery and safety pharmacology. Over the years ion channels
have been associated to many genetic diseases (“channelopathies”). For most of these diseases the therapy
is mainly empirical and symptomatic, often limited by lack of efficacy and tolerability for a number of
patients. The search for the development of new and more specific therapeutic approaches is therefore
strongly pursued. At the same time acquired channelopathies or dangerous side effects (such as proar-
rhythmic risk) can develop as a consequence of drugs unexpectedly targeting ion channels. Several noncar-
diovascular drugs are known to block cardiac ion channels, leading to potentially fatal delayed ventricular
repolarization. Thus, the search of reliable preclinical cardiac safety testing in early stage of drug discovery
is mandatory. To fulfill these needs, both ion channels drug discovery and toxicology strategies are evolv-
ing toward comprehensive research approaches integrating ad hoc designed in silico predictions and exper-
imental studies for a more reliable and quick translation of results to the clinic side.
Here we discuss two examples of how the combination of in silico methods and patch clamp experi-
ments can help addressing drug discovery and safety issues regarding ion channels.
Key words Ion channels, Pharmacovigilance, Patch clamp, Molecular docking, Bartter syndrome,
Cardiotoxicity
1 Introduction
Orazio Nicolotti (ed.), Computational Toxicology: Methods and Protocols, Methods in Molecular Biology, vol. 1800,
https://doi.org/10.1007/978-1-4939-7899-1_15, © Springer Science+Business Media, LLC, part of Springer Nature 2018
313
314 Paola Imbrici et al.
2.1 FDA-AERS Database searching was performed using the Food and Drug
Database Searching Administration-Adverse Effects Reporting System (FDA-AERS)
database [28, 29]. AERS collects spontaneous reports of adverse
events following any US-approved drugs. Despite AERS cannot
316 Paola Imbrici et al.
2.2 Evaluation In order to test whether the selected approved drugs cause Bartter-
of the Blocking like symptoms in patients by targeting ClC-K proteins, ClC-Ks
Efficacy of Marketed channels were expressed in HEK293 cells and chloride currents
Drugs on ClC-Ka recorded through whole-cell patch-clamp before and after the
Channels application of 50 μM of each drug using specific extracellular and
Through Manual Patch intracellular solutions and customized voltage-clamp protocols.
Clamp These studies demonstrated that mycophenolate mofetil, aceno-
coumarol, and quinapril showed a very low affinity for ClC-Ka
channels, whereas the same concentration of valsartan caused ~60%
reduction of ClC-Ka chloride currents [20] (Fig. 1a).
The inhibition of ClC-Ka by this drug was concentration-
dependent with an IC50 = 21 μM. On the other hand, ClC-Kb
channels were much less sensitive to valsartan block, being inhib-
ited by ~40% only at a concentration of 100 μM as well as ClC-1
and ClC-5 channels.
Valsartan belongs to a class of antihypertensive drugs acting as
AT1R antagonists. To detect the molecular determinants of valsar-
tan structure responsible for chloride current inhibition, other
molecules belonging to the same pharmacological class, namely
losartan, telmisartan, candesartan cilexetil, and olmesartan were
then tested in vitro to verify their ability to block ClC-Ka channels.
The considered sartans share a common chemical scaffold made by
a biphenyl bearing a tetrazole ring at ortho position with the
exception of telmisartan where the tetrazole ring is replaced by a
carboxylic group. The performed patch clamp experiments showed
that losartan, candesartan cilexetil, and Telmisartan at 50 μM were
very weak blockers of ClC-Ka/barttin channels reducing chloride
currents by ~10% to ~25%. Unlike them, valsartan and olmesartan,
containing both the tetrazole ring and the carboxylic group, are by
far stronger blockers of ClC-Ka/barttin channels (Fig. 1a, b).
These experiments clarified that differences in the chemical struc-
tures of sartans accounted for a different affinity toward the ClC-Ka
Ion Channels in Drug Discovery and Safety Pharmacology 317
Table 1
Drugs returned from the analysis of FDA pharmacovigilance database causing Bartter-like syndrome
as side effect in the indicated number of case reports
Number
of BS
reports
BS-inducing Primary (2011– Proposed mechanism
drugs Therapeutic class pharmacodynamics 2012) of BS syndrome
Furosemide Loop diuretic NKCC2 blocker 8 PS Inhibits Na/K/2Cl
cotransporter
mimicking type 1
BS [31]
Mycophenolate Immunosuppressant Inosine 7 SS Unknown
Mofetil monophosphate—
Dehydrogenase
inhibitor
Prednisone Anti-inflammatory Glucocorticoid 4 PS, 2 SS Possibly inhibits
glucocorticoid receptor agonist Na/K-ATPase
similarly to the
digitoxigenin
derivative
rostafuroxin [32]
Quinapril Antihypertensive ACE inhibitor 4 SS Unknown
Candesartan Antihypertensive Blocks the AT1R 2 PS, 2 SS Unknown
Cilexetil
Valsartan Antihypertensive Blocks the AT1R 1 PS, 3 SS Unknown
Rituximab Immunosuppressant Monoclonal mAb 4 SS Unknown
anti-CD20 B cell
Tacrolimus Immunosuppressant Phosphatase calcineurin 3 SS, 1 C Stimulates the renal
inhibitor Na–Cl
cotransporter with
hypertension [33]
Methyl- Anti-inflammatory, Glucocorticoid 3 SS Unknown
prednisolone immunosuppressant receptor agonist
Calcitriol Hormone Vitamin D receptor 2C Increases Ca transport
agonist across the epithelial
cells and renal
phosphate excretion
Acenocoumarol Anticoagulant Vitamin K antagonist 2C Unknown
BS Bartter syndrome, PS primary suspected drug, SS secondary suspected drug, C concomitant drug, CaSR calcium
sensing receptor, HMG-CoA 3-hydroxy-3-methylglutaryl-coenzyme A, AT1R angiotensin type 1 receptor, ACE angio-
tensin converting enzyme
318 Paola Imbrici et al.
A C
ClC-Ka/barttin
F347
S*
L253
1.8
5nA Å
2.6 Å
100ms 2.1
N68
Valsartan 50mM Å
K165
Docking score: -10.77 kcal/mol
B D
ClC-Ka/barttin
F347
L253
5nA
100ms
N68
Olmesartan 50mM K165
Docking score : -9.05 kcal/mol
Fig. 1 Effect of valsartan and olmesartan on ClC-Ka channels expressed in HEK293 cells and docking studies
into a homology model of ClC-Ka. (a, b) Representative current traces of ClC-Ka/barttin channels before and
after the application of valsartan 50 μM (a) and Olmesartan 50 μM (b) in HEK293 cells. Data are mean ± SEM
of n = 8 cells. (c, d) Top-scored docking poses of valsartan (c) and olmesartan (d) in ClC-Ka. Important residues
and docked ligands are rendered as sticks while the protein is shown as a surface. The H-bond interactions
are depicted by a dotted line
channel and that both the carboxylic group and the tetrazole ring
are required for an efficacious in vitro block of ClC-Ka channels, as
in valsartan and olmesartan (Table 2).
2.3 Molecular The following step was to gain insights into the putative binding
Docking Simulations site of ClC-K channels, and obtain valuable information for the
future design of new drugs targeting ClC-Ks. To these aims, dock-
ing simulations of valsartan, losartan, and olmesartan were carried
out on ClC-Ka channels, assuming the same binding site recently
defined for the phenyl-benzofuran carboxylic acid derivatives [36–
38] and using a homology model of ClC-Ka built on the 3D struc-
ture of ClC-ec [39, 40]. In particular, the carboxylic unit of these
compounds is known to interact with three residues, N68, K165,
and H346 located at the extracellular side of the ClC-Ka protein.
Ion Channels in Drug Discovery and Safety Pharmacology 319
Table 2
Structure–activity studies of sartans
Chemical structures of tested sartans and percentage of inhibition of ClC-Ka/barttin currents induced by 50 μM of
drug. Blue and red squares indicate the tetrazole ring and the carboxylic group, respectively. Data are mean ± SEM of
n = 8 cells
One important step during the drug discovery process is the early
and efficient assessment of drug-induced electrophysiological and
structural cardiotoxicity in order to advance novel and safe drug
candidates into clinical trials whereas discarding unsafe com-
pounds. The most dangerous cardiac arrhythmia that can be gen-
erated as a consequence of side effect of noncardiovascular drugs is
torsades de pointes (TdPs), a life-threatening ventricular disorder
predominantly associated with the block of hERG channels
(KCNH2 or Kv11.1 or IKr) and characterized by a marked prolon-
gation of QT interval, considered during the last decade its princi-
pal surrogate biomarker [21]. For this reason, preclinical and
clinical cardiac safety approaches regulated by the International
Committee of Harmonization (ICH) S7B and E14 relied for a
long time on in vitro assays in cell lines aiming at discarding any
new chemical entity proven to block this repolarizing potassium
current and expected to induce a prolongation of QT interval in
animals ECG and in clinical thorough QT (TQT) studies. These
screening tests are now considered too simplistic (single parameter
tests) and not specific. They employ nonhuman species that do not
allow a correct prediction of human structural and contractile car-
diotoxicity. Moreover, IKr blockade and QT prolongation are now
considered incomplete biomarkers of proarrhythmic risk. Cardiac
repolarization is in fact affected by the interplay of multiple ionic
currents, and, following a revised understanding of the TdPs
mechanisms, repolarization instability and early after depolariza-
tions (EADs) are additional critical events for TdPs onset [16].
The drawbacks of these common tests have led to the wrong exclu-
sion of favorable lead compounds from later-stage development
and to the great expense of TQT assays; thus, more reliable pre-
clinical cardiac safety paradigms are being developed. One example
is the Comprehensive in vitro Proarrhythmia Assay (CiPA) schema,
a mechanism-based, nonclinical regulatory paradigm born from a
public–private collaboration, which represents the natural evolu-
tion of previous cardiac safety guidelines ICH S7B and E14. The
CiPA initiative integrates drug effects on multiple cardiac ionic
currents with in silico modeling of human ventricular electrical
activity, and in vitro data obtained from human stem cell-derived
ventricular cardiomyocytes to provide a more accurate assessment
of drug proarrhythmic propensity in a cost-effective and high-
throughput manner [7] (Fig. 2).
Ion Channels in Drug Discovery and Safety Pharmacology 321
Fig. 2 Schematic diagram of CiPA core assays. hSC-CM human stem cell-derived cardiomyocytes
3.1 In Vitro Studies According to CiPA, the effect of a new compound should be first
evaluated in vitro on seven recombinant ion channels/currents
(Cav1.2 or ICaL, Nav1.5 or INaF and L, Kv11.1 or IKr, Kv7.1 or IKs,
Kv4.2 or IKto, Kir2.1-2.4 or IK1) expressed in heterologous systems.
Rigorous and standardized voltage-clamp protocols and experi-
mental conditions have to be set to ensure robust, reliable and
reproducible dataset from automated planar patch-clamp. To this
aim, the Ion Channel Working Group (ICWG) pinpoints that both
the potency (IC50 determination) and the voltage-, kinetics-, use-
dependence of drug block are required to better predict the proar-
rhythmic potential of new chemical entities [21, 43].
3.2 In Silico Studies The drug effects recorded on multiple human cardiac currents
should be next integrated in silico in a reconstructed undiseased
mathematical model of human ventricular myocyte electrophysiol-
ogy that should help translating drug effects on individual currents
into their propensity for delayed repolarization and early after-
depolarizations. The In Silico Working Group (ISWG) is in charge
of the development and validation of the best computational
reconstruction of the human ventricular myocyte action potential.
The O’Hara-Rudy (OHR) model represents a reliable albeit per-
fectible approach [44]. Its strengths rely on the facts that it is open
322 Paola Imbrici et al.
Table 3
The 12 CiPA training drugs
3.3 Studies on Stem The third step of the CiPA paradigm consists in confirming the
Cell-Derived Myocytes effects of drugs on ionic currents in vitro on human stem
cell-
derived cardiomyocytes (hSC-CM) recording extracellular
field potentials through multielectrode array platform and changes
in transmembrane potential through voltage-sensitive dyes.
However, several aspects when using hSC-CMs still require clarifi-
cation and optimization, including the lack of standardized guide-
lines for the evaluation of hSC-CM phenotypes and functionality
that may influence drug effects [13, 17]. Finally, clinical assess-
ment of ECG in Phase I trials is forecasted to evaluate unantici-
pated electrophysiological effects [21].
4 Conclusions and Perspectives
References
1. Imbrici P, Liantonio A, Camerino GM et al of in vitro electrophysiological methods in
(2016) Therapeutic Approaches to Genetic Ion CNS safety pharmacology. J Pharmacol Toxicol
Channelopathies and Perspectives in Drug Methods 81:47–59
Discovery. Front Pharmacol 7:121 10. Tikhonov DB, Zhorov BS (2012) Architecture
2. Catterall WA, Swanson TM (2015) Structural and pore block of eukaryotic voltage-gated
basis for pharmacology of voltage-gated sodium channels in view of NavAb bacterial
sodium and calcium channels. Mol Pharmacol sodium channel structure. Mol Pharmacol
88:141–150 82:97–104
3. Imbrici P, Altamura C, Camerino GM et al 11. Feng L, Campbell EB, Hsiung Y, MacKinnon
(2016) Multidisciplinary study of a new ClC-1 R (2010) Structure of a eukaryotic CLC trans-
mutation causing myotonia congenita: a para- porter defines an intermediate state in the
digm to understand and treat ion channelopa- transport cycle. Science 330:635–641
thies. FASEB J 30:3285–3295 12. Zou B (2015) Ion channel profiling to advance
4. Imbrici P, Conte D, Liantonio A (2017) Paving drug discovery and development. Drug Discov
the way for Bartter syndrome type 3 drug dis- Today Technol 18:18–23
covery: a hope from basic research. J Physiol 13. Skalova S, Svadlakova T, Shaikh Qureshi WM,
595:5403–5404 Dev K, Mokry J (2015) Induced pluripotent
5. Jentsch TJ (2015) Discovery of CLC transport stem cells and their use in cardiac and neural
proteins: cloning, structure, function and regenerative medicine. Int J Mol Sci
pathophysiology. J Physiol 593:4091–4109 16:4043–4067
6. Verkman AS, Galietta LJV (2009) Chloride 14. Jan LY, Jan YN (2012) Voltage-gated potas-
channels as drug targets. Nat Rev Drug Discov sium channels and the diversity of electrical sig-
8:153–171 nalling. J Physiol 590:2591–2599
7. Huang H, Pugsley MK, Fermini B, Curtis MJ, 15. Zamponi GW, Striessnig J, Koschak A, Dolphin
Koerner J, Accardi M, Authier S (2017) AC (2015) The physiology, pathology, and
Cardiac voltage-gated ion channels in safety pharmacology of voltage-gated calcium chan-
pharmacology: review of the landscape leading nels and their future therapeutic potential.
to the CiPA initiative. J Pharmacol Toxicol Pharmacol Rev 67:821–870
Methods 87:11–23 16. Gintant G, Sager PT, Stockbridge N (2016)
8. Lynch JJ, Van Vleet TR, Mittelstadt SW, Evolution of strategies to improve preclinical
Blomme EAG (2017) Potential functional and cardiac safety testing. Nat Rev Drug Discov
pathological side effects related to off-target 15:457–471
pharmacological activity. J Pharmacol Toxicol 17. Gintant G, Fermini B, Stockbridge N, Strauss
Methods 87:108–126 D (2017) The evolving roles of human iPSC-
9. Accardi MV, Pugsley MK, Forster R, Troncy E, derived cardiomyocytes in drug safety and dis-
Huang H, Authier S (2016) The emerging role covery. Cell Stem Cell 21:14–17
Ion Channels in Drug Discovery and Safety Pharmacology 325
18. Lavecchia A, Cerchia C (2016) In silico meth- 31. Reinalter SC, Jeck N, Peters M, Seyberth HW
ods to address polypharmacology: current sta- (2004) Pharmacotyping of hypokalaemic salt-
tus, applications and future perspectives. Drug losing tubular disorders. Acta Physiol Scand
Discov Today 21:288–298 181:513–521
19. Langer T, Wermuth C-G (2012) Selective opti- 32. Lupoli S, Salvi E, Barcella M, Barlassina C
mization of side activities (SOSA): a promising (2015) Pharmacogenomics considerations in
way for drug discovery. In: Peters J-U (ed) the control of hypertension. Pharmacogenomics
Polypharmacology in drug discovery, 1st edn. 16:1951–1964
John Wiley & Sons, Inc., New York, NY 33.
Lazelle RA, McCully BH, Terker AS,
20. Imbrici P, Tricarico D, Mangiatordi GF, Himmerkus N, Blankenstein KI, Mutig K,
Nicolotti O, Lograno MD, Conte D, Liantonio Bleich M, Bachmann S, Yang C-L, Ellison
A (2017) Pharmacovigilance database search DH (2016) Renal deletion of 12 kDa FK506-
discloses ClC-K channels as a novel target of binding protein attenuates tacrolimus-
the AT1 receptor blockers valsartan and olmes- induced hypertension. J Am Soc Nephrol
artan. Br J Pharmacol 174:1972–1983 27:1456–1464
21. Fermini B, Hancox JC, Abi-Gerges N et al 34. Chen Y-S, Fang H-C, Chou K-J, Lee P-T, Hsu
(2016) A new perspective in the field of cardiac C-Y, Huang W-C, Chung H-M, Chen C-L
safety testing through the comprehensive (2009) Gentamicin-induced Bartter-like syn-
in vitro proarrhythmia assay paradigm. drome. Am J Kidney Dis 54:1158–1161
J Biomol Screen 21:1–11 35. Zietse R, Zoutendijk R, Hoorn EJ (2009)
22. Fahlke C, Fischer M (2010) Physiology and Fluid, electrolyte and acid-base disorders asso-
pathophysiology of ClC-K/barttin channels. ciated with antibiotic therapy. Nat Rev Nephrol
Front Physiol 1:155 5:193–202
23. Birkenhäger R, Otto E, Schürmann MJ et al 36. Liantonio A, Picollo A, Babini E, Carbonara G,
(2001) Mutation of BSND causes Bartter Fracchiolla G, Loiodice F, Tortorella V, Pusch
syndrome with sensorineural deafness and kid- M, Camerino DC (2006) Activation and
ney failure. Nat Genet 29:310–314 inhibition of kidney CLC-K chloride channels
24. Simon DB, Bindra RS, Mansfield TA et al by fenamates. Mol Pharmacol 69:165–173
(1997) Mutations in the chloride channel gene, 37. Liantonio A, Picollo A, Carbonara G et al
CLCNKB, cause Bartter’s syndrome type (2008) Molecular switch for CLC-K Cl- chan-
III. Nat Genet 17:171–178 nel block/activation: optimal pharmacophoric
25. Imbrici P, Liantonio A, Gradogna A, Pusch M, requirements towards high-affinity ligands.
Camerino DC (2014) Targeting kidney CLC-K Proc Natl Acad Sci U S A 105:1369–1373
channels: pharmacological profile in a human 38. Liantonio A, Imbrici P, Camerino GM et al
cell line versus Xenopus oocytes. Biochim (2016) Kidney CLC-K chloride channels
Biophys Acta 1838:2484–2491 inhibitors: structure-based studies and efficacy
26. Loudon KW, Fry AC (2014) The renal chan- in hypertension and associated CLC-K poly-
nelopathies. Ann Clin Biochem 51:441–458 morphisms. J Hypertens 34:981–992
27. Zieg J, Gonsorcikova L, Landau D (2016) 39. Dutzler R, Campbell EB, Cadene M, Chait BT,
Current views on the diagnosis and manage- MacKinnon R (2002) X-ray structure of a ClC
ment of hypokalaemia in children. Acta Paediatr chloride channel at 3.0 A reveals the molecular
105:762–772 basis of anion selectivity. Nature 415:287–294
28. Mele A, Calzolaro S, Cannone G, Cetrone M, 40. Gradogna A, Imbrici P, Zifarelli G, Liantonio
Conte D, Tricarico D (2014) Database search A, Camerino DC, Pusch M (2014) I–J loop
of spontaneous reports and pharmacological involvement in the pharmacological profile of
investigations on the sulfonylureas and glinides- CLC-K channels expressed in Xenopus oocytes.
induced atrophy in skeletal muscle. Pharmacol Biochim Biophys Acta 1838:2745–2756
Res Perspect 2(1):e00028 41. Small-molecule drug discovery suite 2015-3:
29. Pitts PJ, Louet HL, Moride Y, Conti RM Schrödinger suite 2015-3 induced fit docking
(2016) 21st century pharmacovigilance: protocol; glide version 6.8, Schrödinger, LLC,
efforts, roles, and responsibilities. Lancet New York, NY, 2015; Prime version 4.1,
Oncol 17:e486–e492 Schrödinger, LLC, New York, NY, 2015
30. Evans SJ, Waller PC, Davis S (2001) Use of 42. Cheng C-J, Rodan AR, Huang C-L (2017)
proportional reporting ratios (PRRs) for signal Emerging targets of diuretic therapy. Clin
generation from spontaneous adverse drug Pharmacol Ther 102:420–435
reaction reports. Pharmacoepidemiol Drug Saf 43. Windley MJ, Abi-Gerges N, Fermini B, Hancox
10:483–486 JC, Vandenberg JI, Hill AP (2017) Measuring
326 Paola Imbrici et al.
Abstract
Current therapeutic strategies entail identifying and characterizing a single protein receptor whose inhibi-
tion is likely to result in the successful treatment of a disease of interest, and testing experimentally large
libraries of small molecule compounds “in vitro” and “in vivo” to identify promising inhibitors in model
systems and determine if the findings are extensible to humans. This highly complex process is largely
based on tests, errors, risk, time, and intensive costs. The virtual computational study of compounds simu-
lates situations predicting possible drug linkages with multiple protein target atomic structures, taking into
account the dynamic protein inhibitor, and can help identify inhibitors efficiently, particularly for complex
drug-resistant diseases. Some discussions will relate to the potential benefits of this approach, using HIV-1
and Plasmodium falciparum infections as examples. Some authors have proposed a virtual drug discovery
that not only identifies efficient inhibitors but also helps to minimize side effects and toxicity, thus increas-
ing the likelihood of successful therapies. This chapter discusses concepts and research of bioactive multi-
targets related to toxicology.
1 Introduction
1.1 Promise Current therapeutic strategies have been applied to several dis-
of a New Paradigm eases, including human immunodeficiency virus type 1 (HIV-1)
in Drug Discovery infection from an initial single target treatment to several targets
[1]. Antiretroviral drugs are no longer the only regimen recom-
mended for clinical use against HIV-1 due to rapid emergence of
drug resistance after initiation of therapy [2]. A combination of
antiretroviral drugs targeting different viral proteins is more effec-
tive in suppressing viral growth [11]. In many cases, however,
these regimens are expensive and result in increased toxicity and
poor patient compliance [3]. New paradigms in the discovery of
multitarget drugs have emerged, particularly for the treatment of
HIV-1 infection. For example, the multitarget antiretroviral drug
Cosalane was developed to inhibit several HIV-1 Proteins (gp120,
Orazio Nicolotti (ed.), Computational Toxicology: Methods and Protocols, Methods in Molecular Biology, vol. 1800,
https://doi.org/10.1007/978-1-4939-7899-1_16, © Springer Science+Business Media, LLC, part of Springer Nature 2018
327
328 Luciana Scotti et al.
1.2 Summary We have developed a new computational paradigm for the discov-
and Advantages of Our ery of potential lead inhibitors based on the combination of three
Computing Paradigm principles (Fig. 1) [6].
(1) Incorporation of protein side chains and main chain dynam-
ics during the interaction process to further accurately assess bind-
ing affinities. (2) Selection of single inhibitors that bind to multiple
protein targets simultaneously, (3) Use of a screening library con-
sisting of drugs and drug compounds. Each principle increases the
likelihood of a prediction. The compound will successfully inhibit
the disease; in addition, screening with drug-like compounds spe-
cifically enhances pharmacological viability. Overall, this new para-
digm produces hits that will conveniently and predictably improve
the potential of the compounds analyzed to be viable drugs for
various diseases.
1.3 Justifying Biologically active proteins are in continuous motion, yet most
the Use of Dynamics information on the protein structure is limited to the most stable
form of a protein when crystallized under artificial conditions. The
different conformations of crystallographic structures and unre-
lated crystallographic structures suggest that binding events and
protein motions induce variation and more dimensions and changes
in the electrostatics of the catalytic site. This is likely to, under
physiological conditions, inhibit one of these variant conforma-
tions with affinity greater than that observed for the artificial stabi-
lization structure. Thus, dynamic simulations in addition to using
static crystal structures increase the possibility of a physiologically
relevant conformation.
We perform the dynamic nesting through (1) fitting the ligand
of interest, (2) solvation in a shell of water and salt, (3) applying
100 steps of energy minimization, (4) simulating protein
Computational Approaches in Multitarget Drug Discovery 329
Fig. 1 Comparison of our protocol for the discovery of multitarget inhibitors with traditional approaches cur-
rently used. The advantages of using our broad-spectrum novel. The protocol for the discovery of multitrigid
(right) inhibitors against key pathogens and diseases is contrasted with traditional approaches (left). The main
differences in our protocol, corresponding to the reasons why it is most effective, are as follows: (1) The use of
a molecular dynamics coupling algorithm that takes the flexibility of proteins and inhibitors completely into
account (http://compbio.washington.edu/papers/therapeutics.html). This algorithm is effective because all
molecules in biology are submitted to thermal motility. Traditional rigid anchoring approaches are not respon-
sible for this movement, resulting in poor prediction of binding energies or inhibitory constants as compared
to our approach. (2) The use of compounds which bind to multiple targets simultaneously. The most effective
drugs in humans (e.g., aspirin or Gleevec) inevitably interact and bind to multiple proteins, a feature that tra-
ditional models based on single target drugs do not take into account. The multitarget approach is a necessary
one because each drug has to be effective at its site of action (For example, HIV-1 protease inhibitors must
bind and inhibit the protease molecule.) and must be promptly metabolized by the body (e,g,, cytochrome P450
(CYP450) enzymes, which are responsible for the metabolism of most drugs) [6]. Computational Screening for
Multitarget binding and inhibition are effective because they exploit the evolving fact that the protein structure
is conserved much more in nature than function or sequence. (3) Use of compounds with drug and drug
approved by the FDA and experimental in the process of computational sorting. Screening of drugs developed
for other conditions against infectious diseases is likely to lead to fewer side effects because the pharmacoki-
netic toxicity, absorption, distribution, metabolism, and excretion is typically well established in human models
and animal models. To our knowledge, this is the first time these three elements have been combined to create
an effective inhibitor and drug discovery protocol with predictions that have been experimentally verified to
produce highly promising lead inhibitors for further drug development. The computational aspects of our pro-
tocol are fully automated, can be executed completely in parallel and require only a fixed initial investment in
the number of processors purchased (i.e., the higher number of CPUs, more targets and compounds that can
be traced). Our new protocol is extremely effective and increases success rates downstream preclinical and
clinical use with a considerable reduction of time, effort, and cost spent
330 Luciana Scotti et al.
2 Computational Studies
2.1 Antimalarials Shibi et al. [18] employed a virtual screening of 292 phytochemi-
cals present in 72 traditional herbs, principally found in Africa,
China, and Asia, as effective antimalarials, in order to finding out
inhibitors of plasmepsin-2 and falcipain-2 of P. falciparum.
Bioassay datasets AID 504850, with potential to inhibit the
malarial parasite Plasmodium, and AID 2302, measured on levels
of P. falciparum lactate dehydrogenase as surrogate of parasite
Computational Approaches in Multitarget Drug Discovery 331
2.2 Alzheimer Azam et al. [33] considering the therapeutic properties of green
tea to reduce the risk of various neurodegenerative diseases, such
as Parkinson’s disease, studied the binding interactions between
catechins and 18 potential protein receptors through molecular
docking simulations. Green tea contains several polyphenolic
cathechins, such as catechin, epicatechin, epicatechin gallate, epi-
gallocatechin, and epigallocatechin gallate which are believed to be
the active components [34]. It was observed that green tea poly-
phenols were protective in SH-SY5Y cells against apoptosis induced
by the pro-Parkinsonian neurotoxin 6-hydroxydopamine [35].
Eighteen crystal structures of potential protein receptors,
extracted from PDB, and five catechin derivatives and gallic acid
Computational Approaches in Multitarget Drug Discovery 333
were employed for the docking calculations. For every target pro-
tein, ten poses were visualized for each docked compound in order
to identify the model with minimum binding energy and best
ligand–receptor interaction. From docking analysis, it was observed
that the studied catechins were capable of interacting with all
docked targets, indicating that these ligands have a broad spectrum
of structural features. Drugs that interact with multiple targets
might have a better chance of affecting the complex equilibrium of
whole cellular networks than drugs that act on a single target.
Monoamino oxidase-B (MAO-B), adenosine A2A receptor, and nitric
oxide synthase (NOS) are the most promising anti-Parkinsonian
drug targets, and NMDA (N-methyl-d-aspartate) receptor is the
least favorable anti-Parkinsonian drug target.
A structure–activity relationship analysis identifies the impor-
tance of various functionalities for ligand–receptor interactions.
(a) Except for NMDA and α-amino-3-hydroxy-5-methyl-4-
isoxazole propionic acid receptors, the benzopyran moiety and
aromatic ring CB are essential for activity.
(b) Dihydroxyl group at positions 5 and 7 is required for activity
at almost all targets.
(c) OH substitution at R2 increases the activity at C-Jun-N-
terminal kinase, metabotropic glutamate receptor 1, cyclooxy-
genase-
1, cyclooxygenase-2, glutamate dehydrogenase, and
ionotropic glutamate receptor.
(d) Heterocyclic benzopyran ring contains chiral center at posi-
tions 2 and 3, which is important for the activity at C-Jun-N-
terminal kinase, P38 MAP kinase, catechol-O-methyl
transferase, metabotropic glutamate receptor 3, glycogen syn-
thase kinase 3, and α-amino-3-hydroxy-5-methyl-4-isoxazole
propionic acid.
(e) R1 substitution with OH in epicatechin, epigallocatechin, and
catechin is important for the activity at P38 MAP kinase, gly-
cogen synthase kinase 3, and glutamate dehydrogenase.
334 Luciana Scotti et al.
[53]. On the other hand, some best docking poses showed a root-
mean- square deviation (RMSD) ranging from 1.57 to 2.35 Å
when compared with the experimentally determined structures. In
order to analyze this discrepancy a 50 ns all-atom explicit water
molecular dynamics simulation was performed for all active deriva-
tives starting from the docking results. The results showed that
MD simulations can reproduce the experimentally observed
cocrystal protein–ligand complexes much more accurately than
docking alone.
The optimized models were used to redock 908 compounds
from which the lead compounds had been originally identified. It
was observed that all N-[3-(2-oxo-pyrrolidinyl)phenyl]-benzene-
sulfonamide derivatives were scored better when compared to the
original models. These compounds contain a pyrrolidinone moiety
which interacts with a hydrogen bond to the conserved N140 in
BRD4-1. On the other hand, unlike many of the most potent BET
inhibitors such as (+)-JQ1 and I-BET151, these compounds lack
many of the π–π stacking and hydrophobic interactions necessary
for further stabilization inside of the acetyl-lysine binding site.
These data are important to design more effective compounds.
2.5 Treatment Eighteen anti-influenza herbs which have been widely used his-
of Influenza Virus torically to treat patients with influenza were selected by Gu
Infection et al. [66], from the Traditional Chinese Medicine Database 312
by Computational structures available by compounds of these herbs were collected.
Approaches For simplicity, this study focused on influenza viral proteins
structurally available for docking, omitting related targets of
human proteins. The structure of H7 in complex with its sub-
strate (PDB ID 4DJ7) and a complex structure of N9 with a
sulfate ion at the binding site (PDB ID 2B8H) were chosen. To
select TCM compounds that could potentially inhibit H7 and
N9 it was chosen molecules with a docking score at least 1 kcal/
mol lower than the corresponding positive controls (LSTc and
sialic acid, for H7 and N9 were −6.4 kcal/mol and −7.2 kcal/
mol, respectively). Among the screened compounds seven (11–
17) were predicted as dual inhibitors for H7 and N9. Compounds
11 and 12 from Scutellaria baicalensis, 13 from Radix isatidis,
14 from Glycyrrhiza uralensis, 15 from Artemisia annua, 16
from Cinnamomum aromaticum and, 17 from Folium isatidis.
This research demonstrates TCM’s multitarget/multicompo-
nent strategy for disease treatment. In this way, TCM offers an
efficient combinatorial intervention through mechanistic redun-
dancy and less potential drug resistance. For example, the herbal
drug Radix isatidis is frequently used by the Chinese to prevent
H7N9 and other types of influenza, but its effectiveness is
unclear and often argued. This study data here suggests that
Radix isatidis may prevent influenza, but this should be vali-
dated by additional experimental studies [66].
Computational Approaches in Multitarget Drug Discovery 343
References
1. Hammer SM, Abergh AJ, Erro JJ et al (2006) 13. Pei J, Yin N, Ma X, Lai L (2014) Systems biol-
Treatment for Adult HIV Infection. JAMA ogy brings new dimensions for structure-based
296(7):410–425 drug design. J Am Chem Soc
2. Hopkins AL, Mason JS, Overington JP (2006) 136(33):11556–11565
Can we rationally design promiscuous drugs? 14. Brötz-Oesterhelt H, Brunner NA (2008) How
Curr Opin Struct Biol 16:127. Available on: many modes of action should an antibiotic
http://www.sciencedirect.com/science/arti- have? Curr Opin Pharmacol 8(5):564–573
cle/pii/S0959440X06000157. Access 20 15. Knight ZA, Lin H, Shokat KM (2010)
October, 2017 Targeting the cancer kinome through polyphar-
3. Jenwitheesuk E, Horst AJ, Rivas KL, Van macology. Nat Rev Cancer 10(2):130–137
Voorhis WC, Samudrala R (2008) Novel para- 16. Kumar R, Sharma A, Tiwari RK (2016)
digms for drug discovery: computational mul- Computational drug designing and develop-
titarget screening. Trends Pharmacol Sci ment: an insight. IRJET 03(05):10–14
29(2):62–71 17. Jadhav MN, Kokil GR, Harak SS, Wagh SB
4. Jenwitheesuk E, Samudrala R (2003) Improved (2013) Direct and indirect drug design
prediction of HIV-1 protease-inhibitor bind- approaches for the development of novel tricy-
ing energies by molecular dynamics simula- clic antipsychotics: potential 5-HT2A antago-
tions. BMC Struct Biol 3(1):2 nist. J Chem 2013:1–10
5. Jenwitheesuk E, Samudrala R (2005) 18. Shibi IG, Aswathy L, Jisha RS, Masand VH,
Identification of potential multitarget antima- Gajbhiye JM (2016) Virtual Screening
larial drugs. JAMA 294(12):1490–1491 Techniques to Probe the Antimalarial Activity
6. Jones R, Gazzard B (2006) The cost of antiret- of some Traditionally Used Phytochemicals.
roviral drugs and influence on prescribing poli- Comb Chem High Throughput Screen
cies. Int J STD AIDS 17(8):499–506 19(7):572–591
7. Metzner KJ, Allers K, Rauch P, Harrer T 19. Wang Y, Xiao J, Suzek TO, Zhang J, Wang J,
(2007) Rapid selection of drug-resistant HIV-1 Bryant SH (2009) PubChem: a public infor-
during the first months of suppressive ART in mation system for analyzing bioactivities of
treatment- naive patients. AIDS small molecules. Nucleic Acids Res 37(Web
21(6):703–711 Server issue):W623–W633
8. Santhosh KC, Paul GC, De Clercq E, 20. Liu K, Feng J, Young SS (2005) PowerMV:a
Pannecouque C, Witvrouw M, Loftus TL, software environment for molecular viewing.;
Turpin JA, Buckheit RW Jr, Cushman M descriptor generation.; data analysis and hit
(2001) Correlation of anti-HIV activity with evaluation. J Chem Inf Model 45(2):515–522
anion spacing in a series of cosalane analogues 21. Kumar M, Makhal B, Gupta VK, Sharma A
with extended polycarboxylate pharmacoph- (2014) In silico investigation of medicinal
ores. J Med Chem 44(5):703–714 spectrum of imidazo-azines from the perspec-
9. Scheeff ED, Bourne PE (2005) Structural evo- tive of multitarget screening against malaria,
lution of the protein kinase–like superfamily. tuberculosis and Chagas disease. J Mol Graph
PLoS Comput Biol 1(5):e49 Model 50:1–9
10. Shoichet BK (2005) Virtual screening of 22. Yuvaniyama J, Chitnumsub P,
chemical libraries. Nature Kamchonwongpaisan S, Vanichtanankul J,
432(7019):862–865 Sirawaraporn W, Taylor P, Walkinshaw MD,
11. Volberding PA, Lagakos SW, Grimes JM, Stein Yuthavong Y (2003) Insights into antifo- late
DS, Rooney J, Meng TC, Fischl MA, Collier resistance from malarial DHFR-TS structures.
AC, Phair JP, Hirsch MS (1995) A comparison Nat Struct Biol 10(5):357–365
of immediate with deferred zidovudine therapy 23. Perozzo R, Kuo M, Sidhu AB, Valiyaveettil JT,
for asymptomatic HIV-infected adults with Bittman R, Jacobs WR Jr, Fidock DA,
CD4 cell counts of 500 or more per cubic mil- Sacchettini JC (2002) Structural elucidation of
limeter. N Engl J Med 333(7):401–407 the specificity of the antibacterial agent triclo-
12. Wierenga RK (2001) The TIM-barrel fold: a san for malarial triclosan for malarial enoyl acyl
versatile framework for efficient enzymes. carrier protein reductase. J Biol Chem
FEBS Lett 492:193. Available on http://doi. 277(15):13106–13114
wiley.com/10.1016/S0014-5 793%2801% 24. Merckx A, Echalier A, Langford K, Sicard A,
2902236-0. Access 9 October, 2017 Langsley G, Joore J, Doerig C, Noble M,
344 Luciana Scotti et al.
Endicott J (2008) Structures of P. falciparum 35. Guo S, Bezard E, Zhao B (2005) Protective
protein kinase 7 identify an activation motif effect of green tea polyphenols on the SH-SY5Y
and leads for inhibitor design. Structure cells against 6-OHDA induced apoptosis
16(2):228–238 through ROS-NO pathway. Free Radic Biol
25. Dias MV, Faim LM, Vasconcelos IB, De Med 39(5):682–695
Oliveira JS, Basso LA, Santos DS, De Azevedo 36. Maqbool M, Manral A, Jameel E, Kumar J,
W (2007) Structure of shikimate kinase from Saini V, Shandilya A, Tiwari M, Hoda N,
Mycobacterium tuberculosis complexed with Jayaram B (2016) Development of
ADP and shikimate at 1.9 Å of resolution. Acta cyanopyridine-triazine hybrids as lead multitar-
Crystallogr Sect F 63:1–6 get anti-Alzheimer agents. Bioorg Med Chem
26. Wang S, Eisenberg D (2003) Crystal structures 24(12):2777–2788
of a pantothenate synthetase from M. tubercu- 37. Palanimuthu D, Poon R, Sahni S, Anjum R,
losis and its complexes with substrates and a Hibbs D, Lin HY, Bernhardt PV, Kalinowski
reaction intermediate. Protein Sci DS, Richardson DR (2017) A novel class of
12(5):1097–1108 thiosemicarbazones show multi-functional
27. Li de la Sierra I, Munier-Lehmann H, Gilles activity for the treatment of Alzheimer’s dis-
AM, Bârzu O, Delarue M (2001) X-ray struc- ease. Eur J Med Chem 9(139):612–632
ture of TMP kinase from Mycobacterium 38. Barnham KJ, Masters CL, Bush AI (2004)
tuberculosis complexed with TMP at 1.95 Å Neurodegenerative diseases and oxidative
resolution. J Mol Biol 311(1):87–100 stress. Nat Rev Drug Discov 3(3):205–214
28. Basavannacharya C, Robertson G, Munshi T, 39. Ayton S, Lei P, Bush AI (2013) Metallostasis in
Keep NH, Bhakta S (2010) ATP- dependent Alzheimer’s disease. Free Radic Biol Med
MurE ligase in Mycobacterium tuberculosis: 62:76–89
biochemical and structural characterisation. 40. Hegde ML, Bharathi P, Suram A, Venugopal
Tuberculosis 90(1):16–24 C, Jagannathan R, Poddar P, Srinivas P,
29. Bond CS, Zhang Y, Berriman M, Cunningham Sambamurti K, Rao KJ, Scancar J, Messori L,
ML, Fairlamb AH, Hunter WN (1999) Crystal Zecca L, Zatta P (2009) Challenges associated
structure of Trypanosoma cruzi trypanothione with metal chelation therapy in Alzheimer’s
reductase in complex with trypanothione, and disease. J Alzheimers Dis 17(3):457–468
the structure-based discovery of new natural 41. Vickers JC, Dickson TC, Adlard PA, Saunders
product inhibitors. Structure 7(1):81–89 HL, King CE, McCormack G (2000) The
30. Ladame S, Castilho MS, Silva CH, Denier C, cause of neuronal degeneration in Alzheimer’s
Hannaert V, Périé J, Oliva G, Willson M disease. Prog Neurobiol 60(2):139–165
(2003) Crystal structure of Trypanosoma cruzi 42. Wolfe DM, Lee JH, Kumar A, Lee S, Orenstein
glyceraldehyde-3-phosphate dehydrogenase SJ, Nixon RA (2013) Autophagy failure in
complexed with an analogue of 1,3-bisphos- Alzheimer’s disease and the role of defective
pho-d-glyceric acid. Eur J Biochem lysosomal acidification. Eur J Neurosci
270(22):4574–4586 37(12):1949–1961
31. Amaya MF, Watts AG, Damager I, Wehenkel 43. Morphy R (2010) Selectively nonselective
A, Nguyen T, Buschiazzo A, Paris G, Frasch kinase inhibition: striking the right balance.
AC, Withers SG, Alzari PM (2004) Structural J Med Chem 53(4):1413–1437
insights into the catalytic mechanism of 44. Martin MP, Olesen SH, Georg GI, Schönbrunn
Trypanosoma cruzi trans-sialidase. Structure E (2013) Cyclin-dependent kinase inhibitor
12(5):775–784 dinaciclib interacts with the acetyl-lysine recog-
32. Abad-zapatero C, Metz JT (2005) Ligand effi- nition site of bromodomains. ACS Chem Biol
ciency indices as guideposts for drug Discovery. 8(11):2360–2365
Drug Discov Today 10(7):464–469 45. Parry D, Guzi T, Shanahan F, Davis N,
33. Azam F, Mohamed N, Alhussen F (2015) Prabhavalkar D, Wiswell D, Seghezzi W,
Molecular interaction studies of green tea cat- Paruch K, Dwyer MP, Doll R, Nomeir A,
echins as multitarget drug candidates for the Windsor W, Fischmann T, Wang Y, Oft M,
treatment of Parkinson’s disease: computa- Chen T, Kirschmeier P, Lees EM (2010)
tional and structural insights. Network Dinaciclib (SCH 727965), a novel and potent
26(3-4):97–115 cyclin-dependent kinase inhibitor. Mol Cancer
34. Braicu C, Ladomery MR, Chedea VS, Irimie Ther 9(8):2344–2353
A, Berindan-Neagoe I (2013) The relationship 46. Carlino L, Rastelli G (2016) Dual kinase-
between the structure and biological actions of bromodomain inhibitors in anticancer drug
green tea catechins. Food Chem discovery: a structural and pharmacological
141(3):3282–3289 perspective. J Med Chem 59(20):9305–9320
Computational Approaches in Multitarget Drug Discovery 345
47. Dixon SL, Smondyre AM, Knoll EH, Rao SN, 58. Gogoladze G, Grigolava M, Vishnepolsky B,
Shaw DE, Friesner RA (2006) PHASE: a new Chubinidze M, Duroux P, Lefranc MP,
engine for pharmacophore perception, 3D QSAR Pirtskhalava M (2014) DBAASP: database of
model development, and 3D database screening: antimicrobial activity and structure of peptides.
1. Methodology and preliminary results. FEMS Microbiol Lett 357:63–68
J Comput Aided Mol Des 20(10-11):647–671 59. Verslyppe B, De Smet W, De Baets B, De Vos
48. Dixon SL, Smondyrev AM, Rao SN (2006) P, Dawyndt P (2014) Straininfo introduces
PHASE: a novel approach to pharmacophore electronic passports for microorganisms. Syst
modeling and 3D database searching. Chem Appl Microbiol 37:42–50
Biol Drug Des 67(5):370–372 60. Dresen S, Ferreirós N, Gnann H, Zimmermann
49. Allen BK, Mehta S, Ember SWJ, Schonbrunn R, Weinmann W (2010) Detection and identi-
E, Ayad N, Schürer SC (2015) Large-scale fication of 700 drugs by multi-target screening
computational screening identifies first in class with a 3200 Q TRAP® LC-MS/MS system
multitarget inhibitor of EGFR kinase and and library searching. Anal Bioanal Chem
BRD4. Sci Rep 05:16924 396:2425–2434
50. Schürer SC, Muskal SM (2013) Kinome-wide 61. Kell DB (2013) Finding novel pharmaceuticals
activity modeling from diverse public high-qual- in the systems biology era using multiple effec-
ity data sets. J Chem Inf Model 53(1):27–38 tive drug targets, phenotypic screening and
51. Gaulton A, Bellis LJ, Bento AP, Chambers J, knowledge of transporters: where drug discov-
Davies M, Hersey A, Light Y, McGlinchey S, ery went wrong and how to fix it. FEBS
Michalovich D, Al-Lazikani B, Overington JP J 280:5957–5980
(2012) Chembl: a large-scale bioactivity data- 62. Durrant JD, Amaro RE, Xie L, Urbaniak MD,
base for drug discovery. Nucleic Acids Res Ferguson MAJ, Haapalainen A et al (2010) A
40(Database issue):D1100–D1107 multidimensional strategy to detect polyphar-
52. Huang N, Shoichet BK, Irwin JJ (2006) macological targets in the absence of structural
Benchmarking sets for molecular docking. and sequence homology (2010). PLoS Comput
J Med Chem 49(23):6789–6801 Biol 6:e1000648
53. Allen BK, Mehta S, Ember SWJ, Zhu JY, 63. Ouyang H, Liu S, Zeng W, Levitt RC,
Schönbrunn E, Ayad NG, Schürer SC (2017) Candiotti KA, Hao S (2012) An emerging new
Identification of a novel class of BRD4 inhibi- paradigm in opioid withdrawal: a critical role
tors by computational screening and binding for glia-neuron signaling in the periaqueductal
simulations. ACS Omega 2(8):4760–4771 gray. Sci World J 2012:1–9
54. Engler AC, Wiradharma N, Ong ZY, Coady
64. Del Bello F, Diamanti E, Giannella M,
DJ, Hedrick JL, Yang YY (2016) Emerging Mammoli V, Mattioli L, Titomanlio F (2013)
trends in macromolecular antimicrobials to Exploring multitarget interactions to reduce
fight multi-drug-resistant infections. Nano opiate withdrawal syndrome and psychiatric
Today 7(3):201–222 comorbidity. ACS Med Chem Lett
55. Speck-Planche A, Kleandrova VV, Ruso JM, 4:875–879
Cordeiro MN (2016) First multitarget chemo- 65. Ma S, Feng C, Zhang X, Dai G, Li C, Cheng X
bioinformatic model to enable the discovery of et al (2013) The multi-target capabilities of the
antibacterial peptides against multiple gram-pos- compounds in a TCM used to treat sepsis and
itive pathogens. J Chem Inf Model 56:588–598 their in silico pharmacology. Complement
56. Jenssen H, Hamill P, Hancock RE (2006) Ther Med 21:35–41
Peptide antimicrobial agents. Clin Microbiol 66. Gu S, Yin N, Pei J, Lai L (2013) Understanding
Rev 19:491–511 molecular mechanisms of traditional Chinese
57. Brogden KA (2005) Antimicrobial peptides: medicine for the treatment of influenza viruses
pore formers or metabolic inhibitors in bacte- infection by computational approaches. Mol
ria? Nat Rev Microbiol 3:238–250 Biosyst 9:1–5
Chapter 17
Abstract
This chapter presents an outline of the recent available information regarding safety, toxicity, and efficacy
of nano drug delivery systems. Of particular importance is the evaluation of several key factors to design
nontoxic and effective nanoformulations. Among them, we focus on nanostructure materials and synthesis
methods, mechanisms of interactions with biological systems, treatment of nanoparticles, manufacture
impurities, and nanostability. Emphasis is given to in silico, in vitro, and in vivo models used to assess and
predict the toxicity of these new formulations. Additionally, some examples of in vitro and in vivo studies
of specific nanoderivatives are also presented in this chapter.
1 Introduction
Orazio Nicolotti (ed.), Computational Toxicology: Methods and Protocols, Methods in Molecular Biology, vol. 1800,
https://doi.org/10.1007/978-1-4939-7899-1_17, © Springer Science+Business Media, LLC, part of Springer Nature 2018
347
348 Antonio Lopalco and Nunzio Denora
Table 1
Types of chemical structures, methods of preparation and applications of some NPs used as
pharmaceutical carrier system in drug/gene delivery, photodynamic, in vitro diagnosis, and imaging
NP class (nanostructured
material) Method of preparation Application
Polymer carrier (PLA, Colloidal mill, emulsification-solvent Drug/gene
polyalkylcyanoacrylates, PEI, evaporation or diffusion, emulsification- delivery
PCL, PLGA, PEG-PLA, reverse salting-out, gelation of emulsion
PEG-PLGA) droplets, ionic gelation, polymerization,
Polymer carrier of natural and/or interfacial polycondensation,
derivatives materials (chitosan, nanoprecipitation, formation of
dextrane, gelatin, albumin, polyelectrolyte complexes, formation of
alginates, liposomes, starch) NPs from neutral nanogels [7, 10, 12]
[15–17]
Dendrimers (PAMAM, blocked Divergent, convergent, click chemistry [11, Drug/gene
polymers) 18, 19] delivery
Fullerenes (carbon-based carriers) Arc and combustion method [20, 21] Drug delivery,
Photodynamic
Ferrofluid (SPIONS, USPIONS) Physical, chemical, and biological methods Imaging (MRI)
[22, 23]
Quantum dots (Cd/Zn-selenides) Top-down and bottom-up [24] In vitro diagnosis,
Imaging
chemical toxicity can offer sufficient results for many bulk materi-
als; but in vivo interaction of nanoformulations and biological sys-
tems is relatively complex and dynamic [13, 72].
In the past two decades, several papers have described the safety
and toxicity of NPs formed from biodegradable materials, like
PLGA, and nonbiodegradable materials, including NPs like den-
drimers, carbon nanotubes, and quantum dots [2, 3, 13, 14, 73].
Usually, nanoformulations constituted from biodegradable
polymers are estimated to display scarcer toxic effects than nonbio-
degradable polymers. Semete et al. researched the in vivo toxicity
and biodistribution of poly(lactic-co-glycolic acid) NPs. One week
after oral administration in mouse, 40% percent of those NPs were
concentrated in the liver, and the rest were distributed in brain and
kidney without apparent toxicity [13, 74]. Chemical composition
of biodegradable NPs and their degradation products can affect
the biological effects. Polyesters such as PLGA or polycaprolactone
(PCL) undergo hydrolysis and enzymatic degradation after implan-
tation into the body, forming lactic acid, glycolic or capronic acid,
which are biocompatible entities. Aside from size-dependent toxic-
ity due to ROS-generating ability, dimension of a PLGA or PCL
particle can change the rate of degradation of the polymeric matrix
[75].
Polyacrylate NPs were among the first NPs investigated for
controlled delivery of biological agents. Recently, Song et al. [76]
examined the human toxicity of polyacrylate NPs prepared from
polymerization of unsaturated monomers, such as methyl meth-
acrylic amide or cyanoacrylates. The toxicity of the larger cyanoac-
rylate NPs was correlated with the chemical properties and
molecule chain length; pathological investigation indicated non-
specific pulmonary inflammation, pulmonary fibrosis, and foreign-
body granulomas of pleura after exposure [13, 76–78].
Because of their specific nature dendrimers can find applica-
tions in drug delivery, gene transfection, and technology [8, 14].
Even though their small size limits extensive drug incorporation
into the dendrimers, their nature and polymeric structure allows
for drug loading onto the internal dendritic branches. It has been
shown that functionalization of the surface with specific antibodies
may further enhance potential targeting and reduce side effects
[79]. The limited clinical studies and published data available on
the toxicity of this class of particles make it impossible to define
their safety and toxicity. More recently, they have been explored as
potential antimicrobial agents [80].
In view of their potency for induction of reactive oxygen spe-
cies (ROS) after photoexcitation, fullerenes have also been studied
Nanoformulations for Drug Delivery: Safety, Toxicity, and Efficacy 359
7 Final Considerations
References
60. Gibaud S, Demoy M, Andreux JP, 72. Yanamala N, Kagan VE, Shvedova AA (2013)
Weingarten C, Gouritin B, Couvrer P (1996) Molecular modeling in structural nano-
Cells involved in the capture of nanopar- toxicology: interactions of nano-particles with
ticles in hematopoietic organs. J Pharm Sci nano-machinery of cells. Adv Drug Deliv Rev
85:944–950 65:2070–2077
61. Demoy M, Gibaud S, Andreux JP, Weingarten 73. Aragao-Santiago L, Hillaireau H, Grabowski
C, Gouritin B, Couvrer P (1997) Splenic N, Mura S, Nascimento TL, Dufort S, Coll
trapping of nanoparticles: complementary JL, Tsapis N, Fattal E (2016) Compared
approaches for in situ studies. Pharm Res in vivo toxicity in mice of lung delivered bio-
14:463–468 degradable and non-biodegradable nanopar-
62. Silva AL, Peres C, Conniot J, Matos AI, ticles. Nanotoxicology 10:292–302
Loura L, Carriera B, Sainz V, Scomparin 74. Semete B, Booysen L, Lemmer Y, Kalombo
A, Satchi-Fainaro R, Preat V, Florindo HF L, Katana L, Verschoor J, Swai HS (2010)
(2017) Nanoparticle impact on innate In vivo evaluation of the biodistribution and
immune cell pattern-recognition recep- safety of PLGA nanoparticles as drug delivery
tors and inflammasomes activation. Semin systems. Nanomedicine 6:662–671
Immunol 34:3–24 75. Panyam J, Dali MM, Sahoo SK, Ma W,
63. Sayes CM, Reed KL, Warheit DB (2007) Chakravarthi SS, Amidon GL, Levy RJ,
Assessing toxicity of fine and nanoparticles: Labhasetwar V (2003) Polymer degradation
comparing in vitro measurements to in vivo and in vitro release of a model protein from
pulmonary toxicity profiles. Toxicol Sci poly (D, L-lactide-co-glycolide) nano-and
97:163–180 microparticles. J Control Release 92:173–187
64. Ryman-Rasmussen JP, Riviere JE, Monteiro- 76. Song Y, Li X, Du X (2009) Exposure to
Riviere NA (2007) Variables influencing nanoparticles is related to pleural effusion,
interactions of untargeted quantum dot pulmonary fibrosis and granuloma. Eur
nanoparticles with skin cells and identifica- Respir J 34:559–567
tion of biochemical modulators. Nano Lett 77. Kreuter J (2007) Nanoparticles- a historical
7:1344–1348 perspective. Int J Pharm 331:1–10
65. Oberdörster G, Maynard A, Donaldson 78. Lherm C, Muller RH, Puisieux F, Couvreur
K, Castranova V, Fitzpatrick J, Ausman K, P (1992) Alkylcyanoacrylate drug carriers:
Carter J, Karn B, Kreyling W, Lai D, Olin II. Cytotoxicity of cyanoacrylate nanopar-
S, Monteiro-Riviere N, Warheit D, Yang ticles with different alkyl chain length. Int
H (2005) Principles for characterizing the J Pharm 84:13–22
potential human health effects from expo- 79. Lee CC, MacKay JA, Frechet JM, Szoka FC
sure to nanomaterials: elements of a screening (2005) Designing dendrimers for biological
strategy. Part Fibre Toxicol 2:8–35 applications. Nat Biotechnol 23:1517–1526
66. Keelan JA, Leong JW, Ho D, Iyer KS (2015) 80. Balogh L, Swanson DR, Tomalia DA,
Therapeutic and safety considerations of Hagnauer GL, McManus AT (2001)
nanoparticle-mediated drug delivery in preg- Dendrimer- silver complexes and
nancy. Nanomedicine 10:2229–2247 nanocomposites as antimicrobial agents.
67. Calvaresi M, Zerbetto F (2011) Fullerene Nano Lett 1:18–21
sorting proteins. Nanoscale 3:2873–2881 81. Kolosnjaj J, Szwarc H, Moussa F (2007)
68. Hansch C (1969) Quantitative approach to Toxicity studies of fullerenes and derivatives.
biochemical structure-activity relationships. Bio-applications of nanoparticles. Springer,
Acc Chem Res 2:232–239 New York, NY, pp 168–180
69. Puzyn T, Rasulev B, Gajewicz A, Hu X, Dasari 82. Yamakoshi Y, Umezawa N, Ryu A, Arakane
TD, Michalkova A, Hwang HM, Toropov A, K, Miyata N, Goda Y, Masumizu T, Nagano
Leszczynska D, Leszczynski J (2011) Using T (2003) Active oxygen species generated
nano-QSAR to predict the cytotoxicity of from photoexcited fullerene (C60) as poten-
metal oxide nanoparticles. Nat Nanotechnol tial medicines: O2-• versus 1O2. J Am Chem
6:175–178 Soc 125:12803–12809
70. Puzyn T, Leszczynska D, Leszczynski 83. Lovern SB, Klaper R (2006) Daphnia magna
J (2009) Toward the development of “nano- mortality when exposed to titanium dioxide
QSARs”: advances and challenges. Small and fullerene (C60) nanoparticles. Environ
5:2494–2509 Toxicol Chem 25:1132–1137
71. Sayes C, Ivanov I (2010) Comparative study 84. Zhu S, Oberdörster E, Haasch ML (2006)
of predictive computational models for Toxicity of an engineered nanoparticle (fuller-
nanoparticle-induced cytotoxicity. Risk Anal ene, C 60) in two aquatic species, Daphnia
30:1723–1734 and fathead minnow. Mar Environ Res
62:S5–S9
Nanoformulations for Drug Delivery: Safety, Toxicity, and Efficacy 365
85. Oberdörster E (2004) Manufactured nano- 96. Connor EE, Mwamuka J, Gole A, Murphy
materials (fullerene, C60) induce oxidative CJ (2005) Gold nanoparticles are taken up by
stress in the brain of juvenile largemouth bass. human cells but do not cause acute cytotoxic-
Environ Health Perspect 112:1058–1062 ity. Small 1:325–327
86. Shvedova A, Castranova V, Kisin E, 97. Niidome T, Yamagata M, Okamoto Y,
Schwegler-Berry D, Murray A, Gandelsman Akiyama Y, Takahashi H, Kawano T, Katayama
V, Maynard A, Baron P (2003) Exposure Y (2006) PEG-modified gold nanorods with
to carbon nanotube material: assessment of a stealth character for in vivo applications.
nanotube cytotoxicity using human kerati- J Control Release 114:343–347
nocyte cells. J Toxicol Environ Health Part A 98. Su CH, Sheu HS, Lin CY, Huang CC, Lo
66:1909–1926 YW, Pu YC, Weng JC, Shien JH, Chen JH,
87. Radomski A, Jurasz P, Alonso-Escolano D, Yeh CS (2007) Nanoshell magnetic resonance
Drews M, Morandi M, Malinski T, Radomski imaging contrast agents. J Am Chem Soc
MW (2005) Nanoparticle-induced plate- 129:2139–2146
let aggregation and vascular thrombosis. Br 99. Bernardi RJ, Lowery AR, Thompson
J Pharmacol 146:882–893 PA, Blaney SM, West JL (2008)
88. Warheit DB, Laurence BR, Reed KL, Roach Immunonanoshells for targeted photother-
DH, Reynolds GAM, Webb TR (2004) mal ablation in medulloblastoma and glioma:
Comparative pulmonary toxicity assess- an in vitro evaluation using human cell lines.
ment of single-wall carbon nanotubes in rats. J Neurooncol 86:165–172
Toxicol Sci 77:117–125 100. Lowery AR, Gobin AM, Day ES, Halas NJ,
89. Lam CW, James JT, McCluskey R, Hunter West JL (2006) Immunonanoshells for tar-
RL (2004) Pulmonary toxicity of single-wall geted photothermal ablation of tumor cells.
carbon nanotubes in mice 7 and 90 days Int J Nanomedicine 1:149–154
after intra-tracheal instillation. Toxicol Sci 101. O’Neal DP, Hirsch LR, Halas NJ, Payne JD,
77:126–134 West JL (2004) Photo-thermal tumor abla-
90. Hardman R (2006) A toxicologic review of tion in mice using near infrared-absorbing
quantum dots: toxicity depends on physico- nanoparticles. Cancer Lett 209:171–176
chemical and environmental factors. Environ 102. Lin W, Huang YW, Zhou XD, Ma Y (2006) In
Health Perspect 114:165–172 vitro toxicity of silica nanoparticles in human
91. Hoshino A, Fujioka K, Oku T, Suga M, Sasaki lung cancer cells. Toxicol Appl Pharmacol
YF, Ohta T, Yasuhara M, Suzuki K, Yamamoto 217:252–259
K (2004) Physicochemical properties and cel- 103. Chang JS, Chang KLB, Hwang DF, Kong ZL
lular toxicity of nanocrystal quantum dots (2007) In vitro cytotoxicity of silica nanopar-
depend on their surface modification. Nano ticles at high concentrations strongly depends
Lett 4:2163–2169 on the metabolic activity type of the cell line.
92. Hoshino A, Manabe N, Fujioka K, Suzuki K, Environ Sci Technol 41:2064–2068
Yasuhara M, Yamamoto K (2007) Use of flu- 104. Jin Y, Kannan S, Wu M, Zhao JX (2007)
orescent quantum dot bioconjugates for cel- Toxicity of luminescent silica nanoparticles to
lular imaging of immune cells, cell organelle living cells. Chem Res Toxicol 20:1126–1133
labeling, and nanomedicine: surface modifica- 105. Kumar MNV, Sameti M, Mohapatra SS, Kong
tion regulates biological function, including X, Lockey RF, Bakowsky U, Lindenblatt G,
cytotoxicity. J Artif Organs 10:149–157 Schmidt CH, Lehr CM (2004) Cationic
93. Choi AO, Cho SJ, Desbarats J, Lovric J, silica nanoparticles as gene carriers: synthesis,
Maysinger D (2007) Quantum dot-induced characterization and transfection efficiency
cell death involves Fas upregulation and lipid in vitro and in vivo. J Nanosci Nanotechnol
peroxidation in human neuroblastoma cells. 4:876–881
J Nanobiotechnol 5:1 106. Howard DH (1960) Effect of mycostatin
94. Lovric J, Cho SJ, Winnik FM, Maysinger D and fungizone on the growth of Histoplasma
(2005) Unmodified cadmium telluride quan- capsulatum in tissue culture. J Bacteriol
tum dots induce reactive oxygen species for- 79:442–449
mation leading to multiple organelle damage 107. Perlman D, Giuffre NA, Brindle SA (1961)
and cell death. Chem Biol 12:1227–1234 Use of Fungizone® in control of fungi and
95. Derfus AM, Chan WC, Bhatia SN (2004) yeasts in tissue culture. Proc Soc Exp Biol
Probing the cytotoxicity of semiconductor Med 106:880–883
quantum dots. Nano Lett 4:11–18
Chapter 18
Abstract
By the turn of the twenty-first century, the use of nutraceuticals became increasingly popular in both
humans and animals due to their easy access, cost-effectiveness, and tolerability with a wide margin of
safety. While some nutraceuticals are safe, others have a toxic potential. For a large number of nutraceuti-
cals, no toxicity/safety data are available due to a lack of pharmacological/toxicological studies. The safety
of some nutraceuticals can be compromised via contamination with toxic plants, metals, mycotoxins, pes-
ticides, fertilizers, drugs of abuse, etc. Knowledge of pharmacokinetic/toxicokinetic studies appears to
play a pivotal role in safety and toxicity assessment of nutraceuticals. Interaction studies are essential to
determine efficacy, safety, and toxicity when nutraceuticals and therapeutic drugs are used concomitantly.
This chapter describes various aspects of nutraceuticals, particularly their toxic potential, and the factors
influencing their safety.
Key words Nutraceuticals, Toxicity testing models, Toxic potential, Safe nutraceuticals, Pesticides,
Metals, Mycotoxins, Plant alkaloids, Drugs of abuse
1 Introduction
Orazio Nicolotti (ed.), Computational Toxicology: Methods and Protocols, Methods in Molecular Biology, vol. 1800,
https://doi.org/10.1007/978-1-4939-7899-1_18, © Springer Science+Business Media, LLC, part of Springer Nature 2018
367
368 Ramesh C. Gupta et al.
2 Common Nutraceuticals
Table 1
List of common nutraceuticals or their ingredients
Nutraceutical/ingredient Nutraceutical/ingredient
Acacia catechu Hawthorn (Crataegus oxycantha)
Acacia senegal Isoflavones
Aloe vera Kava (Piper methysticum)
Andrographis paniculata Krill oil (Euphausia superba)
Anthocyanins and anthocyanidin Lychee (Litchi chinensis Sonn)
Arginine and citrulline Maqui Berry (Aristotelia chilensis)
Ashwagandha, Indian ginseng (Withania somnifera) Melatonin
Astaxanthin (Haematococcus pluvialis) Methylsulfonylmethane (MSM)
Bee pollen Milk thistle (Silybum marianum)
Berberine Momordica charantia
Black kohosh (Actaea racemosa) Monsonia angustifolia
Boswellia serrata Mulberry branch bark powder (Ramulus
mori)
Caffeine Neem (Azadirachta indica) extract/oil
Cannabis sativa Olive leaf (Olea europaea L.)
Caralluma fimbriata Omega-3 fatty acids
Capsicum spp. Panax ginseng (Chinese ginseng)
Cassia fistula Pequi oil (Caryocar brasiliense, Camb)
Cassia occidentalis Perilla frutescens
Chondroitin sulfate Perna canaliculus (Green-lipped mussel)
Chromium (trivalent) polynicotinate Phyllanthus emblica (Amla extract)
Elderberry (Sambucus nigra) Quercetin
Ephedra sinica/Ephedra equisetina/Ma Huang Red marine alga (Alsidium corallinum)
Eremophila maculata Resveratrol
Eyebright Royal jelly proteins
Fenugreek (Trigonella foenum-graecum) Sarcandra glabra
Garcinia cambogia Shilajit
(continued)
370 Ramesh C. Gupta et al.
Table 1
(continued)
Nutraceutical/ingredient Nutraceutical/ingredient
Garlic and onion (Allium spp.) Spirulina (Arthrospira platensis)
Ginger (Zingiber officinale) St. John’s wort (Hypericum perforatum)
Ginkgo (Ginkgo biloba) Terminalia chebula extract
Glucosamine Terminalia muelleri Benth
Glucosinolates (Brassica napus) Thymoquinone (Nigella sativa)
Goldenseal (Hydrastis canadensis) Tibouchina granulosa
Grape seed extract Turmeric/curcumin (Curcuma longa)
Green coffee bean (Coffea arabica; Coffeea Type II collagen
canephora)
Green tea (Camellia sinensis) extract Yucca glauca (Yucca)
Guava (Psidium guajava) Valerian (Valeriana officinalis)
Guarana (Paullinia cupana) Wild garlic (Allium ursinum)
3.2 Ginkgo biloba The Ginkgo biloba, commonly referred to as “living fossil,” is a
hardy tree native to Asia, Europe, North America, Argentina, and
New Zealand. The main constituents in G. biloba are terpenoids
(ginkgolides A, B, and C, and bilobalide), the flavonoids (querce-
tin and its metabolites kaempferol and isorhamnetin), and alkyl
phenols (ginkgolic acid). These phytochemicals possess antioxida-
tive, anti-inflammatory and anticancer activities. G. biloba has been
used for medicinal purposes, including brain disorders, circulatory
disorders, respiratory disorders such as asthma, urinary tract dis-
eases, ear, nose, and throat problems, and as an antiparasitic.
Currently, G. biloba is one of the most commonly used herbal
medicines in both the USA and Europe [8]. G. biloba extract is
regulated as a dietary supplement in the USA, but in Germany and
France it is regulated as a prescription drug.
The long-term use of G. biloba extract has been associated with
spontaneous bleeding (including intracranial, hyphema, and retinal
hemorrhage) in humans [9]. Serious side effects, such as bleeding,
hematoma, hyphema, apraxia, neurological deficits, and even
death, have been reported in humans from the concurrent use of
G. biloba extract and anesthetics, analgesics, anticoagulant or anti-
platelet agents [10–12]. The ginkgotoxins from the seeds of G.
biloba are thought to be responsible for epileptiform seizures,
unconsciousness, and paralysis of the legs [13]. In a number of
toxicological studies, G. biloba components have shown potential
for mutagenicity and carcinogenicity [7, 14].
3.3 Green Tea Green tea (Camellia sinensis) infusions and extracts are used as a
Extract beverage, nutraceutical, and phytopharmaceutical around the world.
Green tea comprises more than 20% of the tea products sold and has
the highest annual increase in consumption [15]. There are more
than 2000 different chemical substances (including many flavan-
3-ols, flavones, and flavonols) associated with green tea. Green tea
chemicals have antioxidant, anti-inflammatory, immunomodulatory,
and anticancer properties. Green tea extract (GTE) is claimed to
372 Ramesh C. Gupta et al.
Table 2
List of nutraceuticals/dietary supplements/ingredients, and their toxic effects
Table 2
(continued)
Table 2
(continued)
3.4 Green Coffee Green coffee bean (GCB) refers to unroasted mature or immature
Bean/Caffeine coffee beans from Coffea arabica and Coffea canephora. Although
coffee bean has a mixture of more than 1000 phytochemicals, caf-
feine is the most significant from a nutraceutical perspective [23].
Caffeine is present in a wide range of products, i.e., coffee (40–
180 mg/150 mL), tea (24–50 mg/150 mL), soft drinks (15–
29 mg/180 mL), cocoa beverages (2–7 mg/150 mL), chocolate
bars (1–36 mg/28 g), and maté (~160 mg/L) [24]. By having
antioxidative, anti-inflammatory, and immunomodulatory proper-
ties, GCB phytochemicals have health claims in cardiovascular,
hepatic, pulmonary, and neurodegenerative diseases, type 2 diabe-
tes, obesity, and cancer.
Although caffeine has GRAS status from the US FDA and falls
into the category of functional foods, its risk assessment appears
complex. Toxic concentrations of caffeine are several times higher
than the concentration required to block adenosine receptors. To
inhibit cyclic nucleotide breakdown through phosphodiesterase
inhibition, 20 times higher concentrations of caffeine are required;
to block GABAA, 40 times higher concentrations; and to mobilize
intracellular calcium depots, 100 times higher concentrations [24].
In rats and other laboratory animal species, the LD50 of caffeine is
determined to be 150–265 mg/kg. In humans, ingestion of caf-
feine at 20 mg/kg body wt is considered toxic, and the lethal dose
has been estimated to be in the range of 150–200 mg/kg [25].
Blood concentrations of caffeine in excess of 30 μg/mL are associ-
ated with clinical signs of intoxication, such as anxiety, restlessness,
and tachycardia. The symptoms of acute and chronic caffeine intox-
ication after consumption of high doses (300–800 mg/person/
day) have been described as caffeinism. These symptoms include
dizziness, restlessness, agitation, anxiety, irritability, muscle tremor,
hyperventilation, arrhythmia, tachycardia, hypertension followed
by hypotension, diuresis, and gastrointestinal disorders, renal fail-
ure, respiratory failure, and seizures. Caffeine overdose may cause
hypokalemia, hyponatremia, impaired iron and zinc absorption,
rhabdomyolysis, and finally, circulatory collapse [23, 26, 27].
During pregnancy, caffeine is considered the most consumed
psychostimulant xenobiotic. The UK Government’s Food
Standards Agency, Committee on Toxicity and the US FDA con-
sider that caffeine intake during pregnancy is safe up to 300 mg/
person/day. Recently, Menezes and Da Silva [27] described in
376 Ramesh C. Gupta et al.
3.5 Garcinia Garcinia cambogia is a tropical tree native to Asia and Africa. The
cambogia extract of fruit and fruit rind has been used in the treatment of vari-
ous ailments, such as expulsion of intestinal parasites, dysentery,
rheumatism, as a cardiotonic in angina, and tumors. The fruit rind
is also used in rickets, enlargement of spleen, and for healing of
bone fractures. The fruit extracts from G. cambogia, G. indica, and
G. atroviridis are very rich in hydroxycitric acid (HCA). Other phy-
tochemicals include garcinol, guttiferones (A, E, F, and M), and
xanthones. HCA is a popular component of dietary supplements for
promotion of weight loss. Garcinia/HCA has been found to be
efficacious in ameliorating obesity-related complications, such as
inflammation, oxidative stress, and insulin resistance. G. cambogia
extract has also been shown to have antiulcerative properties.
In a number of in vivo studies (including two generation repro-
ductive and developmental), HCA/HCA-SX has been found to be
very safe [30]. HCA up to a dose of 2800 mg/day in humans has
been regarded as no observed adverse effect level. Rats receiving
ethanolic seed extract of G. cambogia (100 or 200 mg/kg body
wt/day for 6 days a week for 6 weeks) revealed, in a dose-dependent
manner, an increase in the interstitial spaces, degeneration of the
Leydig cells, and distortion in the arrangement of the cells of sper-
matogenic series in histological examination of the testes [31].
Dietary feeding of high doses of G. cambogia (102 or 154 mmol of
HCA/kg diet) to male Zucker obese rats for 90 days caused toxicity
in the testes. The weight of testes was reduced by half, and histo-
pathological changes included marked testicular atrophy and
impairment of spermatogenesis. Kim et al. [32] fed mice G. cambo-
gia (1%) for 16 weeks, and noted hepatic fibrosis, inflammation,
lipid peroxidation, and increased mRNA levels of genes related to
oxidative stress. A number of cases of hepatotoxicity were reported
with the use of Hydroxycut®, an over-the-counter weight loss prod-
uct in the USA [33–35]. Of course, hepatotoxicity might also have
resulted from co-consumption of (−)-epigallocatechin gallate from
green tea extract, acetaminophen, alcohol, anabolic steroids, drugs,
Toxicity Potential of Nutraceuticals 377
3.6 Kava Kava (Piper methysticum), also known as kava kava, is an herbal
shrub that has been used for centuries in the South Pacific as a
social beverage and in traditional ceremonial rituals. For the last
two decades, kava roots and rhizome extracts have been used in
Western industrialized countries for treating mild to moderate anx-
iety, stress, insomnia, restlessness, and muscle fatigue. Several
reports of severe hepatotoxicity in kava consumers led the US FDA
and authorities in Europe to restrict sales of kava-containing prod-
ucts. Kava is still widely used as a dietary supplement to relieve
anxiety and stress in the USA.
Human studies using kava at therapeutic dosages have failed to
demonstrate any toxic effects. Prolonged use of a dose equivalent
to 400 mg or more of kava lactones/day is likely to cause the char-
acteristic skin lesions of kava toxicity (pigmented, dry, covered
with scales) which heals upon discontinuance of the kava extract.
At doses >9 g/day, liver enzymes can elevate and should be moni-
tored. Kavalactones extracted in organic solvents have been pro-
posed to account for kava-induced liver toxicity. In experimental
studies (in vitro and in vivo), Zhou et al. [37] identified flavoka-
wain B (FKB) as a hepatotoxin, and suggested its mechanism of
hepatotoxicity. FKB inhibits IKK activity (directly or indirectly),
leading to downregulation of NF-κB transcriptional activity, which
is crucial for hepatocellular survival. FKB also alters intracellular
redox levels and induces oxidative stress that is likely to be further
enhanced by blockade of NF-κB-mediated transcription and down-
regulation of SOD2. In essence, FKB induces GSH-sensitive oxi-
dative stress and hepatotoxicity through modulation of IKK/
NF-κB and MAPK signaling pathways.
3.7 St. John’s Wort St. John’s wort (Hypericum perforatum) is native to Europe, Asia,
and North Africa. Extracts of this plant are known to have at least
150 compounds. The side effects observed in large clinical trials
conducted over several weeks include minor gastrointestinal irrita-
tion, allergic reactions, tiredness, and restlessness [38]. Seizures and
confusion were reported in a 16-year-old female who consumed
hypericum extract tablets in an apparent suicide attempt [39]. In
some other cases, hypericum extract might have caused autonomic
arousal [40], hepatocellular carcinoma [41], and bone marrow
necrosis [42]. In addition, the plant extracts can upregulate and
downregulate gut and hepatic CYP enzymes and xenobiotic trans-
porters, and by these mechanisms it can alter the pharmacokinetics
and efficacy of concurrent medications, such as theophylline, warfa-
378 Ramesh C. Gupta et al.
3.8 Bitter Melon The plant bitter melon (Momordica charantia) originated from
India and was introduced in China in the fourteenth century.
Currently, it is being used worldwide as a food and a medicine. The
fruit has several biologically active chemical compounds such as gly-
cosides, saponins, alkaloids, fixed oils, triterpenes, proteins, and ste-
roids. Consumption of bitter melon is capable of lowering blood
glucose levels by increasing hepatic utilization of glucose and
decreasing hepatic glucose output [44], and is thereby used as an
herbal remedy to control diabetes. The fruit has shown not only
potent antihyperglycemic activity in various animal models and clin-
ical trials, but also a dose-related hypoglycemic effect at higher
doses in normoglycemic rats [45]. Therefore, the use of bitter
melon with insulin, sulfonylureas, or meglitinides should be cau-
tioned for theoretical risk of increased hypoglycemic events in the
pharmacotherapeutic management of type 2 diabetes [46]. Other
side effects of bitter melon are diarrhea, abdominal pain, headache,
fever, hypoglycemia, heart atrial fibrillation, urinary incontinence,
and chest pain. Bitter melon is contraindicated in pregnant women
because it can cause bleeding, contractions, stimulation of men-
struation, and miscarriage [47]. Also, those deficient in the glucose-
6-phosphate dehydrogenase gene should avoid consumption of
bitter melon preparations due to the presence of vicine in the seeds.
3.9 Aloe vera Aloe vera, also referred to as “plant of immortality,” is cultivated
around the world. It is a stemless plant and the whole leaf extract
(WLE: gel and latex) contains more than 200 chemicals, including
amino acids, vitamins, minerals, lignin, and phytosterols. The latex
contains ~80 chemical constituents and most of them are anthrones
and anthraquinones (mainly barbaloin, also known as aloin A; and
isobarbaloin, aloin B; and aloeresin A and aloeresin B). Aloe vera
has many properties, including wound-healing, antiaging, anti-
inflammatory, fungicidal, antiviral, antibacterial, laxative, immuno-
logical, antiseptic, and antitumor [48]. It is claimed to have
therapeutic effects in arthritis, asthma, chronic fatigue syndrome,
digestive and bowel disorders, external and internal ulcers, psoria-
sis, genital herpes, hypertension and diabetes [49].
Use of topical Aloe vera is not associated with significant side
effects, but its oral ingestion may cause abdominal cramps and
diarrhea and thereby decrease the absorption of drugs (http://
nccih.nih.gov/health/aloevera). IARC [50] have found ingested
nondecolorized liquid Aloe vera to be carcinogenic in animals, and
state that it is a possible carcinogen in humans as well. Under the
guidelines of California Proposition 65, orally ingested nondecol-
orized Aloe vera WLE has been listed by the OEHHA among
Toxicity Potential of Nutraceuticals 379
3.10 Ephedrine The ingredient sources of the ephedrine alkaloids in dietary supple-
Alkaloids ments include raw botanicals and extracts from those plant sources
[53]. Ephedra sinica Stapf, Ephedra equisetina Bunge, Ephedra
intermedia var. tibetica Stapf, Ephedra distachya L. (the Ephedras),
Sida cordifolia, and Pinellia ternata (Thunb) are sources of ephed-
rine alkaloids. Other common names that have been used for the
various plants that contain ephedrine alkaloids include sea grape,
yellow horse, joint fir, popotillo, and country mallow. Ephedra spp.
is used to produce Ma huang, an herbal drug used in asthma, allergy,
and cold formulations, diet pills, and various supplements [54–56].
Cardiotoxicity associated with ephedra has been well documented
[57, 58]. Drugs containing ephedrine, in the form of Ma huang, are
often combined with caffeine, in the form of guarana. This combina-
tion of drugs acts synergistically, enhancing the toxicity of this prod-
uct. Doses of 1.3–88.9 mg/kg Ma huang given concurrently with
4.4–296.2 mg/kg guarana have been associated with clinical toxico-
sis in dogs. One dog given a dose of 5.8 mg/kg Ma huang and
19.1 mg/kg guarana died. In a number of studies in rodents, the
combined use of ephedra and caffeine has been associated with car-
diovascular toxicity and death [59, 60]. In the Federal Register of
February 11, 2004 (69 FR 6788), it was concluded by the FDA that
dietary supplements containing ephedrine alkaloids are adulterated
under section 402(f)(1)(A) [21 U.S.C. 342 (f)(1)(A)] of the Act
because they present an unreasonable risk of illness or injury under
the conditions of use recommended or suggested in labeling, or if
380 Ramesh C. Gupta et al.
3.11 Pennyroyal Oil Even though pennyroyal (Mentha pulegium) oil is extremely toxic,
people have relied on the fresh and dried herb for centuries.
Pennyroyal oil contains 85% of the ketone pulegone. The oil has
been used in folklore medicine for many years as an abortifacient
and as a means to induce menstruation. The abortive effect of the
oil is thought to be due to irritation of the uterus with subsequent
uterine contractions. Sullivan et al. [61] reported two cases of
pennyroyal oil ingestion for the purpose of abortion, and noted
shock, disseminated intravascular coagulation, massive hepatic
necrosis, and death. Anderson et al. [62] identified pulegone and
menthofuran in patients’ serum causing hepatotoxicity and death.
In addition to hepatotoxicity, the oil is known to cause gastroin-
testinal upset and bleeding, hematuria, vaginal bleeding, renal fail-
ure, and coma.
3.12 Pequi Pequi (Caryocar brasiliense, Camb) almond oil has been reported
Almond Oil to possess unsaturated fatty acid and antioxidant compounds (phe-
nolics, gallic acid, quercetin, and carotenoids) related to beneficial
effects on oxidative stress and inflammatory conditions, and thereby
provides hepatoprotective effects [63]. The oil extracted from
Pequi pulp has been evaluated in oral acute and subchronic toxicity
studies in rats [64]. Acute toxicity conducted in female Wistar rats
revealed LD50 > 2000 mg/kg body wt. In a subchronic study, male
and female Wistar rats received repeated doses of 125, 250, 500, or
1000 mg/kg body wt for 28 days. In both acute and subchronic
studies, Pequi oil was found to be of low toxicity. In subchronic
studies, Pequi oil caused some hematopoietic abnormalities.
5.1 Pyrrolizidine Among phytotoxicants, pyrrolizidine alkaloids (PAs) are the most
Alkaloids potent toxins present in plant species, including Crotalaria, Senecio,
Heliotropium, Symphytum, Cynoglossum, Amsinckia, and Echitum
[70–73]. More than 150 PAs have been identified, and some of these
PAs commonly contaminate the food and herbal nutraceuticals. The
PAs contain the pyrrolizidine nucleus and can be represented by the
basic structures of senecionine and heliotrine. The toxic effects of PAs
are somewhat similar, although their potency varies due to their bio-
activation in the liver to toxic metabolites called pyrroles [70]. These
pyrroles are powerful alkylating agents that react with cellular pro-
teins and cross-link DNA, resulting in cellular dysfunction, abnormal
mitosis, and tissue necrosis. Many PAs can cause adverse health
effects, including hepatotoxicity in humans and animals [72, 74–76].
Retrorsine and monocrotaline share the same core structure (ret-
ronecine) and similar metabolic activation pathway, but retrorsine is
more hepatotoxic than monocrotaline [76]. PA toxicity may also dis-
rupt other hepatic functions such as copper metabolism, clotting fac-
tors, NH3 metabolism, and protein metabolism. Recently, Zhu et al.
[77] reported that the significant persistence of PA-derived DNA
adducts in vivo supports their role in serving as a mechanism-based
biomarker of PA exposure and toxicity.
The FDA’s position is clear that PAs are harmful following oral,
nonoral (suppositories), or through broken skin routes of exposure.
Although liver damage is the major documented form of injury to
humans from herbs containing PAs, animal studies suggest that
their toxicity is much broader, affecting the lungs, kidneys, GI tract,
and CNS [53, 70]. In addition, many of the PAs have potential for
cytotoxicity and carcinogenicity. Risk assessment of these alkaloids
is currently based on the carcinogenicity of certain PAs after chronic
application to rats using the sum of detected PAs as the dose metric
[71]. These authors have derived interim Relative Potency (REP)
factors for a number of abundant PAs for use in toxicological risk
management. The FDA ruling banned the internal use of Symphytum
officinale (common comfrey), S. asperum (prickley comfrey), and
Symphytum × uplandicum (Russian comfrey), as well as any other
plant/substance containing PAs.
382 Ramesh C. Gupta et al.
5.2 Heavy Metals Nutraceuticals, in general, are considered relatively safe, but their
contamination by certain metals can make them unsafe. In fact,
heavy metal contamination in traditional medicines has been well
documented [78–81]. Because of their cumulative properties and
toxicity, heavy metal concentrations could potentially reach levels
that lead to hazardous effects on human and animal health.
Cadmium (Cd), mercury (Hg), lead (Pb), and arsenic (As) are
nonessential toxic elements of special concern because of their
toxicity even at low concentrations [3, 80]. Lead exposure has
been associated with renal tumors, reduced cognitive develop-
ment, and increased blood pressure and cardiovascular toxicity.
In a recent retrospective-observational study, traditional and folk
remedies were found to contain toxic amounts of Pb causing
abdominal pain, constipation, hypertension, anemia, neurologi-
cal symptoms, and kidney and liver dysfunction [82]. Cadmium
may induce toxicity in the kidneys, lungs, bones, and reproduc-
tive and developmental system [83]. Arsenic can cause toxic
effects on dermal, respiratory, GIT, hepatic, cardiovascular, ner-
vous, and hematological systems [83]. Both Cd and As are carci-
nogenic. Toxic effects of Hg poisoning can occur in nervous,
renal, cardiovascular, reproductive and developmental, and
immunological systems [84, 85].
The World Health Organization (WHO) has regulated maxi-
mum permissible limits of toxic metals like As, Hg, Cd, and Pb in
herbal medicines, which amount to 10 ppm, 1.0 ppm, 0.3 ppm,
and 10 ppm, respectively [3]. The USA has standardized recom-
mended daily dietary allowances for essential dietary trace elements,
but not for toxic metals [81]. In Europe, the following limits have
been set: 3 ppm for Pb, 1 ppm for Cd (except for seaweed products,
where the limit is set at 3 ppm); and 0.1 ppm for Hg [86]. Cooper
et al. [87] found that certain traditional Chinese medicines (TCM)
were severely contaminated with As, Pb, and Hg. This unacceptable
finding can place consumers at risk for severe or even fatal heavy
metal/metalloid poisoning from the consumption of these phyto-
medicines. Thus, measuring metal concentrations in nutraceuticals
and food ingredients is important for the assessment of safety and
toxicity [88].
5.3 Pesticides For the large-scale cultivation of herbal and food plants, the use of
pesticides is inevitable. Analysis of crude herbal material has often
shown the presence of pesticide residues [3, 89–92]. Among many
types of pesticides, the use of insecticides and herbicides is maximal,
and of course, insecticides are significantly more toxic than any
other type of pesticides. Insecticides are of several types, including
organophosphates (chlorpyrifos, diazinon, malathion, methyl para-
thion, etc.), carbamates (aldicarb, bendiocarb, carbaryl, carbofuran,
methomyl, etc.), organochlorines (aldrin, DDT, heptachlor, lin-
dane, methoxychlor, etc.), pyrethroids (allethrin, cyphenothrin,
deltamethrin, permethrin, etc.), and neonicotinoids (acetamiprid,
Toxicity Potential of Nutraceuticals 383
5.4 Mycotoxins Mycotoxins are secondary metabolites produced by fungi that can
contaminate herbs and dietary supplements. Mycotoxins can be
produced during cultivation of plants, transportation of raw plant
material, and during storage of supplements. Commonly found
mycotoxins in nutraceuticals and dietary ingredients include afla-
toxins, ochratoxin A, citrinin, and others. Raman et al. [93] evalu-
ated several botanical supplements and found the presence of molds
in a great number of samples. The most frequently isolated molds
in medicinal plants were Penicillium, Aspergillus, and Fusarium
[94]. Packed samples of herbal plants have a higher probability of
being infected with molds than nonpacked samples due to increased
humidity inside the pack and unsuitable storage methods [3].
A few studies that found the presence of mycotoxins in herbal
products are cited here in brief. Bungo et al. [95] analyzed raw
medicinal plants collected from Brazilian markets and found con-
tamination with Fusarium and Penicillium. Santos et al. [96]
reported that nearly 87% of medicinal herb samples analyzed in
Spain contained four or more mycotoxins. Romagnoli et al. [97]
detected aflatoxin B1 contamination in different kinds of spices, aro-
matic herbs, and medicinal plants from Italy. Aflatoxin B1 is mainly
produced by the fungus Aspergillus flavus, and its toxicity includes
hepatotoxicity, hepatocarcinogenicity, and mutagenic, teratogenic,
and immunosuppressive effects [98]. The European Pharmacopeia
has set a limit of 2 ppb for aflatoxin B1 and 4 ppb for the sum of
aflatoxins (B1, B2, G1, and G2) for some medicinal herbs [3].
Ochratoxin A (OTA), produced by Penicillium verrucosum,
Aspergillus ochraceus, A. carbonarius, and A. niger, is another
mycotoxin of health concern as it is known to produce nephro-
toxic, hepatotoxic, embryotoxic, teratogenic, neurotoxic, immu-
notoxic, genotoxic, and carcinogenic effects via multiple
mechanisms [3, 99–101]. In a recent study, Gan et al. [99] showed
that OTA at 400 and 800 μg/kg diets significantly increased OTA
concentrations in serum, kidney and spleen, and induced the histo-
pathological lesions in kidney and spleen of pigs. OTA decreased
T-cell receptor (TCR) stimulated T lymphocyte viabilities and IL-2
concentration, increased TNF-α concentration, and decreased
384 Ramesh C. Gupta et al.
Currently, there are many models for efficacy, safety, and toxicity
evaluation of nutraceuticals and food ingredients. The majority of
nutraceuticals are effective against diseases because their ingredi-
ents exert antioxidative, anti-inflammatory, sedative, and adapto-
genic and immunomodulatory properties [103–107]. To evaluate
such properties, in addition to in vivo studies, high-throughput in
vitro, in silico, and omics technology-based assays are available. To
evaluate the safety and toxicity of nutraceuticals, some of the mod-
els utilize vertebrate species using invasive and noninvasive
approaches [108], while others utilize nonvertebrate species, such
as zebrafish and Caenorhabditis elegans [109, 110]. Presently,
alternative in vitro models for safety and toxicity evaluation of
nutraceuticals are considered very informative [111]. Additionally,
there are many novel mechanistic and predictive models to envis-
age adverse outcome pathways and evaluate the safety and toxicity
of nutraceuticals [67, 107, 112–114]. Due to space constraints,
these models are not discussed here, so readers are referred else-
where to a comprehensive resource [2].
Acknowledgment
The authors would like to thank Ms. Robin B. Doss for her techni-
cal assistance in preparation of this chapter.
References
1. Pandey M, Verma RK, Saraf SA (2010) Nutraceuticals: efficacy, safety and toxicity.
Nutraceuticals: new era of medicine and Academic Press/Elsevier, Amsterdam,
health. Asian J Pharmaceut. Clin Res pp 883–892
3:11–15 8. Heinonen T, Wilhelm G (2015) Cross match-
2. Gupta RC (ed) (2016) Nutraceuticals: effi- ing observations on toxicological and clinical
cacy, safety and toxicity. Academic Press/ data for the assessment of tolerability and
Elsevier, Amsterdam. 1022 pages safety of Ginkgo biloba leaf extract. Toxicology
3. Gil F, Hernández AF, Martín-Domingo MC 327:95–115
(2016) Toxic contamination of nutraceuticals 9. Bent S, Goldberg H, Padula A et al (2005)
and food ingredients. In: Gupta RC (ed) Spontaneous bleeding associated with Ginkgo
Nutraceuticals: efficacy, safety and toxicity. biloba; a case report and systematic review of
Academic Press/Elsevier, Amsterdam, the literature. J Gen Intern Med 20:657–661
pp 825–837 10. Posadzki P, Watson L, Ernst E (2012) Herb-
4. FDA (1994) Dietary supplement health and drug interactions: an overview of systematic
education act of 1994. Congress, Pub. L. reviews. Br J Clin Pharmacol 75:603–618
www.fda.gov/DietarySupplement/default. 11. Diamond B, Baily M (2013) Ginkgo biloba;
htm indications, mechanisms and safety. Psychiatr
5. NTP (2010) Toxicology and carcinogenesis Clin North Am 36:73–78
studies of goldenseal root powder (Hydrastis 12. Dziwenka M, Coppock RW (2016) Ginkgo
canadensis) in F344/N rats and B6C3F1 biloba. In: Gupta RC (ed) Nutraceuticals: effi-
mice (feed studies). Natl Toxicol Program cacy, safety and toxicity. Academic Press/
Tech Rep Ser 562:1–188 Elsevier, Amsterdam, pp 681–691
6. Gurley BJ, Gardner SF, Hubbard MA et al 13. Ude C, Schubert-Zsilavecz M, Wurglics M
(2005) In vivo effects of goldenseal, kava kava, (2013) Ginkgo biloba extracts: a review of the
black kohosh, and valerian on human cyto- pharmacokinetics of the active ingredients.
chrome P450 1A2, 2D6, 2E1, and 3A4/5 Clin Pharmacokinet 52:727–749
phenotypes. Clin Pharmacol Ther 14. NTP (2013) Toxicology and carcinogenesis
77:415–426 studies of Ginkgo biloba extract (CAS No.
7. Zhang Z, Mei N, Chen S et al (2016) 90045-36-6) in F344/N rats and B6C3F1/N
Assessment of genotoxic effects of selected mice (Gavage studies). Nat Toxicol Program
herbal dietary supplements. In: Gupta RC (ed) Tech Report Ser 578:1–183
Toxicity Potential of Nutraceuticals 389
15. Coppock RW, Dziwenka M (2016) Green tea 29. EFSA (2015) Scientific opinion on the safety
extract. In: Gupta RC (ed) Nutraceuticals: of caffeine. EFSA J 13(5):1–120
efficacy, safety and toxicity. Academic Press/ 30. Raina R, Mondhe DM, Malik JK, Gupta RC
Elsevier, Amsterdam, pp 633–652 (2016) Garcinia cambogia. In: Gupta RC
16. Kapetanovic IM, Crowell JA, Krishnaraj R (ed) Nutraceuticals: efficacy, safety and toxic-
et al (2009) Exposure and toxicity of green ity. Academic Press/Elsevier, Amsterdam,
tea polyphenols in fasted and nonfasted dogs. pp 669–680
Toxicology 260:28–38 31. Kayode OA, Jimoh Olusegun R, Adesanya
17. Lambert JD, Kennett MJ, Sang S et al (2010) Olamide A et al (2007) Effects of crude etha-
Hepatotoxicity of high oral dose (-)- nolic extract of Garcinia cambogia on the
epigallocatechin- 3-gallate in mice. Food reproductive system of male Wistar rats (Rattus
Chem Toxicol 48:409–416 norwegicus). Afr J Biotechnol 6:1236–1238
18. Chandra AK, Choudhury SR, De N et al 32. Kim YJ, Choi MS, Park YB et al (2013)
(2011) Effect of green tea (Camellia sinensis) Garcinia cambogia attenuates diet induced
extract on morphological and functional adiposity but exacerbates hepatic collagen
changes in adult male gonads of albino rats. accumulation and inflammation. World
Indian J Exp Biol 49:689–697 J Gastroenterol 19:4689–4701
19. Shimizu M, Shirakami Y, Sakai H et al (2015) 33. Dara L, Hewett J, Lim JK (2008) Hydroxycut
Chemopreventive potential of green tea cate- hepatotoxicity: a case series and review of liver
chins in hepatocellular carcinoma. Int J Mol toxicity from herbal weight loss supplements.
Sci 16:6124–6139 World J Gastroenterol 14:6999–7004
20. Ogunleye AA, Xue F, Michels KB (2010) 34. Sharma T, Wong L, Tsai N et al (2010)
Green tea consumption and breast cancer risk Hydroxycut® (herbal weight loss supplement)
or recurrence: a meta-analysis. Breast Cancer induced hepatotoxicity: a case report and review
Res Treat 119:477–484 of literature. Hawaii Med J 69:188–190
21. Abd El-Aty AM, Choi JH, Rahman MM et al 35. Kaswala DH, Shah S, Patel N et al (2014)
(2014) Residues and contaminants in tea and Hydroxycut-induced liver toxicity. Ann Med
tea infusions: a review. Food Addit Contam Health Sci Res 4:143–145
Part A 31:1794–1804 36. Lopez AM, Kornegay J, Hendrickson RG
22. Wang J, Cheung W, Leung D (2014) (2014) Serotonin toxicity associated with
Determination of pesticide residue transfer Garcinia cambogia over-the-counter supple-
rates (percent) from dried tea leaves to brewed ment. J Med Toxicol 10:399–401
tea. J Agric Food Chem 62:966–983 37. Zhou P, Gross S, Liu J-H et al (2010)
23. Garg SK (2016) Green coffee bean. In: Gupta Flavokawain B, the hepatotoxic constituent
RC (ed) Nutraceuticals: efficacy, safety and from kava root, induces GSH-sensitive oxida-
toxicity. Academic Press/Elsevier, Amsterdam, tive stress through modulation of IKK/
pp 653–667 NF-κB and MAPK signaling pathways.
24. Fredholm BB, Bättig K, Holmén J et al (1999) FASEB J 24:4722–4732
Actions of caffeine in the brain with special ref- 38. Coppock RW, Dziwenka M (2016) St. John’s
erence to factors that contribute to its wide- wort. In: Gupta RC (ed) Nutraceuticals: effi-
spread use. Pharmacol Rev 51:83–133 cacy, safety and toxicity. Academic Press/
25. Rudolph T, Knudsen K (2010) A case of fatal Elsevier, Amsterdam, pp 619–631
caffeine poisoning. Acta Anesth Scand 39. Becker LC, Bergfeld WF, Belsito DV et al
54:521–523 (2014) Amended safety assessment of
26. Campana C, Griffin PL, Simon EL (2014) Hypericum perforatum-derived ingredients as
Caffeine overdose resulting in severe rhabdo- used in cosmetics. Int J Toxicol 33(3
myolysis and acute renal failure. Am J Emerg suppl):5S–23S
Med 32:111.e3–111.e4 40. Brown TM (2000) Acute St. John’s wort tox-
27. Menezes FP, Da Silva RS (2017) Caffeine. In: icity. Am J Emerg Med 18:231–232
Gupta RC (ed) Reproductive and develop- 41. Lampri ES, Ioachim E, Harissis H et al (2014)
mental toxicology, 2nd edn. Academic Press/ Pleomorphic hepatocellular carcinoma fol-
Elsevier, Amsterdam, pp 399–411 lowing consumption of Hypericum perfora-
28. Roberts A (2016) Caffeine: an evaluation of tum in alcoholic cirrhosis. Wolrd
the safety database. In: Gupta RC (ed) J Gastroenterol 20:2113–2116
Nutraceuticals: efficacy, safety and toxicity. 42. Demiroglu YZ, Yeter TT, Boga C et al (2005)
Academic Press/Elsevier, Amsterdam, Bone marrow necrosis: a rare complication of
pp 417–434 herbal treatment with Hypericum perforatum
390 Ramesh C. Gupta et al.
(St. John’s Wort). Acta Med Austriaca 56. Ooms TG, Khan S (2001) Suspected caffeine
48:91–94 and ephedrine toxicosis resulting from
43. Gupta RC, Chang D, Nammi S et al (2017) ingestion of an herbal supplement contain-
Interactions between antidiabetic drugs and ing guarana and ma huang in dogs: 47 cases
herbs: an overview of mechanisms of action (1997-1999). J Am Vet Med Assoc
and clinical implications. Diabetol Metab 218:225–229
Syndr 9:59 57. Andraws R, Chawla P, Brown DL (2005)
44. Joseph B, Jini D (2013) Antidiabetic effects Cardiovascular effects of ephedra alkaloids: a
of Momordica charantia (bitter melon) and comprehensive review. Prog Cardiovasc Dis
its medicinal potency. Asian Pac J Trop Dis 47:217–225
3(2):93–102 58. Flanagan CM, Kaesberg JL, Mitchell ES et al
45. Ojewole JA, Adewole SO, Olayiwola G (2010) Coronary artery and thrombosis fol-
(2006) Hypoglycemic and hypotensive effects lowing chronic ephedra use. Int J Cardiol
of Momordica charantia Linn (Cucurbitaceae) 139(1):e11–e13
whole-plant aqueous extract in rats. 59. Nyska A, Murphy E, Foley JE et al (2005)
Cardiovasc J South Afr 17:227–232 Acute hemorrhagic myocardial necrosis and
46. Chan N, Li S, Perez E (2016) Interactions sudden death of rats exposed to a combina-
between Chinese nutraceuticals and western tion of ephedrine and caffeine. Toxicol Sci
medicines. In: Gupta RC (ed) Nutraceuticals: 83:388–396
efficacy, safety and toxicity. Academic Press/ 60. Dunnick JK, Kissling G, Gerken DK et al
Elsevier, Amsterdam, pp 875–882 (2007) Cardiotoxicity of Ma Huang/caffeine
47. Bitter Mellon (2013) Bitter Mellon. http:// in a rodent model system. Toxicol Pathol
www.zmescience.com. Accessed 20 March 35:657–666
2013 61. Sullivan JB, Rumack BH, Thomas H et al
48. Gupta VK, Malhotra S (2012) Pharmaco (1979) Pennyroyal oil poisoning and hepato-
logical attribute of Aloe vera: revalidation toxicity. JAMA 242:2873–2874
through experimental and clinical studies. 62. Anderson IB, Mullen WH, Mecker JE et al
AYU 33:193–196. http://www.ayujournal. (1996) Pennyroyal toxicity: measurement of
org/text.asp?2012/33/2/193/105237 toxic metabolite levels in two cases and review of
49. Choudhary M, Kochhar A, Sangha J (2014) the literature. Ann Intern Med 124:726–734
Hypoglycemic and hypolipidemic effect of 63. Torres LRDO, de Santana FC, Torres-Leal
Aloe vera L. in non-insulin dependent diabet- FL et al (2016) Pequi (Caryocar brasiliense
ics. J Food Sci Technol 51:90–96 Camb) almond oil attenuates carbon
50. IARC (2006) IARC Monograph on the eval- tetrachloride-induced acute hepatic injury in
uation of carcinogenic risks to humans. rats: antioxidant and anti-inflammatory
Preamble. http://monographs.iarc.fr/ effects. Food Chem Toxicol 97:206–216
ENG/Preamble/CurrentPreamble.pdf 64. Traesel GK, Menegati SELT, dos Santos AC
51. Proposition 65 (2015) Chemicals listed effec- et al (2016) Oral acute and subchronic toxic-
tive December 4, 2015 as Known to the State ity studies of the oil extracted from Pequi
of California to cause cancer: Aloe vera, non- (Caryocar brasiliense, Camb) pulp in rats.
decolorized whole leaf extract and goldenseal Food Chem Toxicol 97:224–231
root powder. US Office of Environmental 65. Ang-Lee MK, Moss J, Chen-Su Y (2001)
Health Hazard Assessment. 4 December 2015 Herbal medicines and perioperative care.
52. Nesslany F, Simar-Meintieres S, Ficheux H Review. JAMA 286(2):208–216
et al (2009) Aloe-emodin-induced DNA frag- 66. Chan K (2003) Some aspects of toxic con-
mentation in the mouse in vivo comet assay. taminants in herbal medicines. Chemosphere
Mutat Res 678:13–19 52(9):1361–1371
53. Hilmas CJ, Fabricant DS (2014) Biomarkers 67. Jordan SA, Cunningham DG, Marles RJ
of toxicity for dietary ingredients contained in (2010) Assessment of herbal products: chal-
dietary supplements. In: Gupta RC (ed) lenges, and opportunities to increase the
Biomarkers in toxicology. Academic Press/ knowledge base for safety assessment. Toxicol
Elsevier, Amsterdam, pp 609–627 Appl Pharmacol 243:198–216
54. Means C (1999) Ma huang: all natural but 68. Shi S, Klotz U (2012) Drug interactions with
not always innocuous. Vet Med 94:511–512 herbal medicines. Clin Pharm 51:77–104
55. Means C (2005) Decongestants. In: Plumlee 69. Guo B, Wang M, Liu Y et al (2015) Wide-
K (ed) Clinical veterinary toxicology. Mosby, scope screening of illegal adulterants in dietary
St. Louis, MO, pp 309–311 and herbal supplements via rapid polarity-
Toxicity Potential of Nutraceuticals 391
switching and multistage accurate mass confir- 82. Mehta V, Midha V, Mahajan R et al (2017)
mation using an LC-IT/TOF hybrid Lead intoxication due to Ayurvedic medica-
instrument. J Agric Food Chem tions as a cause of abdominal pain in adults.
63:6954–6967 Clin Toxicol 55:97–101
70. Panter KE, Welch KD, Gardner DR (2014) 83. Flora SJS, Agrawal S (2017) Arsenic, cad-
Poisonous plants: biomarkers for diagnosis. mium, and lead. In: Gupta RC (ed)
In: Gupta RC (ed) Biomarkers in toxicology. Reproductive and developmental toxicology.
Academic Press/Elsevier, Amsterdam, Academic Press/Elsevier, Amsterdam,
pp 563–589 pp 537–566
71. Merz K-H, Schrenk D (2016) Interim rela- 84. Flora SJS (2014) Metals. In: Gupta RC (ed)
tive potency factors for the toxicological risk Biomarkers in toxicology. Academic Press/
assessment of pyrrolizidine alkaloids in food Elsevier, Amsterdam, pp 485–519
and herbal medicines. Toxicol Lett 85. Bridges CC, Zalpus RK (2017) The aging
263:44–57 kidney and the nephrotoxic effects of mer-
72. Preliasco M, Gardner D, Moraes J et al (2017) cury. J Toxicol Environ Health Part B
Senecio grisebachii Baker: Pyrrolizidine alka- 20:55–80
loids and experimental poisoning in calves. 86. Gasser U, Klier B, Kuhn AV et al (2009)
Toxicon 133:66–73 Current findings on the heavy metal content
73. Wang Y, Xiang L, Yi X et al (2017) Potential in herbal drugs. Pharmeur Sci Notes
anti-inflammatory steroidal saponins from the 1:37–49
barriers of Solanum nigrum (European black 87. Cooper K, Noller B, Connell D et al (2007)
nightshade). J Agric Food Chem Public health risks from heavy metals and met-
65:4262–4272 alloids present in traditional Chinese medi-
74. Winship KA (1991) Toxicity of comfrey. cines. J Toxicol Environ Health A
Adverse Drug React Toxicol Rev 10:47–59 70:1694–1699
75. Fu PP, Xia QS, He XB et al (2017) Detection 88. Bhat R, Kiran K, Arun AB (2010)
of pyrrolizidine alkaloid DNA adducts in liv- Determination of mineral composition and
ers of cattle poisoned with Heliotropium euro- heavy metal content of some nutraceutically
paeum. Chem Res Toxicol 30:851–858 valued plant products. Food Anal Methods
76. Yang XJ, Li WW, Sun Y et al (2017) 3:181–187
Comparative study of hepatotoxicity of pyr- 89. Ahmed MT, Loutfy N, Yousef Y (2001)
rolizidine alkaloids retrorsine and monocrota- Contamination of medicinal herbs with
line. Chem Res Toxicol 30:532–539 organophosphorus insecticides. Bull Environ
77. Zhu L, Xue J, Xia Q et al (2017) The long Contam Toxicol 66:421–426
persistence of pyrrolizidine alkaloid-derived 90. Wong TC, Lee FS, Hu GL et al (2007) A sur-
DNA adducts in vivo: kinetic study following vey of heavy metal and organochlorine pesti-
single and multiple exposures in male ICR cide contaminations on commercial Lingzhi
mice. Genotox Carcinogen 91:949–965 products. J Food Drug Anal 15:472–479
78. Meena AK, Bansal P, Kumar S et al (2010) 91. Sarkhail P, Yunesian M, Ahmadkhaniha R
Estimation of heavy metals in commonly used et al (2012) Levels of organophosphorus pes-
medicinal plants: a market basket survey. ticides in medicinal plants commonly con-
Environ Monit Assess 170:657–660 sumed in Iran. Daru J Pharm Sci 20:9
79. Harris ES, Cao S, Littlefield BA et al (2011) 92. Tong M, Gao W, Jiao W et al (2017) Uptake,
Heavy metal and pesticide content in com- translocation, metabolism, and distribution of
monly prescribed individual raw Chinese glyphosate in nontarget tea plant (Camellia
herbal medicines. Sci Total Environ sinensis L.). J Agric Food Chem
409:4297–4305 65:7638–7646
80. Rao MM, Meena AK et al (2011) Detection 93. Raman P, Patino LC, Nair MG (2004)
of toxic heavy metals and pesticide residue in Evaluation of metal and microbial contamina-
herbal plants which are commonly used in the tion in botanical supplements. J Agric Food
herbal formulations. Environ Monit Assess Chem 52:7822–7827
181:267–271 94. Abou-Arab AAK, Soliman Kawther M,
81. Sarma H, Deka S, Deka H et al (2011) Tantawy MEEI et al (1999) Quantity estima-
Accumulation of heavy metals in selected tion of some contaminants in commonly used
medicinal plants. Rev Environ Contam medicinal plants in the Egyptian market.
Toxicol 214:63–86 Food Chem 67:357–363
392 Ramesh C. Gupta et al.
95. Bungo A, Almodovar AAB, Pereira TC analysis on big data. In: Gupta RC (ed)
(2006) Occurrence of toxigenic fungi in Nutraceuticals: efficacy, safety and toxicity.
herbal drugs. Braz J Microbiol 37:47–51 Academic Press/Elsevier, Amsterdam,
96. Santos L, Marin S, Sanchis V et al (2009) pp 239–248
Screening of mycotoxin multicontamination 108. Peterson JD (2016) Noninvasive in vivo opti-
in medicinal and aromatic herbs sampled in cal imaging models for safety and toxicity
Spain. J Sci Food Agric 89:1802–1807 testing. In: Gupta RC (ed) Nutraceuticals:
97. Romagnoli B, Menna V, Gruppioni N et al efficacy, safety and toxicity. Academic Press/
(2007) Aflatoxins in spices, aromatic herbs, Elsevier, Amsterdam, pp 305–317
herb-teas, and medicinal plants marketed in 109. Barnet RE, Bailey DC, Hatfield HE, Fitsanakis
Italy. Food Control 18:697–701 VA (2016) Caenorhabditis elegans: a model
98. Prado G, Altoé AF, Gomes TC et al (2012) organism for nutraceutical safety and toxicity
Occurrence of aflatoxin B1 in natural prod- evaluation. In: Gupta RC (ed) Nutraceuticals:
ucts. Braz J Microbiol 43:1428–1436 efficacy, safety and toxicity. Academic Press/
99. Gan F, Hou L, Zhou Y et al (2017) Effects of Elsevier, Amsterdam, pp 341–354
ochratoxin A on ER stress, MAPK signaling 110. Bian W-P, Pei D-S (2016) Zebrafish model
pathway and autophagy of kidney and spleen for safety and toxicity testing of nutraceuti-
in pigs. Environ Toxicol 32:2277–2286 cals. In Gupta RC (ed): Nutraceuticals: effi-
100. Gupta RC, Lasher MA, Miller Mukherjee IR, cacy, safety and toxicity. Academic Press/
Srivastava A, Lall R (2017) Aflatoxins, ochra- Elsevier, Amsterdam, pp 333–339
toxins, and citrinin. In: Gupta RC (ed)
111. Krishna G, Gopalakrishnan G (2016)
Reproductive and developmental toxicology, Alternative in vitro models for safety and toxic-
2nd edn. Academic Press/Elsevier, ity evaluation of nutraceuticals. In: Gupta RC
Amsterdam, pp 945–962 (ed) Nutraceuticals: efficacy, safety and toxicity.
101. Gupta RC, Srivastava A, Lall R (2018)
Academic Press/Elsevier, Amsterdam,
Ochratoxins and citrinin. In: Gupta RC (ed) pp 355–385
Veterinary toxicology: basic and clinical prin- 112. Kadakkuzha BM, Liu X-A, Swarnkar S, Chen
ciples, 3rd edn. Academic Press/Elsevier, Y (2016) Genomic and proteomic mecha-
Amsterdam, pp 1019–1027 nisms and models in toxicity and safety evalu-
102. Ostry V, Malir F, Ruprich J (2013) Producers ation of nutraceuticals. In: Gupta RC (ed)
and important dietary sources of ochratoxin Nutraceuticals: efficacy, safety and toxicity.
A and citrinin. Toxins 5(9):1574–1586 Academic Press/Elsevier, Amsterdam,
pp 227–237
103. Gulati K, Anand R, Ray A (2016) Nutraceuticals
as adaptogens: their role in health and disease. 113. Gonzalez-Suarez I, Martin F, Hoenig J,
In: Gupta RC (ed) Nutraceuticals: efficacy, Peitsch MC (2016) Mechanistic network
safety and toxicity. Academic Press/Elsevier, models in safety and toxicity evaluation of
Amsterdam, pp 193–205 nutraceuticals. In: Gupta RC (ed)
Nutraceuticals: efficacy, safety and toxicity.
104. Ajayi AM, Umukoro S, Ben-Aju B et al
Academic Press/Elsevier, Amsterdam,
(2017) Toxicity and protective effect of phe- pp 287–304
nolic- enriched ethylacetate fraction of
Ocimum gratissimum (Linn.) leaf against 114. Mindukshew I, Kudryavtsev I, Serebriakova
acute inflammation and oxidative stress in M et al (2016) Flow cytometry and light scat-
rats. Drug Dev Res 78:135–145 tering technique in evaluation of nutraceuti-
cals. In: Gupta RC (ed) Nutraceuticals:
105. Burnaz NA, Kücük M, Akar Z (2017) An on- efficacy, safety and toxicity. Academic Press/
line HPLC system for detection of antioxi- Elsevier, Amsterdam, pp 319–332
dant compounds in some plant extracts by
comparing three different methods. 115. Anadón A, Martínez-Laraaga MR, Ires I et al
J Chromatogr B 1052:66–72 (2016) Interactions between nutraceuticals/
nutrients and therapeutic drugs. In: Gupta
106. Sobrinho AP, Minho AS, Ferreira LLC et al RC (ed) Nutraceuticals: efficacy, safety and
(2017) Characterization of anti-inflammatory toxicity. Academic Press/Elsevier,
effect and possible mechanism of action of Amsterdam, pp 855–874
Tibouchina granulosa. J Pharm Pharmacol
69:706–713 116. Sechi S, Di Cerbo A, Canello S et al (2016)
Effects in dogs with behavioral disorders of a
107. Wang K (2016) Adverse reaction prediction commercial nutraceutical diet on stress and
and pharmacovigilance of nutraceuticals: neuroendocrine parameters. Vet Rec.
examples of computational and statistical https://doi.org/10.1136/vr.103865
Toxicity Potential of Nutraceuticals 393
117. Mouly S, Lloret-Linares C, Sellier PO et al 128. Heuberger R (2012) Polypharmacy and food-
(2017) Is the clinical relevance of drug-food drug interactions among older persons: a
and drug-herb interactions limited to grape- review. J Nutri Gerontol Geriatr 31:325–403
fruit juice and Saint-John’s Wort? Pharmacol 129. Gochfeld M (2017) Sex differences in human
Res 18:82–92 and animal toxicology: toxicokinetics. Toxicol
118. Mooiman KD, Maas-Bakker RF, Hendrikx JJ Pathol 45:172–118
et al (2014) The effect of complementary and 130. Lee KW, Bode AM, Dong Z (2011) Molecular
alternative medicines on CYP3A4-mediated targets of phytochemicals for cancer preven-
metabolism of three different substrates: tion. Nat Rev Cancer 11:211–218
7-benzyloxy-4-trifluoromethyl-coumarin, 131. Posma JM, Garcia-Perez I, Heaton JC et al
midazolam and docetaxel. J Pharm Pharmacol (2017) Integrated analytical and statistical
166(6):865–874 two-dimensional spectroscopy strategy for
119. Oh HA, Lee H, Kim D et al (2017)
metabolite identification: application to dietary
Development of GC-MS based cytochrome biomarkers. Anal Chem 89:3300–3309
P450 assay for the investigation of multi-herb 132. Penman AD, Kaufman GE, Daniels KK
interaction. Anal Biochem 519:71–83 (2014) MicroRNA expression as an indicator
120. Shao F, Zhang H, Xie L et al (2017)
of tissue toxicity. In: Gupta RC (ed)
Pharmacokinetics of ginkgolides A, B and K Biomarkers in toxicology. Academic Press/
after single and multiple intravenous infusions Elsevier, Amsterdam, pp 1003–1018
and their interactions with midazolam in 133. Srivastava A, Kumar A, Thomas JD et al
healthy Chinese male subjects. Eur J Clin (2017) Association of acute toxic encepha-
Pharmacol 73:537–546 lopathy with litchi consumption in an out-
121. Zhang L, Sparreboom A (2017) Predicting break in Muzaffarpur, India, 2014: a
transporter-mediated drug interactions. Clin case-control study. Lancet Glob Health
Pharmacol Ther 101(4):447–449 5:e458–e466
122. Gaudineau C, Beckerman R, Welbourn S et al 134. Hsu CC, Lin MH, Cheng JT et al (2017)
(2004) Inhibition of human P450 enzymes Antihyperglycemic action of diosmin, a citrus
by multiple constituents of the Ginkgo biloba flavonoid, is induced through endogenous-
extract. Biochem Biophys Res Commun endorphin in type I-like diabetes rats. Clin
318(4):1072–1078 Exp Pharmacol Physiol 44:549–555
123. Unger M, Frank A (2004) Simultaneous
135. Sander J, Terhardt M, Sander S et al (2017)
determination of the inhibitory potency of Quantification of methylenecyclopropyl com-
herbal extracts on the activity of six major pounds and acyl conjugates by UPLC-MS/
cytochrome P450 enzymes using liquid chro- MS in the study of the biochemical effects of
matography/mass spectrometry and auto- the ingestion of canned Ackee (Blighia sap-
mated online extraction. Rapid Commun ida) and Lychee (Litchi chinensis). J Agric
Mass Sptectrom 18:2273–2281 Food Chem 65:2603–2608
124. Goey AK, Mooiman KD, Beijnen JH et al 136. Gupta RC (2014) Biomarkers in toxicology.
(2013) Relevance of in vitro and clinical data Academic Press/Elsevier, Amsterdam. 1128
for predicting CYP3A4-mediated herb-drug pages
interactions in cancer patients. Cancer Treat 137. Rao N, Spiller HA, Hodges NL et al (2017)
Rev 39:773–778 An increase in dietary supplement exposures
125. Durr D, Stieger B, Kullak-Ublick GA et al reported to US poison control centers. J Med
(2000) St. John’s Wort induces intestinal Toxicol 13:227–237
P-glycoprotein/MDR1 and intestinal and 138. Coulson JM, Caparrota TM, Thompson JP
hepatic CYP3A4. Clin Pharmacol Ther (2017) The management of ventricular dys-
68:598–604 rhythmia in aconite poisoning. Clin Toxicol
126. Dormán G, Flachner B, Hajdú I, András CD 55:313–321
(2016) Target identification and polypharma- 139. Cope RB (2005) Toxicology brief: Allium
cology of nutraceuticals. In: Gupta RC (ed) species poisoning in dogs and cats. Vet
Nutraceuticals: efficacy, safety and toxicity. Med:462–566
Academic Press/Elsevier, Amsterdam,
pp 263–286 140. IARC (2002) Some traditional herbal medi-
cines, some mycotoxins, naphthalene and sty-
127. Herr M, Grondin H, Sanchez S et al (2017) rene. Aristolochia species and aristolochic
Polypharmacy and potentially inappropriate acids. IARC monographs on the evaluation of
medications: a cross-sectional analysis among carcinogenic risks to humans, vol 82. WHO,
451 nursing homes in France. Eur J Clin Lyon, pp 69–128
Pharmacol 73:601–608
394 Ramesh C. Gupta et al.
Abstract
An extensive use of pharmaceuticals and the widespread practices of their erroneous disposal measures have
made these products contaminants of emerging concern (CEC). Especially, active pharmaceutical ingredi-
ents (APIs) are ubiquitously detected in surface water and soil, mainly in the aquatic compartment, where
they do affect the living systems. Unfortunately, there is a huge gap in the availability of ecotoxicological
data on pharmaceuticals’ environmental behavior and ecotoxicity which force EMEA (European Medicines
Agency) to release guidelines for their risk assessment. In silico modeling approaches are vital tools to
exploit the existing information to rapidly emphasize the potentially most hazardous and toxic pharmaceu-
ticals and prioritize the most environmentally hazardous ones for focusing further on their experimental
studies. The quantitative structure–activity relationship (QSAR) models are capable of predicting missing
properties for toxic end-points required to prioritize existing, or newly synthesized chemicals for their
potential hazard. This chapter reviews the information regarding occurrence and impact of pharmaceuti-
cals and their metabolites in the environment along with their persistence, environmental fate, risk assess-
ment, and risk management. A bird’s eye view about the necessity of in silico methods for fate prediction
of pharmaceuticals in the environment as well as existing successful models regarding ecotoxicity of phar-
maceuticals are discussed. Available toxicity endpoints, ecotoxicity databases, and expert systems frequently
used for ecotoxicity predictions of pharmaceuticals are also reported. The overall discussion justifies the
requirement to build up additional in silico models for quick prediction of ecotoxicity of pharmaceuticals
economically, without or involving only limited animal testing.
Key words APIs, Ecotoxicity, CEC, In silico, Pharmaceuticals, QSAR, Risk assessment, Risk manage-
ment, Waste management
1 Introduction
Orazio Nicolotti (ed.), Computational Toxicology: Methods and Protocols, Methods in Molecular Biology, vol. 1800,
https://doi.org/10.1007/978-1-4939-7899-1_19, © Springer Science+Business Media, LLC, part of Springer Nature 2018
395
396 Supratik Kar et al.
2.1 Source and Entry To understand the ecotoxicity of pharmaceuticals, the first step is
Routes to identify their sources and entry routes into the environment.
Major sources and familiar pathways for environmental pollution
of pharmaceuticals are illustrated below.
(a) Household disposal: Due to lack or improper instructions
about medication disposal, in many cases expired and unused
medicines are dumped through the toilet or via waste bins,
before being transferred to landfill sites as terrestrial ecosystem
Impact of Pharmaceuticals on the Environment: Risk Assessment Using QSAR Modeling… 399
2.3 Pharmaceutical The definition of medical waste according to EPA is “all waste
Hazardous Wastes materials generated at health care facilities, such as hospitals, clin-
and Their Treatment ics, physicians’ offices, dental practices, blood banks, and veteri-
nary hospitals/clinics, as well as medical research facilities and
laboratories.” Among the medical waste, pharmaceutical waste is
the most prominent and perilous ones. The USA has spent around
$2.5 billion for the disposal of medical waste in 2012. Interestingly,
with annual growth of 4.8%, by 2017, the increased cost is expected
to $3.2 billion [91]. For instance, just hospitals in the USA pro-
duce more than 5.9 million tons of waste annually. Therefore, one
can imagine the level of hazardous intensity generated by medical
waste all over the world as all healthcare activities considered to
humans generated medical wastes. The danger increased to mani-
fold by mishandling or improper disposal of these medical wastes.
Therefore, persons engaged to proper risk assessment and manage-
ment must be aware with types of medical wastes especially phar-
maceutical ones along with different approaches to treat them
efficiently to minimize the hazards to environment and living sys-
tems. A typical list of pharmaceutical hazardous waste with few
examples is illustrated in Fig. 2 and most commonly employed
treatment for pharmaceutical as well as medical waste to avoid high
risk of ecotoxicity is reported in Fig. 3.
Fig. 2 Types of pharmaceutical hazardous wastes with few examples [Color of the boxes for the medical waste
represents the color of the waste container]
Fig. 3 Different ways of treatment for medical wastes to avoid high risk of ecotoxicity
Impact of Pharmaceuticals on the Environment: Risk Assessment Using QSAR Modeling… 407
3.1 Risk Assessment The most commonly employed risk assessment approaches of
Approaches pharmaceuticals and their metabolites in various environment
compartments are discussed below [93].
3.1.1 Hazard The first and foremost step for risk assessment is the identification
Identification of source and occurrence of hazards which supports the intensity
of risk of a pharmaceutical. Majority of scientists highly relied on
in vivo data, but due to huge deficiency of reasonable data for
majority of pharmaceuticals related to specific species and definite
environment compartment, greater effort should be offered on the
efficient usage of in vitro assays and in silico analysis, as well as the
use of computational techniques in systems biology [94].
3.1.2 Dose- Detection of threshold dose of the toxic effect is imperative for
Response scientific risk assessment of any hazards. Dose-response informa-
Assessment tion over a wide range of test concentrations should be performed
through quantitative high throughput screening (q-HTS).
Additionally, sensitive assays should be able to detect toxicity at
very low doses or below environmental levels experienced by living
organisms. There should be sufficient scope available to extrapo-
late adversarial responses and to assess critical concentrations data
employing statistical approaches [95].
3.1.3 Dose and Species Major drawbacks of risk assessment are low-dose toxicity and lack
Extrapolation of interspecies extrapolation data. In some cases, regulatory
authorities and government organizations have implemented in
silico models and expert systems as alternatives to deal with these
problems. In vitro to in vivo extrapolation and physiologically
based pharmacokinetic (PBPK) models are agreeable to sensitivity,
variability, and uncertainty analysis using conventional tools [96].
3.1.4 Risk The final phase of the ecological risk assessment is the risk charac-
Characterization terization which integrates the analyses from the exposure and eco-
logical effects characterization along with the doubts, hypothesis,
strengths and limitations of the analyses. The risk characterization
has two major components: risk estimation and risk description.
Again, risk estimation compares integrated exposure and effects
data in context of Levels of Concern (LOCs) and states the poten-
tial for risk [97].
408 Supratik Kar et al.
3.1.5 Deterministic The EPA uses a deterministic approach or the risk quotient (RQ)
Approach and Calculation to evaluate toxicity to environmental exposure which is calculated
of Risk Quotients by dividing a point estimate of exposure by a point estimate of
effects. Calculation of RQ are based upon ecological effects data,
hazards use data, fate and transport data, and estimates of exposure
to the hazards. Thus, the estimated environmental concentration
(EEC) is compared to an effect level, such as an LC50 (the concen-
tration where 50% of the organisms die.)
RQ = Exposure/Toxicity
3.2 Environmental The risk assessment model considers the safety issues and RQ of
Risk Assessment individual pharmaceutical products. The most common approaches
Modeling are offered in the guidance for environmental assessments for reg-
of Pharmaceuticals ulatory drug approvals by the US FDA [28] or by the European
Medicines Agency (EMA) [26]. It is important to evaluate expo-
sure of any pharmaceutical by the following ways previous to model
a toxicological study [98]:
(a) For modeling purpose, the exposure is assessed in the form of
occurrence or the environmental concentration to which the
biological system is exposed along with the duration and fre-
quency being not on the concentrations to which individual
living system is exposed. Exposure is also dependent on many
miscellaneous factors such as sorption effects, metabolism,
transformation processes, and fate.
(b) The life cycle of any organism must be taken into account for
understanding the effect of pharmaceuticals on them.
(c) The MOA of pharmaceuticals needs to be determined to
depict each step of molecular and functional effects.
(d) Proper understanding of the pathways and target sites of phar-
maceuticals in the biological system.
(e) The bioavailability and toxicokinetic properties of the pharma-
ceutical need to be studied.
(f) Complete pharmacokinetic and pharmacodynamic informa-
tion are required to understand the absorption, distribution,
metabolism, excretion, and toxicity pattern.
(g) The hazard generated from inherent toxicity of the pharma-
ceuticals according to their chemical properties is needed to be
studied.
The most important steps for risk assessment and management
process are reported in Fig. 4.
Impact of Pharmaceuticals on the Environment: Risk Assessment Using QSAR Modeling… 409
Fig. 4 Reasons for ecotoxicity study along with steps for risk assessment and risk management due to phar-
maceuticals hazards
4.1 Accomplishment Pharmaceuticals are one of the must have emergency products which
of Preventive cannot be stopped for use but the possible risks of them related to
Measures environmental can be managed by executing apposite preventative
measure and safeguard. A set of guidelines has been set by the EMEA
in 2006 as safety measures for risk management:
1. Initial assessment of risk for individual products,
2. Each pharmaceutical package should have appropriate product
labeling and summary product characteristics (SPC),
3. Educated patients about the possible toxicity toward humans
as well as environment through Package leaflet (PL),
4. Safe and appropriate storage as well as disposal of pharmaceuti-
cal products,
410 Supratik Kar et al.
4.2 Minimizing To reduce the input of pharmaceutical products and their metabo-
the Input of lites, the following steps can be employed effectively.
Pharmaceutical
Hazards into the
Environment
4.2.1 Awareness Awareness and training about occurrence and effect of individual
and Training pharmaceutical products along with their corresponding effects
toward environment is the most crucial step. In addition, knowl-
edge about disposal process of diverse types of pharmaceutical
hazards is the first step to reduce the input of those hazards
into the ecosystem. The awareness need to be spread among
the shareholders, stakeholders and community using the pharma-
ceuticals, including patients, doctors and nurses, and pharmacists.
The most important role need to be played by industries as they
are the major source of pharmaceutical hazards and many of them
are APIs when they are released into the environment without
adequate waste treatment. In addition, each raw material, in pro-
cess molecules and API should consist of material safety data sheets
(MSDSs) intended to provide workers and emergency personnel
with process for handling that product safely with information like:
physical data, toxicity and health hazards, first aid, reactivity, stor-
age, disposal, protective equipment, and spill-handling procedures.
People related to risk management should possess information
about the drug flows from the diverse sources of households,
industries, hospitals and pharmacy [100].
4.2.2 High-End Most of the risk management procedures can be controlled with
and Advanced Sewage improvement of sewage treatment. Implication of sophisticated
Treatment and enhanced waste water as well as sewage treatment can diminish
the hormonal effects to living systems, ecotoxicity and pathogenic
effects of the effluent to manifold. Recently, advanced effluent sew-
age treatment has been practiced comprehensively and performed
employing photochemical oxidation, filtration, and adsorption
processes [101].
4.2.3 Green and Viable The final approach is the knowledge of green and sustainable phar-
Pharmacy macy which supports environmentally benign compounds which
after coming into the contact with environment will be degraded
with minimum hazardous effects in no time [100]. In the present
scenario, it is the least practiced methods, but in long term of sus-
tainability, it is the need of hour.
Furthermore, possible measures, roles and action to reduce
the ecotoxicity imposed through pharmaceutical hazards by
diverse stakeholders are addressed in Table 1 for improved
understanding.
Impact of Pharmaceuticals on the Environment: Risk Assessment Using QSAR Modeling… 411
Table 1
Roles of different stakeholders to reduce the pharmaceutical induces ecotoxicity
Table 2
A list of endpoints for modeling purpose under OECD and areas where QSAR models can be employed
Endpoints for modeling under OECD Areas where QSAR models can be used
• Physical-chemical properties: Boiling point, • Prioritization of existing pharmaceuticals
melting point, vapor pressure, octanol–water for toxicity to environment
partition coefficient, organic carbon–water partition • Classification and labeling of new
coefficient and water solubility pharmaceuticals
• Ecological effects on endpoints: long-term toxicity, • Risk assessment of new and existing
acute Daphnia toxicity, Acute fish toxicity, terrestrial pharmaceuticals
toxicity, algal-toxicity, marine organism toxicity, • Guiding experimental design of regulatory
microorganism toxicity in sewage treatment plant tests or testing strategies
• Environmental fate: Biodegradation, hydrolysis in • Providing mechanistic information
water, atmospheric oxidation, and bioaccumulation • Filling up the large data gaps
▪ Human health effects: Acute oral, acute inhalation, • Building a proper database of each
eye irritation, acute dermal, skin sensitization, skin pharmaceutical to different species
irritation, repeated dose toxicity, reproductive regarding ecotoxicity
toxicity, genotoxicity, systemic toxicity, • Development of expert systems for each
developmental toxicity, mutagenicity, carcinogenicity, therapeutic class for diverse compartments
etc. of the environment
• Construction of efficient interspecies
models to extrapolate data from one
species to another species when data of a
specific species is missing
Table 3
Global regulatory agencies which deal with the environmental risk assessment and risk management
of pharmaceuticals
Regulatory
agencies Objective Responsibility and method of risk assessment
AEA (Australian Advises clients on the AEA has undertaken reports for the Department
Environment environmental hazards and of Sustainability, Environment, Water,
Agency) potential risks associated Population and Communities (DSEWPaC),
with the production, use particularly with respect to their environmental
and disposal of chemicals. assessments performed on new and existing
AEA is a member of the agricultural and veterinary chemicals for the
Society of Ecotoxicology Australian Pesticides and Veterinary Medicine
and Chemistry (SETAC) Authority (APVMA), and industrial chemicals
for the National Industrial Chemicals
Notification and Assessment Scheme
(NICNAS).
CDER (Center for CDER reviews New Drug An assessment of risk to the environment is
drug evaluation Applications to ensure that required for manufacture, use and distribution of
and research) the drugs are safe and human drugs under the National Environment
effective. Its primary Policy Act of 1969 and an environmental
objective is to ensure that all assessment procedure was developed by the US
prescription and over-the- Food and Drug Administration (US FDA) as a
counter (OTC) medications part of the registration procedure for new human
are safe and effective when pharmaceutical drugs. Additionally, in 1995, the
used as directed. FDA-CDER issued a new guidance for the
Submission of an Environmental Assessment in
Human drugs. In 1997 the FDA implemented a
Note for Guidance paper in which all drugs
entering the aquatic compartment at levels below
1 μg/L Predicted Environmental Concentration
(PECEFFLUENT) were exempted from a detailed
risk assessment.
(continued)
414 Supratik Kar et al.
Table 3
(continued)
Regulatory
agencies Objective Responsibility and method of risk assessment
EMEA (European EMEA exhibits the scope and Environmental risk assessment is divided into
agency for the legal basis for risk assessment three phases:
evaluation of of pharmaceuticals and (a) Phase I: Pre-screening and estimation of
medicinal outlines the general exposure based on the drug only, irrespective
products) considerations and the of its route of administration, pharmaceutical
recommended stepwise form, metabolism and excretion
procedure for their risk (b) Phase II Tier A: Screening and initial
assessment. The guideline prediction of risk where all relevant data should
considers the specific features be taken into account, e.g., data on physical-
of pharmaceuticals, e.g., the chemical properties, primary and secondary
use of available pharmacodynamics, toxicology, metabolism,
pharmacological information. excretion, degradability, and persistence of the
Previously environmental risk drug substance and/or relevant metabolites
assessments performed mainly (c) Phase II Tier B: Extended and substance and
on acute ecotoxicity data, but compartment-specific risk assessment.
in recent time EMEA draft Information from the refined data set is
has proposed to include available comprising information on route(s)
pharmacokinetic and of excretion; and qualitative and quantitative
pharmacodynamic data for information on excreted compounds, and
environmental risk assessment. possibly additional long-term toxicity data
EU-CSTEE The CSTEE has identified the QSAR is the first step in gaining more general
(European need for a proactive knowledge on the risk assessment issue as an
Union approach in obtaining data alternative to nonanimal method. In contrast
Commission’s on the environmental effects to the amount of analytical data, information
scientific of pharmaceuticals. Thus, it about the ecotoxicological effects of drug
committee on is recognized that a residues is scrubby. To create a broader basis
toxicity, prioritization procedure for the evaluation of the ecotoxicological
ecotoxicity and needs to be developed for relevance of pharmaceutical compounds,
environment) environmental risk proper documentation of their effects and the
assessment of reason are identified and documented.
pharmaceuticals, and that
this should follow the
general scheme for chemicals
described in the White Paper
for future EU chemicals
policy i.e., REACH
guideline.
(continued)
Impact of Pharmaceuticals on the Environment: Risk Assessment Using QSAR Modeling… 415
Table 3
(continued)
Regulatory
agencies Objective Responsibility and method of risk assessment
MHLW (Ministry The MHLW constructed a The risk assessment is judged by the PEC/
of Health, research group to build up a PNEC (Predicted Environmental
Labour and concept on the regulation of Concentration/Predicted No Effect
Welfare of Japan) pharmaceuticals for Concentration) ratio or ΣPECi/PNECi. In
environmental safety in addition, the Organization for Pharmaceutical
2007. The regulation system Safety and Research (OPSR) conducted
is similar to that of general compliance reviews on application data. This
chemicals in Japan and the was followed by the integration of the
Guideline by EMEA. The aforementioned Evaluation Center, OPSR,
main function of this group and part of the Medical Devices Center to
is to establish a risk-benefit form a new independent administrative
analysis committee for the organization, the Pharmaceutical and Medical
pharmaceuticals which have Devices Agency (PMDA). The MHLW and
a high risk for environmental PMDA handle a wide range of activities from
organisms and to human clinical studies to approval reviews, reviews
health. throughout post-marketing stage, and
pharmaceutical safety measures.
NICNAS NICNAS was established in The major responsibility of NICNAS are:
(National July 1990 under the • Assessing new industrial chemicals for human
Industrial Industrial Chemicals health and/or environmental effects
Chemicals (Notification and • Maintaining the Australian Inventory of
Notification and Assessment) Act 1989 by Chemical Substances (AICS)
Assessment Australian Government • Circulation of information on the human
Scheme) Department of Health. A health and environmental impacts of chemicals
range of state, territory and and recommending on their safe use
Commonwealth government • Registering new industrial chemicals
agencies share regulatory
responsibility for chemical
safety in Australia, with each
chemical being regulated
according to its use, whether
as a pharmaceuticals,
veterinary medicine,
pesticide, food additive or
industrial chemical.
(continued)
416 Supratik Kar et al.
Table 3
(continued)
Regulatory
agencies Objective Responsibility and method of risk assessment
REACH Aims to improve the “No data no market”: the REACH Regulation
(Registration, protection of human health places responsibility on industry to manage
Evaluation, and the environment the risks from chemicals and to provide safety
Authorisation through the better and information on the substances. Manufacturers
and Restriction earlier identification of the and importers of substances have a general
of Chemicals) intrinsic properties of obligation to submit a registration to the
chemical substances. European Chemicals Agency for each
substance manufactured or imported in
quantities of 1 tonne or more per year per
company
SECIS (Swedish An authorized regulatory body To improve risk management decision making,
Environmental which was initiated in 2005 sufficient knowledge about environmental
Classification by the Swedish Association exposures and effects in nontarget species for
and Information for the Pharmaceutical all relevant pharmaceutical substances is
System for Industry. The rationale of needed. Within SECIS, the pharmaceutical
pharmaceuticals) the classification system is to companies provide environmental data and
offer the public and health classify their products according to predefined
care sectors with criteria and a guidance document. The
environmental information guidance document is developed for the
about all active purposes of SECIS, but it is based on the
pharmaceutical ingredients European Medicines Agency (EMA) guideline
(API) on the Swedish for environmental risk assessment of
market up to now. pharmaceuticals and the European Commission
Technical Guidance Document (TGD).
UBA (Federal The German Medicines Act The UBA already assessed around 180 veterinary
Environment provides that the UBA is and around 240 human pharmaceutical
Agency) responsible for the formulations. Filtering concepts established
environmental risk between UBA and the authorization agency
assessment. The UBA responsible for veterinary medicines focused
started assessing the the ERA on antibiotics, parasiticidal
environmental impact of substances and analgesics. Cytostatic
veterinary and human medicines, hormones and contrast agents
pharmaceuticals in an dominated the human medicine dossiers
authorization routine in assessed by UBA.
1998 and 2003, respectively.
VICH VICH is a trilateral (EU– Veterinary medicinal products (VMPs) are
(International Japan–USA) program aimed regulated for environment safety as described
Cooperation on at harmonizing technical in Environmental Impact Assessment for
Harmonization requirements for veterinary VMPs; Phase I in 2000 and Phase II in 2004.
of Technical product registration was
Requirements officially launched in April
for Registration 1996. The initiative to begin
of Veterinary the harmonization process
Medicinal came about in 1983 when
Products) the first International
Technical Consultation on
Veterinary Drug Registration
(ITCVDR) was held.
Impact of Pharmaceuticals on the Environment: Risk Assessment Using QSAR Modeling… 417
Fig. 5 Areas of risk assessment and modeling for ecotoxicity prediction as stated to the OECD
Fig. 6 Category of information included in predicting health and environmental effects according to the OECD
guidelines
418 Supratik Kar et al.
Fig. 7 Most common in silico tools for the prediction of pharmaceuticals ecotoxicity
Impact of Pharmaceuticals on the Environment: Risk Assessment Using QSAR Modeling… 419
6.1 Why In Silico ● The 3Rs concept signifies “Reduction,” “Replacement,” and
Models Required? “Refinement” regarding animal experimentation in scientific
experiments. “Reduction” defines to the lessening the number
of animals used to get precise results, “Replacement” corre-
sponds to the implication of nonliving resources to substitute
conscious living higher animals, and “Refinement” suggests
turn down the severity or cruelty of inhuman methodologies
applied to the experimental animals [110]. Thus to set up the
3Rs concept, in silico techniques are one of the best options
available. The European Centre for the Validation of Alternative
420 Supratik Kar et al.
8 Endpoints
9 Databases
Table 4
Toxicity endpoints for the modeling of pharmaceuticals ecotoxicity
Table 4
(continued)
Table 4
(continued)
Table 5
Representative list of publicly available databases related to the environmental toxicity due to pharmaceuticals
Table 5
(continued)
for Risk)
IUCLID OECD, EU Biocides and EU REACH http://iuclid.echa.europa.eu/
(International
Uniform ChemicaL
Information
Database)
JECDB Japanese Ministry of Health, Labour and Welfare http://dra4.nihs.go.jp/mhlw_data/jsp/
SearchPageENG.jsp
JRC QSTR Database European Commission, Joint Research Centre’s http://ecb.jrc.ec.europa.eu/QSTR/background/
KATE (KAshinhou Japanese National Institute for Environmental Studies (NIES), http://kate.nies.go.jp
Tool for Ministry of the Environment (MoE), Government of Japan
Ecotoxicity)
MRL (Minimal Risk US DHHS and Agency for Toxic Substances and Disease Registry http://www.atsdr.cdc.gov/mrls/index.html
Levels)
NPIC National Pesticide Information Center through Oregon State 166.1 http://npic.orst.edu/
University and US EPA
NTP (National US NIH/NIEHS http://ntp.niehs.nih.gov/
Toxicology
Program)
PAN Pesticide Pesticide Action Network, North America http://www.pesticideinfo.org/
Pesticide Database Toyohashi University of Technology, Japan http://chrom.tutms.tut.ac.jp/JINNO/
PESDATA/00alphabet.html
RITA (Registry of Fraunhofer Institute of Toxicology and Experimental Medicine http://www.item.fraunhofer.de/reni/public/rita/
Industrial (ITEM) Hannover index.php
Toxicology
Animal-data)
TEXTRATOX The University of Tennessee Institute of Agriculture http://www.vet.utk.edu/TETRATOX/index.php
TOXNET US National Library of Medicine (NLM) http://toxnet.nlm.nih.gov/
ToxRefDB US EPA http://www.epa.gov/ncct/toxrefdb/
Toxtree European Commission, Joint Research Centre http://ecb.jrc.ec.europa.eu/QSTR/QSTR-tools/index.
php?c=TOXTREE
TSCATS (Toxic US EPA https://toxplanet.com/tscats/
Substances Control
Act Test
Submissions)
USGS US Geological Survey http://137.227.231.90/data/acute/acute.html
197.1 WikiPharma Swedish research programme MistraPharma www.wikipharma.org
Impact of Pharmaceuticals on the Environment: Risk Assessment Using QSAR Modeling…
431
432 Supratik Kar et al.
10 Software
(continued)
Table 6
434
(continued)
(continued)
12 Conclusion
Acknowledgments
References
1. Aherne G, English J, Marks V (1985) The 4. Li WC (2014) Occurrence, sources, and fate
role of immunoassay in the analysis of micro- of pharmaceuticals in aquatic environment
contaminants in water samples. Ecotoxicol and soil. Environ Pollut 187:193–201
Environ Saf 9:79–83 5. WHO (2011) The World Medicines Situation
2. Richardson M, Bowron J (1985) The fate of 6. Busfield J (2015) Assessing the over use of
pharmaceutical chemicals in the aquatic envi- medicines. Soc Sci Med 131:199–206
ronment. J Pharm Pharmacol 37:1–12 7. IWW (2014) Pharmaceuticals in the environ-
3. Santosa LHMLM, Araújoa AN, Fachinia A ment: occurence, effects and options for action.
et al (2010) Ecotoxicological aspects related Research project funded by German Federal
to the presence of pharmaceuticals in the Environment Agency (UBA) within the
aquatic environment. J Hazard Mater Environmental Research Plan No.371265408.
175:45–95
Impact of Pharmaceuticals on the Environment: Risk Assessment Using QSAR Modeling… 439
8. Roy K, Kar S (2016) In silico models for eco- 20. Adler P, Steger-Hartmann T, Kalbfus W
toxicity of pharmaceuticals. In: Benfenati E (2001) Distribution of natural and synthetic
(ed) In silico methods for predicting drug estrogenic steroid hormones in water samples
toxicity, Methods in molecular biology, vol from southern and middle Germany. Acta
1425. Springer, New York, NY Hydrochim Hydrobiol 29:227–241
9. Taylor D, Senac T (2014) Human pharma- 21. Ternes T (1998) Occurence of drugs in
ceutical products in the environment – the German sewage treatment plants and rivers.
“problem” in perspective. Chemosphere Water Res 32:3245–3260
115:95–99 22. Ahrer W, Scherwenk E, Buchberger W (2001)
10. Kümmerer K (2013) Pharmaceuticals in the Occurence and fate of fluoroquinolone, mac-
environment: sources, fate, effects and risks. rolide, and sulphonamide antibiotics during
Springer Science & Business Media, Berlin wastewater treatment and in ambient waters
11. Hughes SR, Kay P, Brown LE (2013) Global in Switzerland. In: Daughton CG, Jones-
synthesis and critical evaluation of pharma- Lepp T (eds) Pharmaceuticals and personal
ceutical datasets collected from river systems. care products in the environment: scientific
Environ Sci Technol 47:661–677 and regulatory issues. Symposium Series 791.
12. Han GH, Hur HG, Kim SD (2006) American Chemical Society, Washington,
Ecotoxicological risk of pharmaceuticals from DC, pp 56–69
wastewater treatment plants in Korea: occur- 23. Frick EA, Henderson AK, Moll DM, et al
rence and toxicity to Daphnia magna. Environ (2001) Presented at the Georgia Water
Toxicol Chem 25:265–271 Resources Conference, Athens, GA.
13. Fernandez C, Gonzalez-Doncel M, Pro J et al 24. Tauber R (2003) Quantitative analysis of
(2010) Occurrence of pharmaceutically active pharmaceuticals in drinking water from ten
compounds in surface waters of the henares- Canadian cities. Enviro-Test Laboratories,
jarama-tajo river system (Madrid, Spain) and a Winnipeg, MB
potential risk characterization. Sci Total 25. Seiler JP (2002) Pharmacodynamic activity of
Environ 408:543–551 drugs and ecotoxicology can the two be con-
14. Papageorgiou M, Kosma C, Lambropoulou nected? Toxicol Lett 131:105–115
D (2016) Seasonal occurrence, removal, mass 26. EMEA (2006) Guideline on the environmen-
loading and environmental risk assessment of tal impact assessment of medicinal products
55 pharmaceuticals and personal care prod- for human use (Report no. CPMP/
ucts in a municipal wastewater treatment SWP/4447/00). European Agency for the
plant in Central Greece. Sci Total Environ Evaluation of Medicinal Products, London
543:547–569 27. FDA-CDER (1998) Guidance for industry-
15. Oliveira TS, Murphy M, Mendola N et al environmental assessment of human drugs
(2015) Characterization of pharmaceuticals and biologics applications, Revision 1. FDA
and personal care products in hospital effluent Center for Drug Evaluation and Research,
and waste water influent/effluent by direct- Rockville, VA
injection LC-MS-MS. Sci Total Environ 28. FDA: U.S. Department of Health and
518:459–478 Human Services, Food and Drug
16. Rivas J, Encinas A, Beltran F, Grahan N Administration, Center for Drug Evaluation
(2011) Application of advanced oxidation and Research (CDER), Center for Biologics
processes to doxycycline and norfloxacin Evaluation and Research (CBER) (1998)
removal from water. J Environ Sci Health A CMC 6 - Revision 1
Tox Hazard Subst Environ Eng A 29. European Commission, Directive 2006/121/
46:944–951 EC of the European Parliament and of the
17. Kolpin DW, Furlong ET, Meyer MT et al Council of 18 December 2006 amending
(2002) Pharmaceuticals, hormones, and Council Directive 67/548/EEC on the
other organic wastewater contaminants in approximation of laws, regulations and
U.S. streams, 1999-2000: a national recon- administrative provisions relating to the clas-
naissance. Environ Sci Technol sification, packaging and labelling of danger-
36:1202–1211 ous substances in order to adapt it to
18. Halling-Sørensen B, Nors Nielsen S, Lanzky Regulation (EC) No. 1907/2006 concerning
PF et al (1998) Occurrence, fate and effects the Registration, Evaluation, Authorisation
of pharmaceutical substances in the environ- and Restriction of Chemicals (REACH) and
ment-a review. Chemosphere 36(2):357–393 establishing a European Chemicals Agency.
19. Daughton CG, Ternes TA (1999) Off. J. Eur. Union, L 396/850 of 30.12.2006,
Pharmaceuticals and personal care products in Office for Official Publications of the
the environment: agents of subtle change? European Communities (OPOCE),
Environ Health Perspect 107:907–937 Luxembourg
440 Supratik Kar et al.
30. Directive 2004/27/EC of the European 42. US EPA (2015) ECOTOX user guide:
Parliament and of the Council of 31 March ECOTOXicology database system. Version
2004 amending Directive 2001/83/EC on 4.0. http:/www.epa.gov/ecotox/. Accessed
the Community code relating to medicinal 4 July, 2017
products for human use. Official Journal L 43. Mendoza A, Acena J, Perez S et al (2015)
136, 30/04/2004 pp. 34-57. 2004 Pharmaceuticals and iodinated contrast media
31. Directive 2004/28/EC of the European in a hospital wastewater: a case study to anal-
Parliament and of the Council of 31 March yse their presence and characterise their envi-
2004 amending Directive 2001/82/EC on ronmental risk and hazard. Environ Res
the Community code relating to veterinary 140:225–241
medicinal products. Official Journal L 136, 44. Ortiz de García S, Pinto GP, García-Encina
30/04/2004 pp. 58-84. 2004 PA et al (2013) Ranking of concern, based on
32. DIRECTIVE 2013/39/EU OF THE environmental indexes, for pharmaceutical
EUROPEAN PARLIAMENT AND OF and personal care products: an application to
THE COUNCIL of 12 August 2013 amend- the Spanish case. J Environ Manage
ing Directives 2000/60/EC and 2008/105/ 129:384–397
EC as regards priority substances in the field 45. Sanderson H, Thomsen M (2009)
of water policy. 2013 Comparative analysis of pharmaceuticals ver-
33. COMMISSION IMPLEMENTING sus industrial chemicals acute aquatic toxicity
DECISION (EU) 2015/495 of 20 March classification according to the United Nations
2015 establishing a watch list of substances classification system for chemicals. Assessment
for Union-wide monitoring in the field of of the (Q)SAR predictability of pharmaceuti-
water policy pursuant to Directive 2008/105/ cals acute aquatic toxicity and their
EC of the European Parliament and of the predominant acute toxicmode-of-action.
Council. 2015 Toxicol Lett 187:84–93
34. Barbosa MO, Moreira NFF, Ribeiro AR et al 46. Singh KP, Gupta S, Basant N (2015) QSTR
(2016) Occurrence and removal of organic modeling for predicting aquatic toxicity of
micropollutants: an overview of the watch list pharmacological active compounds in multi-
of EU Decision 2015/495. Water Res ple test species for regulatory purpose.
94:257–279 Chemosphere 120:680–689
35. Cassani S, Gramatica P (2015) Identification 47. Persson M, Sabelström E, Gunnarsson B
of potential PBT behavior of personal care (2009) Handling of unused prescription
products by structural approaches. Sustain drugs-knowledge, behaviour and attitude
Chem Pharm 1:19–27 among Swedish people. Environ Int
36. Roy K, Kar S, Das RN (2015) Understanding 35:771–774
the Basics of QSAR for Applications in 48. Li D, Yang M, Hu J et al (2008) Determination
Pharmaceutical Sciences and Risk Assessment. and fate of oxytetracycline and related com-
Academic Press, San Diego, CA pounds in oxytetracycline production waste-
37. Roy K, Kar S, Das RN (2015) A primer on water and the receiving river. Environ Toxicol
QSAR/QSPR modeling: fundamental con- Chem 27:80–86
cepts (SpringerBriefs in Molecular Science). 49. José Gómez M, Petrovic M, Fernández-Alba
Springer, New York, NY AR et al (2006) Determination of pharma-
38. Howard PH, Muir DCG (2011) Identifying ceuticals of various therapeutic classes by
new persistent and bioaccumulative organics solid-phase extraction and liquid
among chemicals in commerce II: pharma- chromatographye-tandem mass spectrometry
ceuticals. Environ Sci Technol analysis in hospital effluent wastewaters.
45:6938–6946 J Chromatogr 1114:224–233
39. Sangion A, Gramatica P (2016) PBT assess- 50. Ternes TA, Hirsch R (2000) Occurrence and
ment and prioritization of contaminants of behavior of X-ray contrast media in sewage
emerging concern: pharmaceuticals. Environ facilities and the aquatic environment.
Res 147:297–306 Environ Sci Technol 34:2741–2748
40. European Commission (2001) 51. Serrano PH (2005) Responsible use of antibi-
CSTEE. Discussion paper on environmental otics in aquaculture. fisheries technical paper
risk assessment of medical products for human 469. Food and Agriculture Organization of
use (non-geneticallymodified organisms the United Nations (FAO), Rome
(non-GMO) containing). CPMPpaperRAsses 52. Kreuzig R, Höltge S, Brunotte J et al (2005)
sHumPharm12062001/D(01) Test plat studies on runoff of sulfonamides
41. US EPA (2012) The ECOSAR (ECOlogical from manured soil after sprinkler irrigation.
Structure Activity Relationship) Class Environ Toxicol Chem 24:777–781
Program. 53. http://www.apsnet.org/online/feature/
Antibiotics/. Accessed 4 July, 2017
Impact of Pharmaceuticals on the Environment: Risk Assessment Using QSAR Modeling… 441
80. Jones-Brando L, Torrey EF, Yolkem R (2003) managing the process. National Academies
Drugs used in the treatment of schizophrenia Press, Washington, DC. Accessed 4 July 2017
and bipolar disorder inhibit the replication of 94. NRC (National Research Council) (2007)
Toxoplasma gondii. Schizophr Res Toxicity testing in the 21st century: a vision
62:237–244 and a strategy. National Academies Press,
81. Pawlowski S, van Aerle R, Tyler CR et al Washington, DC. Available: https://down-
(2004) Effects of 17α-ethinylestradiol in a load.nap.edu/login.php?record_
fathead minnow (Pimephales promelas) id=11970&page=%2Fdownload.
gonadal recrudescence assay. Ecotoxicol php%3Frecord_id%3D11970. Accessed 4 July
Environ Saf 57:330–345 2015
82. Arcay L (1985) Influence of sex hormones on 95. Wignall JA, Shapiro AJ, Wright FA et al
the experimental infection produced by a (2014) Standardizing benchmark dose calcu-
strain of Leishmania mexicana amazonensis lations to improve science-based decisions in
from Venezuela. Rev Latinoam Microbiol human health assessments. Environ Health
27:195–207 Perspect 122:499–505
83. Wang R, Belosevic M (1994) Estradiol 96. Wetmore BA, Wambaugh JF, Ferguson SS
increases susceptibility of goldfish to et al (2012) Integration of dosimetry, expo-
Trypanosoma danilewskyi. Dev Comp Immunol sure, and high-throughput screening data in
18:377–387 chemical toxicity assessment. Toxicol Sci
84. Martín J, Camacho-Muñoz D, Santos JL et al 125:157–174
(2012) Occurrence of pharmaceutical com- 97. Jones DP, Park Y, Ziegler TR (2012)
pounds in wastewater and sludge from waste- Nutritional metabolomics: progress in
water treatment plants: removal and addressing complexity in diet and health.
ecotoxicological impact of wastewater dis- Annu Rev Nutr 32:183–202
charges and sludge disposal. J Hazard Mater 98. van Gestel CA, Jonker M, Kammenga JE
239-240:40–47 (2010) Mixture toxicity: linking approaches
85. Boxall AB, Fogg LA, Baird DJ et al (2005) from ecological and human toxicology.
Targeted monitoring study for veterinary Taylor& Francis, New York, NY
medicines in the environment. Environment 99. Presidential/Congressional Commission on
Agency, Bristol Risk Assessment Risk Management (1997).
86. Grønvold J, Svendsen TS, Kraglund HO et al Risk assessment and risk management in regu-
(2004) Effect of the antiparasitic drugs fen- latory decision-making. Final Report. Vol. 2.
bendazole and ivermectin on the soil nema- Washington, DC:PCRARM. Available:
tode Pristionchus maupasi. Vet Parasitol http://www.riskworld.com. Accessed 4 July,
24:91–99 2017
87. Svendsen TS, Hansen PE, Sommer C et al 100. Kümmerer K (2007) Sustainable from the
(2005) Life history characteristics of very beginning: rational design of molecules
Lumbricus terrestris and effects of the veteri- by life cycle engineering as an important
nary antiparasitics compounds Ivermectin and approach for green pharmacy and green
fenbendazole. Soil Biol Biochem 37:927–936 chemistry. Green Chem 9:899–907
88. Sun Y, Diao X, Zhang Q et al (2005) 101. Nowotny N, Epp B, Von Sonntag C (2007)
Bioaccumulation and elimination of avermectin Quantification and modeling of the elimina-
B1a in the earthworm (Eisenia fetida). tion behavior of ecologically problematic
Chemosphere 60:699–704 wastewater micropollutants by adsorption on
89. Singer AC, Johnson AC, Anderson PD powdered and granulated activated carbon.
(2008) Reassessing the risks of Tamiflu use Environ Sci Technol 41:2050–2055
during a pandemic to the Lower Colorado 102. Fjodorova N, Novich M, Vrachko N et al
River. Environ Health Perspect 116:A285– (2008) Directions in QSAR modeling for
A286 regulatory uses in OECD member countries,
90. Soderstrom H, Jarhult JD, Fick J, et al (2010) EU and in Russia. J Environ Sci Health Part
Levels of antivirals and antibiotics in the river C 26:201–236
Thames, UK, during the pandemic 2009. 103. Knacker T, Duis K, Ternes T et al (2005) The
SETAC Europe Annual Meeting, Seville EU-project ERAPharmsIncentives for the
91. http://www.biomedicalwastesolutions.com/ further development of guidance documents?
medical-waste-disposal/. Accessed 4 July, 2017 Environ Sci Pollut Res 12:62–65
92. Rappaport SM (2011) Implications of the 104. Cleuvers M (2002) Aquatic ecotoxicology of
exposome for exposure science. J Expo Sci selected pharmaceutical agents algal and acute
Environ Epidemiol 21:5–9 Daphnia tests. Umweltwissenschaften und
93. NRC (National Research Council) (1983) Schadstoff-Forschung 14:85–89
Risk assessment in the federal government: 105. VICH (2004) Environmental impact assess-
ment (EIAs) for veterinary medicinal prod-
Impact of Pharmaceuticals on the Environment: Risk Assessment Using QSAR Modeling… 443
ucts (VMPs)–Phase II, Draft Guidance, 117. Escher BI, Bramaz N, Richter M et al (2006)
August, 2003, International Cooperation on Comparative ecotoxicological hazard assess-
Harmonization of the Technical Requirements ment of beta-blockers and their human
for Registration of Veterinary Medicinal metabolites using a mode-of-action-based
Products. Available at: http://vich.eudra. test battery and a QSTR approach. Environ
org/pdf/10_2003/g138_st4.pdf Sci Technol 40:7402–7408
106. http://www.nicnas.gov.au/. Accessed 4 July, 118.
Sanderson H, Thomsen M (2007)
2017 Ecotoxicological quantitative structure–
107. Dearden JC (2016) The history and develop- activity relationships for pharmaceuticals. Bull
ment of quantitative structure-activity rela- Environ Contam Toxicol 79:331–335
tionships (QSARs). Int J Quant Struct-Prop 119. Kar S, Roy K (2010) First report on
Relat 1:1–44 interspecies quantitative correlation of eco-
108. Roy K, Kar S (2015) How to judge predictive toxicity of pharmaceuticals. Chemosphere
quality of classification and regression based 81:738–747
QSAR models? In: Haq ZU, Madura J (eds) 120. Christen V, Hickmann S, Rechenberg B et al
Frontiers of computational chemistry. Bentham, (2010) Highly active human pharmaceuticals
pp 71–120 in aquatic systems: a concept for their identi-
109. Roy K, Kar S (2015) Importance of applica- fication based on their mode of action. Aquat
bility domain of QSAR models. In: Roy K Toxicol 96:167–181
(ed) Quantitative structure-activity relation- 121. http://www.biograf.ch. Accessed 4 July, 2017
ships in drug design, predictive toxicology, 122. Das RN, Sanderson H, Mwambo AE et al
and risk assessment. IGI Global, Hershey PA, (2013) Preliminary studies on model devel-
pp 180–211 opment for rodent toxicity and its interspecies
110. http://ec.europa.eu/research/biosociety/ correlation with aquatic toxicities of pharma-
pdf/anim_al_see_final_report.pdf. Accessed ceuticals. Bull Environ Contam Toxicol
4 July, 2017 90:375–381
111. Kar S, Roy K (2012) Risk assessment for eco- 123. de García MSAO, Pinto GP, García-Encina
toxicity of pharmaceuticals - an emerging PA et al (2014) Ecotoxicity and environmen-
issue. Expert Opin Drug Saf 11:235–274 tal risk assessment of pharmaceuticals and
112. Kanter J (2013) E.U. Bans cosmetics with
personal care products in aquatic environ-
animal-tested ingredients. http://www. ments and wastewater treatment plants.
nytimes.com. Accessed 4 July, 2017 Ecotoxicology 23:1517–1533
113.
Cronin MTD, Livingstone DJ (2004) 124. Sangion A, Gramatica P (2016) Hazard of
Predicting chemical toxicity and fate. CRC pharmaceuticals for aquatic environment:
press, Washington DC prioritization by structural approaches and
114. Schultz TW, Cronin MTD, Walker JD (2003) prediction of ecotoxicity. Environ Int
Quantitative structure-activity relationships 95:131–143
(QSARs) in toxicology: a historical perspec- 125. Sangion A, Gramatica P (2016) Ecotoxicity
tive. J Mol Struct (THEOCHEM) 622:1–22 interspecies QAAR models from Daphnia
115. Kar S, Roy K (2010) Predictive toxicology toxicity of pharmaceuticals and personal care
using QSAR: a perspective. J Indian Chem products. SAR QSAR Environ Res
Soc 87:1455–1515 27:781–798
116. Sanderson H, Johnson D, Reitsma T et al
126. Kümmerer K (2009) The presence of phar-
(2004) Ranking and prioritization of environ- maceuticals in the environment due to human
mental risks of pharmaceuticals in surface use-present knowledge and future challenges.
waters. Reg Pharm Toxicol 39:158–183 J Environ Manage 90:2354–2366
Part IV
Abstract
Knowledge of the genotoxicity and carcinogenicity potential of chemical substances is one of the key sci-
entific elements able to better protect human health. Genotoxicity assessment is also considered as pre-
screening of carcinogenicity. The assessment of both endpoints is a fundamental component of national
and international legislations, for all types of substances, and has stimulated the development of alternative,
nontesting methods. Over the recent decades, much attention has been given to the use and further devel-
opment of structure–activity relationships-based approaches, to be used in isolation or in combination
with in vitro assays for predictive purposes. In this chapter, we briefly introduce the rationale for the main
(Q)SAR approaches, and detail the most important regulatory initiatives and frameworks. It appears that
the existence and needs of regulatory frameworks stimulate the development of better predictive tools; in
turn, this allows the regulators to fine-tune their requirements for an improved defense of human health.
1 Introduction
Orazio Nicolotti (ed.), Computational Toxicology: Methods and Protocols, Methods in Molecular Biology, vol. 1800,
https://doi.org/10.1007/978-1-4939-7899-1_20, © Springer Science+Business Media, LLC, part of Springer Nature 2018
447
448 Cecilia Bossa et al.
2 (Q)SAR Methodologies
Mutagenicity Genotoxicity
Test method Adopted OECD test In vitro/ Gene Structural Numerical DNA DNA Germ cell
guideline in vivo mutations aberrations aberrations repair damage mutagenicity
activity
Bacterial reverse mutation July TG 471 In vitro ✓
test (Ames test) 1997
In vitro Mammalian July TG 473 In vitro ✓
chromosome aberration 2016
test
In vitro Mammalian cell July TG 476 In vitro ✓ ✓
gene mutation test 2016 TG 490
In vitro Mammalian cell July TG 487 In vitro ✓ ✓
micronucleus test 2016
Mammalian bone marrow October TG 479 In vitro ✓
Sister Chromatid 1986
Exchanges (SCE)
Mammalian erythrocyte July TG 474 In vivo ✓ ✓
micronucleus test 2016
Mammalian bone marrow July TG 475 In vivo ✓
chromosome aberration 2016
test
Transgenic rodent somatic July TG 488 In vivo ✓ ✓
and germ cell gene 2013
mutation assays
(Q)SAR Methods for Predicting Genotoxicity and Carcinogenicity: Scientific Rationale…
(continued)
449
Table 1
450
(continued)
Mutagenicity Genotoxicity
Unscheduled DNA July TG 486 In vivo ✓ ✓
Synthesis (UDS) test 1997
with mammalian liver
cells
Cecilia Bossa et al.
2.2 Read-Across Read-across is a technique used for filling data gaps by predicting
endpoint information for one (or more) chemical(s) (referred to as
target chemical), using data for the same endpoint from one or
more similar (see Note 1) substances (the source chemicals). |Read-
across can be qualitative or quantitative, depending on the nature
of the endpoint to be predicted and on what type of data are
required in the specific process.
2.3 Trend Analysis When the substances in a chemical group are related by a trend in
one or more properties (e.g., molecular mass and carbon chain
length), also experimental data for a given endpoint may change
452 Cecilia Bossa et al.
2.4 (Q)SAR QSARs are mathematical equations linking the biological activity
to a limited number of physical chemical or other molecular prop-
erties (descriptors). In order to perform a data gap filling for a
defined endpoint in a group of chemicals, local QSAR models may
be calculated. These kinds of “internal models” are developed as
part of the category formation process, based on the experimental
data available for the category members. Alternatively, external
local or global models could be used for data gap filling. Local
QSARs are calculated on congeneric set of chemicals, i.e., chemi-
cals with similar structure that act through the same mechanism of
action, and are generally more effective models [13, 14]. However,
as the domain of applicability is strictly related to the set of chemi-
cals used for the model definition, local QSARs are narrower in
scope.
Global models, often implemented in software tools, are con-
structed in such a way as to ensure a much broader applicability,
extended to more than one chemical class. Global models can be
classified on the basis of three main modeling approaches referred
to rule-based, statistically based, or hybrid methods. At the basis of
rule-based models, there is the recognition and codification of
functional groups or structural features associated with a potential
reactivity for a defined endpoint, e.g., the structural alerts (SAs).
The structural alerts represent the codification of a mechanistic
understanding, thus having the capability to inform the decision-
making process. The drawback of these methodologies is that, in
general, they are rarely accompanied by a defined applicability
domain. In fact, if a chemical does not include SAs, this does not
necessarily indicate an absence of toxicity; instead it may points to
a lack of knowledge. Statistically based systems use a variety of
machine learning techniques to make associations between struc-
tural features and chemicals activity. These models are driven by
the data available to the computer algorithm, without any expert
supervision, in principle. In this case, whilst mechanistic under-
standing is often not straightforward, negative predictions are gen-
erally more accurate. Hybrid methods, by definition, integrate
both expert knowledge and statistically derived rules, trying to
overcome disadvantages of both approaches. In practice, the dis-
tinction among the methods is seldom absolute.
In recent years, there is a great effort to put these nontesting
methods into a more structured perspective, through the investi-
gation and formalization of integrated approaches to testing and
assessment (IATA). IATA are structured approaches that integrate
and weight information from different methodological approaches,
(Q)SAR Methods for Predicting Genotoxicity and Carcinogenicity: Scientific Rationale… 453
3.1 Toxtree The European Commission Joint Research Centre (JRC), formerly
through the European Chemicals Bureau (ECB) and now through
the EU Reference Laboratory for alternatives to animal testing
(EURL ECVAM), has a long-standing commitment in activities
for the development, assessment and application of computational
methods to predict the potential toxicological effects of
chemicals.
JRC is curating an inventory of QSAR models, available at
http://qsardb.jrc.it/qmrf/. The database provides key informa-
tion on the models, reported along a standard template (the QSAR
Model Reporting Format, QMRF). The JRC also commissioned
the development of freely available (Q)SAR tools, including
Toxtree, a computer program capable of estimating different types
of toxic hazard by applying decision tree approaches.
Toxtree (v2.6.13) is an open source software application devel-
oped by Ideaconsult Ltd under a Joint Research Centre contract
[23]. The tool is available for download and documentation at:
https://eurl-ecvam.jrc.ec.europa.eu/laboratories-research/pre-
dictive_toxicology/qsar_tools/toxtree and http://toxtree.source-
forge.net. Modules encoding rulebases for carcinogenicity
(genotoxic and nongenotoxic), and genotoxicity endpoints (Ames
test mutagenicity and in vivo micronucleus) have been curated by
Istituto Superiore di Sanità (Rome, Italy). Each module consists of
a refined compilation of SAs derived from mechanistic knowledge
[6, 24–26]. The open source software allows the user to go in the
deeper details of the rules implemented with full transparency.
Although it is not possible to define an applicability domain, the
alerts, when feasible, are accompanied by detoxifying chemical
functionalities, helping in discriminating positive/negative out-
comes in case of substances belonging to the same chemical class.
However, since neither all the possible mechanisms of toxicity are
coded in the alerts, nor all the possible exceptions to the known
(Q)SAR Methods for Predicting Genotoxicity and Carcinogenicity: Scientific Rationale… 457
3.2 Oncologic™ In the evaluation and regulation of new and existing chemicals, the
US EPA Office of Pollution Prevention and Toxics (OPPT) makes
routinely use of QSAR methods to assist in evaluating genotoxicity
and carcinogenicity endpoints, among others. OPPT has also
developed and made available to the public a number of (Q)SAR
tools that are used in regulating substances under Toxic Substances
Control Act (TSCA). Among them, it is worth mentioning the
Oncologic software for the carcinogenicity prediction.
OncoLogic is a publicly available, rules-based expert approach
for carcinogenicity potential prediction, developed by the US
Environmental Protection Agency (EPA) (https://www.epa.gov/
tsca-screening-tools/oncologictm-computer-system-evaluate-car-
cinogenic-potential-chemicals). It is a computerized system that
mimics the thinking and reasoning of human experts based on
their knowledge regarding toxicological effects of certain classes of
compounds [28, 29]. Output information includes a prediction of
the carcinogenic potential of the chemical, expressed semiquantita-
tively, and the underlying scientific rationale in the form of a report.
This document is specific for each evaluation, keeping tracks of the
chemical structure submitted together with the details of the rules
that are used to assign a level of carcinogenic concern (namely
Low, Marginal, Low–Moderate, Moderate, High–Moderate,
High). Only a finite number of chemical structures may be entered
into the system and evaluated, thus defining intrinsically a domain
of applicability. Chemicals are classified into predefined potentially
reactive organic classes, to which is assigned a basic level of con-
cern. The evaluation considers the effect of different types of sub-
stituent in modulating the initial concern, potentially giving rise to
a negligible concern, which in practice equals a negative prediction
[18]. These features permit the integration of the information pro-
vided, together with different types of evidence and data, in a wider
risk assessment framework.
3.3 QSAR Toolbox Starting in the 1990s, the Organization for Economic
Cooperation and Development (OECD) is actively committed
in promoting and improving regulatory acceptance of QSAR
458 Cecilia Bossa et al.
methods (http://www.oecd.org/chemicalsafety/risk-assessment/
oecdquantitativestructure-activityrelationshipsprojectqsars.htm).
As a result of the international cooperation among OECD mem-
ber countries, a considerable amount of scientific work has been
produced. Milestones for users and developers of QSAR method-
ologies are (1) the final drafting of the Validation Principles for
[30], (2) the Guidance on Grouping of Chemicals [11], (3) the
OECD QSAR Toolbox (in cooperation with the European
Chemical Agency—ECHA), and (4) the Fundamental And
Guiding Principles For (Q)SAR Analysis Of Chemical Carcinogens
with Mechanistic Considerations [31].
The OECD (Q)SAR Toolbox is a standalone free software
application developed by Laboratory of Mathematical Chemistry
(Burgas, Bulgaria) under the coordination of OECD and ECHA
(www.qsartoolbox.org). It was thought with the purpose of facili-
tating practical application of QSAR approaches within regulatory
frameworks. Crucial to the workflow that culminates in the data
gap filling, is the grouping of chemicals into chemical categories
[32]. The Toolbox incorporates information and tools from many
different sources, such as toxicological databases, expert rulebases,
QSAR models, chemoinformatic and statistical tools, giving rise to
a very powerful and comprehensive instrument, one of a kind in
the public domain. The flexibility of the tool allows for exploiting
different levels of application, depending on the final purpose and
on the experience of the user. These are well summarized as
“Typical Actions performed by the Toolbox” in the brochure
(https://www.qsartoolbox.org/it/support):
(a) Describes the structure of a chemical.
(b) Indicates if a chemical is included in national/regional regula-
tory inventories or existing chemical categories.
(c) Searches for available experimental results for the chemical of
interest.
(d) Explores a chemical list for possible similar chemicals.
(e) Groups chemicals based on mechanism of action and/or struc-
tural similarity.
(f) Groups chemicals based on a common metabolite.
(g) Enables exclusion of different chemicals from the group.
(h) Extracts experimental data for similar chemicals.
(i) Fills data gaps for chemicals using read-across, trend analysis
or QSAR models, where applicable.
(j) Designs a data matrix of a chemical category for printing/
exporting results.
(k) Connects to IUCLID software for direct data exchange.
(l) Generates reports.
(Q)SAR Methods for Predicting Genotoxicity and Carcinogenicity: Scientific Rationale… 459
Table 3
Databases for carcinogenicity and genotoxicity endpoints in the public domain
xls
(continued)
461
Table 3
(continued)
462
EURL ECVAM Genotoxicity and Database by European Reference Laboratory for Alternatives to Animal Toxicity experimental data
Carcinogenicity Consolidated Database Testing. Curated genotoxicity and carcinogenicity data for 726 Ames
https://eurl-ecvam.jrc.ec.europa.eu/news/ positive chemicals compiled from variety of sources. Toxicity results
launch-of-consolidated-database-of-ames- critically reviewed. Downloadable in xls format. Included in OECD
positive-chemicals QSAR Toolbox
Cecilia Bossa et al.
GENE-TOX Genetic Toxicology Data Bank. Peer-reviewed genetic toxicology test data Toxicity experimental data
https://www.nlm.nih.gov/databases/ for over 3000 chemicals (1991–1998). Included in TOXNET.
download/genetox.html
HSDB Hazardous Substances Data Bank. Peer-reviewed toxicology data for over Toxicology database
https://toxnet.nlm.nih.gov/newtoxnet/ 5000 hazardous chemicals. Included in TOXNET database
hsdb.htm
IARC Open access International Agency for Research on Cancer (IARC) Experimental data for Risk
http://monographs.iarc.fr/ monographs including carcinogenicity classification Assessment
IPCS INCHEM International Programme on Chemical Safety (IPCS) and Canadian Centre Tool for chemical risk
http://www.inchem.org/ for Occupational Health and Safety (CCOHS). assessment and
Open access to thousands of searchable full-text documents on chemical management
risks and the sound management of chemicals; searchable by CAS
number, Chemical name or Synonym; downloadable the reports in pdf
form
IRIS Integrated Risk Information System. Hazard identification and dose- Risk Assessment
https://toxnet.nlm.nih.gov/newtoxnet/iris. response assessment for over 500 chemicals. Included in TOXNET
htm database
ISSTOX Cluster of databases, by Istituto Superiore di Sanità; each database contains Cluster of chemical relational
http://www.iss.it/ampp/index. experimental results relative to various types of chemical toxicity, such as databases for toxicity
php?id=233&tipo=7&lang=1 long-term carcinogenicity on rodents (ISSCAN), in vitro (ISSSTY) and endpoints
in vivo mutagenicity (ISSMIC), cell transformation (ISSCTA) and
long-term carcinogenicity on rodents and in vitro mutagenicity of
biocides and plant protection products (ISSBIOC). Toxicity results
critically reviewed. Downloadable in xls, sdf, and pdf formats. Included
in OECD QSAR Toolbox
ITER International Toxicity Estimates for Risk. Includes risk information for Risk Assessment
https://toxnet.nlm.nih.gov/newtoxnet/iter. over 600 chemicals from authoritative groups worldwide. Included in
htm TOXNET database
JECDB (Toxicity Japan MHLW) Open access Japanese Existing Chemical Data Base (JECDB) containing Toxicity experimental data
http://dra4.nihs.go.jp/mhlw_data/jsp/ toxicity data (mostly in Japanese) on high production volume chemicals
SearchPageENG.jsp The database contains experimental results from single dose toxicity test
and mutagenicity test results performed under Japan’s Existing
Chemicals Programme. Included in OECD QSAR Toolbox (as Toxicity
Japan MHLW)
NTP National Toxicology Program (NTP) U.S. Department of Health and Historical control database;
https://ntp.niehs.nih.gov/results/ Human Services; NTP evaluates substances that pose a hazard to U.S. Primary toxicology
summaries/chronicstudies/index.html people, for a variety of health-related effects, using rodent models for experimental data
study and protocols specifically designed to fully characterize the toxic
potential; studies are reported as technical reports. Searchable by
chemical name or CAS number, downloadable the reports in pdf form.
“Chemically-indexed” in the DSSTox database
OECD QSAR Toolbox OECD software application to identify and fill (eco)toxicological data gaps Platform that incorporates
https://www.qsartoolbox.org/ for chemicals hazard assessment. The Toolbox contains databases with modules and databases
results from experimental studies. from other sources
Open Food Tox EFSA chemical hazards database; a compilation of chemical and Chemical and toxicological
https://dwh.efsa.europa.eu/bi/asp/Main. toxicological information on chemicals assessed by EFSA since its information on chemicals
aspx?rwtrep=400 creation and included in already published scientific opinions. Summary assessed by EFSA
data sheets for each individual substance downloadable in pdf or xls
format.
RTECS Registry of Toxic Effects of Chemical Substances (RTECS) by The Toxicity data
https://www.cdc.gov/niosh/rtecs/default. National Institute for Occupational Safety and Health (NIOSH);
html collection of basic toxicity information (including mutation and
tumorigenic studies) on substances used in industrial and house
situation, extracted from the open scientific literature.
(Q)SAR Methods for Predicting Genotoxicity and Carcinogenicity: Scientific Rationale…
(continued)
PAN Pesticide Database (Eco)Toxicity and regulatory information for pesticide active ingredients Toxicity experimental data
http://pesticideinfo.org/Index.html and their transformation products, as well as adjuvants and solvents used
in pesticide products
PubChem National Center for Biotechnology Information (NCBI); depositor system Toxicological/biological
Cecilia Bossa et al.
5 Regulatory Frameworks
Fig. 1 Overview of the submitted “endpoint study records” (ESRs) in REACH dossiers from 2009 to 2016 [38]
5.2 EFSA Panel In the risk assessment of pesticides, according to the requirements
on Plant Protection of Commission Regulation (EU) No 283/2013, a comprehensive
Products and Their toxicological dossier is developed for active substances, tradition-
Residues ally relying on extensive in vivo and in vitro testing. On the other
hand, for substances resulting from pesticides metabolic and deg-
radation processes, toxicological characterization is often very lim-
ited. In this context, nontesting methods have been proposed for
the identification of all possible metabolites and degradates of toxi-
cological relevance that have to be included in the residue defini-
tion for risk assessment.
The usage of nontesting approaches for pesticides assessment
is under evaluation and adoption also in other regulatory frame-
works. Notably, the US EPA’s Office of Pesticide Programs (OPP)
and the Pest Management Regulatory Agency (PMRA) of Health
Canada are developing common strategies to make use of IATA
and (Q)SAR techniques to pesticide assessment [43].
In 2016 the European Food Safety Authority (EFSA) Panel on
Plant Protection Products and their Residues (PPR), published a
practical Guidance on the establishment of the residue definition
468 Cecilia Bossa et al.
Fig. 2 Schematic representation of the EFSA approach for genotoxicity assessment of metabolites, in the
procedure of derivation of the residue definition for dietary risk assessment. Adapted from [44]
5.3 The ICH M7 Recently, the ICH M7 guideline for the “Assessment and control
Guideline of DNA reactive (mutagenic) impurities in pharmaceuticals” was
finalized [47]. This guideline, elaborated within the mission of the
ICH to support the development and registration of safe and effec-
tive medicines, provides a framework for the minimization of the
risk of human exposure to DNA-reactive chemicals.
As shown in Fig. 3, the strategy to detect potentially carcino-
genic impurities is driven by two key elements: the analysis of
Fig. 3 Schematic representation of the ICH-M7 strategy to detect potentially carcinogenic impurities. Adapted
from [47]
470 Cecilia Bossa et al.
6 Notes
References
1. Huff J, Haseman J (1991) Long-term chemical carcinogenesis studies. Annu Rev Pharmacol
carcinogenesis experiments for identifying Toxicol 31:621–652
potential human cancer hazards: collective 3. EFSA (2011) Scientific opinion on genotoxic-
database of the National Cancer Institute and ity testing strategies applicable to food and feed
National Toxicology Program (1976-1991). safety assessment. EFSA J 9:2379
Environ Health Perspect 96:23–31 4. Benigni R, Bossa C (2011) Mechanisms of
2. Huff J, Haseman J, Rall D (1991) Scientific chemical carcinogenicity and mutagenicity: a
concepts, value, and significance of chemical review with implications for predictive toxicol-
ogy. Chem Rev 111:2507–2536
(Q)SAR Methods for Predicting Genotoxicity and Carcinogenicity: Scientific Rationale… 471
5. OECD (2007) Detailed review paper on cell assessment (IATA). Regul Toxicol Pharmacol
transformation assays for detection of chemical 70:629–640
carcinogens. OECD Publishing, Paris. ENV/ 18. Benigni R, Battistelli CL, Bossa C,
JM/MONO(2007)18 Colafranceschi M, Tcheremenskaia O (2013)
6. Benigni R, Bossa C, Tcheremenskaia O, Mutagenicity, carcinogenicity, and other end
Battistelli CL, Giuliani A (2015) The Syrian points. Methods Mol Biol 930:67–98
hamster embryo cells transformation assay 19. Serafimova R, Gatnik MF, Worth A (2010)
identifies efficiently nongenotoxic carcinogens, Review of QSAR models and software tools for
and can contribute to alternative, integrated predicting genotoxicity and carcinogenicity.
testing strategies. Mutat Res Genet Toxicol EUR - Scientific and Technical Research
Environ Mutagen 779:35–38 Reports. EUR 24427 EN
7. OECD (2016) Guidance document on the 20. Worth A, Barroso J, Bremer S, Burton J, Casati
in vitro Bhas 42 cell transformation assay S, Coecke S, Corvi R, Desprez B, Dumont C,
(BHAS 42 CTA). OECD Publishing, Paris. Gouliarmou V, Goumenou M, Gräpel R,
ENV/JM/MONO(2016)1 Griesinger C, Halder M, Roi AJ, Kienzler A,
8. OECD (2015) Guidance document on the Madia F, Munn S, Nepelska M, Paini A, Price
in vitro syrian hamster embryo (SHE) cell A, Prieto P, Rolaki A, Schäffer M, Triebe J,
transformation assay. OECD Publishing, Paris. Whelan M, Wittwehr C, Zuang V (2014)
ENV/JM/MONO(2015)18 Alternative methods for regulatory toxicol-
9. Cherkasov A, Muratov EN, Fourches D et al ogy – a state-of-the-art review. EUR - Scientific
(2014) QSAR modeling: where have you been? and Technical Research Reports. EUR 26797
Where are you going to? J Med Chem 21. Cassano A, Raitano G, Mombelli E, Fernández
57:4977–5010 A, Cester J, Roncaglioni A, Benfenati E (2014)
10. Nicolotti O, Benfenati E, Carotti A, Gadaleta Evaluation of QSAR models for the prediction
D, Gissi A, Mangiatordi GF, Novellino E of ames genotoxicity: a retrospective exercise
(2014) REACH and in silico methods: an on the chemical substances registered under
attractive opportunity for medicinal chemists. the EU REACH regulation. J Environ Sci
Drug Discov Today 19:1757–1768 Health C 32:273–298
11. OECD (2014) Guidance on grouping of 22. OECD (2007) Guidance document on the
chemicals, 2nd edn. OECD Publishing, Paris. validation of (quantitative) structure-activity
ENV/JM/MONO(2014)4 relationship [(Q)SAR] models, vol ENV/JM/
12. ECETOC (2012) Category approaches, Read- MONO(2007)2. OECD Publishing, Paris
across, (Q)SAR. Technical Report no. 116. Brussels 23. Patlewicz G, Jeliazkova N, Safford RJ, Worth
13. Benigni R, Bossa C (2008) Predictivity and AP, Aleksiev B (2008) An evaluation of the
reliability of QSAR models: the case of muta- implementation of the Cramer classification
gens and carcinogens. Toxicol Mech Methods scheme in the Toxtree software. SAR QSAR
18:137–147 Environ Res 19:495–524
14. Benigni R, Bossa C, Netzeva T, Worth A 24. Benigni R, Bossa C (2008) Structure alerts for
(2007) Collection and evaluation of (Q)SAR carcinogenicity, and the Salmonella assay sys-
models for mutagenicity and carcinogenicity. tem: a novel insight through the chemical rela-
EUR - Scientific and Technical Research tional databases technology. Mutat Res
Reports. EUR 22772 EN 659:248–261
15. OECD (2008) Report of a Workshop on 25. Benigni R, Bossa C, Jeliazkova N, Netzeva T,
Integrated Approaches to Testing and Worth A (2008) The Benigni/Bossa rulebase
Assessment (IATA). OECD Publishing, Paris. for mutagenicity and carcinogenicity - a mod-
ENV/JM/MONO(2008)10 ule of toxtree. EUR - Scientific and Technical
Research Reports. EUR 23241 EN
16. USEPA (2011) Integrated approaches to test-
ing and assessment strategy: use of new com- 26. Benigni R, Bossa C, Tcheremenskaia O (2013)
putational and molecular tools. FIFRA Nongenotoxic carcinogenicity of chemicals:
Scientific Advisory Panel Consultation US mechanisms of action and early recognition
Environmental Protection Agency, Office of through a new set of structural alerts. Chem
Pesticide Programs Rev 113(5):2940–2957. https://doi.
org/10.1021/cr300206t
17. Tollefsen KE, Scholz S, Cronin MT, Edwards
SW, de Knecht J, Crofton K, Garcia-Reyero N, 27. Benigni R, Bossa C, Tcheremenskaia O,
Hartung T, Worth A, Patlewicz G (2014) Battistelli CL, Crettaz P (2012) The new
Applying adverse outcome pathways (AOPs) to ISSMIC database on in vivo micronucleus and
support integrated approaches to testing and its role in assessing genotoxicity testing strate-
gies. Mutagenesis 27:87–92
472 Cecilia Bossa et al.
28. Lai D, Woo Y-T (2005) OncoLogic. In: recommendations to registrants. European
Predictive toxicology. CRC Press, Boca Raton, Chemicals Agency
FL, pp 385–413 40. ECHA (2017) Read-across assessment frame-
29. Woo YTLD, Argus MF, Arcos JC (1998) An work (RAAF). European Chemicals Agency
integrative approach of combining mechanisti- 41. ECHA (2016) Practical guide – how to use
cally complementary short-term predictive and report (Q)SARs. European Chemicals
tests as a basis for assessing the carcinogenic Agenc (ECHA)
potential of chemicals. J Environ Sci Health C 42. ECHA (2016) Practical guide: how to use
Environ Carcinog Ecotoxicol Rev C alternatives to animal testing to fulfil the infor-
16(2):101–122 mation requirements for REACH registration.
30. OECD (2004) OECD Principles for the vali- European Chemicals Agency
dation, for regulatory purposes, of (quantita- 43. NAFTA (2012) (Quantitative) Structure
tive) structure-activity relationship models. Activity Relationship [(Q)SAR] Guidance
OECD Publishing, Paris Document. US Environmental Protection
31. OECD (2015) Fundamental and guiding prin- Agency, Technical Working Group on
ciples for (Q)SAR analysis of chemical carcino- Pesticides
gens with mechanistic considerations. OECD 44. EFSA-PPR (2016) Guidance on the establish-
Publishing, Paris. ENV/JM/MONO(2015)46 ment of the residue definition for dietary risk
32. Dimitrov SD, Diderich R, Sobanski T et al assessment. EFSA J 14:4549
(2016) QSAR Toolbox - workflow and major 45. EU-JRC (2010) Applicability of QSAR analysis
functionalities. SAR QSAR Environ Res:1–17 to the evaluation of the toxicological relevance
33. Benigni R, Battistelli CL, Bossa C, of metabolites and degradates of pesticide
Tcheremenskaia O, Crettaz P (2013) New per- active substances for dietary risk assessment.
spectives in toxicological information manage- EFSA Support Publ 7(5):50E
ment, and the role of ISSTOX databases in 46. EFSA-PPR (2012) Scientific opinion on evalu-
assessing chemical mutagenicity and carcinoge- ation of the toxicological relevance of pesticide
nicity. Mutagenesis 28:401–409 metabolites for dietary risk assessment. EFSA
34. Hardy B, Apic G, Carthew P, Clark D, Cook J 10(07):2799
D, Dix I, Escher S, Hastings J, Heard DJ, 47. ICH-M7 (2017) Assessment and control of
Jeliazkova N, Judson P, Matis-Mitchell S, Mitic DNA reactive (mutagenic) impurities in phar-
D, Myatt G, Shah I, Spjuth O, Tcheremenskaia maceuticals to limit potential carcinogenic risk.
O, Toldo L, Watson D, White A, Yang C http://www.ich.org/fileadmin/Public_Web_
(2012) Food for thought ... A toxicology Site/ICH_Products/Guidelines/
ontology roadmap. ALTEX 29(2):129–137 Multidisciplinary/M7/M7_R1_Addendum_
35. Hardy B, Apic G, Carthew P, Clark D, Cook Step_4_2017_0331.pdf
D, Dix I, Escher S, Hastings J, Heard DJ, 48. Greene N, Dobo KL, Kenyon MO, Cheung J,
Jeliazkova N, Judson P, Matis-Mitchell S, Mitic Munzner J, Sobol Z, Sluggett G, Zelesky T,
D, Myatt G, Shah I, Spjuth O, Tcheremenskaia Sutter A, Wichard J (2015) A practical applica-
O, Toldo L, Watson D, White A, Yang C tion of two in silico systems for identification of
(2012) Toxicology ontology perspectives. potentially mutagenic impurities. Regul
ALTEX 29:139–156 Toxicol Pharmacol 72:335–349
36. Tcheremenskaia O, Benigni R, Nikolova I, 49. Amberg A, Beilke L, Bercu J, Bower D, Brigo
Jeliazkova N, Escher SE, Batke M, Baier T, A, Cross KP, Custer L, Dobo K, Dowdy E,
Poroikov V, Lagunin A, Rautenberg M, Hardy Ford KA, Glowienke S, Van Gompel J, Harvey
B (2012) OpenTox predictive toxicology J, Hasselgren C, Honma M, Jolly R, Kemper
framework: toxicological ontology and R, Kenyon M, Kruhlak N, Leavitt P, Miller S,
semantic media wiki-based OpenToxipedia.
Muster W, Nicolette J, Plaper A, Powley M,
J Biomed Semantics 3(Suppl 1):S7 Quigley DP, Reddy MV, Spirkl HP, Stavitskaya
37. ECHA (2008) QSARs and grouping of chemi- L, Teasdale A, Weiner S, Welch DS, White A,
cals, vol R.6. Guidance on information require- Wichard J, Myatt GJ (2016) Principles and
ments and chemical safety assessment. procedures for implementation of ICH M7
Guidance for the implementation of REACH recommended (Q)SAR analyses. Regul Toxicol
38. ECHA (2017) The use of alternatives to test- Pharmacol 77:13–24
ing on animals for the REACH Regulation. 50. Barber C, Amberg A, Custer L, Dobo KL,
European Chemicals Agency Glowienke S, Van Gompel J, Gutsell S, Harvey
39. ECHA (2016) Evaluation under REACH J, Honma M, Kenyon MO, Kruhlak N, Muster
progress report 2016 – executive summary and W, Stavitskaya L, Teasdale A, Vessey J, Wichard
J (2015) Establishing best practise in the appli-
(Q)SAR Methods for Predicting Genotoxicity and Carcinogenicity: Scientific Rationale… 473
cation of expert review of mutagenicity under 55. Teasdale A (2017) Regulatory highlights. Org
ICH M7. Regul Toxicol Pharmacol Process Res Dev 21:1209–1212
73:367–377 56. Williams RV, Amberg A, Brigo A, Coquin L,
51. Barber C, Cayley A, Hanser T, Harding A, Giddings A, Glowienke S, Greene N, Jolly R,
Heghes C, Vessey JD, Werner S, Weiner SK, Kemper R, O’Leary-Steele C, Parenty A, Spirkl
Wichard J, Giddings A, Glowienke S, Parenty H-P, Stalford SA, Weiner SK, Wichard J (2016)
A, Brigo A, Spirkl H-P, Amberg A, Kemper R, It’s difficult, but important, to make negative
Greene N (2016) Evaluation of a statistics- predictions. Regul Toxicol Pharmacol 76(Suppl
based Ames mutagenicity QSAR model and C):79–86
interpretation of the results obtained. Regul 57. Sutter A, Amberg A, Boyer S, Brigo A,
Toxicol Pharmacol 76(Suppl C):7–20 Contrera JF, Custer LL, Dobo KL, Gervais V,
52. Barber C, Hanser T, Judson P, Williams R Glowienke S, Gompel JV, Greene N, Muster
(2017) Distinguishing between expert and sta- W, Nicolette J, Reddy MV, Thybaud V, Vock
tistical systems for application under ICH M7. E, White AT, Müller L (2013) Use of in silico
Regul Toxicol Pharmacol 84(Suppl systems and expert knowledge for structure-
C):124–130 based assessment of potentially mutagenic
53. Cartus A, Schrenk D (2017) Current methods impurities. Regul Toxicol Pharmacol
in risk assessment of genotoxic chemicals. Food 67:39–52
Chem Toxicol 106(Part B):574–582 58. Floris M, Manganaro A, Nicolotti O, Medda
54. Powley MW (2015) (Q)SAR assessments of R, Mangiatordi GF, Benfenati E (2014) A gen-
potentially mutagenic impurities: a regulatory eralizable definition of chemical similarity for
perspective on the utility of expert knowledge read-across. J Cheminformatics 6:39
and data submission. Regul Toxicol Pharmacol
71:295–300
Chapter 21
Abstract
Human pluripotent stem cells such as embryonic stem (ES) and induced pluripotent stem (iPS) cells,
combined with sophisticated bioinformatics methods, are powerful tools to predict developmental chemi-
cal toxicity. Because cell differentiation is not necessary, these cells can facilitate cost-effective assays, thus
providing a practical system for the toxicity assessment of various types of chemicals. Here we describe how
to apply machine learning techniques to different types of data, such as qRT-PCRs, gene networks, and
molecular descriptors, for toxic chemicals, as well as how to integrate these data to predict toxicity catego-
ries. Interestingly, our results using 20 chemical data for neurotoxins (NTs), genotoxic carcinogens (GCs),
and nongenotoxic carcinogens (NGCs) demonstrated that the highest and most robust prediction perfor-
mance was obtained by using gene networks as the input. We also observed that qRT-PCR and molecular
descriptors tend to contribute to specific toxicity categories.
Key words Embryonic stem cell, Chemical toxicity prediction, Developmental effect, Gene network,
Molecular descriptor, Multi-kernel support vector machine
1 Introduction
Orazio Nicolotti (ed.), Computational Toxicology: Methods and Protocols, Methods in Molecular Biology, vol. 1800,
https://doi.org/10.1007/978-1-4939-7899-1_21, © Springer Science+Business Media, LLC, part of Springer Nature 2018
475
476 Hiroki Takahashi et al.
2 Materials
2.1 Computational 1. Cluster machines with minimum 3.0 GHz, 256 core, 4 GB
Environments memory/core, and 1 TB disk performance
2. Linux CentOS or other operating systems
3. RX-TAOgen (http://stemcellinformatics.org/toxicology/)
[9, 11, 12] or other Bayesian network inference programs
4. Kemba-svm (http://www.net-machine.net/~kato/kemba-
svm1-ts/) [13] or other kernel SVMs
5. SHOGUN package (http://www.shogun-toolbox.org) [14]
or other MK-SVMs
3 Methods
3.3 Inference 1. Scale a series of gene expression data across samples for the
of Bayesian Networks same chemical to standard normal distribution, N(0,1).
by Replica-Exchange 2. (Suppose that intel MPI is installed in the machine.)
Method
% mpdboot –r ssh –n 8 (start MPI daemon with 8 processors)
% I_MPI_DEVICE=ssm mpiexec –configfile mpi.config
< mpi.input (see Note 4).
478 Hiroki Takahashi et al.
Example of mpi.config:
-s all
-n 1 -host h01 mpiTaogen
-n 1 -host h02 mpiTaogen
-n 1 -host h03 mpiTaogen
-n 1 -host h04 mpiTaogen
-n 1 -host h05 mpiTaogen
-n 1 -host h06 mpiTaogen
-n 1 -host h07 mpiTaogen
-n 1 -host h08 mpiTaogen
Example of mpi.input:
8
1,1,1,1,1,1,1,1
0.0001,0.0002,0.0004,0.0008,0.002,0.005,0.01,0.02
0.0001,0.0002,0.0004,0.0008,0.002,0.005,0.01,0.02
&input_data
sample=100,
gene=8,
sampling=10,
array_data=”/home/w.fujibuchi/Research/rxtao-
gen.20141211/data.txt”,
replica_exchange=1,
exchange=10000,
replica_exchange_order=1,
r=1,
measure=0,
burn_in=0,
count_in=5000,
verbose=1,
verbose_measure=200,
verbose_parameter=200
/
Fig. 1 Network diagrams for ten chemicals. Upper: neurotoxins (NTs); Lower: genotoxic carcinogens (GCs)
3.5 Support Vector 1. Once you obtain one or a combination of qRT-PCRs, gene
Machine networks, and molecular descriptors for toxic chemicals, use
t-statistics to rank the inputs (features) in each input type in
learning data so that the order of the inputs reflects the dis-
criminating power of the positive and negative chemicals.
2. Select proper kernels such as linear, polynomial, radial basis
function (RBF), saigoone, saigotwo, kernemb, or others in
SVM analysis [9].
3. Execute grid search with various parameter values and calcu-
late SVM performances (see Note 5).
4. Perform the leave one chemical out prediction (LOCOP), i.e.,
a set of repeat data for each of the chemicals is eliminated dur-
ing the learning process to test the prediction (Table 1; see
Note 6).
5. Instead of combining and ranking the three inputs, i.e.,
qRT-PCRs, gene networks, and molecular descriptors, for
each toxic chemical, you may use distinct kernels for each
input. Use SHOGUN package to perform the LOCOP
analysis by MK-SVM. In Fig. 2, linear, RBF, and polyno-
mial kernels are tested (see Note 7). The highest accuracies
for NTs, GCs, and NGCs are 85.0%, 95.0%, and 100.0%,
respectively (see Note 8).
480 Hiroki Takahashi et al.
Table 1
Accuracies for combinations of three inputs: qRT-PCR (PCR), Bayesian network (BN), and quantitative
structure–activity relationships (QSAR)
4 Notes
1. This threshold is set for Illumina BeadArray chip. Users can set
proper thresholds for different microarray types such as
Affymetrix or Agilent.
2. It is important to select informative marker genes whose
expressions change significantly with exposure to toxic
chemical.
3. In this paper, we use the following ten genes: NANOG, SOX2,
DMTF1, ZNF208, ADRM1, TRIB1, CRY1, SMAD6, SMAD7,
and VHL1.
4. The lines of mpi.input are:
Stem Cell-Based Methods to Predict Developmental Chemical Toxicity 481
Fig. 2 Weights of kernels from qRT-PCR, BN, and QSAR data by multi-kernel SVM predictions. Vertical axis
indicates the ids of 20 chemicals. Weights that attained the highest accuracy in each toxic category are aver-
aged and shown by heat maps (see Note 8)
Acknowledgment
References
1. Waters MD, Fostel JM (2004) Toxicogenomics lines and long-term maintenance with stable
and systems toxicology: aims and prospects. karyotype by enzymatic bulk passage. Biochem
Nat Rev Genet 5:936–948 Biophys Res Commun 345:926–932
2. Schwartz MP, Hou Z, Propson NE et al (2015) 11. Yamanaka T, Toyoshiba H et al (2004) The
Human pluripotent stem cell-derived neural TAO-Gen algorithm for identifying gene inter-
constructs for predicting neural toxicity. Proc action networks with application to SOS repair
Natl Acad Sci 112:12516–12521 in E. coli. Environ Health Perspect
3. Lee EK, Kurokawa YK et al (2015) Machine 112:1614–1621
learning plus optical flow: a simple and sensi- 12. Nagano R, Akanuma H et al (2011) Multi-
tive method to detect cardioactive drugs. Sci parametric profiling network based on gene
Rep 5:11817 expression and phenotype data: a novel
4. Kandasamy K, Chuah JK et al (2015) Prediction approach to developmental neurotoxicity test-
of drug-induced nephrotoxicity and injury ing. Int J Mol Sci 13:187–207
mechanisms with human induced pluripotent 13. Kato T, Fujibuchi W (2010) Kernel classifica-
stem cell-derived cells and machine learning tion methods for cancer microarray data. In:
methods. Sci Rep 5:12337 Emmert-Streib F, Dehmer M (eds) Medical
5. Huh D, Matthews BD et al (2010) biostatistics for complex diseases. John Wiley-
Reconstituting organ-level lung functions on a VCH, Weinheim
chip. Science 328:1662–1668 14. Sonnenburg SĆ, Henschel S et al (2010) The
6. Esch MB, King TL et al (2011) The role of SHOGUN machine learning toolbox. J Mach
body-on-a-chip devices in drug and toxicity Learn Re 11:1799–1802
studies. Annu Rev Biomed Eng 13:55–72 15. Benzécri JP (1973) L’Analyse des Données.
7. Takayama K, Inamura M et al (2012) Efficient Volume II. L’Analyse des Correspondances.
generation of functional hepatocytes from Dunod, Paris
human embryonic stem cells and induced plu- 16. Ripley B (2016). MASS: support functions and
ripotent stem cells by HNF4α transduction. datasets for venables and Ripley’s MASS. R
Mol Ther 20:127–137 package version 7.3-45. https://cran.
8. Seiler AE, Spielmann H (2011) The validated r-project. org/web/packages/MASS
embryonic stem cell test to predict embryotox- 17. Leek JT, Johnson WE et al (2012) The sva
icity in vitro. Nat Protoc 6:961 package for removing batch effects and other
9. Yamane J, Aburatani S et al (2016) Prediction unwanted variation in high-throughput experi-
of developmental chemical toxicity based on ments. Bioinformatics 28:882–883
gene networks of human embryonic stem cells. 18. Wiese R, Eiglsperger M et al (2004) yfiles—
Nucleic Acids Res 44:5515–5528 visualization and automatic layout of graphs.
10. Suemori H, Yasuchika K et al (2006) Efficient In: Jünger M, Mutzel P (eds) Graph drawing
establishment of human embryonic stem cell software. Springer, Berlin
Chapter 22
Abstract
Over the recent years development toward assessing skin sensitization hazard has moved toward non-
animal testing methods. These methods are based on the key events as described in the OECD Adverse
Outcome Pathway (AOP) for skin sensitization initiated by covalent binding to proteins. As these indi-
vidual methods address mainly one mechanistic event (key event) in the initiation of skin sensitization,
combination of different methods are needed to conclude on the skin sensitization hazard. Validated and
regulatory adopted (EU and OECD) in chemico/in vitro methods are available for KEs 1–3 and are pre-
sented here. This chapter also illustrates how individual test methods can be combined by providing two
examples of defined approaches to testing and assessment for skin sensitization hazard identification and
assessment.
Key words In vitro, In chemico, Adverse outcome pathway (AOP), Key event (KE), Skin
sensitization
1 Introduction
Laura H. Rossi is a staff member of the European Chemicals Agency; the views and opinions expressed in this article
represent exclusively the personal ideas of the author and do not represent the official position of the Agency.
Orazio Nicolotti (ed.), Computational Toxicology: Methods and Protocols, Methods in Molecular Biology, vol. 1800,
https://doi.org/10.1007/978-1-4939-7899-1_22, © Springer Science+Business Media, LLC, part of Springer Nature 2018
485
486 Laura H. Rossi and Janine Ezendam
Chemical
Structure/ MIE Cellular Level Organ Level
Properties
Covalent
Electrophilic Keratinocyte Dendritic Cell T-cell Activation Skin
Binding to
Chemicals Activation Activation and Proliferation Sensitization
Skin Proteins
Fig. 1 Representation of different key events from skin sensitization AOP (from AOPWiki)
2 Methods
2.1 In Vitro/In Currently there is one in chemico assay that has been validated and
Chemico Assay(s) adopted under OECD test guidelines programme, i.e., In chemico
for the AOP Molecular skin sensitization: Direct Peptide Reactivity Assay (DPRA), OECD
Initiation Event/Key TG 442C.
Event 1: Peptide The method is based on the fact that majority of organic chem-
Binding/Protein icals inducing skin sensitization are either inherently electrophilic
Binding or will be converted into electrophilic chemicals via metabolic acti-
vation or autooxidation/abiotic degradation. Electrophilic skin
sensitizing chemicals will bind to nucleophilic amino acids such as
cysteine and/or lysine. This reaction can be measured in this assay.
The DPRA assay uses synthetic heptapeptides containing either
cysteine (-Ac-RFAACAA-COOH) or lysine (-Ac-RFAAKAA-
COOH) and binding to the test chemical is measured by using
high-pressure liquid chromatography (HPLC) using ultraviolet
(UV) detector by determining the concentration of free peptide
Predicting Chemically Induced Skin Sensitization by Using In Chemico/In Vitro Methods 487
Table 1
Reactivity classes of the DPRA
that is available after incubating with the test chemical for 24 h at
25 ± 2.5 °C. The synthetic peptides contain phenylalanine to aid in
the detection. The standard operating procedures are described in
more detail in the EURL ECVAM DB-ALM protocol no. 154
(https://ecvam-dbalm.jrc.ec.europa.eu/public/datasheet/?id=4
C1CE4B5521172A599E751542A0D8CED59DECC02B7D3
B4D6CB9CD2CDCA67E19FA3291B895581F634) which pro-
vides the step by step instructions how to perform the assay,
reagents needed and how to calculate the results using the predic-
tion model (cysteine 1:10/lysine 1:50, or cysteine 1:10).
Based on the mean peptide depletion values, calculated for test
chemical and positive control substance (Cinnamic aldehyde, CAS
104-55-2; ≥95% food grade purity), a reactivity class is calculated
ranging from “No or minimal reactivity” (negative prediction) to
“High reactivity” (positive prediction). As a cut-off value the mean
cysteine and lysine peptide percent depletion value of 6.38 is used
to discriminate between peptide nonreactive and peptide reactive
chemicals (Table 1).
488 Laura H. Rossi and Janine Ezendam
2.2 In Vitro Assay(s) Under the OECD test guidelines programme, there is currently
for the AOP Key Event one test guideline that covers this key event (In Vitro Skin
2: Keratinocyte Sensitization—ARE-Nrf2 Luciferase Test Method; OECD
Activation TG442D). Currently, the only ARE-Nrf2 luciferase test method
covered by this test guideline is the Keratinosens™ test method.
Two other keratinocyte-based assays are included in the OECD
work program, the LuSens and SENS-IS.
2.4 LuSens Method The LuSens ARE-Nrf2 luciferase test method is based on the same
principle as the KeratinoSens™ assay; it measures activation of the
Nrf2-KEAP1 pathway in a human keratinocyte reporter cell line.
In the LuSens test method the luciferase gene is under the control
of an ARE-element from the rat NADPH Quinone oxidoreductase
(NQO1), instead of the human AKR1C2 in the KeratinoSens™
assay [10].
The protocol and data interpretation procedure of the LuSens
are similar to the KeratinoSens™, with a few minor differences.
The LuSens uses ethylene glycol dimethacrylate (EGDMA, CAS
97-90-5) as a positive control. More details on the standard oper-
ating procedures of the LuSens can be found in the EURL ECVAM
DB-ALM protocol no. 184 (https://ecvam-dbalm.jrc.ec.europa.
eu/methods-and-protocols/protocol/lusens-assay-protocol-
no.-184/key/p_1392).
LuSens underwent a Performance Standards based validation
study based on the KeratinoSens™ as a validated reference method
[11]. An independent review was conducted by ESAC [12], who
concluded that the assay was easily transferred to other labs and
had a very good within- and between-laboratory reproducibility
(100% in both cases). The accuracy of the LuSens was comparable
to the KeratinoSens™ and ranges between 71% and 85% [9, 11].
The accuracy of the LuSens for discriminating skin sensitizers (i.e.,
Predicting Chemically Induced Skin Sensitization by Using In Chemico/In Vitro Methods 491
2.6 In Vitro Assays Under the OECD test guidelines programme, there is currently
for the AOP Key Event one test guideline that covers this key event (In Vitro Skin
3: Activation Sensitization assays addressing AOP Key Event 3: activation of
of Dendritic Cells dendritic cells, OECD TG 442E). Currently, the following test
methods are included in this test guideline: Human Cell Line
Activation test (h-CLAT), U937 cell line activation Test
(U-SENS™), and Interleukin-8 Reporter Gene Assay (IL-8 Luc
assay).
The currently available test methods either quantify the change
in the expression of cell surface markers linked to the activation of
dendritic cells (DCs) when exposed to skin sensitizers (h-CLAT
and U-SENS™) or the changes in a cytokine linked to activation of
DCs, i.e., IL-8 (IL-8 Luc assay).
2.7 Assays The Human Cell Line Activation test (h-CLAT) uses monocyte-
Measuring Expression derived human monocytic leukemia cell line (THP-1) as surrogate
of Cell Surface to dendritic cells to investigate DC activation. In this test method
Markers the expression of two surface markers linked to DC maturation is
measured, i.e., CD86 (costimulatory molecule) and CD54 (adhe-
2.7.1 h-CLAT
sion molecule) by using flow cytometry [15]. Those two surface
markers are upregulated in activated DCs and are important when
DCs presents antigens to T-cells [16, 17]. As such, DC activation
is key in the process of skin sensitization.
In the test method THP-1 cells are exposed for 24 h to eight
different test chemical concentrations in a serial manner. The con-
centrations are selected based on cell viability of 75% (CV75) in
order to verify that the upregulation of the surface markers occurs
at sub-cytotoxic concentrations. Propidium iodide or other cyto-
toxicity markers are to be used to calculate the cell viability. The
test chemical is tested in at least in two independent runs in order
to obtain a single prediction (positive or negative). In each run
concurrent positive (2,4-dinitrochlorobenzene, CAS 97-00-7,
purity ≥99%) and a solvent/vehicle control is to be used.
Predicting Chemically Induced Skin Sensitization by Using In Chemico/In Vitro Methods 493
2.7.2 U-SENSTM Method The U-SENS™ method uses the human myeloid U937 cell line to
quantify changes in the CD86 cell surface marker following
45 ± 3 h exposure to different concentrations of the test chemical
using flow cytometry. The CD86 surface marker has been selected,
as this is one typical marker of U937 activation and costimulatory
molecule needed for DC activation [15, 25].
In the assay, no dose range finding assay is performed, but
multiple concentrations used directly up to 200 μg/mL either in
complete medium or in 0.4% DMSO. There needs to be at least
four different concentrations and at least two independent runs
that are performed on different days to derive single prediction,
i.e., positive or negative. In the second run, the concentrations
used can be modified to the test chemical depending on the EC150
and cytotoxicity noted in the first run. As in the h-CLAT assay the
EC150 and CV70 is used to derive the prediction, i.e., for each
run if the CD86 induction has been increased by at least 150%
with <30% cytotoxicity, the run is considered positive (pre-
dicted skin sensitizer). In each run concurrent positive
(2,4-dinitrochlorobenzene, CAS 97-00-7, purity ≥99%) control
and a solvent/vehicle control are to be used.
The standard operating procedures are described more in
detail in the EURL ECVAM DB-ALM protocol no. 183 (https://
ecvam-dbalm.jrc.ec.europa.eu/methods-and-protocols/
protocol/1567/u937-cell-line-activation-test-for-skin-sensitiza-
tion-(u-sens-)/datasheet) which provides step by step instructions
how to perform the assay, reagents needed and how to calculate
the results using the prediction model. The results from this assay
cannot be directly used to discriminate between skin sensitization
potency classes (UN GHS Cat 1A (extreme/strong), or Cat 1B
(moderate)). However, the information obtained from the
U-SENS™ assay may provide useful information when used
together with other information such as in vitro data obtained
from other skin sensitization key events.
The assay is not applicable to chemicals that are either not
soluble or do not form a stable dispersion in an appropriate
solvent/vehicle (e.g., complete medium or 0.4% DMSO).
Membrane-
disrupting chemicals, e.g., surfactants, may lead to
false positive predictions due to a nonspecific increase in the induc-
tion CD86 [26]. Strong fluorescent chemicals emitting the same
wavelength as the fluorochrome used, may lead to false negative
predictions.
The test method has gone through industry-led validation.
The ESAC Opinion on the method is available and has been pub-
lished in June 2016 (EURL ECVAM Scientific Advisory
Committee., EURL ECVAM validation available at: https://
ec.eur opa.eu/jr c/en/publication/esac-opinion-lor-al-
coordinated-study-transferability-and-reliability-u-sens-test-
Predicting Chemically Induced Skin Sensitization by Using In Chemico/In Vitro Methods 495
2.8 IL-8 Luc Assay The IL-8 Luc assay addresses KE3 by measuring the induction of
IL-8 mRNA in the THP-1 cell line [28]. IL-8 is a cytokine associ-
ated with activation of dendritic cells [29].
IL-8 Luc is a based on a stable reporter cell line, THP-G8,
which was established by transfection of plasmid vectors into the
THP-1 cells. In these vectors, expression of stable luciferase orange
(SLO) and the stable luciferase red (SLR) genes is under the con-
trol of IL-8 and GAPDH promoters, respectively. In that way IL-8
Luc allows quantitative assessment of IL-8, by measurement of
luciferase expression [28, 30].
THP-G8 cells are treated for 16 h with the test chemical dis-
solved in X-VIVO™15, a commercially available serum-free
medium. Both test chemicals that are fully soluble and those that
are not or poor soluble in this medium are included in this assay.
The poor or insoluble test chemicals are shaken for 30 min and
after that supernatants are used in the assay. Test concentrations
are based on preliminary dose-range findings using GAPDH
expression as a read-out. In each experiment, the positive control
4-nitrobenzyl bromide (4-NBB) (CAS 100-11-8, ≥99% purity)
and the negative control, lactic acid (LA) (CAS 50-21-5, ≥85%
purity) need to be included. After incubation SLO luciferase
activity (IL8LA) reflecting IL-8 promoter activity and SLR lucifer-
ase activity (GAPLA) reflecting GAPDH promoter activity are
measured. The measured values or the quadriplate measurement
are used to calculate the following parameters:
(a) normalised IL8LA (nIL8LA), which is the ratio of IL8LA to
GAPLA;
(b) the induction of nIL8LA (Ind-IL8LA), which is the ratio of the
means of nIL8LA of THP-G8 cells treated with a test chemical
and the values of the nIL8LA of untreated THP-G8 cells;
(c) the inhibition of GAPLA (Inh-GAPLA), which is the ratio of
the values of the GAPLA of THP-G8 cells treated with a test
496 Laura H. Rossi and Janine Ezendam
2.9 GARD Method The Genomic Allergen Rapid Detection test method (GARD) uses
gene expression profiling of transcripts and machine learning
approaches for predicting skin sensitization hazard and potency.
The human myeloid leukemia cell line (MUTZ-3) is used in this
assay as a surrogate to human dendritic cells.
In the assay, prior the chemical stimulation a qualitative phe-
notypic analysis of the MUTZ-3 cells used is performed to verify
that the proliferating cells are at an immature stage [32]. The test
chemicals are dissolved either in water or in DMSO. An end con-
centration of 500 μM is targeted for noncytotoxic and soluble
chemicals, and a highest possible concentration for chemicals hav-
ing limited solubility or cytotoxic chemicals resulting in relative
cell viability of 90%.
The cells are exposed to the test chemical for 24 h, following
which the cells are lysed with TRIzol reagent (Life Technologies)
for the RNA isolation performed according to the instructions by
the TRIzol supplier. From the RNA, cDNA is prepared for the
gene chip hybridization according to the Affymetrix GeneChip®
protocol using the recommended kits and controls (GeneChip®
Expression Analysis Technical Material. Available at:
https://www.affymetrix.com/support/downloads/manuals/
expression_analysis_plates_manual.pdf). More information about
the protocol can be obtained from the publications of Johansson
et al. and Zeller et al. [32, 33]. The standard operating procedures
are not yet available for this method (as of September, 2017).
In the GARD assay, the prediction signature consists of 200
transcripts (GARD prediction signature) that is proposed to be
refined to 52 transcripts [33]. The changes measured in the tran-
scripts are linked to the maturation and activation of dendritic
cells. The results obtained from the test chemical are compared to
498 Laura H. Rossi and Janine Ezendam
2.10 Defined The currently available and regulatory adopted in vitro/in c hemico
Approaches test methods cover different key events of the skin sensitization
AOP. To replace the LLNA, it is not sufficient to use only one of
such methods addressing a specific key event, but a combination of
test methods covering different steps of the AOP is needed. For
skin sensitization, several approaches combining different methods
including in silico, in chemico and in vitro methods to assess skin
sensitization hazard and/or potency have been proposed [35, 36].
However, there is currently no generally approved and/or vali-
dated way how to combine those methods in order to obtain reli-
able results that provide similar results to the currently available
in vivo methods (OECD TGs 406 and 429). Due to this reason,
an OECD project was approved in 2017 to develop assessment
criteria, how to combine in vitro and other data, to obtained
equivalent information in relation to hazard identification and
characterization when compared to the currently available in vivo
methods. The aim is to generate standardized data set and proce-
dures applied to obtain a prediction of a hazard (or the lack thereof)
without the need to apply expert judgment. For illustrative pur-
poses, two case studies are presented here; one for hazard identifi-
cation and one for potency assessment. More case studies are
included in the OECD GD 256, Annex 1 [37] and are described
in a recent review [35].
2.12 Integrated The ITS approach published by Kao Corporation combines infor-
Testing Strategy (ITS) mation obtained from DEREK Nexus (version 2.0 from Lhasa
for Skin Sensitization Limited), DPRA (Key event 1, OECD TG 442C) and h-CLAT
Potency Classification (key event 3, OECD TG 442E).
Based on In Silico, The DIP uses the quantitative parameters obtained from
In Chemico DPRA and h-CLAT assay and the outcomes obtained from
and In Vitro Data
Table 2
Conversion of the outcome of the individual assays of the ITS
Table 3
Classification of the ITS based on total battery scores
Table 4
Predictive performance of the defined approach for hazard identification
ITS
DEREK to assign scores that are used to predict the skin sensitiza-
tion potential and potency (Table 2) [39, 40].
For DEREK absence of an alert is assigned a score of 0 and
presence of an alert a score of 1. For DPRA scores are assigned
based on the mean peptide depletion values obtained and for
h-CLAT based on the concentrations for minimum inductions
thresholds.
The individual scores obtained are then summed together into
total battery scores from 0 to 7 to predict skin sensitization hazard
(skin sensitizer vs. non-sensitizer) and the skin sensitization potency
(two rank classes: EC3 <1% in Local Lymph Node Assay (LLNA)
(strong sensitizer), EC3 >1% in LLNA (weak sensitizer)) Table 3.
It is good to note, that the rank classes used here, i.e., EC3
thresholds, are not the same thresholds used to classify skin sensi-
tizers in the UN GHS potency classes, where EC3 ≤2% is for Cat
1A (strong/extreme) and is EC3 >2% is for moderate skin sensitiz-
ers [41], therefore the results obtained cannot be used directly to
classify into UN GHS potency categories.
The predictive capacity for skin sensitization hazard identifica-
tion (sensitizer vs. non-sensitizer) of this defined approach has
been calculated to be 89% for sensitivity, 70% for specificity and
84% for accuracy based on data generated from 139 chemicals
[40], when compared to LLNA (Table 4).
Predicting Chemically Induced Skin Sensitization by Using In Chemico/In Vitro Methods 501
Table 5
Predictive performance of the defined approach for potency estimation
ITS
3 Conclusion
References
1. OECD (2012) The Adverse Outcome Pathway 2. Schmidt M, Goebeler M (2015) Immunology
for Skin Sensitisation initiated by Covalent of metal allergies. J Dtsch Dermatol Ges
Binding for Proteins. Part 1. Scientific 13:653–660
Evidence. OECD Environment, Health and 3. EURL-ECVAM (2013) EURL ECVAM rec-
Safety Publications Series on Testing and ommendation on the Direct Peptide
Assessment. No. 168. Available at: http:// Reactivity Assay (DRPA) for skin sensitisa-
www.oecd.org/officialdocuments/publicdispla tion testing. European Commission Joint
ydocumentpdf/?cote=env/jm/ Research Centre. Institute for Health and
mono(2012)10/part1&doclanguage=en Consumer Protection. European Union
502 Laura H. Rossi and Janine Ezendam
Reference Laboratory for Alternatives to ibility and accuracy of the LuSens assay: a
Animal Testing (EURL ECVAM). Available reporter gene-cell line to detect keratinocyte
at:http://ihcp.jrc.ec.europa.eu/our_labs/ activation by skin sensitizers. Toxicol In
eurl-ecvam/eurl-e cvam-r ecommendations/ Vitro 32:278–286
E U R L - E C VA M -R E C - D P R A - D R A F T- 12. ESAC (2016) EURL ECVAM Scientific
ver02August2013.pdf Advisory Commitee (ESAC) opinion on the
4. Natsch A, Ryan CA, Foertsch L, Emter R, BASF-coordinated Performance Standards-
Jaworska J, Gerberick F, Kern P (2013) A data- based validation of the LuSens test method for
set on 145 chemicals tested in alternative assays skin sensitisation testing. Available at: http://
for skin sensitization undergoing prevalidation. publications.jrc.ec.europa.eu/repository/bit-
J Appl Toxicol 33:1337–1352 stream/JRC103706/esac_opinion_2016-04_
5. Natsch A (2010) The Nrf2-Keap1-ARE toxic- lusens_final.pdf
ity pathway as a cellular sensor for skin sensitiz- 13. Cottrez F, Boitel E, Auriault C, Aeby P, Groux
ers--functional relevance and a hypothesis on H (2015) Genes specifically modulated in sen-
innate reactions to skin sensitizers. Toxicol Sci sitized skins allow the detection of sensitizers in
113:284–292 a reconstructed human skin model.
6. Natsch A, Emter R, Gfeller H, Haupt T, Ellis G Development of the SENS-IS assay. Toxicol In
(2015) Predicting skin sensitizer potency based Vitro 29:787–802
on in vitro data from keratinosens and kinetic 14. Cottrez F, Boitel E, Ourlin J-C, Peiffer J-L,
peptide binding: global versus domain- based Fabre I, Henaoui I-S, Mari B, Vallauri A,
assessment. Toxicol Sci 143:319–332 Paquet A, Barbry P, Auriault C, Aeby P, Groux
7. EURL-ECVAM (2013) EURL ECVAM rec- H (2016) SENS-IS, a 3D reconstituted epider-
ommendation on the Keratinosens™ assay for mis based model for quantifying chemical sen-
skin sensitisation testing European Commission sitization potency: reproducibility and
Joint Research Centre. Institute for Health and predictivity results from an inter-laboratory
Consumer Protection. European Union study. Toxicol In Vitro 32:248–260
Reference Laboratory for Alternatives to 15. Ashikaga T, Yoshida Y, Hirota M, Yoneyama K,
Animal Testing (EURL ECVAM). Available Itagaki H, Sakaguchi H, Miyazawa M, Ito Y,
at:https://eurl-ecvam.jrc.ec.europa.eu/eurl- Suzuki H, Toyoda H (2006) Development of
e c v a m -r e c o m m e n d a t i o n s / an in vitro skin sensitization test using human
r e c o m m e n d a t i o n -k e r a t i n o s e n s -s k i n - cell lines: the human Cell Line Activation Test
sensitisation (h-CLAT). I. Optimization of the h-CLAT
8. Bauch C, Kolle SN, Fabian E, Pachel C, protocol. Toxicol In Vitro 20:767–773
Ramirez T, Wiench B, Wruck CJ, Ravenzwaay 16. Harris NL, Ronchese F (1999) The role of B7
BV, Landsiedel R (2011) Intralaboratory vali- costimulation in T-cell immunity. Immunol
dation of four in vitro assays for the prediction Cell Biol 77:304–311
of the skin sensitizing potential of chemicals. 17. Hwang I, Shen X, Sprent J (2003) Direct stim-
Toxicol In Vitro 63:489–504 ulation of naive T cells by membrane vesicles
9. Urbisch D, Mehling A, Guth K, Ramirez T, from antigen-presenting cells: distinct roles for
Honarvar N, Kolle S, Landsiedel R, Jaworska J, CD54 and B7 molecules. Proc Natl Acad Sci U
Kern PS, Gerberick F, Natsch A, Emter R, S A 100:6670–6675
Ashikaga T, Miyazawa M, Sakaguchi H (2015) 18. EURL-ECVAM (2015) EURL ECVAM
Assessing skin sensitization hazard in mice and Recommendation on the human Cell Line
men using non-animal test methods. Regul Activation Test (h-CLAT) for skin sensitisation
Toxicol Pharmacol 71:337–351 testing. European Commission, Joint Research
10. Ramirez T, Mehling A, Kolle SN, Wruck CJ, Centre, Institute for Health and Consumer
Teubner W, Eltze T, Aumann A, Urbisch D, Protection, accesible at https://eurl-ecvam.jrc.
van Ravenzwaay B, Landsiedel R (2014) ec.europa.eu/news/news_docs/
LuSens: a keratinocyte based ARE reporter eurl-ecvam-recommendation-on-the-human-
gene assay for use in integrated testing strate- cell-line-activation-test-h-clat-for-skin-sensiti-
gies for skin sensitization hazard identification. sation-testing
Toxicol In Vitro 28:1482–1497 19. EURL-ECVAM (2015) Re-analysis of the within
11. Ramirez T, Stein N, Aumann A, Remus T, and between laboratory reproducibility of the
Edwards A, Norman KG, Ryan C, Bader JE, human Cell Line Activation Test (h-CLAT).
Fehr M, Burleson F, Foertsch L, Wang X, European Commission, Joint Research Centre,
Gerberick F, Beilstein P, Hoffmann S, Institute for Health and Consumer Protection,
Mehling A, van Ravenzwaay B, Landsiedel R accessible at: https://eurl-ecvam.jrc.ec.europa.
(2016) Intra- and inter-laboratory reproduc- eu/eurl-ecvam-recommendations/
Predicting Chemically Induced Skin Sensitization by Using In Chemico/In Vitro Methods 503
38. Bauch C, Kolle SN, Ramirez T, Eltze T, Fabian E, Kasahara T, Fujita M, Toyoda A, Sekiya D,
Mehling A, Teubner W, Ravenzwaay BV, Landsiedel Watanabe S, Seto H, Hirota M, Ashikaga T,
R (2012) Putting the parts together: combining Miyazawa M (2015) Test battery with the
in vitro methods to test for skin sensitizing poten- human cell line activation test, direct peptide
tials. Regul Toxicol Pharmacol 63:489–504 reactivity assay and DEREK based on a 139
39. Nukada Y, Miyazawa M, Kazutoshi S, chemical data set for predicting skin sensitizing
Sakaguchi H, Nishiyama N (2013) Data inte- potential and potency of chemicals. J Appl
gration of non-animal tests for the develop- Toxicol 3:1318–1332
ment of a test battery to predict the skin
41. UN (2017) United Nations UN) Globally
sensitizing potential and potency of chemicals. Harmonised System (GHS) for Classification
Toxicol In Vitro 27:609–618 and Labeling. Revison 7, available at https://
40. Takenouchi O, Fukui S, Okamoto K, Kurotani www.unece.org/fileadmin/DAM/trans/dan-
S, Imai N, Fujishiro M, Kyotani D, Kato Y, ger/publi/ghs/ghs_rev07/English/03e_
part3.pdf
Chapter 23
Abstract
The present method describes a systems biology approach for the in silico predictive modeling of drug
toxicity. The data from LINCS were used to determine the type and number of pathways disturbed by each
compound and to estimate the extent of disturbance (network perturbation elasticity). Moreover, the most
frequently disturbed metabolic pathways and reactions were determined across the studied toxicants. The
process was exemplified by successful predictions on various statins. In conclusion, an entirely new approach
linking gene expression alterations to the prediction of complex organ toxicity was developed.
Key words Systems biology, Predictive modeling, Drug toxicity, Hepatotoxicity, Gene regulation
1 Introduction
Orazio Nicolotti (ed.), Computational Toxicology: Methods and Protocols, Methods in Molecular Biology, vol. 1800,
https://doi.org/10.1007/978-1-4939-7899-1_23, © Springer Science+Business Media, LLC, part of Springer Nature 2018
505
506 Oriol López-Massaguer et al.
Fig. 1 Global diagram of the method. Panel (1) consists of extracting the chemical-induced gene upregulation/
downregulation data for the substances of interest from the LINCS database. Panel (2) shows how gene–
protein–reactions (GPR) associations of the hepatocyte model are used in order to map the regulated genes
into the set of altered reactions. Panel (3) illustrates how flux variability analysis is applied to the hepatocyte
model in order to determine lower and upper bound limits. These limits are used in panel (4) to perform an
elasticity analysis in order to determine the perturbation in the network. Finally, panel (5) shows how altered
reactions can be mapped to identify perturbed pathways
2 Materials
Table 1
Supplementary material file listing
3 Methods
3.1 Extraction The first step is downloading the data of differential gene expression
of Gene Expression for the chemical perturbation (compound of interest) from the
Data from the LINCS LINCS database, the so-called gene expression signature.
Database In the LINCS database, 22,119 perturbations were available
for compounds with gene expression information based on the
L1000 technology (see Note 1). They focused on the 50 most
highly upregulated/downregulated genes for the chemical pertur-
bation (see Note 2).
508 Oriol López-Massaguer et al.
3.2 Cell Type- The next step is to map the gene expression data obtained in order
Specific Metabolic to identify the perturbed reactions in the metabolic network of the
Model liver.
1. For this purpose we use Recon 2, a consensus metabolic model
for human cell metabolism [2]. The model can be downloaded
from the Biomodels database [6], MODEL1109130000.1 The
model is represented using SBML and contains 65 cell-type
specific models, including liver hepatocytes. In the models,
each of the 7440 metabolic reactions is annotated with their
associated reactants and corresponding stoichiometric coeffi-
cients (see Note 4).
2. Reactions are annotated with pathway information, enzyme
class, and gene–protein–reaction (GPR) associations. GPRs are
Boolean functions that reflect the requirements in terms of
genes for their associated enzymes and reactions fi(gj), includ-
ing those requirements for enzyme complexes that are formed
by different subunits, the presence of isozymes, enzyme pro-
miscuity, etc. [7]. This step is independent of the compounds
we are interested in because it maps genes to reactions using
GPR rules (see Note 5). The data is available in File S1.
3. We obtained in the previous step from LINCS the upregu-
lated/downregulated gene sets g+(si) and g−(si) for several
chemical perturbations si. In our case the cell type is human
hepatocyte. To compute the perturbation on each reaction we
map the upregulated/downregulated genes into the GPR
Boolean rules defined in the model between genes and reac-
tions (File S2). The way we mapped this information is by
translating AND/OR into min/max relationships as follows,
for a pair of genes gi and gj:
1
https://www.ebi.ac.uk/biomodels-main/MODEL1109130000
Hepatotoxicity Prediction by Systems Biology Modeling of Disturbed Metabolic… 509
(
g i AND g j ® min g i , g j )
g i OR g j ® max ( g , g )
i j (1)
ì -1 if g i Î g -
ï
g i = í 0 otherwise
ï+1 if g Î g +
î i
These previous relationships can be interpreted as follows:
(a)
When two genes are associated with an AND operation
according to the GPR rule, then downregulation can be
caused by any of the two genes (min function). In the
AND case, it suffices for one of the genes to be downregu-
lated to potentially alter the reaction; if one of the genes is
overexpressed, we cannot assume overexpression as the
reaction depends on both genes.
(b) when the genes are associated with an OR rule, then only
one of the genes is needed in order to have upregulation
(max function). In this case, the downregulation of one of
the genes would not be a sufficient condition for expect-
ing negative regulation of the reaction (see Note 6).
4. By applying these criteria systematically to each gene involved
in a reaction flux, as described by the previous equation, we
obtained the overall effect of the gene regulation δijk on that
reaction under the given chemical perturbation si.
For example, the following GPR rule:
(6520.1) and (23,428.1) or (6520.1) and (23,428.2) or
(56,301.1) and (6520.1) → R_CYSGLYexR
is transformed into:
min(max(6520.1, 23,428.1), max(6520.1, 23,428.2), max
(56,301.1, 6520.1)) → R_CYSGLYexR
The pseudo code of the steps:
1. Load liver hepatocytes metabolic model.
2. Obtain from the metabolic model the GPR rules that map
genes to reactions.
3. Map chemical-induced gene perturbation (altered expres-
sion) to reactions using GPR rules (see https://github.
com/phi-grib/Hepatoxicity-SysBio/blob/master/note-
book/Statins.ipynb).
(a)
For each chemical perturbation that produce gene
overexpression and underexpression (obtained from
LINCS database).
(b) Gene expression alteration is mapped using GPR rules
to the associated altered reactions.
510 Oriol López-Massaguer et al.
3.3 Flux Variability The procedure described in the previous section provides an estimate
Analysis of the main perturbed reactions in the cell. In the next step, we
perform a simulation of the steady state of the cell model in order to
determine the distribution of fluxes for each reaction. A systems
analysis of the model can determine how critical such perturbations
may be in terms of disrupting the steady state equilibrium.
1. We use a sensitivity analysis method known as flux variability
analysis (FVA), which provides an estimate of the upper and
lower bounds of each steady state reaction flux vi in the cell [8]
(see Note 7). We used the COBRApy package [4].The method
is based on determining the solution, i.e., the distribution of
fluxes in the cell, that corresponds to the optimal state of the
cell in the sense of some fitness objective function (see Note 8).
2. First, we determined the space of solution fluxes vi that maxi-
mizes the objective function defined for each cell in Recon 2
by solving the following constraint-based optimization
problem
Sv = 0
ai £ vi £ bi (2)
max b = c T v
where S is the stoichiometry matrix, where each metabo-
lite has a row and each reaction a column, and stoichiometric
coefficients are mapped into each column; αi and βi are the
nominal lower and upper bounds for each flux vi, and cT is a
vector of coefficients corresponding to the biomass function
in the cell model.
3. Once the optimal biomass b0 has been calculated, in order to
determine the allowed upper and lower bounds for each flux,
vilb, viub respectively, for each reaction flux vi, we calculated an
optimization to maximize/minimize the flux under the addi-
tional constraint of setting the biomass to its optimal b0:
Sv = 0
ai £ vi £ bi
(3)
c T v = b0
min/max vi i ¹ j
Hepatotoxicity Prediction by Systems Biology Modeling of Disturbed Metabolic… 511
3.4 Estimation In the fourth step, we will compute the global perturbation of the
of the Perturbation metabolic network from the chemical perturbation.
Elasticity
1. From the previous computations, we can define a coefficient in
order to estimate the global network perturbation using the
flux variability information as a measure of robustness of the
system in combination with differential gene expression infor-
mation to identify the points of perturbation in the network,
to ultimately characterize cell responses under chemical per-
turbations. To that end, we can define a coefficient to estimate
the overall level of network elasticity perturbation of a given
reaction flux k due to the chemical perturbation i in cell type j
as follows:
( ) ( )
Dijk = dijk vklb H -dijk + dijk vkub H dijk (4)
512 Oriol López-Massaguer et al.
3.5 Determination We can determine also the metabolic pathways that are altered
of Altered Pathways by the chemical perturbation. Such information may be used to
obtain additional insights about the toxicity mechanisms.
1. Collect a consensus list of pathways associated with each reac-
tion in the hepatocyte cell model (see Note 12). Such a list can
be obtained from Recon 2, which annotates reactions in the
cell models with associated pathways, also called “subsystems.”
Alternatively, other databases like BioCyc, KEGG or Reactome
associate genes and reactions with pathways and are therefore
valid sources for pathway information (see Note 13).
2. For each compound, according to the list of perturbed reac-
tions obtained from Eq. 1, determine number of perturbed
reactions per pathway.
3. Pathways containing a high percentage of perturbed reactions
are more likely to be associated with a mechanism of response
to toxicity (see Note 14).
2
http://mathworld.wolfram.com/HeavisideStepFunction.html
Hepatotoxicity Prediction by Systems Biology Modeling of Disturbed Metabolic… 513
4 Example
Table 2
LINCS database identifiers corresponding to the three statins
Table 3
LINCS gene expression profiling signatures for the three statins
4.2 Mapping Gene We used the gene expression obtained in the previous step (File S3)
Expression to Altered to map it to the reactions altered based on the GPR rules from
Reactions Recon 2 (File S1) using the method described before.
See the jupyter notebook provided for further details of the
implementation.
4.3 FVA for Statins The result of the Flux elasticities obtained from the Recon 2 meta-
bolic model (File S2) was applied to the three statins and we obtain
a list of the upper and lower bounds for each perturbed reaction in
our example for statins. We obtained the pathways altered as well
(fragment). See Table 4.
See the jupyter notebook provided for further details of the
implementation. The full list is in the S4 file.
4.4 Estimation For the statins example of the network perturbation results are
of the Perturbation shown in Table 5 and Fig. 2.
Elasticity See the jupyter notebook provided for further details of the
implementation.
Table 4
List of pathways altered for the statins example (fragment)
Lower Upper
Reaction Pravastatin Simvastatin Rosuvastatin bound bound Pathway
2OXOADOXm −1 −1 0 0 3 Lysine metabolism
ACACT1r 0 1 1 −1000 333.67 Tryptophan metabolism
ACACT1x 0 1 1 0 774.83 Cholesterol metabolism
ACACT4p 0 0 1 0 501.48 Fatty acid oxidation
Table 5
Network perturbation results for the statins example
Fig. 2 Network perturbation results bar chart for the statins example (pravastatin, simvastatin, and
rosuvastatin)
4.5 Altered For the example of the statins, File S5 shows the list of pathways
Pathways that were altered in each case and the number of reactions in each
pathway. See Table 6.
See the jupyter notebook provided for further details of the
implementation.
5 Notes
Table 6
List of pathways altered with the number of reactions in each pathway
Acknowledgment
References
1. Duan Q, Flynn C, Niepel M et al (2014) 8. Orth JD, Thiele I, Palsson BØ (2010) What is
LINCS Canvas Browser: interactive web app to flux balance analysis? Nat Biotechnol
query, browse and interrogate LINCS L1000 28:245–248
gene expression signatures. Nucleic Acids Res 9. Zhu X, Kruhlak NL (2014) Construction and
42:W449–W460 analysis of a human hepatotoxicity database
2. Thiele I, Swainston N, Fleming RMT et al suitable for QSAR modeling using post-market
(2013) A community-driven global recon- safety data. Toxicology 321:62–72
struction of human metabolism. Nat 10. Ahmed MH, Al-Atta A, Hamad MA (2012)
Biotechnol 31:419–425 The safety and effectiveness of statins as treat-
3. Carbonell P, Lopez O, Amberg A, Pastor M, ment for HIV-dyslipidemia: the evidence so far
Sanz F (2017) Hepatotoxicity prediction by and the future challenges. Expert Opin
systems biology modeling of disturbed meta- Pharmacother 13:1901–1909
bolic pathways using gene expression data. 11. Liu R, AbdulHameed MDM, Wallqvist A
ALTEX 34:219–234 (2017) Molecular structure-based large-scale
4. Ebrahim A, Lerman JA, Palsson BO, Hyduke prediction of chemical-induced gene expression
DR (2013) COBRApy: COnstraints-Based changes. J Chem Inf Model 57:2194–2202
Reconstruction and Analysis for python. BMC 12. Larhlimi A, Blachon S, Selbig J, Nikoloski Z
Syst Biol 7:74 (2011) Robustness of metabolic networks: a
5. (2007) GNU General Public License, version 3 review of existing definitions. Biosystems
6. Juty N, Ali R, Glont M et al (2015) BioModels: 106:1–8
content, features, functionality, and use. CPT 13. King ZA, Dräger A, Ebrahim A, Sonnenschein
Pharmacometrics Syst Pharmacol 4:55–68 N, Lewis NE, Palsson BO (2015) Escher: a
7. Thiele I, Palsson BØ (2010) A protocol for web application for building, sharing, and
generating a high-quality genome-scale meta- embedding data-rich visualizations of biologi-
bolic reconstruction. Nat Protoc 5:93–121 cal pathways. PLoS Comput Biol 11:e1004321
Chapter 24
Abstract
The assessment of acute toxicity of chemicals by in silico methods is actually done by two methodologies,
read-across and QSAR. The two approaches are strongly based on the similarity between the chemical for
which a risk assessment is required and the reference chemical(s) for which the experimental data are
known. Here, we describe the two methodologies with some main publications as illustrations and the in
silico data associated with acute toxicity endpoints (ECHA, REACH) accessible via eChemPortal.
Key words Acute toxicity, Endpoints, In silico methods, Read-across, QSAR, REACH
1 Introduction
Orazio Nicolotti (ed.), Computational Toxicology: Methods and Protocols, Methods in Molecular Biology, vol. 1800,
https://doi.org/10.1007/978-1-4939-7899-1_24, © Springer Science+Business Media, LLC, part of Springer Nature 2018
519
520 Ronan Bureau
Table 1
GHS acute toxicity scheme
Fig. 1 Decision tree to assess whether an in vivo acute toxicity test is required, when the registrant has to
generate a novel repeated dose subacute oral toxicity study [2]
522 Ronan Bureau
2 Materials
The databases and the software for QSAR and read-across were
described in recent reviews [7–9] and summarized in Table 2.
In silico methods for the prediction of chemical toxicity were
also described in recent reviews [8, 25, 26]. A guidance on
information requirements and chemical safety assessment related to
QSARs and grouping of chemicals was published by ECHA [27].
Table 2
Software and databases related to acute toxicity
Software Databases
TEST (US EPA) [10] ECHA [11]
ACD/TOX [12] OECD QSAR Toolbox [13]
ADMET Predictor [14] RTECS [15]
Multicase MCASE/MC4PC [16] ACToR/EPA [17]
TerraQSAR [18] TOXNET/NLM [19]
TOPKAT [15] ChemIDplus/NLM [20]
Leadscope [21]
eChemPortal [22]
The Merck Index/RSC [23]
PAN [24]
Nontest Methods to Predict Acute Toxicity: State of the Art for Applications of In Silico… 523
3 Methods
3.1 Read-Across/ The assessment of the acute toxicity starting from read-across rep-
Category Approaches/ resents one main approach for in silico studies. For this approach,
Structural Analog it is necessary to get robust experimental data of similar chemicals
compared to the reference chemical. From these similar chemicals
(classically called nearest neighbors (NNs) in the next parts), the
prediction for the reference chemical will be the same category of
toxicity (like LD50 > 2000 mg/kg bw in case of nontoxicity) or an
average value of the ATE associated with NNs (potentially weighted
by the molecular similarities toward the reference chemical). The
molecular similarity [28] between two compounds is classically
dependent of two parameters: the molecular fingerprint used to
describe the chemicals and the metric to compare these molecular
fingerprints (similarity coefficient or distance between chemicals).
An large number of fingerprints [28, 29] are available, from sub-
structure keys-based fingerprints (like MACCS keys) to topologi-
cal fingerprints. Circular fingerprints (a kind of topological
fingerprints) are also present with a definition of the environment
of each atom around a determined radius (like ECFP4 (extended-
connectivity fingerprints) with a radius corresponding to four
chemical bonds). The metric [29] is often a similarity coefficient,
like Tanimoto or DICE coefficients, with a range of values between
0 and 1 (1: highly similar). It is also possible to explain how a
chemical is similar to another one, starting from the definition of a
particular substructure associated with a chemical family. This will
be described below.
To analyze the in silico methodologies used, we have consid-
ered the endpoint data recorded in eChemPortal (OECD). Among
the databases recorded in eChemPortal, ECHA CHEM represents
the main database (48,250 substances with 598,842 endpoints
recorded in September, 2017). For the property search, we have
considered the three acute toxicities (oral, inhalation, and dermal)
and the associated in silico study result type ((Q)SAR/read-across
based on grouping of substances (category approach)/read-across
from supporting substance (structural analog or surrogate)).
3.1.1 A Typical Case Prior to analyzing the data from ECHA, a recent publication [30]
described well the principle and discussed about the benefits of using
read-across to fulfill hazard data requirements. This recent publica-
tion is based on a category approach used by the American Cleaning
Institute (ACI) to fill hazard data gaps across 261 substances (high
production volume for these substances). Nine chemical categories
were defined with separations based on common functional group(s)
and an incremental and constant change across the category (e.g., a
chain-length category). Subcategories were also established within a
category based on some specific characteristics. For instance, the
524 Ronan Bureau
Fig. 2 Process for developing categories described in the publication of Stanton et al. [30]
3.1.2 General View The property search “oral acute toxicity” and a study type associ-
of the Methodology ated with read-across based on grouping of substances (category
Through the Registration approach) lead to 2654 results. By filtering on the substance name
Files (ECHA, eChemPortal) (several records for a same substance), it remains 566 substances.
The term “dermal acute toxicity” leads to 499 substances. The
term “inhalation acute toxicity” leads to 511 substances. By con-
sidering the three property search together, this is associated with
806 unique substances. The property search “oral acute toxicity”
and a study result type associated with read-across from supporting
substances (structural analog or surrogate) lead to 3194 results
with 1119 unique substances (based on the substance name). The
term “dermal acute toxicity” leads to 936 substances. The term
“inhalation acute toxicity” leads to 766 substances. By considering
the three property search together, this is associated with 1568
unique substances. For the registration files, the two in silico meth-
ods (grouping or supporting substances) are very close and the
consideration of supporting substances with robust experimental
data is largely the basis for the assessment of the prediction.
For grouping of substances and read-across, a typical example
is associated with the use of QSAR Toolbox [13]. The main
approach is to consider for the risk assessment the average value
(LD50) of the nearest neighbors (NN, 5 NN classically). Associated
with this molecular similarity, the notion of validation domain is
defined. The predictions were made on a chemical subset which
respect a validation domain specify by a logical expression like: “a”
and “b” (not “c”). An illustration of the parameters of the v alidation
domain is for instance:
–– “a”: the target chemical should be classified as high (Class III)
by Toxic hazard classification by Cramer.
–– “b”: the target chemical should be classified as No alert found
by DNA binding by OASIS.
526 Ronan Bureau
NH2
O
1
O O
O O
NH 2
Fig. 3 Assessment of two enantiomers of 1 and three similar chemicals by considering 1 as basis
Nontest Methods to Predict Acute Toxicity: State of the Art for Applications of In Silico… 527
3.2 QSAR The partition coefficient between octanol and water (logKOW), as
hydrophobic descriptor, is strongly present for QSAR associated
3.2.1 LogKOW as Main with acute mammalian toxicity. For instance, Eq. 1 is associated
Chemical Descriptor with alcohols (n = 57, s = 0.24, r2 = 0.956). LogKOW is also the
Coupled with Hansch’s main descriptor for the acute toxicity in ecotoxicology for which
Approach the narcosis process represents the main mode of action (MOA) of
the chemicals [32]. Based on the ratio [33] between a predicted
value with logKOW (see Eq. 1, acute oral toxicity) and the experi-
mental value, the notion of Toxic Ratio (TR) was exploited to esti-
mate a particular MOA for some derivatives (this was done also in
ecotoxicology). From their analysis (108 chemicals, acute oral tox-
icity), baseline toxicity may be expected for alcohols, acids, ketones,
and one-ring aromatics. Aldehydes and amines show excess t oxicity,
whereas simple aliphatic hydrocarbons have a miscellaneous mech-
anism of action. Esters show less than baseline toxicity.
- log LD50 ( mol / kg ) = 0.6 log K ow - 0.8 log ( 0.08K ow + 1) + 1.13 (1)
This TR study was also done for inhalation toxicity with
another equation (Eq. 2, n = 18, r2 = 0.964, s = 0.140) related to
the air–water partition coefficient Kaw. The values of Kaw were lack-
ing for 65 chemicals (out of the 108). They also indicated than the
reliability of the TR values of some of the remaining 43 chemicals
may be questionable, due to either problems associated with the
calculation of the baseline toxicity value or errors associated with
the experimental values.
528 Ronan Bureau
Fig. 5 Modeling workflow. (a) Preparation of the target data set. (b) Modeling procedure for qHTS LD50 data
set from [42]
Fig. 6 External prediction results of kNN models using the distribution of the predicted values (chemical only
descriptors (left) vs the best hybrid descriptors (right)) [42]
set associated with the prediction (RI value superior to 0.5 are reli-
able for the software). RI value takes into account the molecular
similarity between the training set and the compound. The soft-
ware displays up to the five most similar structures with their
experimental data for validating the prediction. When this RI value
is close to the limit (RI = 0.3, no reliability), the prediction is
clearly defined in the conclusions has borderline reliable but, for
the administrative data (registration file), the reliability is still
defined as 2. A QMRF ((Q)SAR Model Reporting Format) is
recorded (ACD/Percepta, Q32-48-43-425 and 426), at the JRC
[43], for rat and mouse acute oral toxicity. It allows to understand
the statistical method and the descriptors used for the prediction.
The statistical method is named GALAS corresponding to a com-
bination of two systems: a first PLS model for the prediction of
LD50 and a local correction based on the model performance for
similar compounds.
With TEST version 4.1, the ten compounds with a low reli-
ability have always the same conclusion: “A prediction is made but
the result may not be reliable as only moderately similar com-
pounds with known experimental values in the training set have
been found”. Beside these ten compounds, four compounds have
a prediction based on TEST version 3.2 and the reliability is 2 (but
it is impossible to have an overall view of the analysis). We have
compared the results with the version 4.2.1 of TEST by consider-
ing two out of the ten compound and one (CAS number: 24650-
42-8) out of the four compounds predicted with the version 3.2.
We have used the consensus method for the prediction (consensus
of hierarchical clustering, FDA and nearest neighbors). An applica-
bility domain is defined before to carry out a prediction [10]. With
the CAS number 24650-42-8 associated with a reliability of 2, the
prediction is explained by considering similar compounds in the
external testing set (Mean Absolute Error (MAE) of 0.78 for com-
pounds with a similarity coefficient ≥0.5). If we start with one of
the derivatives with a reliability of 3 like the 2-phenylisobutyric
acid, the final results seems better by considering the MAE (MAE
of 0.29). The problem could be related to the absence of an acidic
function for the NNs. However, for the dicyclohexyl ketone (CAS
number: 119-60-8), other chemical with a reliability of 3, a very
high similar compound is found (CAS number 90-42-6, similar-
ity = 0.88) and the quality of the prediction for the NNs is correct.
So, it is difficult to understand why we have a reliability of 2 in one
case and a reliability of 3 in the two other cases. The quality of the
data (robust data) associated with the NNs must be the explana-
tion (ChemIDplus database from US National library of Medicine).
Indeed, the source for the LD50 associated with the NN of dicy-
clohexyl ketone (CAS number 90-42-6) is related to “raw material
data handbook, vol. 1, 1974”. This information (experimental
532 Ronan Bureau
4 Conclusion
future. These new data will be the basis for a larger application of in
silico methodology. It is sure that actually most of the predictions are
related to derivatives with low toxicity (the WoE proposed primarily
applies to low toxicity substances) and the challenge will be to inte-
grate chemicals with highest toxicity in the future. The consideration
of data related to systemic biology should be an opportunity for
reaching this objective as described briefly with the consideration of
chemical and biological descriptors in a QSAR approach.
References
1. Walum E (1998) Acute oral toxicity. Environ 12. ACD ACD/Percepta. http://www.acdlabs.
Health Perspect 106(Suppl 2):497–503 com/products/percepta/physchem_adme_tox/
2. ECHA (2016) Acute toxicity. Guidance on 13. QSAR Toolbox. https://www.qsartoolbox.
IR&CSA, Section R.7.4. https://echa.europa. org/
eu/guidance-documents/guidance-on-infor- 14. SimulationsPLUS ADMET predictors.
mation-requirements-and-chemical-safety-as- http://www.simulations-plus.com/software/
sessment admet-property-prediction-qsar/
3. ECHA (2016) Guidance on the application 15. BIOVIA QSAR, ADMET and predictive toxi-
of CLP criteria. https://echa.europa.eu/ cology. http://accelrys.com/products/col-
regulations/clp/classification laborative-science/biovia-discovery-studio/
4. UNECE globally harmonized system of clas- qsar-admet-and-predictive-toxicology.html
sification and labelling of chemicals (GHS). 16. Multicase. http://www.multicase.com/
http://www.unece.org/trans/danger/publi/ 17. actor. https://actor.epa.gov/actor/home.
ghs/ghs_rev00/00files_e.html xhtml
5. Gissi A, Louekari K, Hoffstadt L, Bornatowicz 18. Terrabase TerraTox. http://www.terrabase-
N, Aparicio AM (2016) Alternative acute oral inc.com/
toxicity assessment under REACH based on
sub-acute toxicity values. ALTEX 34:353–361 19. TOXNET. https://toxnet.nlm.nih.gov/
6. Blomme EA, Will Y (2016) Toxicology strat- 20. ChemIDPLus. http://sis.nlm.nih.gov/chem/
egies for drug discovery: present and future. alllocators.html
Chem Res Toxicol 29:473–504 21. Leadscope. http://www.leadscope.com/
7. Lapenna S, Fuart-Gatnik M, Worth A (2010) 22. eChemPortal. https://www.echemportal.
Review of QSAR models and software tools org/echemportal/index.action
for predicting acute and chronic systemic 23. The Merck Index. https://www.rsc.org/
toxicity. JRC Scientifica and Technical Merck-Index/
Reports. http://publications.jrc.ec.europa. 24. PAN Pesticide Action Network. http://www.
eu/repositor y/bitstream/JRC61930/ pesticideinfo.org/
eur_24639_en.pdf 25. Raies AB, Bajic VB (2016) In silico toxicol-
8. Nicolotti O, Benfenati E, Carotti A, Gadaleta ogy: computational methods for the predic-
D, Gissi A, Mangiatordi GF, Novellino E tion of chemical toxicity. Wiley Interdiscip Rev
(2014) REACH and in silico methods: an Comput Mol Sci 6:147–172
attractive opportunity for medicinal chemists. 26. Patlewicz G, Fitzpatrick JM (2016) Current
Drug Discov Today 19:1757–1768 and future perspectives on the develop-
9. Toropov AA, Toropova AP, Raska I, ment, evaluation, and application of in silico
Leszczynska D, Leszczynski J (2014) approaches for predicting toxicity. Chem Res
Comprehension of drug toxicity: software and Toxicol 29:438–451
databases. Comput Biol Med 45:20–25 27. ECHA (2008) Guidance on information
10. TEST user’s guide for T.E.S.T. (version 4.2) requirements and chemical safety assessment,
(toxicity estimation software tool) a program QSAR and grouping of chemicals (Chapter
to estimate toxicity from molecular structure. R.6). https://echa.europa.eu/guidance-doc-
https://www.epa.gov/chemical-research/ uments/guidance-on-information-require-
users-guide-test-version-42-toxicity-estima- ments-and-chemical-safety-assessment
tion-software-tool-program-estimate 28. Muegge I, Mukherjee P (2016) An overview
11. ECHA (2017) https://echa.europa.eu/ of molecular fingerprint similarity search in
534 Ronan Bureau
virtual screening. Expert Opin Drug Discov The importance of hydrophobicity and elec-
11:137–148 trophilicity descriptors in mechanistically-
29. Cereto-Massague A, Ojeda MJ, Valls C, based QSARs for toxicological endpoints. SAR
Mulero M, Garcia-Vallve S, Pujadas G (2015) QSAR Environ Res 13:167–176
Molecular fingerprint similarity search in vir- 37. Devillers J, Devillers H (2009) Prediction of
tual screening. Methods 71:58–63 acute mammalian toxicity from QSARs and
30. Stanton K, Kruszewski FH (2016) Quantifying interspecies correlations. SAR QSAR Environ
the benefits of using read-across and in silico Res 20:467–500
techniques to fulfill hazard data require- 38. Gonella Diaza R, Manganelli S, Esposito
ments for chemical categories. Regul Toxicol A, Roncaglioni A, Manganaro A, Benfenati
Pharmacol 81:250–259 E (2015) Comparison of in silico tools for
31. Klimisch HJ, Andreae M, Tillmann U (1997) evaluating rat oral acute toxicity. SAR QSAR
A systematic approach for evaluating the qual- Environ Res 26:1–27
ity of experimental toxicological and ecotoxico- 39. Chavan S, Nicholls IA, Karlsson BC, Rosengren
logical data. Regul Toxicol Pharmacol 25:1–5 AM, Ballabio D, Consonni V, Todeschini R
32. Lipnick RL (1999) Correlative and mechanis- (2014) Towards global QSAR model building
tic QSAR models in toxicology. SAR QSAR for acute toxicity: Munro database case study.
Environ Res 10:239–248 Int J Mol Sci 15:18162–18174
33. de Wolf W, Lieder PH, Walker JD (2004) 40. Munro IC, Ford RA, Kennepohl E, Sprenger
Application of QSARs: correlation of acute JG (1996) Correlation of structural class with
toxicity in the rat following oral or inhalation no-observed-effect levels: a proposal for estab-
exposure. QSAR Comb Sci 23:521–525 lishing a threshold of concern. Food Chem
34. Hansch C, Leo A, Hoekman D (eds) (1995) Toxicol 34:829–867
Exploring QSAR: hydrophobic, electronic, 41. TALETE Dragon 7. http://www.talete.mi.it/
and steric constants, vol 2. ACS professional products/dragon_description.htm
reference book. American Chemical Society, 42. Sedykh A, Zhu H, Tang H, Zhang L, Richard
Washington, DC A, Rusyn I, Tropsha A (2011) Use of in vitro
35. Tsakovska I, Lessigiarska I, Netzeva T, Worth HTS-derived concentration-response data as
AP (2008) A mini review of mammalian toxic- biological descriptors improves the accuracy
ity (Q)SAR models. QSAR Comb Sci 27:41–48 of QSAR models of in vivo toxicity. Environ
36. Cronin MTD, Dearden JC, Duffy JC, Edwards Health Perspect 119:364–370
R, Manga N, Worth AP, Worgan ADP (2002) 43. QMRF, JRC Europen Commission. http://
qsardb.jrc.it/qmrf/search_catalogs.jsp
Chapter 25
Abstract
In this review we address to what extent computational techniques can augment our ability to predict
toxicity. The first section provides a brief history of empirical observations on toxicity dating back to the
dawn of Sumerian civilization. Interestingly, the concept of dose emerged very early on, leading up to the
modern emphasis on kinetic properties, which in turn encodes the insight that toxicity is not solely a prop-
erty of a compound but instead depends on the interaction with the host organism. The next logical step
is the current conception of evaluating drugs from a personalized medicine point of view. We review recent
work on integrating what could be referred to as classical pharmacokinetic analysis with emerging systems
biology approaches incorporating multiple omics data. These systems approaches employ advanced statisti-
cal analytical data processing complemented with machine learning techniques and use both pharmacoki-
netic and omics data. We find that such integrated approaches not only provide improved predictions of
toxicity but also enable mechanistic interpretations of the molecular mechanisms underpinning toxicity
and drug resistance. We conclude the chapter by discussing some of the main challenges, such as how to
balance the inherent tension between the predicitive capacity of models, which in practice amounts to
constraining the number of features in the models versus allowing for rich mechanistic interpretability, i.e.,
equipping models with numerous molecular features. This challenge also requires patient-specific predic-
tions on toxicity, which in turn requires proper stratification of patients as regards how they respond, with
or without adverse toxic effects. In summary, the transformation of the ancient concept of dose is currently
successfully operationalized using rich integrative data encoded in patient-specific models.
Key words Toxicology, Systems biology, Network pharmacology, Algorithmic complexity, Omics
Orazio Nicolotti (ed.), Computational Toxicology: Methods and Protocols, Methods in Molecular Biology, vol. 1800,
https://doi.org/10.1007/978-1-4939-7899-1_25, © Springer Science+Business Media, LLC, part of Springer Nature 2018
535
536 Narsis A. Kiani et al.
Translation
&Validation
between drugs and targets can facilitate the process of drug discov-
ery by deciphering a drug’s mechanism of action, thereby assisting
researchers seeking new targets for an old (FDA approved) drug as
well as new drug candidates for a known target [41–45]. The main
source of information in reconstruction of the Drug–Target inter-
action network (DTN) is the Drug Bank, which is one of the major
publicly available integrated sources of drugs and targets. It is a
highly comprehensive database combining chemical properties and
detailed clinical information about drugs and their targets. It also
provides drug-related data feeds for well-known databases such as
Uniprot, PubChem, PDB, and KEGG [46, 47].
In spite of the fact that mining drug–target interaction data is
increasing at an amazing rate [42], drug–target interaction data
currently available from public sources are largely incomplete and
biased toward targets of common therapeutic interest [48–50].
Biochemical experiments or in vitro methods for finding drug–tar-
get interaction are costly and time-consuming. An attempt to
address the issue of data completeness of drug–target interaction
involves using in silico methods [51]. For example, docking simu-
lations are extensively used in pharmacology. AutoDock [52] is
one of the most complete suites of free open-source software for
the computational docking and virtual screening of small mole-
cules to macromolecular receptors. Xie et al. identified drug off-
targets by docking the drug into protein binding pockets similar to
those of its primary target, followed by mapping the proteins with
the best docking scores to known biological pathways, thus pre-
dicting potential side effects [53].Classically, the process starts
with a target of known three-dimensional structure, and docking is
used to predict the bound conformation and binding energy. In
most cases, the three-dimensional structure of a target is needed to
compute the binding of each drug candidate to the target, which
for many targets are still unavailable [54–56]. Wallach et al. have
developed a method to mitigate the impact of this important limi-
tation. They utilize a dataset where there is a pairing of drugs with
their observed adverse drug reactions (ADRs), the protein struc-
ture database and in silico virtual docking to identify putative pro-
tein targets for each drug and search for correlated pairs of side
effects and biological pathways [57]. Another challenge when per-
forming docking simulation is that it is computationally expensive
and most of the methods must simplify the problem to make the
computation feasible. The reduction of conformational space by
imposing limitations on the system, such as fixed bond angles and
lengths in the ligand or a simplified scoring function such as those
based on empirical free energies of binding to score poses quickly
at each step of the conformation search, are the most common
short-cuts that are currently used in the field [52, 58].
In a more recent effort, machine-learning approaches have
been used for larger-scale predictions of drug–target interactions.
542 Narsis A. Kiani et al.
The new interactions between drugs and targets can lead to poten-
tial insights on previously unidentified side effects for a particular
drug. This idea is the basis of another category of systems toxicol-
ogy methods. Machine learning-based methods mostly use struc-
tural and chemical descriptors of drugs and sequences of targets,
similarity matrix or (and) any other pharmacological information
about drugs as input. Then they use any machine learning method,
such as support vector machines (SVMs) or kernel regression, for
predicting the drug–target interactions [59–63]. Cobanoglu et al.
used the known interactions in the Drug Bank in the form of a
bipartite network to train a model that represents each drug and
target as a vector of latent variables and assigns weights to drug–
target interactions using probabilistic matrix factorization [64].
Approaches that use similarity scores as input are more promising
than other approaches [41].
In general, the use of machine-learning algorithms is one of
most promising approaches to extracting knowledge from big data
using a data-driven framework. However, the performance of
machine-learning algorithms relies heavily on data representations
called features, and identifying which features are more appropri-
ate for the given task is very difficult. Deep Learning has recently
emerged as a promising technique where the features do not need
to be hand-crafted a priori. Recent success has been accomplished
thanks to the availability of fast computations, massive (labeled)
datasets and sophisticated algorithms [65]. Machine learning using
deep learning is defined by neural networks with multiple hidden
layers. Each layer basically constructs a feature from the preceding
layers [66]. The training process allows layers deeper in the net-
work to contribute to the refinement of earlier layers. For this rea-
son, these algorithms can automatically engineer or discover
features that are suitable for representing the data at hand. When
sufficient data are available, these methods construct features
attuned to a specific problem and combine those features into a
predictor [67]. Deep learning algorithms have shown promise in
fields as diverse as high-energy physics [68], dermatology [69],
and translation [70]. DEEPtox is one of the first methods using
Deep Learning for computational toxicity prediction [65].
DeepTox normalizes the chemical representations of the com-
pounds and computes a large number of chemical descriptors that
are used as input in machine learning methods. As a next step,
DeepTox trains several models, evaluates them, and combines the
best of them into ensembles. Finally, DeepTox predicts the toxicity
of new compounds. In DEEPTox SVMs, random forests, and elas-
tic nets are used for cross-checking, supplementing the Deep
Learning models, and for ensemble learning to complement Deep
Neural Networks (DNNs). The networks consist of multiple layers
of rectified linear units (ReLUs) to enforce sparse representations
and counteract the appearance of a vanishing gradient. ReLUs are
Predictive Systems Toxicology 543
followed by a final layer of sigmoid output units, one for each task.
One output unit is used for single-task learning. Stochastic gradi-
ent descent learning has been used to train the DNNs, and both
dropout and L2 weight decay were implemented for the DNNs in
the DeepTox pipeline for regularizing learning and avoiding over-
fitting. Of note is the fact that DEEPtox outperformed many other
computational approaches like naive Bayes, support vector
machines, and random forests in toxicity prediction of 12,000
environmental chemicals.
The output of all the abovementioned methods is a DTN, an
undirected bipartite network composed of two sets of nodes,
drugs and targets. DTN have a complex topology that reflects the
inherently rich polypharmacology of drugs (also known as drug
repurposing) [51]. The analysis of DTN has recently emerged as
an effective means to study targets and to identify new targets for
known drugs. In one of the very first attempts, Ma’ayan et al. [71]
reconstructed such a bipartite network, and the nodes have been
connected if there is an association between a drug and a target on
the basis of data from the Drug Bank. They report several classes
of proteins as better targets for drugs based on network statistics
and gene ontology. A decade later, Lin et al. [72] have followed
the same approach to studying the drug–target interaction and
could characterize the drug–target relations of different kinds of
drugs. They showed that the number of multitarget new molecu-
lar entities (NME) has increased over the years, but less than
single-target NMEs. In both these cases and several other cases in
the literature, it has proven useful to analyze the general structure
of a network in order to extract new knowledge facilitating the
classification of drugs and/or their targets. Structural (graphical)
analysis of a network provides insights into the organization and
topology of the DTN and targets for hypothesis generation and
experimental testing. As a rule this is performed through compu-
tation and analysis of network parameters–parameters that quan-
tify different aspects of the network’s internal structure, such as
parameters measuring centrality, a node or more global parame-
ters such as modularity index, network density, network entropy
or network diameter [36]. Several methods have been developed
and applied based on network topology, graph theory, and cluster
analysis (see [8] for a recent review). Methods based on the simi-
larity of networks are another set of techniques that have been
used to uncover novel target or disease-specific changes [73, 74].
A wide range of similarity measures have been used in the litera-
ture, ranging from intuitive measures such as the number of edge
changes required to get one network from another or the com-
parison of the top-k nodes to the more complicated ones, such as
using an ensemble of different model networks, and the distribu-
tion of the best-fitting ensemble. However, it should be kept in
mind that the fundamental question of checking whether two
544 Narsis A. Kiani et al.
Table 1
Top 50 pathways
Table 1
(continued)
Dream
Molecular
STRING
Filtering and
HPRD PPI-customized Heat diffusion
wieghting PPI
HI-II
Diffusion score
Normalizing and
Calculating Pharmacoligcal
Dream weighted
Pharmacological AUC (IC25-75) Score multiplication
Fig. 3 Drug profiling by (algorithmic) information indexes: (a) The molecular (chemical) graph of Atorvastatin
(C33H35FN2O5), a member of the drug class known as statins used primarily as a lipid-lowering agent for pre-
vention associated with treatment of cardiovascular diseases. (b) In a molecular network geographical coordi-
nates and shapes are no longer important, but rather their topology (which element is connected to which
other), which can be built upon (c) the molecular contact map where grey scale (left matrix) indicates proximity
between each element that can be binarized (right) using a cutoff value based on the grey scale median. (d)
Algorithmic information landscape of more than 4000 drugs from the DrugBank (extracted from the Wolfram
Language) constructed by taking the entropy of their SMILES codes, the valence sequences of each of the
elements in their formula (from SMILES), given the importance they have for bonding (e) Algorithmic informa-
tion landscape of the drugs involved in the DREAM challenge. Color is determined by the “contact map com-
plexity”; the less complex (the shorter the length of the algorithm generating it) the closer to blue, the longer
(more algorithmic-random) the closer to red
5 Concluding Remarks
Acknowledgments
References
1. Petrovska BB (2012) Historical review of scheme for generating chemical categories
medicinal plants’ usage. Pharmacogn Rev for read-across, structural alerts and insights
6:1–5 into mechanism(s) of action. Crit Rev Toxicol
2. Hunter P (2008) A toxic brew we cannot live 43:537–558
without. Micronutrients give insights into the 11. Gerner I, Barratt MD, Zinke S, Schlegel K,
interplay between geochemistry and evolu- Schlede E (2004) Development and prevali-
tionary biology. EMBO Rep 9:15–18 dation of a list of structure-activity relation-
3. Bottini AA, Amcoff P, Hartung T (2007) ship rules to be used in expert systems for
Food for thought … on globalisation of alter- prediction of the skin-sensitising properties
native methods. ALTEX 24:255–269 of chemicals. Altern Lab Anim ATLA
4. Adeleye Y, Andersen M, Clewell R et al 32:487–509
(2015) Implementing Toxicity Testing in the 12. Milan C, Schifanella O, Roncaglioni A,
21st Century (TT21C): making safety deci- Benfenati E (2011) Comparison and possible
sions using toxicity pathways, and progress use of in silico tools for carcinogenicity within
in a prototype risk assessment. Toxicology REACH legislation. J Environ Sci Health
332:102–111 Part C 29:300–323
5. Pease W (1997) Toxic ignorance: the con- 13. Ellison CM, Enoch SJ, Cronin MTD (2011)
tinuing absence of basic health testing for A review of the use of in silico methods to
top-selling chemicals in the United States. predict the chemistry of molecular initiating
Diane Pub. Co., Darby events related to drug toxicity. Expert Opin
6. Mak IW, Evaniew N, Ghert M (2014) Lost in Drug Metab Toxicol 7:1481–1495
translation: animal models and clinical trials in 14. Bhatia S, Schultz T, Roberts D, Shen J,
cancer treatment. Am J Transl Res 6:114–118 Kromidas L, Marie Api A (2015) Comparison
7. Hansch C, Maloney PP, Fujita T, Muir RM of Cramer classification between Toxtree, the
(1962) Correlation of biological activity OECD QSAR Toolbox and expert judgment.
of phenoxyacetic acids with Hammett sub- Regul Toxicol Pharmacol 71:52–62
stituent constants and partition coefficients. 15. Hardy B, Douglas N, Helma C et al (2010)
Nature 194:178–180 Collaborative development of predictive toxi-
8. Raies AB, Bajic VB (2016) In silico toxicol- cology applications. J Cheminform 2:7
ogy: computational methods for the predic- 16. Gallegos-Saliner A, Poater A, Jeliazkova N,
tion of chemical toxicity. Wiley Interdiscip Patlewicz G, Worth AP (2008) Toxmatch—a
Rev Comput Mol Sci 6:147–172 chemical classification and activity predic-
9. Lepoittevin J-P, Benezra C (1991) Allergic tion tool based on similarity measures. Regul
contact dermatitis caused by naturally occur- Toxicol Pharmacol 52:77–84
ring quinones. Pharm Weekbl Sci 13:119–122 17. Williams-DeVane CR, Wolf MA, Richard
10. Hewitt M, Enoch SJ, Madden JC, Przybylak AM (2009) DSSTox chemical-index files for
KR, Cronin MTD (2013) Hepatotoxicity: a exposure-related experiments in ArrayExpress
and Gene Expression Omnibus: enabling
554 Narsis A. Kiani et al.
tion by growth of DrugBank database. Brief 61. Yabuuchi H, Niijima S, Takematsu H, Ida T,
Bioinform 17:bbv094 Hirokawa T, Hara T, Ogawa T, Minowa Y,
47. Campbell SJ, Gaulton A, Marshall J, Bichko Tsujimoto G, Okuno Y (2014) Analysis of multi-
D, Martin S, Brouwer C, Harland L (2012) ple compound-protein interactions reveals novel
Visualizing the drug target landscape. Drug bioactive molecules. Mol Syst Biol 7:472–472
Discov Today 17:S3–S15 62. Perlman L, Gottlieb A, Atias N, Ruppin E,
48. Swamidass SJ (2011) Mining small-molecule Sharan R (2011) Combining drug and gene
screens to repurpose drugs. Brief Bioinform similarity measures for drug-target elucida-
12:327–335 tion. J Comput Biol 18:133–145
49. Moriaud F, Richard SB, Adcock SA, Chanas- 63. Yamanishi Y, Pauwels E, Kotera M (2012)
Martin L, Surgand J-S, Ben Jelloul M, Drug side-effect prediction based on the
Delfaud F (2011) Identify drug repurposing integration of chemical and biological spaces.
candidates by mining the Protein Data Bank. J Chem Inf Model 52:3284–3292
Brief Bioinform 12:336–340 64. Cobanoglu MC, Liu C, Hu F, Oltvai ZN,
50. Dobson CM (2004) Chemical space and biol- Bahar I (2013) Predicting drug-target inter-
ogy. Nature 432:824–828 actions using probabilistic matrix factoriza-
51. Vogt I, Mestres J (2010) Drug-target net- tion. J Chem Inf Model 53:3399–3409
works. Mol Inform 29:10–14 65. Mayr A, Klambauer G, Unterthiner T,
52. Forli S, Huey R, Pique ME, Sanner MF, Hochreiter S (2016) DeepTox: toxicity pre-
Goodsell DS, Olson AJ (2016) Computational diction using deep learning. Front Environ
protein–ligand docking and virtual drug Sci 3:80
screening with the AutoDock suite. Nat 66. Min S, Lee B, Yoon S (2016) Deep learn-
Protoc 11:905–919 ing in bioinformatics. Brief Bioinform
53. Xie L, Wang J, Bourne PE (2007) In silico 18(5):851–869
elucidation of the molecular mechanism 67. LeCun Y, Bengio Y, Hinton G (2015) Deep
defining the adverse effect of selective estro- learning. Nature 521:436–444
gen receptor modulators. PLoS Comput Biol 68. Baldi P, Sadowski P, Whiteson D (2014)
3:2324–2332 Searching for exotic particles in high-energy
54. Halperin I, Ma B, Wolfson H, Nussinov R physics with deep learning. Nat Commun
(2002) Principles of docking: an overview of 5:ncomms5308
search algorithms and a guide to scoring func- 69. Esteva A, Kuprel B, Novoa RA, Ko J,
tions. Proteins 47:409–443 Swetter SM, Blau HM, Thrun S (2017)
55. Shoichet BK, McGovern SL, Wei B, Irwin JJ Dermatologist-level classification of skin
(2002) Lead discovery using molecular dock- cancer with deep neural networks. Nature
ing. Curr Opin Chem Biol 6:439–446 542:115–118
56. Cheng AC, Coleman RG, Smyth KT, Cao Q, 70. Wu Y, Schuster M, Chen Z, Le QV, Norouzi M
Soulard P, Caffrey DR, Salzberg AC, Huang et al (2016) Google’s neural machine transla-
ES (2007) Structure-based maximal affinity tion system: bridging the gap between human
model predicts small-molecule druggability. and machine translation, arXiv:1609.08144v2
Nat Biotechnol 25:71–75 71. Ma’ayan A, Jenkins SL, Goldfarb J, Iyengar R
57. Wallach I, Jaitly N, Lilien R (2010) A (2007) Network analysis of FDA approved
structure-based approach for mapping adverse drugs and their targets. Mt Sinai J Med
drug reactions to the perturbation of underly- 74:27–32
ing biological pathways. PLoS One 5:e12063 72. Lin H-H, Zhang L-L, Yan R, Lu J-J, Hu Y
58. Huey R, Morris GM, Olson AJ, Goodsell DS (2017) Network analysis of drug-target inter-
(2007) A semiempirical free energy force field actions: a study on FDA-approved new molec-
with charge-based desolvation. J Comput ular entities between 2000 to 2015. Sci Rep
Chem 28:1145–1152 7:12230
59. Xia Z, Wu L-Y, Zhou X, Wong ST (2010) 73. McGary KL, Park TJ, Woods JO, Cha HJ,
Semi-supervised drug-protein interaction Wallingford JB, Marcotte EM (2010)
prediction from heterogeneous biological Systematic discovery of nonobvious human
spaces. BMC Syst Biol 4:S6 disease models through orthologous
60. van Laarhoven T, Nabuurs SB, Marchiori phenotypes. Proc Natl Acad Sci U S A
E (2011) Gaussian interaction profile ker- 107:6544–6549
nels for predicting drug–target interaction. 74. Korcsmáros T, Szalay MS, Rovó P, Palotai R,
Bioinformatics 27:3036–3043 Fazekas D, Lenti K, Farkas IJ, Csermely P,
556 Narsis A. Kiani et al.
evaluation of shannon entropy and local Random forest: a classification and regres-
estimations of algorithmic complexity, sion tool for compound classification and
arXiv:1609.00110v5 QSAR modeling. https://doi.org/10.1021/
102. Burden FR, Winkler DA (2015) Relevance CI034160G
vector machines: sparse classification 104. Lei T, Li Y, Song Y, Li D, Sun H, Hou T
methods for QSAR. J Chem Inf Model (2016) ADMET evaluation in drug discov-
55:1529–1534 ery: 15. Accurate prediction of rat oral acute
103. Svetnik V, Liaw A, Tong C, Christopher
toxicity using relevance vector machine and
Culberson J, Sheridan RP, Feuston BP (2003) consensus modeling. J Cheminform 8:6
Chapter 26
Abstract
Chemoinformatic methods, such as multivariable explorative techniques and quantitative structure–activ-
ity relationship (QSAR) modeling, allow for discovering relationships between the activity and the structure
of chemical compounds. These techniques can be applied, as preliminary screening methods for designing
and/or selecting new compounds with defined activity.
Here we describe step by step how to preliminarily screen ionic liquids (a set of 13 ILs) and assess
their cytotoxic activity against leukemia cell line IPC-81 as well as ILs’ potential to inhibit acetylcholines-
terase enzyme using the TRIC method (toxicity ranking index of cations) combined with the QSAR
approach.
Key words Ionic liquids, Multivariable explorative technique, TRIC, Quantitative structure–activity
relationship, Molecular descriptors
1 Introduction
Orazio Nicolotti (ed.), Computational Toxicology: Methods and Protocols, Methods in Molecular Biology, vol. 1800,
https://doi.org/10.1007/978-1-4939-7899-1_26, © Springer Science+Business Media, LLC, part of Springer Nature 2018
559
560 Anita Sosnowska et al.
2 Materials
3 Methods
3.1 Dataset 1. First, determine the set of ILs that has to be examined. Here
we present the study for 13 ionic liquids with known activ-
ity in order to verify the accuracy of the analysis (IL1—1-
ethyl-3-methylimidazolium bis(pentafluoroethyl)phosphinate;
IL2—butylethyldimethylammonium bis(trifluoromethylsulfonyl)
amide; IL3—1-(3-methoxypropyl)-3-methylimidazolium chloride;
IL4—1-butyl-3-methylimidazolium trifluoromethanesulfonate;
IL5—1-butyl-3 methylimidazolium 1-octylsulfate; IL6—1-hexyl-
3-methylimidazolium 1,1-dioxo-1,2-dihydrobenzo[d]isothiazol-
3-onate; IL7—4 (dimethylamino)-1-butylpyridinium bis(trifluoro
methylsulfonyl)amide; IL8—1-heptyl-3-methylimidazolium tetra
fluoroborate; IL9—3 methyl-1-octylimidazolium chloride;
IL10—3-methyl-1-nonylimidazolium tetrafluoroborate; IL11—
1-decyl-3-methylimidazolium chloride; IL12—1-hexadecyl-3-
methylimidazoliumchloride;IL13—3-methyl-1-octadecylimidazolium
chloride).
2. Try to arrange the data in a clear and manageable manner. We
recommend creating a calculation worksheet (like Excel) and
store all the data in Table 1 (see Note 1).
Table 1
Scheme presenting a convenient way of data preparation
Descriptors
3.3 TRIC Analysis 1. For screening and choosing the least toxic ILs, screening tool
should be used. For this purpose we recommend TRIC metric
[18], which was created for various endpoints and therefore is
capable of predicting toxic activity of ionic liquid against
various organisms (see Note 9).
2. In order to use TRIC, simply multiply the vector of WHIM
descriptors by the vector of loadings. Here, we present the
Chemoinformatic Approach to Assess Toxicity of Ionic Liquids 563
Fig. 1 Ionic liquids arranged by increasing value of TRIC metric. Blue color represents the ILs with the smallest
TRIC value selected for further analyses; red color represents all the other ILs
values of loadings vector, so that user does not have to look for
it in the literature: 0.172; 0.113; 0.107; 0.110; −0.098;
−0.057; −0.034; −0.049; −0.026; −0.126; 0.007; 0.168;
0.119; 0.113; 0.094; −0.082; −0.057; −0.033; −0.044;
0.069; 0.044; 0.087; 0.172; 0.114; 0.108; 0.110; −0.097;
−0.057; −0.036; −0.052; −0.011; −0.117; 0.017; 0.171;
0.113; 0.103; 0.106; −0.094; −0.063; −0.032; −0.049;
0.053; −0.087; 0.024; 0.172; 0.113; 0.108; 0.111; −0.098;
−0.057; −0.035; −0.048; −0.035; −0.124; 0.007; 0.176;
0.173; 0.176; 0.175; 0.176; 0.167; 0.164; 0.167; 0.165;
0.167; −0.069; −0.073; 0.111; 0.095; 0.110; 0.107; 0.111;
−0.091; 0.095; −0.067; −0.003; −0.091; 0.163; 0.164;
0.163; 0.163; 0.163 (see Note 10).
3. Arrange the ILs according to the values of the multiplication
result. Those values represent relative toxicity of each
IL. Higher values mean higher toxicity. The result for tested
set of 13 ILs is presented on the Fig. 1.
4. Choose ILs with the lowest TRIC values (most promising for
eventual application) for further analyses. In the studied case,
these are IL1–IL5.
3.4 QSAR Predictions 1. This step covers quantitative prediction of toxicity values for
each IL. In order to do so, choose appropriate QSAR models
(see Note 11). In studied case, we present use of two models
capable of predicting ILs’ toxicity against IPC-81 cells in via-
bility test and inhibition of acetylocholinesterase test
respectively.
2. In order to determine ILs’ toxicity against IPC-81 leukemia
cells, substitute descriptors’ values to the QSAR equation
(here we provide model’s equation to make the use of this
564 Anita Sosnowska et al.
h = x kT ( X T X ) x k
−1
Fig. 2 Insubria plot showing the relation of five tested ILs with the AD QSAR
model for toxicity against IPC-81 leukemia cells
=
Model statistics : R 2 0=
.76; RMSEC 0.31; Q 2CV
= 0= .76; RMSECV 0.33; Q 2 F 2
= 0= .71; RMSEExt 0.38;
Fig. 3 Insubria plot showing the relation of five tested ILs with the AD QSAR
model for inhibition of acetylocholinesterase
Table 2
Obtained results for five ILs with assigned ranks (color green indicates IL which shows the least
toxicity in considered tests)
Observed Predicted Ranks
AChE Test test AChE Test test TRIC AChE Test test
7. Gather all predicted data (from TRIC analysis and QSAR pre-
dictions) in one table.
Create rules according to which you can choose the most
appropriate IL for the further use. In considered case, we assigned
ranks to the qualitative predictions for studied set of ionic liquids
(Table 2). After that, we choose the one with mean lowest value:
IL3—1-(3-methoxypropyl)-3-methylimidazolium chloride
(see Notes 15 and 16).
Chemoinformatic Approach to Assess Toxicity of Ionic Liquids 567
Fig. 4 Observed versus predicted values of toxicity for used QSAR models: Left panel—QSAR model for toxicity
against IPC-81 leukemia cells; right panel—inhibition of acetylocholinesterase. Blue color represents the ILs
with the smallest TRIC value selected for further analyses; red color represents all other analyzed ILs
To show that the chosen ILs were in fact those one with the
lowest toxicity we present the plots of experimental values vs. pre-
dicted by QSAR models for all studied 13 ionic liquids in both
viability and inhibition tests (Fig. 4).
4 Notes
Fig. 5 Schematic Insubria plot for evaluating whether the compound is in the AD
of the QSAR model
Acknowledgments
References
23. Dragon Software for Molecular Descriptor 26. O’Boyle NM, Banck M, James CA, Morley C,
Calculation hwtmi, Milano, 2014 Vandermeersch T, Hutchison GR (2011) Open
24. Yap CW (2011) PaDEL-descriptor: an open babel: an open chemical toolbox. J Cheminform
source software to calculate molecular descrip- 3:33
tors and fingerprints. J Comput Chem 27. Schaftenaar G, Noordik JH (2000) Molden: a
32:1466–1474 pre- and post-processing program for molecu-
25. Stewart JJP (2012) Stewart computational lar and electronic structures. J Comput Aided
chemistry, Colorado Springs, CO, USA Mol Des 14:123–134
Chapter 27
Abstract
Quantitative structure–activity relationships (QSARs) for prediction of toxicological endpoints built up
with the CORAL software are discussed. Prejudices related to these QSAR models are listed. Possible ways
to improve the software are discussed.
Key words Toxicity, QSAR, SMILES, Monte Carlo technique, CORAL software, Mutagenicity,
Psychotropic drug toxicity, Drug inhibitor activity
1 Introduction
Orazio Nicolotti (ed.), Computational Toxicology: Methods and Protocols, Methods in Molecular Biology, vol. 1800,
https://doi.org/10.1007/978-1-4939-7899-1_27, © Springer Science+Business Media, LLC, part of Springer Nature 2018
573
574 Andrey A. Toropov et al.
4 CORAL Software
5 Prejudices
5.1 Prejudices (a) The prejudice that QSAR is a tool to describe only more or less
Related to QSAR nonspecific effects. In other words, the opinion that results of
in General experiments related to toxic effects cannot be predicted with-
out direct experiments exists and remains very popular [47].
(b) Research works in the field of QSAR analysis are not
standardized.
(c) The statistical quality of a QSAR model can be an artifact.
5.2 Prejudices (a) Any kind of toxicity is a confused value depending upon many
Related to QSAR factors, some of which are unruled (weather conditions, health
for Toxicity status of organisms, etc.).
576 Andrey A. Toropov et al.
(b) The accuracy of a QSAR prediction for any toxicity is not com-
parable with the accuracy for corresponding experiment. The
modern paradigm believes that it is necessary to develop more
rapid tests. In addition, it is necessary to use primitive organ-
isms (bacteria) in order to avoid the ethical problems.
Unfortunately, this can lead to reducing of heuristic potential
of humanity in the field of toxicology [47].
6 Selection of Endpoints
6.1 Endpoint-1: Data on mutagenic potentials of the set of 95 aromatic and hetero-
Mutagenicity aromatic amines were taken from the literature [48]. The muta-
of Polyaromatic genic activity in Salmonella typhimurium TA98 + S9 microsomal
Amines reparation is expressed as the natural logarithm of R, where R is the
number of revertants per nanomole [49].
6.2 Endpoint-2: Numerical data on toxicity (oral LD50, mg/kg, mice) of psycho-
Toxicity of tropic drugs (phenothiazines, antidepressants, and anxiolitics)
Psychotropic Drugs were taken from the published work [35]. These values were con-
verted into pLD50, i.e., negative decimal logarithm of the LD50
expressed in mmol/kg.
6.3 Endpoint-3: Quantitative data (i.e., yes = 1 and no = −1) for inhibitory activity
Inhibitor Activity against monoamine oxidase B (human) for 1523 compounds were
Against Monoamine taken in the literature [50].
Oxidase B
7 QSAR
7.1 QSAR Model QSAR model for mutagenicity of polyaromatic amines character-
for Mutagenicity ized by strange situation: the statistical characteristics of the model
of Polyaromatic for training set are poorer than statistical characteristics for calibra-
Amines tion and validation set. However, this is the result of “natural”
process: building up model is stopped when the maximal quality
for the calibration set is reached (Fig. 1).
7.2 QSAR Model QSAR model for toxicity of psychotropic drugs is characterized by
for Toxicity usual situation: the statistical characteristics of the model for train-
of Psychotropic Drugs ing set are better than statistical characteristics for calibration and
validation set (Fig. 2).
Prediction of Biochemical Endpoints by the CORAL Software: Prejudices, Paradoxes… 577
Fig. 1 The CORAL method used for endpoint-1 and its statistical characteristics
Fig. 2 The CORAL method used for endpoint-2 and its statistical characteristics
578 Andrey A. Toropov et al.
Table 1
Statistical quality of binary classification of endpoint-3
Fig. 3 The CORAL method used for inhibitor activity against monoamine oxidase B and statistical characteris-
tics for this model in terms of semicorrelations
7.3 Binary Table 1 lists the statistical characteristics for binary classification of
Classification endpoint-3 (inhibitor activity against monoamine oxidase B).
for Inhibitor Activity Figure 3 shows the statistical quality of semicorrelation used to
Against Monoamine build up the model.
Oxidase B
8 Paradoxes
9 Results
11 Conclusions
Acknowledgments
References
1. Toropov AA, Toropova AP, Lombardo A, prediction by means of the Monte Carlo
Roncaglioni A, Benfenati E, Gini G (2011) method. Ecotoxicol Environ Saf 133:390–394
CORAL: building up the model for biocon- 12. Toropova AP, Toropov AA (2017) CORAL:
centration factor and defining it’s applicability binary classifications (active/inactive) for drug-
domain. Eur J Med Chem 46:1400–1403 induced liver injury. Toxicol Lett 268:51–57
2. Roncaglioni A, Toropov AA, Toropova AP, 13. Park H-G, Yeo M-K (2013) Ecotoxicity esti-
Benfenati E (2013) In silico methods to pre- mation of hazardous air pollutants emitted
dict drug toxicity. Curr Opin Pharmacol from semiconductor manufacturing processes
13:802–806 utilizing QSAR. Bull Kor Chem Soc
3. Toropova AP, Toropov AA, Lombardo A, 34(12):3755–3761
Roncaglioni A, Benfenati E, Gini G (2012) 14. Tong J, Li L, Bai M, Li K (2017) A new
CORAL: QSAR models for acute toxicity in descriptor of amino acids-SVGER and its appli-
fathead minnow (Pimephales promelas). cations in peptide QSAR. Mol Inform
J Comput Chem 33:1218–1223 36(5):1501023
4. Schleifer K-J (2013) Computational approaches 15. Algamal ZY, Lee MH (2017) A new adaptive
in agricultural research. Chapter 2, In book: L1-norm for optimal descriptor selection of
Jeschke P, Krämer W, Schirmer U, Witschel M high-dimensional QSAR classification model
(eds) Modern methods in crop protection for anti-hepatitis C virus activity of thiourea
research, pp 21–41. Wiley-VCH: Verlag&C0. derivatives. SAR QSAR Environ Res 28:75–90
KGaA, Boschstr.12, 69469 Weinheim, 16. Bigdeli A, Hormozi-Nezhad MR, Jalali-Heravi
Germany. Print ISBN: 978-3-527- 33175-8 M, Abedini MR, Sharif-Bakhtiar F (2014)
ePub ISBN: 978-3-527-65592-2 Towards defining new nano-descriptors:
5. Ghorbanzadeh M, Zhang J, Andersson PL extracting morphological features from trans-
(2016) Binary classification model to predict mission electron microscopy images. RSC Adv
developmental toxicity of industrial chemicals 4:60135–60143
in zebrafish. J Chemom 30:298–307 17. Masand VH, Rastija V (2017) PyDescriptor: a
6. Toropov AA, Toropova AP, Marzo M, Dorne new PyMOL plugin for calculating thousands
JL, Georgiadis N, Benfenati E (2017) QSAR of easily understandable molecular descriptors.
models for predicting acute toxicity of pesti- Chemom Intell Lab Syst 169:12–18
cides in rainbow trout using the CORAL soft- 18. Basak SC (2017) The expanding landscape of
ware and EFSA’s OpenFoodTox database. graph theoretic molecular descriptors: develop-
Environ Toxicol Pharmacol 53:158–163 ment, gradual diversification of descriptor
7. Hisaki T, Kaneko MAN, Yamaguchi M, Sasa space, and applications in QSAR/QMSA and
H, Kouzuki H (2015) Development of qsar new drug discovery. Curr Comput Aided Drug
models using artificial neural network analysis Des 13:172–176
for risk assessment of repeated-dose, reproduc- 19. Cassano A, Robinson RLM, Palczewska A,
tive, and developmental toxicities of cosmetic Puzyn T, Gajewicz A, Tran L, Manganelli S,
ingredients. J Toxicol Sci 40:163–180 Cronin MTD (2016) Comparing the CORAL
8. Burello E (2017) Review of (Q)SAR models and random forest approaches for modelling
for regulatory assessment of nanomaterials the in vitro cytotoxicity of silica nanomaterials.
risks. NanoImpact 8:48–58 ATLA Altern Lab Anim 44:533–556
9. Gobbi M, Beeg M, Toropova MA, Toropov 20. Cronin MTD, Schultz TW (2003) Pitfalls in
AA, Salmona M (2016) Monte Carlo method QSAR. J Mol Struct THEOCHEM
for predicting of cardiac toxicity: hERG blocker 622:39–51
compounds. Toxicol Lett 250–251:42–46 21. Schultz TW, Cronin MTD, Netzeva TI (2003)
10. Sokolović D, Aleksić D, Milenković V, Karaleić The present status of QSAR in toxicology.
S, Mitić D, Kocić J, Mekić B, Veselinović JB, J Mol Struct THEOCHEM 622:23–38
Veselinović AM (2016) QSAR modeling of 22. Doweyko AM (2008) QSAR: dead or alive?
bis-quinolinium and bis-isoquinolinium com- J Comput Aided Mol Des 2:81–89
pounds as acetylcholine esterase inhibitors
based on the Monte Carlo method—the impli- 23. Gajewicz A (2017) What if the number of
cation for Myasthenia gravis treatment. Med nanotoxicity data is too small for developing
Chem Res 25:2989–2998 predictive Nano-QSAR models? An alternative
read-across based approach for filling data
11. Toropov AA, Toropova AP, Cappellini L, gaps. Nanoscale 9:8435–8448
Benfenati E, Davoli E (2016) Odor threshold
Prediction of Biochemical Endpoints by the CORAL Software: Prejudices, Paradoxes… 581
24. van Leeuwen K, Schultz TW, Henry T, tionships. Curr Comput Aided Drug Des
Diderich B, Veith GD (2009) Using chemical 12:181–250
categories to fill data gaps in hazard assess- 37. Raevsky OA, Razdolskii AN, Liplavskii YV,
ment. SAR QSAR Environ Res 20:207–220 Raevskaya OE, Yarkov AV (2012) Molecular-
25. Auerbach M, Macdougall I (2017) The avail- biological problems of drug design and mecha-
able intravenous iron formulations: history, nism of drug action: acute toxicity evaluation
efficacy, and toxicology. Hemodial Int upon intravenous injection into mice: interspe-
21:S83–S92 cies correlations, lipophilicity parameters, and
26. Campbell ND (2016) Behavior within fortu- physicochemical descriptors. Pharm Chem
itous environments: the entwined history of J 46:69–74
division 28 and the fields of behavioral pharma- 38. Furtula B, Gutman I (2011) Relation between
cology and toxicology. Exp Clin second and third geometric-arithmetic indices
Psychopharmacol 24:209–213 of trees. J Chemom 25:87–91
27. Satoh T (2016) History of Japanese society of 39. Mercader A, Castro EA, Toropov AA (2001)
toxicology. J Toxicol Sci 41:SP1–SP9 Calculation of total molecular electronic ener-
28. Toropov AA, Rasulev BF, Leszczynski J (2007) gies from correlation weighting of local graph
QSAR modeling of acute toxicity for nitroben- invariants. J Mol Model 7:1–5
zene derivatives towards rats: comparative anal- 40. Morrill JA, Topczewski JJ, Lodge AM, Yasapala
ysis by MLRA and optimal descriptors. QSAR N, Quinn DM (2015) Development of quanti-
Comb Sci 26:686–693 tative structure activity relationships for the
29. Toropova AP, Toropov AA, Martyanov SE, binding affinity of methoxypyridinium cations
Benfenati E, Gini G, Leszczynska D, for human acetylcholinesterase. J Mol Graph
Leszczynski J (2012) CORAL: QSAR model- Model 62:181–189
ing of toxicity of organic chemicals towards 41. Mansouri K, Consonni V, Durjava MK, Kolar
Daphnia magna. Chemom Intell Lab Syst B, Öberg T, Todeschini R (2012) Assessing
110:177–181 bioaccumulation of polybrominated diphenyl
30. Gramatica P, Cassani S, Roy PP, Kovarich S, ethers for aquatic species by QSAR modeling.
Yap CW, Papa E (2012) QSAR modeling is not Chemosphere 89:433–444
“push a button and find a correlation”: a case 42. Diaza RG, Manganelli S, Esposito A,
study of toxicity of (Benzo-)triazoles on algae. Roncaglioni A, Manganaro A, Benfenati E
Mol Inform 31:817–835 (2015) Comparison of in silico tools for evalu-
31. Toropov AA, Benfenati E (2007) SMILES as ating rat oral acute toxicity. SAR QSAR
an alternative to the graph in QSAR modelling Environ Res 26:1–27
of bee toxicity. Comput Biol Chem 31:57–60 43. Basant N, Gupta S (2017) QSAR modeling for
32. Wang X, Greene N (2012) Comparing mea- predicting mutagenic toxicity of diverse chemi-
sures of promiscuity and exploring their rela- cals for regulatory purposes. Environ Sci Pollut
tionship to toxicity. Mol Inform 31:145–159 Res 24:14430–14444
33. Venkatapathy R, Wang CY, Bruce RM, 44. Toropov AA, Benfenati E (2008) Additive
Moudgal C (2009) Development of quantita- SMILES-based optimal descriptors in QSAR
tive structure-activity relationship (QSAR) modelling bee toxicity: using rare SMILES
models to predict the carcinogenic potency of attributes to define the applicability domain.
chemicals. I. Alternative toxicity measures as an Bioorg Med Chem 16:4801–4809
estimator of carcinogenic potency. Toxicol 45. Toropov AA, Toropova AP, Benfenati E,
Appl Pharmacol 234:209–221 Manganaro A (2009) QSPR modelling of
34. Gironés X, Carbó-Dorca R (2006) Modelling enthalpies of formation for organometallic
toxicity using molecular quantum similarity compounds by SMART-based optimal descrip-
measures. QSAR Comb Sci 25:579–589 tors. J Comput Chem 30:2576–2582
35. Gissi A, Toropov AA, Toropova AP, Nicolotti 46. Toropov AA, Toropova AP, Benfenati E (2010)
O, Carotti A, Benfenati E (2014) Building up QSAR-modeling of toxicity of organometallic
QSAR model for toxicity of psychotropic drugs compounds by means of the balance of correla-
by the Monte Carlo method. Struct Chem tions for InChI-based optimal descriptors. Mol
25:1067–1073 Divers 14:183–192
36. Sahoo S, Adhikari C, Kuanar M, Mishra BK 47. Rotini A, Manfra L, Spanu F, Pisapia M, Cicero
(2016) A short review of the generation of AM, Migliore L (2017) Ecotoxicological
molecular descriptors and their applications in method with marine bacteria Vibrio anguilla-
quantitative structure property/activity rela- rum to evaluate the acute toxicity of environ-
mental contaminants. J Vis Exp 123:e55211
582 Andrey A. Toropov et al.
48. Toropov AA, Toropova AP, Martyanov SE, ture-activity relationships in drug design, pre-
Benfenati E, Gini G, Leszczynska D, dictive toxicology, and risk assessment. IGI
Leszczynski J (2011) Comparison of SMILES Global, Hershey, PA, pp 560–585
and molecular graphs as the representation of 59. Rescifina A, Floresta G, Marrazzo A, Parenti C,
the molecular structure for QSAR analysis for Prezzavento O, Nastasi G, Dichiara M, Amata
mutagenic potential of polyaromatic amines. E (2017) Development of a Sigma-2 receptor
Chemom Intell Lab Syst 109:94–100 affinity filter through a Monte Carlo based
49. Cash GG (2001) Prediction of the genotoxic- QSAR analysis. Eur J Pharm Sci 106:94–101
ity of aromatic and heteroaromatic amines 60. Rescifina A, Floresta G, Marrazzo A, Parenti C,
using electrotopological state indices. Mutat Prezzavento O, Nastasi G, Dichiara M, Amata
Res Genet Toxicol Environ Mutagen E (2017) Sigma-2 receptor ligands QSAR
491:31–37 model dataset. Data Brief 13:514–535
50. Lorenzo VP, Filho JMB, Scotti L, Scotti MT 61. Toropova MA, Raska I, Toropova AP, Raskova
(2015) Combined structure- and ligand-based M (2017) CORAL software: analysis of impacts
virtual screening to evaluate caulerpin analogs of pharmaceutical agents upon metabolism via
with potential inhibitory activity against mono- the optimal descriptors. Curr Drug Metab
amine oxidase B. Braz J Pharmacogn 18:500–510
25:690–697 62. Sokolović D, Ranković J, Stanković V,
51. IRFMN (2017) http://www.insilico.eu/coral. Stefanović R, Karaleić S, Mekić B, Milenković
Accessed 14 Sept 2017 V, Kocić J, Veselinović AM (2017) QSAR
52. Toropova AP, Toropov AA, Diaza RG, study of dipeptidyl peptidase-4 inhibitors based
Benfenati E, Gini G (2011) Analysis of the co- on the Monte Carlo method. Med Chem Res
evolutions of correlations as a tool for QSAR- 26:796–804
modeling of carcinogenicity: an unexpected 63. Kumar A, Chauhan S (2017) QSAR differen-
good prediction based on a model that seems tial model for prediction of SIRT1 modulation
untrustworthy. Cent Eur J Chem 9:165–174 using Monte Carlo method. Drug Res
53. Toropova AP, Toropov AA, Benfenati E, Gini 67:156–162
G (2011) Co-evolutions of correlations for 64. Kumar A, Chauhan S (2017) Use of the Monte
QSAR of toxicity of organometallic and inor- Carlo method for OECD principles-guided
ganic substances: an unexpected good predic- QSAR modeling of SIRT1 inhibitors. Arch
tion based on a model that seems untrustworthy. Pharm (Weinheim) 350(1):e1600268
Chemom Intell Lab Syst 105:215–219 65. Sokolović D, Stanković V, Toskić D, Lilić L,
54. Pajares F, Hartley J, Valiante G (2001) Ranković G, Ranković J, Nedin-Ranković G,
Response format in writing self-efficacy assess- Veselinović AM (2016) Monte Carlo-based
ment: greater discrimination increases predic- QSAR modeling of dimeric pyridinium com-
tion. Meas Eval Couns Dev 33:214–221 pounds and drug design of new potent acetyl-
55. Goodarzi M, Freitas MP, Ferreira EB (2009) choline esterase inhibitors for potential therapy
Influence of changes in 2-D chemical structure of myasthenia gravis. Struct Chem
drawings and image formats on the prediction 27:1511–1519
of biological properties using MIA- 66. Aranda JF, Garro Martinez JC, Castro EA,
QSAR. QSAR Comb Sci 28:458–464 Duchowicz PR (2016) Conformation-
56. Achary PGR (2014) QSPR modelling of independent QSPR approach for the soil sorp-
dielectric constants of π-conjugated organic tion coefficient of heterogeneous compounds.
compounds by means of the CORAL software. Int J Mol Sci 17(8):1247
SAR QSAR Environ Res 25:507–526 67. Islam MA, Pillay TS (2016) Simplified molecu-
57. Veselinović AM, Milosavljević JB, Toropov AA, lar input line entry system-based descriptors in
Nikolić GM (2013) SMILES-based QSAR QSAR modeling for HIV-protease inhibitors.
model for arylpiperazines as high-affinity Chemom Intell Lab Syst 153:67–74
5-HT1A receptor ligands using CORAL. Eur 68. Ghaedi A (2015) Predicting the cytotoxicity of
J Pharm Sci 48:532–541 ionic liquids using QSAR model based on
58. Toropov AA, Toropova AP, Benfenati E, SMILES optimal descriptors. J Mol Liq
Nicolotti O, Carotti A, Nesmerak K, Veselinovic 208:269–279
AM, Veselinovic JB, Duchowicz PR, Bacelo 69. Fioressi SE, Bacelo DE, Cui WP, Saavedra LM,
DE, Castro EA, Rasulev BF, Leszczynska D, Duchowicz PR (2015) QSPR study on refrac-
Leszczynski J (2015) QSPR/QSAR analyses tive indices of solvents commonly used in poly-
by means of the CORAL software: results, mer chemistry using flexible molecular
challenges, perspectives. In: Quantitative struc-
Prediction of Biochemical Endpoints by the CORAL Software: Prejudices, Paradoxes… 583
Orazio Nicolotti (ed.), Computational Toxicology: Methods and Protocols, Methods in Molecular Biology, vol. 1800,
https://doi.org/10.1007/978-1-4939-7899-1, © Springer Science+Business Media, LLC, part of Springer Nature 2018
585
Computational Toxicology: Methods and Protocols
586 Index
Exposures���������������������������� 74, 108, 109, 114, 219, 221–225, M
227, 235, 314, 341, 358–360, 376, 379, 381–383, 387,
403–405, 407, 408, 412–414, 416, 435, 437, 461, 468, Machine learning�����������������������������������14, 38, 48, 80, 88, 96,
469, 477, 480, 489, 494, 519, 520, 536, 538, 539, 574 119–133, 154, 158, 173, 174, 194, 276, 301, 452, 475,
497, 541, 542, 550–552
F Mathematical chemistry���������������������� 60, 161, 434, 458, 560
Metals�������������������������������� 185, 283, 334, 335, 352, 357, 359,
Food safety������������������������������������������86, 219–227, 429, 467
368, 381, 425, 427, 486, 488
G Molecular descriptor���������������������������������� 3, 6, 16, 23, 35, 42,
79, 87, 89, 120, 122, 125–129, 172–175, 178, 187, 288,
Gene network��������������������������������������������476, 478, 479, 515 298, 299, 301, 304, 357, 423, 424, 476, 479, 529, 537,
Gene regulation����������������������������������������������������������������509 552, 560–562, 574
Genotoxicity��������������������������������61, 122, 125, 126, 225, 237, Molecular docking��������������119, 181–195, 315, 318, 332, 335
238, 372, 373, 379, 412, 447–470 Molecular similarity�������������������������� 10, 34, 35, 38, 171–178,
330, 522, 523, 525, 526, 531, 532
H Molecular toxicity�������������������������������������������������������������278
Hepatotoxicity���������������������������������� 122, 123, 125, 127, 240, Monte Carlo technique������������������������������������ 257, 575, 579
249, 372–377, 380, 381, 383, 505–518 Multi-kernel support vector machine����������������������� 476, 481
High-throughput screening (HTS)����������������������56, 74, 133, Multitarget������������������������������������������������327–342, 386, 543
183, 236, 275–284, 287, 314, 328, 407, 530 Multivariable explorative technique����������������������������������559
Hit identification������������������������������������������������������236–237 Mutagenicity���������������������������������57, 60, 121, 125–129, 131,
Human health���������������������������������58, 62, 70, 107, 110, 111, 240, 278, 371–373, 412, 433, 448–450, 453–457, 459,
220, 382, 386, 395, 409, 412, 415, 418, 420, 429, 432, 460, 462, 470, 537, 576, 578
447, 520 Mycotoxins������������������������������������������������������� 368, 381, 383
I N
In chemico�������������������������������������������������� 65, 67, 69, 72, 73, Nano-formulation����������������������������������������������������347–361
453, 485–501 Nanotechnology������������������������������������������������������� 357, 360
Induction���������������������������������90–95, 99, 100, 103, 222, 303, Network pharmacology����������������������������������������������������544
358, 359, 448, 489, 494–496, 499, 500 Neural networks��������������������������������� 84, 120, 128–132, 154,
In silico�����������������������������55, 60, 62, 65, 66, 72, 73, 142, 145, 435, 530, 532, 542
171, 199, 201, 215, 234, 235, 241, 242, 253, 255, 277, Non-testing methods (NTM)��������������������86, 172, 199–215,
288, 291, 300, 303, 306, 307, 314, 320, 321, 323, 324, 448, 465, 467
340, 347, 384, 398, 407, 412, 418–424, 438, 447, 451, Nutraceuticals������������������������������������367, 369–370, 372–374
454, 455, 468, 470, 498–501, 519–533, 537, 538, 541
In vitro��������������������������56, 59, 63, 65–67, 69, 72–74, 90, 102,
O
109, 111, 183, 224, 225, 233–239, 241, 242, 300, 315, OECD QSAR Toolbox��������������������������������55–75, 455, 458,
316, 320, 321, 323, 336, 342, 347–349, 356–358, 377, 460–465, 522
379, 384, 385, 402, 404, 407, 418, 448–450, 453, 454, Omics�������������������������������������������������� 56, 384, 517, 538, 539
457, 459, 462, 465, 467, 476, 485–501, 521, 541, 544
Ion channels����������������������������������������������������� 238, 313–324 P
Ionic liquids (ILs)������������������������������������� 425–427, 559–569
Patch clamp�����������������������������������������������315, 316, 319, 321
K Pesticides��������������������������� 129, 142, 220, 221, 224–226, 242,
368, 381, 382, 413, 415, 426, 430, 461, 464, 467, 469
Key event (KE)������������������������������� 62–64, 72, 132, 224–226, Pharmaceuticals�������������������������������������� 7, 80, 159, 233, 275,
485–488, 491–494, 498 291, 324, 330, 347, 367, 395, 407, 424, 469, 536
Pharmacovigilance����������������������������������������������������315–320
L Plant alkaloids������������������������������������������������������������������368
Lead identification���������������������������������������������������237–240 Point of departure����������������������������������������������������� 222, 227
Lead optimization�������������������������������12, 182, 235, 237–240 Predictive modeling������������� 81, 89, 91, 92, 97, 101, 128, 129,
Library design����������������������������������������������������������275–284 141, 182, 253, 255, 258, 259, 262–264, 266–268, 288,
Ligand mobility������������������������������������������������254–256, 258, 291, 300–304, 306, 309, 310, 384, 419, 434, 550, 579
259, 268–271 Predictive toxicology���������������73, 74, 127, 181–195, 357, 459
Computational Toxicology: Methods and Protocols
587
Index
Property space�������������������������������������������236, 250–259, 271 Skin sensitization����������������������������������� 60, 64, 65, 109, 110,
Psychotropic drug toxicity����������������������������������������576–578 485, 486, 488, 491–494, 496–501
Structure-activity�����������������������������������3, 6, 55, 56, 109, 119,
Q 120, 141, 171, 235, 242, 271, 288, 300, 319, 333, 357,
QSAR models���������������������������������11, 58, 80, 114, 119, 141, 397, 412, 423, 433, 435, 448, 560
172, 174, 199, 235, 288, 357, 398, 452, 515, 520, 537, Support vector machines (SVM)���������������� 84, 101, 125, 126,
560, 574 130, 132, 158, 173, 475, 478, 479, 481, 482, 542, 543
Quantitative structure-activity relationship Systems biology������������������������������������������56, 407, 505–518,
(QSAR)���������������������������������4, 55, 79, 119, 141, 171, 538–540, 552
172, 174, 199, 235, 271, 288, 328, 357, 397, 451, 482,
T
515, 520, 537, 560, 573
Toxicity������������������������������ 4, 56, 80, 110, 119, 171, 200, 220,
R 233, 246, 275, 300, 314, 327, 340, 347, 370, 380, 396,
Random forest (RF)������������������������� 101, 127, 128, 130–132, 451, 475, 512, 519, 560, 574
173, 301, 305, 307, 331, 475, 498, 530, 542, 543, 552 Toxicity prediction�������������������������������� 7, 130, 131, 133, 171,
Read-across����������������������� 7, 55, 58–63, 66–68, 72, 109–111, 278, 356, 398, 417–421, 426, 432, 436, 438, 454–455,
171, 172, 174, 176–177, 199, 201–203, 207, 213, 237, 465, 505–518, 537, 542, 543, 545, 552
419, 436, 458, 465, 467, 520, 522–527, 530, 532 Toxicity ranking index of cations (TRIC)����������������������� 560,
Registration, Evaluation, Authorisation and Restriction of 562–563, 566–569
Chemicals (REACH)�����������������������80, 86, 107–115, Toxicity testing models�����������������������������������������������������536
129, 414, 416, 430, 451, 454, 465–467, 520, 521, 532 Toxicology������������������������� 7, 56, 81, 119, 171, 181, 226, 234,
Reliability�������������������������������� 5, 41, 63, 69, 71, 90, 112, 113, 247, 278, 298, 323, 340, 360, 370, 459, 535–553, 574
142–144, 158, 159, 161, 165, 167, 183, 186, 195, 200, Toxic potential������������������������������������������368, 370–380, 386,
201, 250, 255, 258, 261, 268, 270, 332, 356, 527, 528, 387, 421, 463, 574
530–532, 540, 564
V
Risk management�����������������������108, 405, 409, 410, 416, 432
Validation��������������������������������������13, 14, 16, 60, 66, 67, 72, 80,
S 86–88, 95–102, 125, 141–143, 176, 182–184, 188, 189,
Safe nutraceuticals�������������������������������������������� 380, 382, 384 275–284, 288, 301, 307, 315, 321, 419, 423, 453, 458,
Safety������������������������55, 56, 59, 112, 142, 167, 211, 219–227, 459, 476, 488, 490, 492–498, 520, 525, 526, 529, 530,
233–235, 238, 240, 242, 275, 313, 340, 347–361, 368, 575, 576, 578, 579
380, 382, 384, 387, 408–410, 412, 415, 416, 418, 462, Virtual screening�������������������������������� 86, 236, 250, 277, 288,
463, 467, 468, 522, 536, 545 291, 302, 306, 309, 330, 337, 516, 541
Similarity searching�������������������������������������������� 70, 238, 277
W
Simplified Molecular Input Line Entry System
(SMILES)���������������������������13, 17–19, 21–23, 46, 69, Waste management����������������������������������������������������������405
84, 112, 183, 280, 281, 289–294, 296, 297, 302, 303, Weight of evidence (WoE)���������������������63, 65–69, 109–112,
306, 478, 550, 551, 561, 567, 575, 579 115, 199–215, 237, 451, 465, 468, 520, 521, 533