Tema Neumática

Environmental Toxicology and Chemistry, Vol. 35, No. 11, pp.
2691–2697, 2016
# 2016 SETAC
Printed in the USA
MONTE CARLO–BASED QUANTITATIVE STRUCTURE–ACTIVITY RELATIONSHIP MODELS

FOR TOXICITY OF ORGANIC CHEMICALS TO DAPHNIA MAGNA
ALLA P. TOROPOVA,*y ANDREY A. TOROPOV,y ALEKSANDAR M. VESELINOVIC ,z JOVANA B. VESELINOVIC ,z

DANUTA LESZCZYNSKA,x and JERZY LESZCZYNSKIk
yIRCCS-Istituto di Ricerche Farmacologiche Mario Negri, Milan, Italy
zUniversity of Nis, Faculty of Medicine, Department of Chemistry, Nis, Serbia
xInterdisciplinary Nanotoxicity Center, Department of Civil and Environmental Engineering, Jackson State University, Jackson, Mississippi, USA
kInterdisciplinary Nanotoxicity Center, Department of Chemistry and Biochemistry, Jackson State University, Jackson, Mississippi, USA
(Submitted 29 March 2016; Returned for Revision 18 April 2016; Accepted 21 April 2016)
Abstract: Quantitative structure–activity relationships (QSARs) for toxicity of a large set of 758 organic compounds to Daphnia magna
were built up. The simplified molecular input-line entry system (SMILES) was used to represent the molecular structure. The Correlation
and Logic (CORAL) software was utilized as a tool to develop the QSAR models. These models are built up using the Monte Carlo
method and according to the principle “QSAR is a random event” if one checks a group of random distributions in the visible training set
and the invisible validation set. Three distributions of the data into the visible training, calibration, and invisible validation sets are
examined. The predictive potentials (i.e., statistical characteristics for the invisible validation set of the best model) are as follows: n ¼ 87,
r2 ¼ 0.8377, root mean square error ¼ 0.564. The mechanistic interpretations and the domain of applicability of built models are
suggested and discussed. Environ Toxicol Chem 2016;35:2691–2697. # 2016 SETAC
Keywords: Computational toxicology Ecological risk assessment Environmental toxicology Aquatic toxicology Organic
contaminant
INTRODUCTION chemistry approaches using such methodology are referred to

Modern civilization has spawned a new and great concern as quantitative structure–property relationships (QSPRs) and
for nature because massive productions of hazardous chemicals quantitative structure–activity relationships (QSARs) [12,13].
and pollutants have an enormous effect on the ecosystem. The It is to be noted that the QSAR technique leads to a significant
global production of chemicals has increased from 1 million tons reduction of the use of test animals; this reduction is
in 1930 to 400 million tons in the 21st century. In addition, every recommended by the Registration, Evaluation, Authorisation,
year newly synthesized compounds are introduced into the and Restriction of Chemicals (REACH) regulation [14].
ecosystem. Aquatic ecological systems are important compo- In QSAR modeling defining molecular descriptors is 1 of the
nents of the natural world. Their pollution could introduce most important steps. There are various molecular descriptors
harmful and difficult-to-correct environmental outcomes. calculated from different molecular features including 1 based
Unfortunately, relatively few chemical compounds have been on a molecular graph [15]. An attractive alternative for the
subjected to adequate assessment for their perilous environmen- representation of the molecular structure by graph is the
tal properties [1]. Prevention is always a more efficient remedy simplified molecular input-line entry system (SMILES) [16–18].
than restoration. Such an approach involves a priori evaluation The SMILES notation can be used for calculating the optimal
of environmental effects of considered classes of compounds. descriptor, a molecular descriptor that depends on both the
For risk assessment, Daphnia magna, an important molecular structure and the property under analysis but does not
freshwater invertebrate species in aquatic food webs, has explicitly depend on details from the molecular 3-dimensional
been used worldwide for many years as a representative test geometry. For this reason the development of QSAR models
species for the ecotoxicological evaluation of industrial where molecular descriptors based on SMILES notation are
chemicals. Experimental investigations have been carried out used is an attractive direction of research in the field of QSPR
to address this issue; however, this approach is expensive and theory and applications because every QSPR model that
time-consuming. An alternative involves development of includes geometry-dependent molecular descriptors usually
theoretical (computational) methods to estimate toxic end- includes a relatively difficult calculation of the optimum
points. Such methods could be applied for substances, which molecular geometry, with high computational costs and long
have not been examined experimentally. They use the available development time.
experimental data on these endpoints for compounds belonging The Correlation and Logic (CORAL) software can be used as
to the same group of chemicals [2–11]. Computational a computational tool to carry out calculations to build up the
QSPR/QSAR based on molecular descriptors from SMILES
notation. All QSAR models are based on the Monte Carlo
This article includes online-only Supplemental Data. approach according to the principle “QSAR is a random event”
* Address correspondence to alla.toropova@marionegri.it
Published online 25 April 2016 in Wiley Online Library
[19–26]. The aim of the present study was to test the predictive
(wileyonlinelibrary.com). potential of the CORAL models for prediction of toxicity of a
DOI: 10.1002/etc.3466 large group of organic compounds to D. magna.
2691
2692 Environ Toxicol Chem 35, 2016 A.P. Toropova et al.
MATERIALS AND METHODS In Equation 1, CW(x) is the correlation weight of SMILES

attribute x; BOND is a global SMILES descriptor which
Data is a mathematical function of the presence or absence of
The toxicity data on D. magna for 758 compounds were double (¼), triple (#), and stereochemical (@) bonds; NOSP
taken from the literature [27]. The negative logarithm of 48-h is a global SMILES descriptor which is a mathematical
lethal concentration (millimoles per liter) is examined as the function of the presence or absence of the chemical elements
endpoint (–log 50% lethal concentration [pLC50]). The above- nitrogen (N), oxygen (O), sulfur (S), and phosphorus (P);
mentioned compounds were distributed into the training set HALO is a global SMILES descriptor which is a mathematical
(“builder of a model”), the invisible training set (“blocker function of the presence or absence of the chemical elements
of overtraining”), the calibration set (“estimator of predictive fluorine (F), chlorine (Cl), bromine (Br), and iodine (I) (i.e.,
potential”), and the external validation set (final confirmation of halogens); and PAIR represents the simultaneous presence
the predictive potential of a model). These distributions obey the of pairs of the above-mentioned (in parentheses) SMILES
following principles: 1) they are random, and 2) the fractions of elements.
the compounds considered for each set are approximately 40%, It is possible to calculate the correlation weights, which
approximately 40%, approximately 10%, and approximately establish correlation between the optimal descriptor and the
10% for the training, invisible training, calibration, and desired endpoint. The calculation can be done by the Monte
validation sets, respectively. Carlo method [19–29].
The T* and N* parameters are from the Monte Carlo
Optimal descriptors optimization procedure. The former represents the threshold
The version of optimal descriptors suggested in the to classify all molecular features extracted from SMILES into
literature [28,29] was utilized in the present study: 2 classes: rare (noise) and active. The model calculated with
correlation weights includes solely active features, whereas
DCW ðT ; N Þ ¼ W ðBONDÞ þ W ðNOSPÞ correlation weights of rare features are defined as equal to 0.
The latter is the number of epochs of the Monte Carlo
þ W ðHALOÞ þ WðPAIRÞ
X X ð1Þ optimization, which give the maximal correlation coefficient for
þ W ð Sk Þ þ W ðSSk Þ the calibration set (Figure 1) [30].
X With numerical data on the correlation weights of all features
þ W ðSSSk Þ
involved in building up the model together with defined values
Figure 1. Graphical scheme of the definition of the T* and N* parameters.

QSAR models for toxicity to Daphnia magna Environ Toxicol Chem 35, 2016 2693
of T* and N*, a predictive model can be calculated using the potential of a model can be estimated via a defect of Fk,
training set: defect(Fk) [31,32]:
pLC50 ¼C0 þ C1 DCWðT ; N Þ ð2Þ PT ðFk Þ PC ðFk Þ

defect ðFk Þ ¼ ð3Þ
N T ðFk Þ þ N c ðFk Þ
The next step involves validation of the developed model
calculated with Equation 2. Its predictive potential should be In Equation 3, PT(Fk) and PC(Fk) are probabilities of attribute Fk
checked with the validation set. in the training and the calibration set, respectively; NT(Fk) and
One can use 2 versions of the optimization for the NC(Fk) are prevalence (frequency) of attribute Fk in the training
correlation weights [28,29]. First, the balance of correlation, set and the calibration set, respectively. The defect(Fk) ¼ 1,
where the training set is arranged into 2 groups: the compounds if NC(Fk) ¼ 0.
from the first group are utilized to calculate correlation The defect of SMILES, d(SMILES), can be estimated via
weights, and the compounds included in the second group are defects of Fk in the SMILES:
utilized for control of the absence of the overtraining (i.e.,
the ideal model for visible substances is accompanied by X
dðSMILESÞ ¼ defect ðFk Þ ! min ð4Þ
a poor model for substances which are not involved in the Fk 2SMILES
calculation of the optimal correlation weights). Second,
the traditional model, where the training set does not include The defect of a distribution into the training, invisible training,
invisible “passive” part: in fact, the invisible training set has calibration, and validation sets can be estimated via the sum of
considerably distinctive influence on the model in comparison defects of SMILES from the training and calibration sets:
with the case where these compounds are directly involved
in the training set. In the present study, a comparison of these X
2 cases is carried out. dðDistributionÞ ¼ dðSMILESÞ ð5Þ
SMILES 2Train & Calib
The approach can be carried out utilizing the CORAL
software. A detailed description of the practical use of the
Computational experiments have shown that the described
CORAL software is available in the literature [28].
models have preferable predictive potential if the d(Distribution)
Domain of applicability is minimal.
The statistically robust domain of applicability can be
Various distributions of the data into the training, invisible
introduced via inequality
training, calibration, and validation sets result in different
prevalence rates of features in the training and calibration sets.
The measure of influence of a feature, Fk, for possible predictive dðSMILESÞ < 2 dðSMILESÞ ð6Þ
Table 1. Statistical characteristics of quantitative structure–activity relationship models for 3 distributions of data into the training, invisible training, calibration,
and validation setsa
Distribution Set n r2 q2 s F R2m DR2m
1 Balance of correlations Training 288 0.7399 0.7359 0.786 813

Invisible training 284 0.7729 0.7697 0.770 960
Calibration 87 0.7712 0.7519 0.711 286 0.6768 0.1672
Validation 99* 0.7805* 0.668*
Traditional scheme Training 572 0.7815 0.7799 0.717 2039
Calibration 87 0.7256 0.7011 0.835 225 0.6225 0.0721
Validation 99 0.6605 0.870
Calibration 75 0.7979 0.7850 0.682 288 0.6578 0.1860
Validation 84* 0.8229* 0.6276*
Traditional scheme Training 599 0.7738 0.7722 0.722 2042
Calibration 75 0.7979 0.7857 0.683 288 0.6616 0.1847
Validation 84 0.7203 0.800
Calibration 82 0.9053 0.8999 0.527 765 0.8633 0.0205
Validation 87* 0.8377* 0.564*
Traditional scheme Training 589 0.7764 0.7748 0.726
Calibration 82 0.8746 0.8661 0.590 0.8205 0.0081
Validation 87 0.7426 0.787
a
n is the number of compounds in a set; r2 is the correlation coefficient between experimental and calculated –log 50% lethal concentration; q2 is the leave-one-
root mean squared error; F is the Fischer F ratio; R2m and DR2m are measures of predictability [33]
out cross-validated r2; sqisffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

r ðx; yÞ ¼ r 1
2 2 r2 r2
m 0
r2 ðx; yÞ þr2 ðy; xÞ
R2m ¼ m 2
m
DR2m ¼ r2m ðx; yÞ r2m ðy; xÞ

where x and y are vectors of predicted and experimental endpoint values, respectively; r20 is the correlation coefficient between experimental and predicted values
calculated without intercept.
*indicates significance.
where dðSMILESÞ is the average defect of SMILES over the

training and calibration sets.
Mechanistic interpretation
The developed QSAR models allow for mechanical
interpretation of the studied phenomena. Having the numerical
data on the correlation weights of features in several runs of the
Monte Carlo optimization, one can extract 3 categories of these
features [28–32]: 1) features which have a positive value of
the correlation weight in all runs (these are promoters of the
endpoint increase); 2) features which have a negative value of
the correlation weight in all runs (these are promoters of the
endpoint decrease); and 3) features which have both negative
and positive values of the correlation weight in different runs of
the optimization. These are features with an unclear role (one
cannot classify these features as promoters of an increase or a
decrease of the endpoint).
RESULTS
QSAR models
The QSAR models for toxicity to D. magna, which are
calculated with the balance of correlations for 3 distributions
of data into the training, invisible training, calibration, and
validation sets, are as follows:
pLC50 ¼ 0:8701 ð 0:0084Þ þ 0:09209 ð 0:0002Þ

ð7Þ
DCWð3;10Þ
pLC50 ¼ 0:0020 ð 0:0099Þ þ 0:09745Þ ð 0:0002Þ

ð8Þ
DCWð3;10Þ
pLC50 ¼ 0:0016 ð 0:0097Þ þ 0:09505 ð 0:0002Þ

ð9Þ
DCWð3;10Þ
However, if one does not use the balance of correlation Figure 2. Graphical representation of correlations between experimental
approach, the developed models are different. The QSAR and calculated –log 50% lethal concentration (pLC50) values.
models for toxicity to D. magna, which are calculated with the
traditional scheme (without the invisible training set) for 3 3, respectively. This changes when the traditional scheme of
distributions of the data into the training, calibration, and distribution is used. In such a case the numbers of outliers are
validation sets, are as follows: 14, 12, and 5 for distributions 1, 2, and 3, respectively. Thus,
distribution 3 seems to be preferable for both versions of the
pLC50 ¼ 2:087 ð 0:0028Þ þ 0:08818 ð 0:0001Þ Monte Carlo optimization (balance of correlations and traditional
ð10Þ
DCWð3;10Þ scheme). In fact, the criterion expressed as inequality 6 is an
indicator of compounds, which have rare, untypical molecular
pLC50 ¼ 2:301 ð 0:0026Þ þ 0:08368 ð 0:0001Þ features. In other words, these compounds are suspected to be
ð11Þ outliers. However, the data on the number of these suspected
DC Wð3;10Þ
compounds give an additional possibility to estimate different
distributions of the data into the training, invisible training,
pLC50 ¼ 2:538 ð 0:0023Þ þ 0:09843 ð 0:0001Þ calibration, and validation sets: “if the number of the above-
ð12Þ
DCWð3;10Þ mentioned outliers is smaller, then distribution is better.”
Table 1 contains the statistical characteristics of the models Mechanistic interpretation

calculated with Equations 7 to 12. All of these models are Table 2 contains the lists of SMILES attributes, which are
satisfactory according to statistical criteria [33] represented revealed to be stable promoters of the pLC50 increase or
in Table 1. However, it is to be noted that the balance of decrease. The data on prevalence of these promoters in the training,
correlations approach gives a better prediction for all 3 random invisible training, and calibration sets is displayed in Table 3.
distributions. Figure 2 represents the models calculated with
Equations 7 to 9 graphically. DISCUSSION
Domain of applicability Comparison of statistical quality of QSAR for D. magna

The numbers of outliers according to inequality 6 for the Table 4 contains the statistical characteristics of QSAR
balance of correlations are 10, 9, and 7 for distributions 1, 2, and models for D. magna available in the literature [34–37]. This
Table 2. List of molecular features extracted from simplified molecular input-line entry system (SMILES), which are promoters of increase (or decrease) for
–log 50% lethal concentration (pLC50)
No. Feature, Fk Comment Impact for pLC50
1 BOND10000000 Presence of double bonds Increase

2 BOND00000000 Absence of double, triple, and stereochemical bonds Increase
3 NOSP01000000 Presence of oxygen together with absence of nitrogen, sulfur, and phosphorus Increase
4 þþþþN─ ─ ─B25 555 Simultaneous presence in a molecule of nitrogen and double bond (s) Increase
5 S ... Presence of sulfur atom (s) Increase
6 þþþþS─ ─ ─ B25 555 Simultaneous presence of Sulfur together with double bond (s) Decrease
7 C . . . C . . . (. . .) Branching at pair of carbon atoms in sp3 state Decrease
8 n ... Presence of nitrogen in sp2 state Decrease
provides a good summary of the already executed approaches. which have no physical interpretation, and 2) the Monte Carlo
One can compare the previously published data with the calculations take considerable time to execute, especially if the
statistical characteristics of the models developed in the present number of compounds is large (e.g., n > 500).
study. The data presented in Table 4 clearly indicate that the
QSAR models calculated with Equations 7 to 9 are comparable Availability of examined data
with models suggested in the literature. The Supplemental Data contain technical details: 1) SM1
contains correlation weights for calculations with Equations 7
Advantages of the CORAL models to 9; 2) SM2 contains experimental and predicted pLC50
There are various benefits of application of the CORAL values calculated with Equations 7 to 9; 3) SM3 contains the
models. The approach described in the present study applies correlation weights obtained in 3 probes of the Monte Carlo
the QSAR model using solely experimental values of pLC50 optimization where one can find the stable promoters of the
together with data on the molecular structure. There is no pLC50 increase together with promoters of the pLC50 decrease;
need to use information on physicochemical parameters, and 4) SM4 contains 3 distributions into the training (i.e., the
3-dimensional representation of the molecular systems, and training, invisible training, and calibration sets) and external
quantum mechanics descriptors for the considered compounds. validation sets which were examined in the present study
The method applied in the present study delivers QSAR models (these distributions can be checked with the CORAL software);
in accordance with Organisation for Economic Co-operation and 5) SM5 contains the CORAL method used to build the
and Development principles [38]. described models.
Disadvantages of the CORAL models
Though CORAL models are efficient and reliable, there are CONCLUSIONS
also some drawbacks of such approaches: 1) there are SMILES Using the CORAL approach, predictions of toxicity of 758
attributes (this is related to SSk and SSSk involved in Equation 1) organic compounds to D. magna were carried out. All QSAR
Table 3. Correlation weights and prevalence of molecular features Fk extracted from simplified molecular input-line entry system (SMILES), which are
promoters of increase (or decrease) for –log 50% lethal concentration
Feature, Fk Distribution CW(Fk) in probe 1 CW(Fk) in probe 2 CW(Fk) in probe 3 NT NIT NC Defect(Fk)
BOND10000000 1 4.50212 4.25292 2.50098 163 138 47 0.0001

2 4.49822 7.49671 4.99822 178 146 29 0.0008
3 5.00259 3.50238 3.99991 163 158 34 0.0006
BOND00000000 1 7.75332 9.49539 10.50317 115 139 39 0.0003
2 9.99812 13.75023 9.24921 133 123 45 0.0010
3 8.24893 9.50242 8.00085 137 114 45 0.0006
NOSP01000000 1 2.50275 3.99775 3.49717 105 115 27 0.0004
2 3.25394 3.24730 2.50415 121 95 34 0.0005
3 3.49710 3.50430 3.49783 112 105 37 0.0006
þþþþN─ ─ ─B25
555 1 0.99642 3.49920 2.50496 76 59 21 0.0002
2 4.49525 4.50001 1.49646 83 65 10 0.0014
3 1.00461 0.99860 1.75335 67 74 15 0.0004
S ... 1 6.00091 5.49845 6.49918 44 36 13 0.0001
2 5.75436 5.74986 4.50392 47 40 2 0.0025
3 5.49772 5.25079 3.49748 40 40 9 0.0004
þþþþS─ ─ ─B25
555 1 –3.75458 –4.49712 –3.49840 38 29 11 0.0001
2 –1.75444 –8.00161 –6.50246 40 34 1 0.0027
3 –2.75126 –4.75226 –4.75001 32 32 9 0.0001
C ... C ... (...) 1 –0.00259 –0.24797 –1.24937 87 88 27 0.0001
2 –1.24737 –1.24876 –0.25035 95 80 23 0.0001
3 –0.75343 –1.24625 –2.25130 89 92 19 0.0005
n ... 1 –2.50344 –2.75050 –3.99880 12 17 3 0.0005
2 –0.75458 –1.24965 –0.99687 13 15 3 0.0000
3 –2.24872 –3.00371 –4.24579 15 15 3 0.0007
CW ¼ correlation weight; defect(Fk) ¼ defect of a feature, Fk, calculated with Equation 3; NT, NIT, and NC ¼ numbers of SMILES attribute x in the training,
invisible training, and calibration sets, respectively.
Table 4. Comparison of the statistical quality of quantitative structure–activity relationship models for toxicity to Daphnia magna
Training set Validation set
No. n r2 s n r2 s Reference
1 — 0.740–0.768 0.79–1.00 — — — 34
2 222 0.738 75 0.721 35
3 97 0.77 0.39 — — 0.34–0.44 36
4 149 0.70 1.04 89 0.768 0.88 37
5 307 0.739 0.80 87 0.838 0.564 Present studya
a
Equation 9.
models are based on SMILES notation optimal descriptors 6. Ghaedi A. 2015. Predicting the cytotoxicity of ionic liquids using
and were developed with application of the Monte Carlo QSAR model based on SMILES optimal descriptors. J Mol Liq
208:269–279.
method. The predictive potential of the applied approach was
7. Li Q, Ding X, Si H, Gao H. 2014. QSAR model based on SMILES of
tested with 3 random splits into the subtraining, calibration, test, inhibitory rate of 2,3-diarylpropenoic acids on AKR1C3. Chemometr
and validation sets and with different statistical methods. Intell Lab Syst 139:132–138.
All models considered in the present study are characterized 8. Masand VH, Toropov AA, Toropova AP, Mahajan DT. 2014. QSAR
by the following features: 1) every time, the best statistical models for anti-malarial activity of 4-aminoquinolines. Curr Comput
Aided Drug Des 10:75–82.
characteristics for the calibration set are accompanied by 9. Scotti L, Lima EO, da Silva MS, Ishiki H, Lima IO, Pereira FO,
satisfactory predictive potential (the statistical characteristics MendonSca FJB Jr, Scotti MT. 2014. Docking and PLS studies on a set of
for the external validation set), and 2) the balance of correlation thiophenes RNA polymerase inhibitors against Staphylococcus aureus.
approach gives better predictions than the traditional scheme. Curr Top Med Chem 14:64–80.
10. Scotti L, Ishiki H, Ferreira MJP, Francisco JBM Jr, De P Emerenciano
Both features demonstrate that Monte Carlo method–based
V, Silva MS, Scotti MT. 2012. In silico methods applied in food
modeling incorporated in CORAL software is a very promising chemistry: A short review with bitter and mutagenic compounds. Lett
computational method in QSAR studies for risk assessment Drug Des Discov 9:527–534.
related to toxicity of organic chemicals to D. magna. The 11. Speck-Planche A, Kleandrova VV, Luan F, Cordeiro MNDs. 2015.
SMILES attributes (both local and global), which are promoters Computational modeling in nanomedicine: Prediction of multiple
antibacterial profiles of nanoparticles using a quantitative structure–
of toxicity increase or decrease were identified and defined. activity relationship perturbation model. Nanomedicine 10:193–204.
These structural features are related to organic chemical 12. Torrens F, Castellano G. 2014. QSPR prediction of chromatographic
toxicity, and their identification helped to improve the retention times of pesticides: Partition and fractal indices. J Environ Sci
understanding of organic chemical toxicity toward D. magna. Health B 49:400–407.
13. Torrens F, Castellano G. 2012. QSPR prediction of retention times of
The Monte Carlo calculations described in the present study
phenylurea herbicides by biological plastic evolution. Curr Drug Saf
can be reproduced using the CORAL software. 7:262–268.
14. van der Jagt K, Munn S, Torslov J, de Bruijn J, eds. 2004. Alternative
Supplemental Data—The Supplemental Data are available on the Wiley approaches can reduce the use of test animals under REACH.
Online Library at DOI:10.1002/etc.3466. Addendum to: Assessment of additional testing needs under REACH
effects of (Q)SARS, risk based testing and voluntary industry
initiatives. IHCP report EUR 21405 EN. Joint Research Centre
Acknowledgment—A.A. Toropov and A.P. Toropova thank the European Institute for Health and Consumer Protection, European Commission,
Commission project PeptiCAPS (project 686141). A.M. Veselinovic and Ispra, Italy.
J.B. Veselinovic acknowledge support from the Ministry of Education 15. Ivanciuc O. 2013. Chemical graphs, molecular matrices and topological
and Science, the Republic of Serbia (project 43012). D. Leszczynska and indices in chemoinformatics and quantitative structure-activity rela-
J. Leszczynska acknowledge support from the National Science Foundation tionships. Curr Comput Aided Drug Des 9:153–163.
(NSF/CREST HRD-0833178) and EPSCoR (362492-190200-01/NSFEPS- 16. Weininger D. 1988. SMILES, a chemical language and information
090378). system. 1. Introduction to methodology and encoding rules. J Chem Inf
Comput Sci 28:31–36.
Data availability—Data were taken from Zhang et al. [27]. In addition, the 17. Weininger D, Weininger A, Weininger JL. 1989. SMILES. 2.
Supplemental Data contain the data as Excel files. Algorithm for generation of unique SMILES notation. J Chem Inf
Comput Sci 29:97–101.
18. Weininger D. 1990. SMILES. 3. Depict. Graphical depiction of
REFERENCES chemical structures. J Chem Inf Comput Sci 30:237–243.
1. Mackay D, Hubbarde J, Webster E. 2003. The role of QSARs and fate
19. Zivkovi c JV, Trutic NV, Veselinovic JB, Nikolic GM, Veselinovic
models in chemical hazard and risk assessment. QSAR Comb Sci AM. 2015. Monte Carlo method based QSAR modeling of maleimide
22:106–112. derivatives as glycogen synthase kinase-3b inhibitors. Comput Biol
2. Furtula B, Gutman I. 2011. Relation between second and third Med 64:276–282.
geometric-arithmetic indices of trees. J Chemom 25:87–91.
20. Veselinovic JB, Nikolic GM, Trutic NV, Zivkovi c JV, Veselinovic
3. Afantitis A, Melagraki G, Koutentis PA, Sarimveis H, Kollias G. 2011. AM. 2015. Monte Carlo QSAR models for predicting organophosphate
Ligand-based virtual screening procedure for the prediction and the inhibition of acetylcholinesterase. SAR QSAR Environ Res 26:449–
identification of novel b-amyloid aggregation inhibitors using Kohonen 460.
maps and counterpropagation artificial neural networks. Eur J Med
21. Veselinovic AM, Veselinovic JB, Zivkovi c JV, Nikolic GM. 2015.
Chem 46:497–508. Application of smiles notation based optimal descriptors in drug
4. Afantitis A, Melagraki G, Sarimveis H, Koutentis PA, Igglessi- discovery and design. Curr Top Med Chem 15:1768–1779.
Markopoulou O, Kollias G. 2010. A combined LS-SVM & MLR QSAR 22. Achary PGR. 2014. QSPR modelling of dielectric constants of p-
workflow for predicting the inhibition of CXCR3 receptor by conjugated organic compounds by means of the CORAL software. SAR
quinazolinone analogs. Mol Divers 14:225–235. QSAR Environ Res 25:507–526.
5. Duchowicz PR, Comelli NC, Ortiz EV, Castro EA. 2012. QSAR study 23. Worachartcheewan A, Nantasenamat C, Isarankura-Na-Ayudhya C,
for carcinogenicity in a large set of organic compounds. Curr Drug Saf Prachayasittikul V. 2014. QSAR study of H1N1 neuraminidase
7:282–288. inhibitors from influenza a virus. Lett Drug Des Discov 11:420–427.
24. Achary PGR. 2014. Simplified molecular input line entry system-based of cytotoxicity for metal oxide nanoparticles under different conditions.
optimal descriptors: QSAR modelling for voltage-gated potassium Ecotoxicol Environ Saf 112:39–45.
channel subunit Kv7.2. SAR QSAR Environ Res 25:73–90. 32. Toropova MA, Toropov AA, Raska I, Raskova M. 2015. Searching
25. Garcıa J, Duchowicz PR, Rozas MF, Caram JA, Mirıfico MV, therapeutic agents for treatment of Alzheimer disease using the Monte
Fernandez FM, Castro EA. 2011. A comparative QSAR on 1,2,5- Carlo method. Comput Biol Med 64:148–154.
thiadiazolidin-3-one 1,1-dioxide compounds as selective inhibitors of 33. Ojha PK, Mitra I, Das RN, Roy K. 2011. Further exploring rm2 metrics
human serine proteinases. J Mol Graph Model 31:10–19. for validation of QSPR models. Chemometr Intell Lab Syst 107:194–
26. Mullen LMA, Duchowicz PR, Castro EA. 2011. QSAR treatment on 205.
a new class of triphenylmethyl-containing compounds as potent 34. Vikas R. 2015. Exploring the role of quantum chemical descriptors in
anticancer agents. Chemometr Intell Lab Syst 107:269–275. modeling acute toxicity of diverse chemicals to Daphnia magna. J Mol
27. Zhang X, Qin W, He J, Wen Y, Su L, Sheng L, Zhao Y. 2013. Graph Model 61:89–101.
Discrimination of excess toxicity from narcotic effect: Comparison of 35. Kar S, Roy K. 2010. QSAR modeling of toxicity of diverse organic
toxicity of class-based organic chemicals to Daphnia magna and chemicals to Daphnia magna using 2D and 3D descriptors. J Hazard
Tetrahymena pyriformis. Chemosphere 93:397–407. Mater 177:344–351.
28. Toropova AP, Toropov AA, Benfenati E, Leszczynska D, Leszczynski 36. Cassani S, Kovarich S, Papa E, Roy PP, van der Wal L, Gramatica P.
J. 2015. QSAR model as a random event: A case of rat toxicity. Bioorg 2013. Daphnia and fish toxicity of (benzo)triazoles: Validated QSAR
Med Chem 23:1223–1230. models, and interspecies quantitative activity–activity modeling.
29. Toropova AP, Toropov AA, Benfenati E, Gini G, Leszczynska D, J Hazard Mater 258–259:50–60.
Leszczynski J. 2011. CORAL: Quantitative structure-activity relation- 37. Toropova AP, Toropov AA, Martyanov SE, Benfenati E, Gini G,
ship models for estimating toxicity of organic compounds in rats. Leszczynska D, Leszczynski J. 2012. CORAL: QSAR modeling of
J Comput Chem 32:2727–2733. toxicity of organic chemicals toward Daphnia magna. Chemometr
30. Toropova AP, Toropov AA, Veselinovic JB, Veselinovic AM. 2015. Intell Lab Syst 110:177–181.
QSAR as a random event: A case of NOAEL. Environ Sci Pollut Res Int 38. Organisation for Economic Co-operation and Development. 2007.
22:8264–8271. Guidance document on the validation of (quantitative)structure–
31. Toropova AP, Toropov AA, Rallo R, Leszczynska D, Leszczynski J. activity relationship [(Q)SAR] models. Series on Testing and
2015. Optimal descriptor as a translator of eclectic data into prediction Assessment, No. 69. ENV/JM/MONO(2007)2. Paris, France.

Tema Neumática

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Tema Neumática

Uploaded by

Copyright:

Available Formats

Environmental Toxicology and Chemistry, Vol. 35, No. 11, pp.

MONTE CARLO–BASED QUANTITATIVE STRUCTURE–ACTIVITY RELATIONSHIP MODELS

ALLA P. TOROPOVA,*y ANDREY A. TOROPOV,y ALEKSANDAR M. VESELINOVIC ,z JOVANA B. VESELINOVIC ,z

INTRODUCTION chemistry approaches using such methodology are referred to

MATERIALS AND METHODS In Equation 1, CW(x) is the correlation weight of SMILES

Figure 1. Graphical scheme of the deﬁnition of the T* and N* parameters.

pLC50 ¼C0 þ C1 DCWðT ; N Þ ð2Þ PT ðFk Þ PC ðFk Þ

Distribution Set n r2 q2 s F R2m DR2m

1 Balance of correlations Training 288 0.7399 0.7359 0.786 813

DR2m ¼ r2m ðx; yÞ r2m ðy; xÞ

where dðSMILESÞ is the average defect of SMILES over the

pLC50 ¼ 0:8701 ð 0:0084Þ þ 0:09209 ð 0:0002Þ

pLC50 ¼ 0:0020 ð 0:0099Þ þ 0:09745Þ ð 0:0002Þ

pLC50 ¼ 0:0016 ð 0:0097Þ þ 0:09505 ð 0:0002Þ

Table 1 contains the statistical characteristics of the models Mechanistic interpretation

Domain of applicability Comparison of statistical quality of QSAR for D. magna

No. Feature, Fk Comment Impact for pLC50

1 BOND10000000 Presence of double bonds Increase

BOND10000000 1 4.50212 4.25292 2.50098 163 138 47 0.0001

Training set Validation set

You might also like

Tema Neumática

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Tema Neumática

Uploaded by

Copyright:

Available Formats

Environmental Toxicology and Chemistry, Vol. 35, No. 11, pp.

MONTE CARLO–BASED QUANTITATIVE STRUCTURE–ACTIVITY RELATIONSHIP MODELS

ALLA P. TOROPOVA,*y ANDREY A. TOROPOV,y ALEKSANDAR M. VESELINOVIC ,z JOVANA B. VESELINOVIC ,z

INTRODUCTION chemistry approaches using such methodology are referred to

MATERIALS AND METHODS In Equation 1, CW(x) is the correlation weight of SMILES

Figure 1. Graphical scheme of the deﬁnition of the T* and N* parameters.

pLC50 ¼C0 þ C1 DCWðT ; N Þ ð2Þ PT ðFk Þ PC ðFk Þ

Distribution Set n r2 q2 s F R2m DR2m

1 Balance of correlations Training 288 0.7399 0.7359 0.786 813

DR2m ¼ r2m ðx; yÞ r2m ðy; xÞ

where dðSMILESÞ is the average defect of SMILES over the

pLC50 ¼ 0:8701 ð  0:0084Þ þ 0:09209 ð 0:0002Þ

pLC50 ¼ 0:0020 ð 0:0099Þ þ 0:09745Þ ð 0:0002Þ

pLC50 ¼ 0:0016 ð 0:0097Þ þ 0:09505 ð 0:0002Þ

Table 1 contains the statistical characteristics of the models Mechanistic interpretation

Domain of applicability Comparison of statistical quality of QSAR for D. magna

No. Feature, Fk Comment Impact for pLC50

1 BOND10000000 Presence of double bonds Increase

BOND10000000 1 4.50212 4.25292 2.50098 163 138 47 0.0001

Training set Validation set

You might also like

pLC50 ¼ 0:8701 ð 0:0084Þ þ 0:09209 ð 0:0002Þ

pLC50 ¼ 0:0020 ð 0:0099Þ þ 0:09745Þ ð 0:0002Þ

pLC50 ¼ 0:0016 ð 0:0097Þ þ 0:09505 ð 0:0002Þ