Download as pdf or txt
Download as pdf or txt
You are on page 1of 66

Molecule TOPKAT_Aerobic_Biodegradability

Structural Similar Compounds


Name Benzeneacetic_acid,_4- Methanone,__2-hydroxy- Rhodamine_B
chloro-.alpha.-(4- 4-
chlorophenyl)-.alpha.- (octyloxy)phenyl_phenyl-
hydroxy-,_ethyl_ester
Structure

C23H23N2O2[?]
Molecular Weight: 359.44091 Actual Endpoint Non-Degradable Non-Degradable Non-Degradable
ALogP: 5.163 Predicted Endpoint Non-Degradable Non-Degradable Non-Degradable
Rotatable Bonds: 6 Distance 0.617 0.659 0.690
Acceptors: 4 Reference Environmental Toxicology Environmental Toxicology Environmental Toxicology
Donors: 1 & Chemistry 18(9), 1763- & Chemistry 18(9), 1763- & Chemistry 18(9), 1763-
1768, 1999. 1768, 1999. 1768, 1999.

Model Prediction
Model Applicability
Prediction: Non-Degradable
Unknown features are fingerprint features in the query molecule, but not found or appearing too infrequently
Probability: 0.118 in the training set.
Enrichment: 0.271
1. All properties and OPS components are within expected ranges.
Bayesian Score: -10.6
Mahalanobis Distance: 18.5
Mahalanobis Distance p-value: 4.07e-025
Feature Contribution
Prediction: Positive if the Bayesian score is above the estimated Top features for positive contribution
best cutoff value from minimizing the false positive and false
negative rate. Fingerprint Bit/Smiles Feature Structure Score Degradable in
Probability: The esimated probability that the sample is in the training set
positive category. This assumes that the Bayesian score follows
a normal distribution and is different from the prediction using a SCFP_12 1242547645 0.501 4 out of 5
cutoff.
Enrichment: An estimate of enrichment, that is, the increased
likelihood (versus random) of this sample being in the category.
Bayesian Score: The standard Laplacian-modified Bayesian
score.
Mahalanobis Distance: The Mahalanobis distance (MD) is the
distance to the center of the training data. The larger the MD, the
less trustworthy the prediction.
Mahalanobis Distance p-value: The p-value gives the fraction of
training data with an MD greater than or equal to the one for the
given sample, assuming normally distributed data. The smaller
the p-value, the less trustworthy the prediciton. For highly non-
normal X properties (e.g., fingerprints), the MD p-value is wildly
inaccurate.
SCFP_12 392579710 0.422 3 out of 4

SCFP_12 136597326 0.36 179 out of 307

Top Features for negative contribution


Fingerprint Bit/Smiles Feature Structure Score Degradable in
training set
SCFP_12 10 -1.61 11 out of 145

SCFP_12 -1850396224 -1.45 0 out of 8

SCFP_12 384920865 -1.36 6 out of 65


Molecule TOPKAT_Ames_Mutagenicity
Structural Similar Compounds
Name LUCANTHONE IA-4 43047-59-2
HYDROCHLORIDE
Structure

C23H23N2O2[?] Actual Endpoint Mutagen Mutagen Mutagen


Molecular Weight: 359.44091 Predicted Endpoint Mutagen Mutagen Mutagen
ALogP: 5.163 Distance 0.527 0.533 0.545
Rotatable Bonds: 6 Reference EMIC EMIC Kazius et. al., J. Med.
Chem. (2005) 48, 312-320
Acceptors: 4
Donors: 1
Model Applicability
Unknown features are fingerprint features in the query molecule, but not found or appearing too infrequently
Model Prediction in the training set.
Prediction: Non-Mutagen
1. All properties and OPS components are within expected ranges.
Probability: 0.609
Enrichment: 1.09
Bayesian Score: -4.78 Feature Contribution
Mahalanobis Distance: 7.13 Top features for positive contribution
Mahalanobis Distance p-value: 1 Fingerprint Bit/Smiles Feature Structure Score Mutagen in training
Prediction: Positive if the Bayesian score is above the estimated set
best cutoff value from minimizing the false positive and false
negative rate. SCFP_12 -161181165 0.442 37 out of 41
Probability: The esimated probability that the sample is in the
positive category. This assumes that the Bayesian score follows
a normal distribution and is different from the prediction using a
cutoff.
Enrichment: An estimate of enrichment, that is, the increased
likelihood (versus random) of this sample being in the category.
Bayesian Score: The standard Laplacian-modified Bayesian
score.
Mahalanobis Distance: The Mahalanobis distance (MD) is the
distance to the center of the training data. The larger the MD, the
less trustworthy the prediction.
Mahalanobis Distance p-value: The p-value gives the fraction of
training data with an MD greater than or equal to the one for the
given sample, assuming normally distributed data. The smaller
the p-value, the less trustworthy the prediciton. For highly non-
normal X properties (e.g., fingerprints), the MD p-value is wildly
inaccurate.
SCFP_12 -1798344807 0.437 81 out of 91

SCFP_12 1007749707 0.388 3 out of 3

Top Features for negative contribution


Fingerprint Bit/Smiles Feature Structure Score Mutagen in training
set
SCFP_12 1987936786 -2.2 0 out of 14

SCFP_12 2033992841 -1.32 4 out of 31

SCFP_12 1498989769 -0.998 0 out of 3


Molecule TOPKAT_Developmental_Toxicity_Potential
Structural Similar Compounds
Name Pirmenol .HCl (Free base Perphenazine Bromperidol
form)
Structure

C23H23N2O2[?] Actual Endpoint Non-Toxic Toxic Toxic


Molecular Weight: 359.44091 Predicted Endpoint Non-Toxic Toxic Toxic
ALogP: 5.163 Distance 0.566 0.605 0.612
Rotatable Bonds: 6 Reference Toxicol Appl Pharmacol Toxicol Appl Pharmacol Journal of Toxic Sciences
56:294-301; 1980 21(2):230-6; 1972 9:109-126; 1984
Acceptors: 4
Donors: 1
Model Applicability
Unknown features are fingerprint features in the query molecule, but not found or appearing too infrequently
Model Prediction in the training set.
Prediction: Toxic
1. All properties and OPS components are within expected ranges.
Probability: 0.593
Enrichment: 1.13
Bayesian Score: 1.17 Feature Contribution
Mahalanobis Distance: 10.1 Top features for positive contribution
Mahalanobis Distance p-value: 0.0145 Fingerprint Bit/Smiles Feature Structure Score Toxic in training
Prediction: Positive if the Bayesian score is above the estimated set
best cutoff value from minimizing the false positive and false
negative rate. SCFP_6 -538866216 0.478 4 out of 4
Probability: The esimated probability that the sample is in the
positive category. This assumes that the Bayesian score follows
a normal distribution and is different from the prediction using a
cutoff.
Enrichment: An estimate of enrichment, that is, the increased
likelihood (versus random) of this sample being in the category.
Bayesian Score: The standard Laplacian-modified Bayesian
score.
Mahalanobis Distance: The Mahalanobis distance (MD) is the
distance to the center of the training data. The larger the MD, the
less trustworthy the prediction.
Mahalanobis Distance p-value: The p-value gives the fraction of
training data with an MD greater than or equal to the one for the
given sample, assuming normally distributed data. The smaller
the p-value, the less trustworthy the prediciton. For highly non-
normal X properties (e.g., fingerprints), the MD p-value is wildly
inaccurate.
SCFP_6 2033992841 0.441 3 out of 3

SCFP_6 -1971137145 0.431 7 out of 8

Top Features for negative contribution


Fingerprint Bit/Smiles Feature Structure Score Toxic in training
set
SCFP_6 1684592399 -0.718 0 out of 2

SCFP_6 2010506287 -0.422 0 out of 1

SCFP_6 -1798553344 -0.358 3 out of 9


Molecule TOPKAT_Mouse_Female_FDA_None_vs_Carcinogen
Structural Similar Compounds
Name Indomethacin Simvastatin Haloperidol
Structure

Actual Endpoint Non-Carcinogen Carcinogen Carcinogen


C23H23N2O2[?] Predicted Endpoint Non-Carcinogen Carcinogen Carcinogen
Molecular Weight: 359.44091 Distance 0.629 0.635 0.635
ALogP: 5.163 Reference US FDA (Centre for Drug US FDA (Centre for Drug US FDA (Centre for Drug
Rotatable Bonds: 6 Eval.& Res./Off. Testing & Eval.& Res./Off. Testing & Eval.& Res./Off. Testing &
Res.) Sept. 1997 Res.) Sept. 1997 Res.) Sept. 1997
Acceptors: 4
Donors: 1
Model Applicability
Unknown features are fingerprint features in the query molecule, but not found or appearing too infrequently
Model Prediction in the training set.
Prediction: Non-Carcinogen
1. All properties and OPS components are within expected ranges.
Probability: 0.206
2. Unknown ECFP_2 feature: -1305021906: [*]['?']
Enrichment: 0.644 3. Unknown ECFP_2 feature: 1335702447: ['?'][c](:[*]):[c](C=[*]):[cH]:[*]
Bayesian Score: -7.44 4. Unknown ECFP_2 feature: -1659020767: [*][c](:[*]):[c](['?']):[c]([*]):[*]
Mahalanobis Distance: 11.9 5. Unknown ECFP_2 feature: 432684389: ['?'][c](:[*]):[*]
Mahalanobis Distance p-value: 0.0143 6. Unknown ECFP_2 feature: -176483725: [*]=C[c](:[cH]:[*]):[cH]:[*]
Prediction: Positive if the Bayesian score is above the estimated
best cutoff value from minimizing the false positive and false
negative rate. Feature Contribution
Probability: The esimated probability that the sample is in the
positive category. This assumes that the Bayesian score follows Top features for positive contribution
a normal distribution and is different from the prediction using a
cutoff. Fingerprint Bit/Smiles Feature Structure Score Carcinogen in
Enrichment: An estimate of enrichment, that is, the increased training set
likelihood (versus random) of this sample being in the category.
Bayesian Score: The standard Laplacian-modified Bayesian
score.
Mahalanobis Distance: The Mahalanobis distance (MD) is the
distance to the center of the training data. The larger the MD, the
less trustworthy the prediction.
Mahalanobis Distance p-value: The p-value gives the fraction of
training data with an MD greater than or equal to the one for the
given sample, assuming normally distributed data. The smaller
the p-value, the less trustworthy the prediciton. For highly non-
normal X properties (e.g., fingerprints), the MD p-value is wildly
inaccurate.
ECFP_6 -1925046727 0.391 11 out of 23

ECFP_6 -938530932 0.0661 8 out of 24

Top Features for negative contribution


Fingerprint Bit/Smiles Feature Structure Score Carcinogen in
training set
ECFP_6 2077607946 -1.15 0 out of 7

ECFP_6 -219423964 -0.935 0 out of 5

ECFP_6 -1831055759 -0.805 0 out of 4


Molecule TOPKAT_Mouse_Female_NTP
Structural Similar Compounds
Name Chlorobenzilate Rhodamine 6G Tricresyl Phosphate
Structure

Actual Endpoint Carcinogen Non-Carcinogen Non-Carcinogen


C23H23N2O2[?] Predicted Endpoint Carcinogen Non-Carcinogen Non-Carcinogen
Molecular Weight: 359.44091 Distance 0.603 0.640 0.651
ALogP: 5.163 Reference NTP/TR-075 NTP/TR-364 NTP/TR-433
Rotatable Bonds: 6
Acceptors: 4 Model Applicability
Donors: 1 Unknown features are fingerprint features in the query molecule, but not found or appearing too infrequently
in the training set.
Model Prediction 1. OPS PC16 out of range. Value: -4.4952. Training min, max, SD, explained variance: -3.8995,
Prediction: Carcinogen 2.937, 1.095, 0.0199.
Probability: 0.71 2. Unknown ECFP_2 feature: -1305021906: [*]['?']
Enrichment: 1.8 3. Unknown ECFP_2 feature: -428002189: [*]:[cH]:[c](:n:[*])[c](:[*]):[*]
4. Unknown ECFP_2 feature: -1659020767: [*][c](:[*]):[c](['?']):[c]([*]):[*]
Bayesian Score: 3.2
5. Unknown ECFP_2 feature: 432684389: ['?'][c](:[*]):[*]
Mahalanobis Distance: 11.8
Mahalanobis Distance p-value: 1.8e-006
Prediction: Positive if the Bayesian score is above the estimated
Feature Contribution
best cutoff value from minimizing the false positive and false
negative rate. Top features for positive contribution
Probability: The esimated probability that the sample is in the
positive category. This assumes that the Bayesian score follows Fingerprint Bit/Smiles Feature Structure Score Carcinogen in
a normal distribution and is different from the prediction using a training set
cutoff.
Enrichment: An estimate of enrichment, that is, the increased ECFP_8 -1734834311 0.544 2 out of 2
likelihood (versus random) of this sample being in the category.
Bayesian Score: The standard Laplacian-modified Bayesian
score.
Mahalanobis Distance: The Mahalanobis distance (MD) is the
distance to the center of the training data. The larger the MD, the
less trustworthy the prediction.
Mahalanobis Distance p-value: The p-value gives the fraction of
training data with an MD greater than or equal to the one for the
given sample, assuming normally distributed data. The smaller
the p-value, the less trustworthy the prediciton. For highly non-
normal X properties (e.g., fingerprints), the MD p-value is wildly
inaccurate.
ECFP_8 -1925046727 0.496 25 out of 40

ECFP_8 1680623188 0.422 9 out of 15

Top Features for negative contribution


Fingerprint Bit/Smiles Feature Structure Score Carcinogen in
training set
ECFP_8 1335702447 -0.748 0 out of 3

ECFP_8 -992306215 -0.555 0 out of 2

ECFP_8 734603939 -0.104 57 out of 171


Molecule TOPKAT_Mouse_Male_FDA_None_vs_Carcinogen
Structural Similar Compounds
Name Indomethacin Nafenopin Haloperidol
Structure

Actual Endpoint Non-Carcinogen Carcinogen Carcinogen


C23H23N2O2[?] Predicted Endpoint Non-Carcinogen Carcinogen Carcinogen
Molecular Weight: 359.44091 Distance 0.594 0.610 0.613
ALogP: 5.163 Reference US FDA (Centre for Drug US FDA (Centre for Drug US FDA (Centre for Drug
Rotatable Bonds: 6 Eval.& Res./Off. Testing & Eval.& Res./Off. Testing & Eval.& Res./Off. Testing &
Res.) Sept. 1997 Res.) Sept. 1997 Res.) Sept. 1997
Acceptors: 4
Donors: 1
Model Applicability
Unknown features are fingerprint features in the query molecule, but not found or appearing too infrequently
Model Prediction in the training set.
Prediction: Non-Carcinogen
1. All properties and OPS components are within expected ranges.
Probability: 0.228
Enrichment: 0.775
Bayesian Score: -3.37 Feature Contribution
Mahalanobis Distance: 12.6 Top features for positive contribution
Mahalanobis Distance p-value: 0.000933 Fingerprint Bit/Smiles Feature Structure Score Carcinogen in
Prediction: Positive if the Bayesian score is above the estimated training set
best cutoff value from minimizing the false positive and false
negative rate. FCFP_6 451371068 0.439 3 out of 6
Probability: The esimated probability that the sample is in the
positive category. This assumes that the Bayesian score follows
a normal distribution and is different from the prediction using a
cutoff.
Enrichment: An estimate of enrichment, that is, the increased
likelihood (versus random) of this sample being in the category.
Bayesian Score: The standard Laplacian-modified Bayesian
score.
Mahalanobis Distance: The Mahalanobis distance (MD) is the
distance to the center of the training data. The larger the MD, the
less trustworthy the prediction.
Mahalanobis Distance p-value: The p-value gives the fraction of
training data with an MD greater than or equal to the one for the
given sample, assuming normally distributed data. The smaller
the p-value, the less trustworthy the prediciton. For highly non-
normal X properties (e.g., fingerprints), the MD p-value is wildly
inaccurate.
FCFP_6 1115242110 0.38 2 out of 4

FCFP_6 -1151884458 0.348 6 out of 15

Top Features for negative contribution


Fingerprint Bit/Smiles Feature Structure Score Carcinogen in
training set
FCFP_6 -497728148 -0.96 2 out of 26

FCFP_6 -1038421835 -0.719 0 out of 4

FCFP_6 1028934530 -0.596 1 out of 10


Molecule TOPKAT_Mouse_Male_NTP
Structural Similar Compounds
Name CHLORBENZILATE RHODAMINE 6G Tricresyl Phosphate
Structure

Actual Endpoint Carcinogen Non-Carcinogen Non-Carcinogen


C23H23N2O2[?] Predicted Endpoint Carcinogen Non-Carcinogen Non-Carcinogen
Molecular Weight: 359.44091 Distance 0.595 0.622 0.628
ALogP: 5.163 Reference NTP/TR-75 NTP/TR-364 NTP433
Rotatable Bonds: 6
Acceptors: 4 Model Applicability
Donors: 1 Unknown features are fingerprint features in the query molecule, but not found or appearing too infrequently
in the training set.
Model Prediction 1. OPS PC15 out of range. Value: 3.1966. Training min, max, SD, explained variance: -3.6063,
Prediction: Carcinogen 2.7452, 1.167, 0.0219.
Probability: 0.718
Enrichment: 1.82 Feature Contribution
Bayesian Score: 3.05
Top features for positive contribution
Mahalanobis Distance: 12.1
Fingerprint Bit/Smiles Feature Structure Score Carcinogen in
Mahalanobis Distance p-value: 1.03e-006 training set
Prediction: Positive if the Bayesian score is above the estimated
best cutoff value from minimizing the false positive and false SCFP_12 392579710 0.638 3 out of 3
negative rate.
Probability: The esimated probability that the sample is in the
positive category. This assumes that the Bayesian score follows
a normal distribution and is different from the prediction using a
cutoff.
Enrichment: An estimate of enrichment, that is, the increased
likelihood (versus random) of this sample being in the category.
Bayesian Score: The standard Laplacian-modified Bayesian
score.
Mahalanobis Distance: The Mahalanobis distance (MD) is the
distance to the center of the training data. The larger the MD, the
less trustworthy the prediction.
Mahalanobis Distance p-value: The p-value gives the fraction of
training data with an MD greater than or equal to the one for the
given sample, assuming normally distributed data. The smaller
the p-value, the less trustworthy the prediciton. For highly non-
normal X properties (e.g., fingerprints), the MD p-value is wildly
inaccurate.
SCFP_12 2033992841 0.377 1 out of 1

SCFP_12 1987936786 0.377 1 out of 1

Top Features for negative contribution


Fingerprint Bit/Smiles Feature Structure Score Carcinogen in
training set
SCFP_12 367120510 -0.555 0 out of 2

SCFP_12 1498989769 -0.316 0 out of 1

SCFP_12 -534368618 -0.316 0 out of 1


Molecule TOPKAT_Ocular_Irritancy_Mild_vs_Moderate_Severe
Structural Similar Compounds
Name BENZILIC ACID; 4;4'- 1-BENZOYLAMINO-4- Benzophenone; 4-(2-
DICHLORO-; ISOPROPYL METHOXY-5- ethylhexyloxy)-2-hydroxy-
ESTER CHLORANTHRAQUINONE
Structure

C23H23N2O2[?]
Actual Endpoint Moderate_Severe Mild Mild
Molecular Weight: 359.44091
Predicted Endpoint Moderate_Severe Mild Mild
ALogP: 5.163
Distance 0.577 0.621 0.625
Rotatable Bonds: 6
Reference CIGET* -;-;77 28ZPAK-;90;72 Prehled Prumyslove
Acceptors: 4 Toxikologie; Organicke
Donors: 1 Latky; Marhold; J. pp
648;86

Model Prediction
Model Applicability
Prediction: Mild
Unknown features are fingerprint features in the query molecule, but not found or appearing too infrequently
Probability: 0.543 in the training set.
Enrichment: 0.788
1. All properties and OPS components are within expected ranges.
Bayesian Score: -5.69 2. Unknown FCFP_2 feature: -1151884458: ['?'][c](:[*]):[c](N):n:[*]
Mahalanobis Distance: 7.12
Mahalanobis Distance p-value: 0.997
Prediction: Positive if the Bayesian score is above the estimated
Feature Contribution
best cutoff value from minimizing the false positive and false
negative rate. Top features for positive contribution
Probability: The esimated probability that the sample is in the Fingerprint Bit/Smiles Feature Structure Score Moderate_Severe
positive category. This assumes that the Bayesian score follows
a normal distribution and is different from the prediction using a in training set
cutoff.
Enrichment: An estimate of enrichment, that is, the increased FCFP_10 -497728148 0.356 24 out of 25
likelihood (versus random) of this sample being in the category.
Bayesian Score: The standard Laplacian-modified Bayesian
score.
Mahalanobis Distance: The Mahalanobis distance (MD) is the
distance to the center of the training data. The larger the MD, the
less trustworthy the prediction.
Mahalanobis Distance p-value: The p-value gives the fraction of
training data with an MD greater than or equal to the one for the
given sample, assuming normally distributed data. The smaller
the p-value, the less trustworthy the prediciton. For highly non-
normal X properties (e.g., fingerprints), the MD p-value is wildly
inaccurate.
FCFP_10 1028934530 0.256 2 out of 2

FCFP_10 136120670 0.206 53 out of 65

Top Features for negative contribution


Fingerprint Bit/Smiles Feature Structure Score Moderate_Severe
in training set
FCFP_10 -1748394506 -0.842 0 out of 2

FCFP_10 1771501788 -0.842 0 out of 2

FCFP_10 -1431222223 -0.842 0 out of 2


Molecule TOPKAT_Ocular_Irritancy_None_vs_Irritant
Structural Similar Compounds
Name BENZILIC ACID; 4;4'- 1-BENZOYLAMINO-4- Benzophenone; 4-(2-
DICHLORO-; ISOPROPYL METHOXY-5- ethylhexyloxy)-2-hydroxy-
ESTER CHLORANTHRAQUINONE
Structure

C23H23N2O2[?]
Actual Endpoint Irritant Irritant Irritant
Molecular Weight: 359.44091
Predicted Endpoint Irritant Irritant Non-Irritant
ALogP: 5.163
Distance 0.573 0.613 0.616
Rotatable Bonds: 6
Reference CIGET* -;-;77 28ZPAK-;90;72 Prehled Prumyslove
Acceptors: 4 Toxikologie; Organicke
Donors: 1 Latky; Marhold; J. pp
648;86

Model Prediction
Model Applicability
Prediction: Irritant
Unknown features are fingerprint features in the query molecule, but not found or appearing too infrequently
Probability: 0.976 in the training set.
Enrichment: 1.15
1. All properties and OPS components are within expected ranges.
Bayesian Score: 0.142 2. Unknown FCFP_2 feature: -1151884458: ['?'][c](:[*]):[c](N):n:[*]
Mahalanobis Distance: 6.84
Mahalanobis Distance p-value: 0.999
Prediction: Positive if the Bayesian score is above the estimated
Feature Contribution
best cutoff value from minimizing the false positive and false
negative rate. Top features for positive contribution
Probability: The esimated probability that the sample is in the Fingerprint Bit/Smiles Feature Structure Score Irritant in training
positive category. This assumes that the Bayesian score follows
a normal distribution and is different from the prediction using a set
cutoff.
Enrichment: An estimate of enrichment, that is, the increased FCFP_12 1747237384 0.208 44 out of 44
likelihood (versus random) of this sample being in the category.
Bayesian Score: The standard Laplacian-modified Bayesian
score.
Mahalanobis Distance: The Mahalanobis distance (MD) is the
distance to the center of the training data. The larger the MD, the
less trustworthy the prediction.
Mahalanobis Distance p-value: The p-value gives the fraction of
training data with an MD greater than or equal to the one for the
given sample, assuming normally distributed data. The smaller
the p-value, the less trustworthy the prediciton. For highly non-
normal X properties (e.g., fingerprints), the MD p-value is wildly
inaccurate.
FCFP_12 17 0.189 48 out of 49

FCFP_12 136120670 0.151 65 out of 69

Top Features for negative contribution


Fingerprint Bit/Smiles Feature Structure Score Irritant in training
set
FCFP_12 -1078052987 -0.344 2 out of 4

FCFP_12 690511177 -0.268 1 out of 2

FCFP_12 451371068 -0.167 6 out of 9


Molecule TOPKAT_Rat_Female_FDA_None_vs_Carcinogen
Structural Similar Compounds
Name Indomethacin Simvastatin Haloperidol
Structure

Actual Endpoint Non-Carcinogen Carcinogen Non-Carcinogen


C23H23N2O2[?] Predicted Endpoint Non-Carcinogen Carcinogen Non-Carcinogen
Molecular Weight: 359.44091 Distance 0.641 0.645 0.646
ALogP: 5.163 Reference US FDA (Centre for Drug US FDA (Centre for Drug US FDA (Centre for Drug
Rotatable Bonds: 6 Eval.& Res./Off. Testing & Eval.& Res./Off. Testing & Eval.& Res./Off. Testing &
Res.) Sept. 1997 Res.) Sept. 1997 Res.) Sept. 1997
Acceptors: 4
Donors: 1
Model Applicability
Unknown features are fingerprint features in the query molecule, but not found or appearing too infrequently
Model Prediction in the training set.
Prediction: Non-Carcinogen
1. OPS PC19 out of range. Value: 3.9096. Training min, max, SD, explained variance: -4.1681,
Probability: 0.207 3.4693, 1.279, 0.0172.
Enrichment: 0.644 2. Unknown ECFP_2 feature: -1305021906: [*]['?']
Bayesian Score: -6.36 3. Unknown ECFP_2 feature: 1335702447: ['?'][c](:[*]):[c](C=[*]):[cH]:[*]
Mahalanobis Distance: 11.4 4. Unknown ECFP_2 feature: -1659020767: [*][c](:[*]):[c](['?']):[c]([*]):[*]
Mahalanobis Distance p-value: 0.0256 5. Unknown ECFP_2 feature: 432684389: ['?'][c](:[*]):[*]
Prediction: Positive if the Bayesian score is above the estimated
6. Unknown ECFP_2 feature: -176483725: [*]=C[c](:[cH]:[*]):[cH]:[*]
best cutoff value from minimizing the false positive and false
negative rate.
Probability: The esimated probability that the sample is in the
positive category. This assumes that the Bayesian score follows
Feature Contribution
a normal distribution and is different from the prediction using a Top features for positive contribution
cutoff.
Enrichment: An estimate of enrichment, that is, the increased Fingerprint Bit/Smiles Feature Structure Score Carcinogen in
likelihood (versus random) of this sample being in the category.
Bayesian Score: The standard Laplacian-modified Bayesian training set
score.
Mahalanobis Distance: The Mahalanobis distance (MD) is the
distance to the center of the training data. The larger the MD, the
less trustworthy the prediction.
Mahalanobis Distance p-value: The p-value gives the fraction of
training data with an MD greater than or equal to the one for the
given sample, assuming normally distributed data. The smaller
the p-value, the less trustworthy the prediciton. For highly non-
normal X properties (e.g., fingerprints), the MD p-value is wildly
inaccurate.
ECFP_12 -1925046727 0.407 16 out of 33

ECFP_12 -428002189 0.208 1 out of 2

ECFP_12 734603939 0.0966 92 out of 267

Top Features for negative contribution


Fingerprint Bit/Smiles Feature Structure Score Carcinogen in
training set
ECFP_12 2077607946 -1.25 0 out of 8

ECFP_12 767488533 -0.941 0 out of 5


ECFP_12 -468366781 -0.941 0 out of 5
Molecule TOPKAT_Rat_Female_NTP
Structural Similar Compounds
Name Rhodamine 6G Tricresyl Phosphate 1-TRANS-DELTA(9)-
TETRAHYDROCANNABIN
OL
Structure

C23H23N2O2[?]
Actual Endpoint Carcinogen Non-Carcinogen Non-Carcinogen
Molecular Weight: 359.44091
Predicted Endpoint Carcinogen Non-Carcinogen Non-Carcinogen
ALogP: 5.163
Distance 0.622 0.626 0.673
Rotatable Bonds: 6
Reference NTP364 NTP433 TR-446
Acceptors: 4
Donors: 1
Model Applicability
Unknown features are fingerprint features in the query molecule, but not found or appearing too infrequently
Model Prediction in the training set.
Prediction: Non-Carcinogen
1. OPS PC11 out of range. Value: 3.8229. Training min, max, SD, explained variance: -2.3897,
Probability: 0.438 3.1905, 1.314, 0.0302.
Enrichment: 0.961
Bayesian Score: -2.82
Feature Contribution
Mahalanobis Distance: 10.9
Mahalanobis Distance p-value: 6.13e-005 Top features for positive contribution
Prediction: Positive if the Bayesian score is above the estimated Fingerprint Bit/Smiles Feature Structure Score Carcinogen in
best cutoff value from minimizing the false positive and false training set
negative rate.
Probability: The esimated probability that the sample is in the FCFP_12 -1861645784 0.387 6 out of 9
positive category. This assumes that the Bayesian score follows
a normal distribution and is different from the prediction using a
cutoff.
Enrichment: An estimate of enrichment, that is, the increased
likelihood (versus random) of this sample being in the category.
Bayesian Score: The standard Laplacian-modified Bayesian
score.
Mahalanobis Distance: The Mahalanobis distance (MD) is the
distance to the center of the training data. The larger the MD, the
less trustworthy the prediction.
Mahalanobis Distance p-value: The p-value gives the fraction of
training data with an MD greater than or equal to the one for the
given sample, assuming normally distributed data. The smaller
the p-value, the less trustworthy the prediciton. For highly non-
normal X properties (e.g., fingerprints), the MD p-value is wildly
inaccurate.
FCFP_12 342000123 0.344 1 out of 1

FCFP_12 -620155118 0.324 5 out of 8

Top Features for negative contribution


Fingerprint Bit/Smiles Feature Structure Score Carcinogen in
training set
FCFP_12 1771501788 -0.607 0 out of 2

FCFP_12 907007053 -0.487 5 out of 21

FCFP_12 451371068 -0.461 2 out of 9


Molecule TOPKAT_Rat_Male_FDA_None_vs_Carcinogen
Structural Similar Compounds
Name Indomethacin Nafenopin Haloperidol
Structure

Actual Endpoint Non-Carcinogen Carcinogen Non-Carcinogen


C23H23N2O2[?] Predicted Endpoint Non-Carcinogen Carcinogen Non-Carcinogen
Molecular Weight: 359.44091 Distance 0.599 0.611 0.612
ALogP: 5.163 Reference US FDA (Centre for Drug US FDA (Centre for Drug US FDA (Centre for Drug
Rotatable Bonds: 6 Eval.& Res./Off. Testing & Eval.& Res./Off. Testing & Eval.& Res./Off. Testing &
Res.) Sept. 1997 Res.) Sept. 1997 Res.) Sept. 1997
Acceptors: 4
Donors: 1
Model Applicability
Unknown features are fingerprint features in the query molecule, but not found or appearing too infrequently
Model Prediction in the training set.
Prediction: Non-Carcinogen
1. All properties and OPS components are within expected ranges.
Probability: 0.327
Enrichment: 0.979
Bayesian Score: -1.05 Feature Contribution
Mahalanobis Distance: 13.3 Top features for positive contribution
Mahalanobis Distance p-value: 0.00056 Fingerprint Bit/Smiles Feature Structure Score Carcinogen in
Prediction: Positive if the Bayesian score is above the estimated training set
best cutoff value from minimizing the false positive and false
negative rate. SCFP_6 -1971137145 0.434 5 out of 9
Probability: The esimated probability that the sample is in the
positive category. This assumes that the Bayesian score follows
a normal distribution and is different from the prediction using a
cutoff.
Enrichment: An estimate of enrichment, that is, the increased
likelihood (versus random) of this sample being in the category.
Bayesian Score: The standard Laplacian-modified Bayesian
score.
Mahalanobis Distance: The Mahalanobis distance (MD) is the
distance to the center of the training data. The larger the MD, the
less trustworthy the prediction.
Mahalanobis Distance p-value: The p-value gives the fraction of
training data with an MD greater than or equal to the one for the
given sample, assuming normally distributed data. The smaller
the p-value, the less trustworthy the prediciton. For highly non-
normal X properties (e.g., fingerprints), the MD p-value is wildly
inaccurate.
SCFP_6 392579710 0.425 2 out of 3

SCFP_6 2010506287 0.415 1 out of 1

Top Features for negative contribution


Fingerprint Bit/Smiles Feature Structure Score Carcinogen in
training set
SCFP_6 -1211866396 -1.1 2 out of 25

SCFP_6 -1272709286 -0.459 12 out of 61

SCFP_6 -1850396224 -0.278 0 out of 1


Molecule TOPKAT_Rat_Male_NTP
Structural Similar Compounds
Name Rhodamine 6G Tricresyl Phosphate Lithocholic Acid
Structure

Actual Endpoint Carcinogen Non-Carcinogen Non-Carcinogen


C23H23N2O2[?] Predicted Endpoint Carcinogen Non-Carcinogen Non-Carcinogen
Molecular Weight: 359.44091 Distance 0.639 0.656 0.693
ALogP: 5.163 Reference NTP364 NTP/TR-433 NTP/TR-175
Rotatable Bonds: 6
Acceptors: 4 Model Applicability
Donors: 1 Unknown features are fingerprint features in the query molecule, but not found or appearing too infrequently
in the training set.
Model Prediction 1. All properties and OPS components are within expected ranges.
Prediction: Non-Carcinogen 2. Unknown ECFP_2 feature: -1305021906: [*]['?']
Probability: 0.513 3. Unknown ECFP_2 feature: -428002189: [*]:[cH]:[c](:n:[*])[c](:[*]):[*]
Enrichment: 1.01 4. Unknown ECFP_2 feature: -1659020767: [*][c](:[*]):[c](['?']):[c]([*]):[*]
Bayesian Score: -2.43 5. Unknown ECFP_2 feature: 432684389: ['?'][c](:[*]):[*]
Mahalanobis Distance: 9.92
Mahalanobis Distance p-value: 0.0013 Feature Contribution
Prediction: Positive if the Bayesian score is above the estimated Top features for positive contribution
best cutoff value from minimizing the false positive and false
negative rate. Fingerprint Bit/Smiles Feature Structure Score Carcinogen in
Probability: The esimated probability that the sample is in the
positive category. This assumes that the Bayesian score follows training set
a normal distribution and is different from the prediction using a ECFP_12 -219423964 0.575 7 out of 7
cutoff.
Enrichment: An estimate of enrichment, that is, the increased
likelihood (versus random) of this sample being in the category.
Bayesian Score: The standard Laplacian-modified Bayesian
score.
Mahalanobis Distance: The Mahalanobis distance (MD) is the
distance to the center of the training data. The larger the MD, the
less trustworthy the prediction.
Mahalanobis Distance p-value: The p-value gives the fraction of
training data with an MD greater than or equal to the one for the
given sample, assuming normally distributed data. The smaller
the p-value, the less trustworthy the prediciton. For highly non-
normal X properties (e.g., fingerprints), the MD p-value is wildly
inaccurate.
ECFP_12 -181568884 0.511 9 out of 10

ECFP_12 -468366781 0.405 2 out of 2

Top Features for negative contribution


Fingerprint Bit/Smiles Feature Structure Score Carcinogen in
training set
ECFP_12 1335702447 -0.916 0 out of 3

ECFP_12 -1831055759 -0.693 1 out of 6

ECFP_12 -176483725 -0.693 1 out of 6


Molecule TOPKAT_Skin_Irritancy_Mild_vs_Moderate_Severe
Structural Similar Compounds
Name Aniline, 2,4-bis(o- Benzophenone, 4-(2- Anthraquinone, 1-(2,4,6-
methylphenoxy)- ethylhexyloxy)-2-hydroxy- trimethylphenylamino)-
Structure

C23H23N2O2[?] Actual Endpoint Mild Mild Mild


Molecular Weight: 359.44091 Predicted Endpoint Mild Mild Mild
ALogP: 5.163 Distance 0.634 0.662 0.686
Rotatable Bonds: 6 Reference 85JCAE "Prehled 85JCAE "Prehled 28ZPAK "Sbornik
Prumyslove Toxikologie; Prumyslove Toxikologie; Vysledku Toxixologickeho
Acceptors: 4 Organicke Latky," Organicke Latky," Vysetreni Latek A
Donors: 1 Marhold, J., Prague , Marhold, J., Prague , Pripravku," Marhol d, J.V.,
Czechoslovakia, Czechoslovakia, Institut Pro Vychovu
Avicenum, 1986 Avicenum, 1986 Vedoucicn Pracovniku
Model Prediction Volume(issue)/page/year:
-,725,1986
Volume(issue)/page/year:
-,648,1986
Chemickeho Prumyclu
Praha, Cz echoslovakia,
Prediction: Mild 1972
Probability: 0.179 Volume(issue)/page/year:
-,242,1
Enrichment: 0.485
Bayesian Score: -5.75 Model Applicability
Mahalanobis Distance: 6.89 Unknown features are fingerprint features in the query molecule, but not found or appearing too infrequently
Mahalanobis Distance p-value: 0.992 in the training set.
Prediction: Positive if the Bayesian score is above the estimated
best cutoff value from minimizing the false positive and false 1. All properties and OPS components are within expected ranges.
negative rate.
Probability: The esimated probability that the sample is in the
positive category. This assumes that the Bayesian score follows
a normal distribution and is different from the prediction using a Feature Contribution
cutoff.
Enrichment: An estimate of enrichment, that is, the increased Top features for positive contribution
likelihood (versus random) of this sample being in the category.
Bayesian Score: The standard Laplacian-modified Bayesian Fingerprint Bit/Smiles Feature Structure Score Moderate_Severe
score. in training set
Mahalanobis Distance: The Mahalanobis distance (MD) is the
distance to the center of the training data. The larger the MD, the
less trustworthy the prediction.
Mahalanobis Distance p-value: The p-value gives the fraction of
training data with an MD greater than or equal to the one for the
given sample, assuming normally distributed data. The smaller
the p-value, the less trustworthy the prediciton. For highly non-
normal X properties (e.g., fingerprints), the MD p-value is wildly
inaccurate.
FCFP_12 -1977641857 0.416 18 out of 32

FCFP_12 -1151884458 0.385 1 out of 1

FCFP_12 342000123 0.385 1 out of 1

Top Features for negative contribution


Fingerprint Bit/Smiles Feature Structure Score Moderate_Severe
in training set
FCFP_12 1771501788 -1.15 0 out of 6

FCFP_12 -1748394506 -0.893 0 out of 4


FCFP_12 -1431222223 -0.893 0 out of 4
Molecule TOPKAT_Skin_Irritancy_None_vs_Irritant
Structural Similar Compounds
Name 1-Piperazineacetic acid, 4- Aniline, 2,4-bis(o- Benzophenone, 4-(2-
(2-hydroxyethyl)-alpha- methylphenoxy)- ethylhexyloxy)-2-hydroxy-
phenyl-, 2,6-xyly l ester,
monohydrochloride
Structure

C23H23N2O2[?]
Molecular Weight: 359.44091 Actual Endpoint Irritant Irritant Irritant
ALogP: 5.163 Predicted Endpoint Irritant Non-Irritant Non-Irritant
Rotatable Bonds: 6 Distance 0.608 0.642 0.669
Acceptors: 4 Reference BCFAAI Bollettino 85JCAE "Prehled 85JCAE "Prehled
Donors: 1 Chimico Farmaceutico. Prumyslove Toxikologie; Prumyslove Toxikologie;
(Societa Editoriale Organicke Latky," Organicke Latky,"
Farmaceutica, Vi a Marhold, J., Prague , Marhold, J., Prague ,
Model Prediction Ausonio 12, 20123 Milan,
Italy) V.33- 1894-
Czechoslovakia,
Avicenum, 1986
Czechoslovakia,
Avicenum, 1986
Prediction: Irritant Volume(issue)/page/year: Volume(issue)/page/year: Volume(issue)/page/year:
Probability: 0.973 107,3 10,1968 -,725,1986 -,648,1986

Enrichment: 1.06
Bayesian Score: -0.742
Model Applicability
Mahalanobis Distance: 6.82 Unknown features are fingerprint features in the query molecule, but not found or appearing too infrequently
in the training set.
Mahalanobis Distance p-value: 0.998
Prediction: Positive if the Bayesian score is above the estimated 1. All properties and OPS components are within expected ranges.
best cutoff value from minimizing the false positive and false
negative rate.
Probability: The esimated probability that the sample is in the
positive category. This assumes that the Bayesian score follows
Feature Contribution
a normal distribution and is different from the prediction using a
cutoff. Top features for positive contribution
Enrichment: An estimate of enrichment, that is, the increased Fingerprint Bit/Smiles Feature Structure Score Irritant in training
likelihood (versus random) of this sample being in the category.
Bayesian Score: The standard Laplacian-modified Bayesian set
score.
Mahalanobis Distance: The Mahalanobis distance (MD) is the
distance to the center of the training data. The larger the MD, the
less trustworthy the prediction.
Mahalanobis Distance p-value: The p-value gives the fraction of
training data with an MD greater than or equal to the one for the
given sample, assuming normally distributed data. The smaller
the p-value, the less trustworthy the prediciton. For highly non-
normal X properties (e.g., fingerprints), the MD p-value is wildly
inaccurate.
FCFP_12 -1038421835 0.0795 9 out of 9

FCFP_12 1771501788 0.0772 7 out of 7

FCFP_12 -1405834164 0.0734 5 out of 5

Top Features for negative contribution


Fingerprint Bit/Smiles Feature Structure Score Irritant in training
set
FCFP_12 1069584379 -0.439 38 out of 65

FCFP_12 -1861645784 -0.125 12 out of 15


FCFP_12 -620155118 -0.125 12 out of 15
Molecule TOPKAT_Skin_Sensitization_None_vs_Sensitizer
Structural Similar Compounds
Name 4;4'-Isopropylidene 4-tert-butyl-4'- 2-Hydroxy-2-n-
diphenol methoxybenzoylmethane octoxybenzophenone
Structure

C23H23N2O2[?] Actual Endpoint Sensitizer Sensitizer Sensitizer


Molecular Weight: 359.44091 Predicted Endpoint Sensitizer Sensitizer Sensitizer
ALogP: 5.163 Distance 0.655 0.670 0.680
Rotatable Bonds: 6 Reference Howard I Maibach (priv SAR and QSAR in Env SAR and QSAR in Env
comm) Res (1994) 2:159 Res (1994) 2:159
Acceptors: 4
Donors: 1
Model Applicability
Unknown features are fingerprint features in the query molecule, but not found or appearing too infrequently
Model Prediction in the training set.
Prediction: Sensitizer
1. All properties and OPS components are within expected ranges.
Probability: 0.884
2. Unknown FCFP_2 feature: -1861645784: [*]:[cH]:[c](:[cH]:[*])[c](:[*]):[*]
Enrichment: 1.29 3. Unknown FCFP_2 feature: 690511177: [*]:[cH]:[c](:n:[*])[c](:[*]):[*]
Bayesian Score: 2.37
Mahalanobis Distance: 6.68
Feature Contribution
Mahalanobis Distance p-value: 0.454
Prediction: Positive if the Bayesian score is above the estimated Top features for positive contribution
best cutoff value from minimizing the false positive and false Fingerprint Bit/Smiles Feature Structure Score Sensitizer in
negative rate.
Probability: The esimated probability that the sample is in the training set
positive category. This assumes that the Bayesian score follows
a normal distribution and is different from the prediction using a FCFP_12 -497728148 0.304 15 out of 15
cutoff.
Enrichment: An estimate of enrichment, that is, the increased
likelihood (versus random) of this sample being in the category.
Bayesian Score: The standard Laplacian-modified Bayesian
score.
Mahalanobis Distance: The Mahalanobis distance (MD) is the
distance to the center of the training data. The larger the MD, the
less trustworthy the prediction.
Mahalanobis Distance p-value: The p-value gives the fraction of
training data with an MD greater than or equal to the one for the
given sample, assuming normally distributed data. The smaller
the p-value, the less trustworthy the prediciton. For highly non-
normal X properties (e.g., fingerprints), the MD p-value is wildly
inaccurate.
FCFP_12 -1038421835 0.266 5 out of 5

FCFP_12 907007053 0.254 17 out of 18

Top Features for negative contribution


Fingerprint Bit/Smiles Feature Structure Score Sensitizer in
training set
FCFP_12 136120670 -0.245 15 out of 27

FCFP_12 3 -0.0947 89 out of 136

FCFP_12 136597326 -0.092 128 out of 195


Molecule TOPKAT_Skin_Sensitization_Weak_vs_Strong
Structural Similar Compounds
Name 4-tert-butyl-4'- 4;4'-Isopropylidene Sodium 3;5;5-
methoxybenzoylmethane diphenol Trimethylhexanoyloxyben
zene sulfonate
Structure

C23H23N2O2[?]
Actual Endpoint Strong-Sensitizer Strong-Sensitizer Strong-Sensitizer
Molecular Weight: 359.44091
Predicted Endpoint Strong-Sensitizer Strong-Sensitizer Strong-Sensitizer
ALogP: 5.163
Distance 0.663 0.675 0.708
Rotatable Bonds: 6
Reference SAR and QSAR in Env Howard I Maibach (priv SAR and QSAR in Env
Acceptors: 4 Res (1994) 2:159 comm) Res (1994) 2:159
Donors: 1
Model Applicability
Model Prediction Unknown features are fingerprint features in the query molecule, but not found or appearing too infrequently
Prediction: Strong-Sensitizer in the training set.
Probability: 0.992 1. All properties and OPS components are within expected ranges.
Enrichment: 1.28 2. Unknown FCFP_2 feature: -1861645784: [*]:[cH]:[c](:[cH]:[*])[c](:[*]):[*]
Bayesian Score: 4.26 3. Unknown FCFP_2 feature: 690511177: [*]:[cH]:[c](:n:[*])[c](:[*]):[*]
Mahalanobis Distance: 6.5
Mahalanobis Distance p-value: 0.243 Feature Contribution
Prediction: Positive if the Bayesian score is above the estimated
best cutoff value from minimizing the false positive and false Top features for positive contribution
negative rate.
Probability: The esimated probability that the sample is in the Fingerprint Bit/Smiles Feature Structure Score Strong-Sensitizer
positive category. This assumes that the Bayesian score follows in training set
a normal distribution and is different from the prediction using a
cutoff. FCFP_12 16 0.232 165 out of 165
Enrichment: An estimate of enrichment, that is, the increased
likelihood (versus random) of this sample being in the category.
Bayesian Score: The standard Laplacian-modified Bayesian
score.
Mahalanobis Distance: The Mahalanobis distance (MD) is the
distance to the center of the training data. The larger the MD, the
less trustworthy the prediction.
Mahalanobis Distance p-value: The p-value gives the fraction of
training data with an MD greater than or equal to the one for the
given sample, assuming normally distributed data. The smaller
the p-value, the less trustworthy the prediciton. For highly non-
normal X properties (e.g., fingerprints), the MD p-value is wildly
inaccurate.
FCFP_12 1618154665 0.232 164 out of 164

FCFP_12 203677720 0.232 139 out of 139

Top Features for negative contribution


Fingerprint Bit/Smiles Feature Structure Score Strong-Sensitizer
in training set
FCFP_12 136597326 -0.239 74 out of 119

FCFP_12 3 -0.131 61 out of 88

FCFP_12 0 0 186 out of 244


Molecule TOPKAT_Weight_of_Evidence_Rodent_Carcinogenicity
Structural Similar Compounds
Name Chlorbenzilate Indomethacin Nafenopin
Structure

Actual Endpoint Carcinogen Non-Carcinogen Carcinogen


C23H23N2O2[?] Predicted Endpoint Carcinogen Non-Carcinogen Carcinogen
Molecular Weight: 359.44091 Distance 0.608 0.608 0.620
ALogP: 5.163 Reference NCI/NTP TR-75 US FDA (Centre for Drug US FDA (Centre for Drug
Rotatable Bonds: 6 Eval.& Res./Off. Testing & Eval.& Res./Off. Testing &
Res.) Sept. 1997 Res.) Sept. 1997
Acceptors: 4
Donors: 1
Model Applicability
Unknown features are fingerprint features in the query molecule, but not found or appearing too infrequently
Model Prediction in the training set.
Prediction: Non-Carcinogen
1. All properties and OPS components are within expected ranges.
Probability: 0.429
Enrichment: 0.833
Bayesian Score: -2.91 Feature Contribution
Mahalanobis Distance: 6.39 Top features for positive contribution
Mahalanobis Distance p-value: 0.782 Fingerprint Bit/Smiles Feature Structure Score Carcinogen in
Prediction: Positive if the Bayesian score is above the estimated training set
best cutoff value from minimizing the false positive and false
negative rate. SCFP_8 384920865 0.332 37 out of 55
Probability: The esimated probability that the sample is in the
positive category. This assumes that the Bayesian score follows
a normal distribution and is different from the prediction using a
cutoff.
Enrichment: An estimate of enrichment, that is, the increased
likelihood (versus random) of this sample being in the category.
Bayesian Score: The standard Laplacian-modified Bayesian
score.
Mahalanobis Distance: The Mahalanobis distance (MD) is the
distance to the center of the training data. The larger the MD, the
less trustworthy the prediction.
Mahalanobis Distance p-value: The p-value gives the fraction of
training data with an MD greater than or equal to the one for the
given sample, assuming normally distributed data. The smaller
the p-value, the less trustworthy the prediciton. For highly non-
normal X properties (e.g., fingerprints), the MD p-value is wildly
inaccurate.
SCFP_8 392579710 0.318 3 out of 4

SCFP_8 -538866216 0.303 1 out of 1

Top Features for negative contribution


Fingerprint Bit/Smiles Feature Structure Score Carcinogen in
training set
SCFP_8 -1211866396 -0.685 6 out of 27

SCFP_8 -1850396224 -0.67 0 out of 2

SCFP_8 1242547645 -0.52 3 out of 12


Molecule TOPKAT_Carcinogenic_Potency_TD50_Mouse
Structural Similar Compounds
Name Clobuzarit s 2,5-Dimethoxy-4´- 865
aminostilbene
Structure

C23H23N2O2[?] Actual Endpoint (-log C) 3.29645 3.42525 3.42525


Molecular Weight: 359.44091 Predicted Endpoint (-log 3.52771 3.51812 3.51812
C)
ALogP: 5.163
Distance 0.684 0.687 0.687
Rotatable Bonds: 6
Reference CPDB CPDB CPDB
Acceptors: 4
Donors: 1
Model Applicability
Unknown features are fingerprint features in the query molecule, but not found or appearing too infrequently
Model Prediction in the training set.
Prediction: 4.46
1. OPS PC16 out of range. Value: 5.4505. Training min, max, SD, explained variance: -3.1026,
Unit: mg/kg_body_weight/day 4.016, 1.245, 0.0193.
Mahalanobis Distance: 12.7 2. Unknown ECFP_2 feature: -1305021906: [*]['?']
Mahalanobis Distance p-value: 5.21e-007 3. Unknown ECFP_2 feature: -1659020767: [*][c](:[*]):[c](['?']):[c]([*]):[*]
Mahalanobis Distance: The Mahalanobis distance (MD) is a 4. Unknown ECFP_2 feature: 432684389: ['?'][c](:[*]):[*]
generalization of the Euclidean distance that accounts for
correlations among the X properties. It is calculated as the
distance to the center of the training data. The larger the MD, the
less trustworthy the prediction. Feature Contribution
Mahalanobis Distance p-value: The p-value gives the fraction of
training data with an MD greater than or equal to the one for the Top features for positive contribution
given sample, assuming normally distributed data. The smaller
the p-value, the less trustworthy the prediciton. For highly non- Fingerprint Bit/Smiles Feature Structure Score
normal X properties (e.g., fingerprints), the MD p-value is wildly
inaccurate. ECFP_6 655739385 0.229
ECFP_6 1572579716 0.225

ECFP_6 1559650422 0.203

Top Features for negative contribution


Fingerprint Bit/Smiles Feature Structure Score
ECFP_6 1996767644 -0.251

ECFP_6 642810091 -0.247

ECFP_6 -182236392 -0.232


Molecule TOPKAT_Carcinogenic_Potency_TD50_Rat
Structural Similar Compounds
Name Indomethacin 3-(Cyclopentyloxy)-N-(3,5- FD & C violet no. 1
di-chloro-4-pyridyl)-4-
methoxy-benzamide
Structure

C23H23N2O2[?]
Actual Endpoint (-log C) 5.49293 5.39369 2.8543
Molecular Weight: 359.44091
Predicted Endpoint (-log 4.9569 4.27874 3.40838
ALogP: 5.163 C)
Rotatable Bonds: 6 Distance 0.563 0.564 0.629
Acceptors: 4 Reference CPDB CPDB CPDB
Donors: 1
Model Applicability
Model Prediction Unknown features are fingerprint features in the query molecule, but not found or appearing too infrequently
Prediction: 0.726 in the training set.
Unit: mg/kg_body_weight/day 1. OPS PC19 out of range. Value: -3.4242. Training min, max, SD, explained variance: -2.9709,
Mahalanobis Distance: 15.9 5.6065, 1.282, 0.0158.
Mahalanobis Distance p-value: 5.61e-014 2. OPS PC20 out of range. Value: -4.2597. Training min, max, SD, explained variance: -3.9266,
5.5565, 1.236, 0.0147.
Mahalanobis Distance: The Mahalanobis distance (MD) is a
generalization of the Euclidean distance that accounts for
correlations among the X properties. It is calculated as the
distance to the center of the training data. The larger the MD, the Feature Contribution
less trustworthy the prediction.
Mahalanobis Distance p-value: The p-value gives the fraction of
training data with an MD greater than or equal to the one for the
Top features for positive contribution
given sample, assuming normally distributed data. The smaller Fingerprint Bit/Smiles Feature Structure Score
the p-value, the less trustworthy the prediciton. For highly non-
normal X properties (e.g., fingerprints), the MD p-value is wildly FCFP_6 136627117 0.69
inaccurate.
FCFP_6 -1861645784 0.359

FCFP_6 690511177 0.293

Top Features for negative contribution


Fingerprint Bit/Smiles Feature Structure Score
FCFP_6 16 -0.354

FCFP_6 1674451008 -0.233

FCFP_6 17 -0.149
Molecule TOPKAT_Chronic_LOAEL
Structural Similar Compounds
Name ISOXABEN ASSURE RHODAMINE 6G
Structure

Actual Endpoint (-log C) 3.81665 5.00328 4.54906


C23H23N2O2[?] Predicted Endpoint (-log 4.42315 4.27671 4.6787
Molecular Weight: 359.44091 C)
ALogP: 5.163 Distance 0.598 0.600 0.626
Rotatable Bonds: 6 Reference EPA COVER SHEET EPA COVER SHEET NTP 364 39
0339;881201;(1) 0335;891001;(1)
Acceptors: 4
Donors: 1
Model Applicability
Unknown features are fingerprint features in the query molecule, but not found or appearing too infrequently
Model Prediction in the training set.
Prediction: 0.00717
1. All properties and OPS components are within expected ranges.
Unit: g/kg_body_weight
2. Unknown ECFP_6 feature: -1305021906: [*]['?']
Mahalanobis Distance: 27 3. Unknown ECFP_6 feature: -181568884: [*]:[cH]:[c](:[cH]:[*])[c](:[*]):[*]
Mahalanobis Distance p-value: 3.3e-019 4. Unknown ECFP_6 feature: -428002189: [*]:[cH]:[c](:n:[*])[c](:[*]):[*]
Mahalanobis Distance: The Mahalanobis distance (MD) is a 5. Unknown ECFP_6 feature: 1335702447: ['?'][c](:[*]):[c](C=[*]):[cH]:[*]
generalization of the Euclidean distance that accounts for
correlations among the X properties. It is calculated as the 6. Unknown ECFP_6 feature: -1734834311: ['?'][c](:[*]):[c](N):n:[*]
distance to the center of the training data. The larger the MD, the 7. Unknown ECFP_6 feature: -1659020767: [*][c](:[*]):[c](['?']):[c]([*]):[*]
less trustworthy the prediction.
Mahalanobis Distance p-value: The p-value gives the fraction of 8. Unknown ECFP_6 feature: 432684389: ['?'][c](:[*]):[*]
training data with an MD greater than or equal to the one for the 9. Unknown ECFP_6 feature: -938530932: [*]:[c](:[*])N
given sample, assuming normally distributed data. The smaller
the p-value, the less trustworthy the prediciton. For highly non- 10. Unknown ECFP_6 feature: 767488533: [*]:[c](:[*])CC
normal X properties (e.g., fingerprints), the MD p-value is wildly 11. Unknown ECFP_6 feature: -1831055759: [*]\C=C\[c](:[*]):[*]
inaccurate.
12. Unknown ECFP_6 feature: -176483725: [*]=C[c](:[cH]:[*]):[cH]:[*]
13. Unknown ECFP_6 feature: 1307307440: [*]:[c](:[*])OC

Feature Contribution
Top features for positive contribution
Fingerprint Bit/Smiles Feature Structure Score
ECFP_6 1559650422 0.129

FCFP_6 3 0.0924

ECFP_6 -1925046727 0.0915

Top Features for negative contribution


Fingerprint Bit/Smiles Feature Structure Score
FCFP_6 1 -0.102

FCFP_6 -453677277 -0.0906


FCFP_6 136597326 -0.0815
Molecule TOPKAT_Daphnia_EC50
Structural Similar Compounds
Name Tricresyl phosphate Pyriproxyfen Fenoxaprop-ethyl
Structure

Actual Endpoint (-log C) 5.0348 5.905 5.056


C23H23N2O2[?] Predicted Endpoint (-log 8.24599 5.61364 6.30845
Molecular Weight: 359.44091 C)
ALogP: 5.163 Distance 0.586 0.599 0.610
Rotatable Bonds: 6 Reference EPA EcoTox Database Toropov and Benfenati, Toropov and Benfenati,
2006, Bioorganic & 2006, Bioorganic &
Acceptors: 4 Medicinal Chemistry, Medicinal Chemistry,
Donors: 1 14(8), 2779-2788 14(8), 2779-2788

Model Prediction Model Applicability


Prediction: 0.655 Unknown features are fingerprint features in the query molecule, but not found or appearing too infrequently
in the training set.
Unit: mg/l
Mahalanobis Distance: 32.7 1. All properties and OPS components are within expected ranges.
2. Unknown ECFP_6 feature: -1305021906: [*]['?']
Mahalanobis Distance p-value: 8.12e-041
3. Unknown ECFP_6 feature: -181568884: [*]:[cH]:[c](:[cH]:[*])[c](:[*]):[*]
Mahalanobis Distance: The Mahalanobis distance (MD) is a
generalization of the Euclidean distance that accounts for 4. Unknown ECFP_6 feature: -428002189: [*]:[cH]:[c](:n:[*])[c](:[*]):[*]
correlations among the X properties. It is calculated as the 5. Unknown ECFP_6 feature: 1335702447: ['?'][c](:[*]):[c](C=[*]):[cH]:[*]
distance to the center of the training data. The larger the MD, the
less trustworthy the prediction. 6. Unknown ECFP_6 feature: -1734834311: ['?'][c](:[*]):[c](N):n:[*]
Mahalanobis Distance p-value: The p-value gives the fraction of 7. Unknown ECFP_6 feature: -1659020767: [*][c](:[*]):[c](['?']):[c]([*]):[*]
training data with an MD greater than or equal to the one for the
given sample, assuming normally distributed data. The smaller 8. Unknown ECFP_6 feature: 432684389: ['?'][c](:[*]):[*]
the p-value, the less trustworthy the prediciton. For highly non- 9. Unknown ECFP_6 feature: 767488533: [*]:[c](:[*])CC
normal X properties (e.g., fingerprints), the MD p-value is wildly
inaccurate. 10. Unknown ECFP_6 feature: -1831055759: [*]\C=C\[c](:[*]):[*]
11. Unknown ECFP_6 feature: -176483725: [*]=C[c](:[cH]:[*]):[cH]:[*]
12. Unknown ECFP_6 feature: 1307307440: [*]:[c](:[*])OC

Feature Contribution
Top features for positive contribution
Fingerprint Bit/Smiles Feature Structure Score
ECFP_6 -1059365320 0.165

ECFP_6 642810091 0.148

ECFP_6 1572579716 0.114

Top Features for negative contribution


Fingerprint Bit/Smiles Feature Structure Score
FCFP_6 0 -0.202

FCFP_6 136597326 -0.193


FCFP_6 17 -0.189
Molecule TOPKAT_Fathead_Minnow_LC50
Structural Similar Compounds
Name Diphenylphthalate Triphenyl phosphate Fenvalerate (test 2)
Structure

Actual Endpoint (-log C) 6.6 5.57512 9


C23H23N2O2[?] Predicted Endpoint (-log 6.4642 6.43312 7.86616
Molecular Weight: 359.44091 C)
ALogP: 5.163 Distance 0.621 0.662 0.756
Rotatable Bonds: 6 Reference ATOCFM Volume 2 DSSTox/EPAFHM ATOCFM Volume 4
Acceptors: 4
Donors: 1 Model Applicability
Unknown features are fingerprint features in the query molecule, but not found or appearing too infrequently
Model Prediction in the training set.
Prediction: 0.000386 1. All properties and OPS components are within expected ranges.
Unit: g/l
Mahalanobis Distance: 15.8 Feature Contribution
Mahalanobis Distance p-value: 9.55e-023 Top features for positive contribution
Mahalanobis Distance: The Mahalanobis distance (MD) is a
generalization of the Euclidean distance that accounts for Fingerprint Bit/Smiles Feature Structure Score
correlations among the X properties. It is calculated as the
distance to the center of the training data. The larger the MD, the FCFP_2 1036089772 0.119
less trustworthy the prediction.
Mahalanobis Distance p-value: The p-value gives the fraction of
training data with an MD greater than or equal to the one for the
given sample, assuming normally distributed data. The smaller
the p-value, the less trustworthy the prediciton. For highly non-
normal X properties (e.g., fingerprints), the MD p-value is wildly
inaccurate.
FCFP_2 1069584379 0.105

FCFP_2 136627117 0.0814

Top Features for negative contribution


Fingerprint Bit/Smiles Feature Structure Score
FCFP_2 1 -0.306

FCFP_2 0 -0.275

FCFP_2 136120670 -0.214


Molecule TOPKAT_Rat_Inhalational_LC50
Structural Similar Compounds
Name 1H-Benzimidazole; 5- 1H-1;2;4-Triazole-1- Ethanone; 2-((4-(2;4-
chloro-6-(2;3- propanenitrile; alpha-(2- dichloro-3-
dichlorophenoxy)-2- (4-chlorophenyl)ethyl) - methylbenzoyl)-1;3-
(methylthio)- alpha-phenyl- dimethyl-1H-pyra zol-5-
yl)oxy)- 1-(4-
methylphenyl)-
Structure

C23H23N2O2[?]
Molecular Weight: 359.44091
ALogP: 5.163
Actual Endpoint (-log C) 2.2548 1.6031 1.7472
Rotatable Bonds: 6
Predicted Endpoint (-log 1.69815 2.17352 2.15944
Acceptors: 4 C)
Donors: 1 Distance 0.620 0.683 0.685
Reference MDACAP Medicamentos PEMNDP Pesticide NNGADV Nippon Noyaku
Model Prediction de Actualidad. (J.R.
Prous; S.A.; Apartado de
Manual. (The British Crop
Protection Council; 20
Gakkaishi. Journal of the
Pesticide Science Society
Prediction: 1.49e+004 Correos 54 0; 08080 Bridport R d.; Thornton of Japan. (Nippon Noyaku
Unit: mg/m3/h Barcelona; Spain) V.1- Heath CR4 7QG; UK) Gakkai; 1-43-11;
1965- V.1- 1968- Komagome; Toshima-ku;
Mahalanobis Distance: 13.1 Volume(issue)/page/year: Volume(issue)/page/year: Tokyo 170; Japan) V.1-
21;227;1985 9;157;1 991 1976-
Mahalanobis Distance p-value: 3.83e-009 Volume(issue)/page/year:
Mahalanobis Distance: The Mahalanobis distance (MD) is a 15;125;1990
generalization of the Euclidean distance that accounts for
correlations among the X properties. It is calculated as the
distance to the center of the training data. The larger the MD, the
less trustworthy the prediction.
Model Applicability
Mahalanobis Distance p-value: The p-value gives the fraction of Unknown features are fingerprint features in the query molecule, but not found or appearing too infrequently
training data with an MD greater than or equal to the one for the
given sample, assuming normally distributed data. The smaller in the training set.
the p-value, the less trustworthy the prediciton. For highly non-
normal X properties (e.g., fingerprints), the MD p-value is wildly 1. OPS PC21 out of range. Value: -3.4664. Training min, max, SD, explained variance: -3.0247,
inaccurate. 4.4972, 1.058, 0.0155.
2. Unknown ECFP_2 feature: -1305021906: [*]['?']
3. Unknown ECFP_2 feature: -428002189: [*]:[cH]:[c](:n:[*])[c](:[*]):[*]
4. Unknown ECFP_2 feature: -1734834311: ['?'][c](:[*]):[c](N):n:[*]
5. Unknown ECFP_2 feature: -1659020767: [*][c](:[*]):[c](['?']):[c]([*]):[*]
6. Unknown ECFP_2 feature: 432684389: ['?'][c](:[*]):[*]

Feature Contribution
Top features for positive contribution
Fingerprint Bit/Smiles Feature Structure Score
ECFP_2 642810091 0.214

ECFP_2 1572579716 0.159

ECFP_2 1996767644 0.127

Top Features for negative contribution


Fingerprint Bit/Smiles Feature Structure Score
ECFP_2 863188371 -0.338
ECFP_2 734603939 -0.302

ECFP_2 655739385 -0.217


Molecule TOPKAT_Rat_Maximum_Tolerated_Dose_Feed
Structural Similar Compounds
Name CHLORBENZILATE C.I.PIGMENT RED 3 ROTENONE
Structure

Actual Endpoint (-log C) 3.38252 2.65635 5.06769


C23H23N2O2[?] Predicted Endpoint (-log 3.27894 2.97957 4.11907
Molecular Weight: 359.44091 C)
ALogP: 5.163 Distance 0.631 0.648 0.680
Rotatable Bonds: 6 Reference NCI/NTP TR-75 NCI/NTP TR-407 NCI/NTP TR-320
Acceptors: 4
Donors: 1 Model Applicability
Unknown features are fingerprint features in the query molecule, but not found or appearing too infrequently
Model Prediction in the training set.
Prediction: 0.0515 1. OPS PC14 out of range. Value: 3.4894. Training min, max, SD, explained variance: -2.0656,
3.3808, 1.011, 0.0231.
Unit: g/kg_body_weight
Mahalanobis Distance: 9.86
Mahalanobis Distance p-value: 7.42e-005 Feature Contribution
Mahalanobis Distance: The Mahalanobis distance (MD) is a
generalization of the Euclidean distance that accounts for
Top features for positive contribution
correlations among the X properties. It is calculated as the Fingerprint Bit/Smiles Feature Structure Score
distance to the center of the training data. The larger the MD, the
less trustworthy the prediction. FCFP_2 136627117 0.173
Mahalanobis Distance p-value: The p-value gives the fraction of
training data with an MD greater than or equal to the one for the
given sample, assuming normally distributed data. The smaller
the p-value, the less trustworthy the prediciton. For highly non-
normal X properties (e.g., fingerprints), the MD p-value is wildly
inaccurate.
FCFP_2 1036089772 0.0749

FCFP_2 3 0.0737

Top Features for negative contribution


Fingerprint Bit/Smiles Feature Structure Score
FCFP_2 203677720 -0.0829

FCFP_2 1 -0.0796

FCFP_2 16 -0.0512
Molecule TOPKAT_Rat_Maximum_Tolerated_Dose_Gavage
Structural Similar Compounds
Name PHENYLBUTAZONE PROMETHAZINE.HCL 8-METHOXYPSORALEN
Structure

Actual Endpoint (-log C) 3.48909 3.93152 3.45978


C23H23N2O2[?] Predicted Endpoint (-log 3.17333 4.72433 4.14745
Molecular Weight: 359.44091 C)
ALogP: 5.163 Distance 0.836 0.883 0.933
Rotatable Bonds: 6 Reference NCI/NTP TR-367 NCI/NTP TR-425 NCI/NTP TR-359
Acceptors: 4
Donors: 1 Model Applicability
Unknown features are fingerprint features in the query molecule, but not found or appearing too infrequently
Model Prediction in the training set.
Prediction: 0.00585 1. Num_AromaticRings out of range. Value: 3. Training min, max, mean, SD: 0, 2, 0.5625, 0.693.
Unit: g/kg_body_weight 2. OPS PC6 out of range. Value: -3.6356. Training min, max, SD, explained variance: -2.4321,
2.9885, 1.256, 0.0488.
Mahalanobis Distance: 8.61 3. Unknown FCFP_2 feature: -1861645784: [*]:[cH]:[c](:[cH]:[*])[c](:[*]):[*]
Mahalanobis Distance p-value: 0.000381 4. Unknown FCFP_2 feature: 690511177: [*]:[cH]:[c](:n:[*])[c](:[*]):[*]
Mahalanobis Distance: The Mahalanobis distance (MD) is a
generalization of the Euclidean distance that accounts for
correlations among the X properties. It is calculated as the
distance to the center of the training data. The larger the MD, the Feature Contribution
less trustworthy the prediction.
Mahalanobis Distance p-value: The p-value gives the fraction of Top features for positive contribution
training data with an MD greater than or equal to the one for the
given sample, assuming normally distributed data. The smaller Fingerprint Bit/Smiles Feature Structure Score
the p-value, the less trustworthy the prediciton. For highly non-
normal X properties (e.g., fingerprints), the MD p-value is wildly FCFP_2 332760439 0.672
inaccurate.
FCFP_2 1 0.511

FCFP_2 3 0.104

Top Features for negative contribution


Fingerprint Bit/Smiles Feature Structure Score
FCFP_2 136597326 -0.489

FCFP_2 203677720 -0.406

FCFP_2 0 -0.29
Molecule TOPKAT_Rat_Oral_LD50
Structural Similar Compounds
Name BUFEZOLAC INDOMETHAZINE DIFENAMIZOLE
Structure

Actual Endpoint (-log C) 3.862 5.17 2.431


C23H23N2O2[?] Predicted Endpoint (-log 2.51751 3.33605 2.62562
Molecular Weight: 359.44091 C)
ALogP: 5.163 Distance 0.512 0.568 0.577
Rotatable Bonds: 6 Reference DRFUD4 5;21;80 ARZNAD 25;1526;75 KSRNAM 6;168;72
Acceptors: 4
Donors: 1 Model Applicability
Unknown features are fingerprint features in the query molecule, but not found or appearing too infrequently
Model Prediction in the training set.
Prediction: 0.949 1. All properties and OPS components are within expected ranges.
Unit: g/kg_body_weight 2. Unknown ECFP_2 feature: -1305021906: [*]['?']
3. Unknown ECFP_2 feature: -1659020767: [*][c](:[*]):[c](['?']):[c]([*]):[*]
Mahalanobis Distance: 18.1
4. Unknown ECFP_2 feature: 432684389: ['?'][c](:[*]):[*]
Mahalanobis Distance p-value: 0.000435 5. Unknown FCFP_6 feature: 16: [*]:[cH]:[*]
Mahalanobis Distance: The Mahalanobis distance (MD) is a
generalization of the Euclidean distance that accounts for 6. Unknown FCFP_6 feature: 1618154665: [*][c](:[*]):[cH]:[cH]:[*]
correlations among the X properties. It is calculated as the 7. Unknown FCFP_6 feature: -1861645784: [*]:[cH]:[c](:[cH]:[*])[c](:[*]):[*]
distance to the center of the training data. The larger the MD, the
less trustworthy the prediction. 8. Unknown FCFP_6 feature: 690511177: [*]:[cH]:[c](:n:[*])[c](:[*]):[*]
Mahalanobis Distance p-value: The p-value gives the fraction of 9. Unknown FCFP_6 feature: 1747237384: [*][c](:[*]):n:[c]([*]):[*]
training data with an MD greater than or equal to the one for the
given sample, assuming normally distributed data. The smaller 10. Unknown FCFP_6 feature: -1151884458: ['?'][c](:[*]):[c](N):n:[*]
the p-value, the less trustworthy the prediciton. For highly non- 11. Unknown FCFP_6 feature: 1069584379: [*]:[c](:[*])N
normal X properties (e.g., fingerprints), the MD p-value is wildly
inaccurate. 12. Unknown FCFP_6 feature: 451371068: [*]\C=C\[c](:[*]):[*]

Feature Contribution
Top features for positive contribution
Fingerprint Bit/Smiles Feature Structure Score
ECFP_6 642810091 0.281

FCFP_6 136627117 0.17

FCFP_6 136120670 0.103

Top Features for negative contribution


Fingerprint Bit/Smiles Feature Structure Score
FCFP_6 1676877079 -0.254

ECFP_6 2077607946 -0.252


ECFP_6 655739385 -0.239

You might also like