Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

Supporting Information for

Learning Medicinal Chemistry


Absorption, Distribution,
Metabolism, Excretion, and
Toxicity (ADMET) rules from
Cross-company Matched
Molecular Pairs Analysis
(MMPA)
Christian Kramer§, Attilla Ting•, Hao Zheng‡, Jerome Hert§, Torsten Schindler§, Martin Stahl§,
Graeme Robb•, James J. Crawford‡, Jeff Blaney‡, Shane Montague†, Andrew G. Leach†, Al. G.
Dossetter†, Ed J. Griffen†

§
Roche Pharma Research and Early Development, Roche Innovation Center Basel,
Switzerland, ‡Genentech Inc, 1 DNA Way, South San Francisco, CA 94080, •AstraZeneca plc,
Milton Rd, Milton, Cambridge, CB4 0FZ, †MedChemica Ltd, Biohub Alderley Park, Macclesfield.
Cheshire SK10 4TG.

Contents
 Additional detail and diagrams showing the capture of the local chemical environment round a

point of change in a transformation (S3 – S5)

 The data merging approach & Flowchart showing the method for assigning rules as significant

(S5 – S7)

 Summary: Rules per ADMET endpoint (S7 / S8)

S1
 Importance of capturing the chemical environment (S8 – S13)

 Solubility vs logD plots, confusion tables, and unexpected rules (S14 – S21)

 Clearance: unexpected rules (S22 – S31)

 PPB: Comparison with logD confusion matrix and unexpected rules (S32 – S35)

 Comparison with Papadatos 2010 paper (S36 – S38)

 Single company vs GRD confusion tables (S39 – S41)

S2
MMP Generation
Examples of encoding different levels of environments for functional group inter-conversions

are shown in Figure S1.

Molecule Pair

Molecule A → Molecule B
1 bond environment

[c:1]([H])>>[c:1][C]([H])([H])([H])

2 bond environment

[c:1]([H])[c:2]([H])[c:3]([H])>>[c:1]([H])[c:2]([c:3]([H]))[C]([H])([H])([H])

3 bond environment

[c:1]([H])[c:2]([H])[c:3]([H])[c:4]([H])[n:5]>>[c:1]([H])[c:2]([H])[c:3]([c:4]([H])
[n:5])[C]([H])([H])([H])

4 bond environment

[c:1]1([H])[c:2]([H])[c:3]([H])[n:4][c:5][c:6]1([H])>>[C]([H])([H])([H])[c:2]1[c:
1]([H])[c:6]([H])[c:5][n:4][c:3]1([H])

Figure S1: Functional group transformations with different environment specification. The green

groups are those being changed in the transformation, the red portions of the transformations

shows the increasing level of environment specification increasing in all directions from the point

of change. The blue atom numbers correspond to the atom mappings in the SMIRKS

transformation encoding, these change with environment as the SMIRKS are canonicalized.

S3
The encoding of a linker exchange (see Figure S2) and a core replacement (see Figure S3) are

also captured with the analogous environments to the functional group inter-conversion. In the

example shown with the amide to sulfonamide conversions the SMARTS specification used in

the Hussain and Rea method avoids cutting the amide N-C(=O)single bond or the sulfonamide

N-SO2 single bonds so the resulting transformations shows an amide to sulfonamide

transformation rather than carbonyl to sulfonyl.

Molecule A → Molecule B

[c:1][C](=[O])[N]([H])[C:2]([H])>>[c:1][S](=[O])(=[O])[N]([H])[C:2]([H])

Figure S2: Linker exchange

Molecule A → Molecule B

S4
[C:1]([H])([H])[C]1([H])[C]([H])([C](=[O])[N]1[C:2])[C:3]([H])>>
[C:1]([H])([H])[C]1([H])[C]([H])([S](=[O])(=[O])[N]1[C:2])[C:3]([H])

Figure S3: core replacement

Data Merging
The data for each endpoint was aggregated and filtered according to the procedure shown

in Figure S4.

Figure S4: Rule selection process

S5
Four classes of rule are defined: increase, decrease, neutral and No Effect Determined

(NED). The latter indicates that although there was sufficient data, there was not sufficient

statistical signal to support an assignment of the direction of the transformation.

Figure S5: Rule classes

The neutral rules were those that reproducibly made very little change to an endpoint. One

of the principles of our analysis was not to assume in advance any distribution for the underlying

data, therefore a simple binomial test was used to indicate if a transformation had statistical

support for significantly increasing or decreasing a property. This has the considerable

advantage of easily overcoming the challenge of censored or “out-of-range” data. Censored

data is of particular importance in matched pair analysis, as a transformation that involves an in

range to an out of range, such as some measured hERG inhibition to out of range inactive

hERG inhibition is highly valuable knowledge. A consequence of choosing the binomial test and

a p value of 0.05, is that all transformations require a minimum of 6 supporting examples as this

is the first point at which, if every example pair is in the same direction, there is a <=5% chance

of the set of pairs being seen at random.

In using the rules extracted, care needs to be exercised in how NED rules are interpreted.

For the comparison of sets of rules between assays, for example, in relating human liver

microsomes to human hepatocytes, it is valid to explore if transformations that cause an

S6
increase in one assay are more variable in another assay and so yield a NED result. However,

when exploring the effect of a specific transformation, if the transformation is a NED in one

assay system the message is that the distribution of the current data is too variable to be

classified with a greater than 95% level of certainty that it is not merely random variation. Under

these circumstances any rule that is classed as a NED should be treated with caution when

used in designing new molecules.

Rules per ADMET endpoint


A summary of the statistically significant rules derived is shown in Table S1.

Datasets(s) Number of increase / decrease/


neutral rules

logD7.4 153,449

Solubility 46,655

In vitro microsomal clearance: 88,423

human, rat ,mouse, cynomolgus monkey, dog

In vitro hepatocyte clearance : 26,627

human, rat, mouse, cynomolgus monkey, dog

MDCK permeability A-B / B–A efflux 1,852

Cytochrome P450 inhibition: 40,605

2C9, 2D6 , 3A4 , 2C19 , 1A2

S7
Cardiac ion channels 15,636

NaV 1.5 , hERG ion channel inhibition

Glutathione Stability 116

Plasma protein binding 64,622

human, rat ,mouse, cynomolgus monkey, dog


Table S1: Number of rules found per target property

The importance of capturing environment.


There are some functional group exchanges that have been tried in a very large number of

environments. These allow us to explore the effect chemical environment exerts upon the effect

of chemical modifications. For example the replacement H→Me has been explored in 1,550

different local chemical environments as stored in the GRD. For the 225 different

transformations where there is a statistically significant effect of H→Me on solubility, the

average change in solubility can be as different as 3.4 log units, depending on the environment.

As shown in Figure S7, the median change in solubility for H→Me is -0.22 log units, however

the range is more than 3.4 log units. Even discounting the transformations that involve

methylation of an anion i.e. CO2H→CO2Me (Figure S6, left), there is still a 2.5 log range in the

effect on solubility (Figure S6, right).

S8
Figure S6: H→Me transformations effect on solubility

The benefit of specificity of the environments is also shown by the histograms below (the

overall median effect is shown as a solid vertical line). The larger magnitude changes in

solubility(environment level 4) are associated with the more specific rules, the blending of these

examples together in the less specific rules(environment level 1,2) would be misleading as it

would miss the opportunities and risks inherent to specific environments. This form of analysis

starts to unravel the “magic” in “magic methyl” effects.

S9
Figure S7: Deeper environment level specification for H  Me transformations reveals rules

with very different effects.

Three of the examples of H→Me transformations giving large increases in solubility are

shown in Figure S8 and are consistent with the importance of conformational changes.

[c:1]([H])[c:2]([c:3])[N:4]([H])[C:5](=[O:6])[C:7]([H])([H])>>
[c:1]([H])[c:2]([c:3])[N:4]([C]([H])([H])([H]))[C:5](=[O:6])[C:7]([H])([H])

Delta log(sol) = +0.975

S10
[C:1][c:2]1[n:3]([H])[c:4]([n:5])[c:6][n:7]1>>[
C]([H])([H])([H])[n:3]1[c:2]([n:7][c:6][c:4]1[n:5])[C:1]

Delta log(sol) = +1.456

[c:1]1([H])[c:2]([c:3][n:4][n:5]1)[N:6]([H])>>
[C]([H])([H])([H])[c:1]1[c:2]([c:3][n:4][n:5]1)[N:6]([H])

Delta log(sol) = +0.508

Figure S8: HMe transformation examples that give a large increase in solubility

For the same transformation, 278 different environments are observed that give a

statistically significant change in human microsomal metabolism. These show a broader

distribution around a median increase of 0.28 log Clint, but again with a few environments

where the H→Me transformation gives a consistent decrease in metabolic clearance. Where the

H→Me transformation involves methylating an acid, there is a consistent increase in clearance,

presumably associated with the ester hydrolysis or demethylation pathways now becoming

available.

S11
Figure S9: H→Me transformations effect on human microsomal clearance

The highly popular functional group replacement of H→F as a “metabolic fix” has 37

different environments where there is sufficient statistical support to comment. Although the

median change is precisely 0, implying in general no effect on metabolism, there are specific

environments where a H→F replacement is either highly effective or highly deleterious (see

Figures S10, S11). This may explain the continuing popularity with medicinal chemists of trying

H→F, in that it works well under highly specific circumstances, modestly under more, but

generally is a disappointment. The human bias to remember the successful cases can be

tempered by a better understanding of when such a change is most likely to be successful.

Figure S10: HF substitution effects on human microsomal clearance

S12
[c:1]1([H])[c:2]([H])[c:3][c:4]([H])[c:5]([c:6]1([H]))[C:7](=[O:8])[N:9]([H])>>
[c:2]1([H])[c:1]([H])[c:6]([c:5]([c:4]([H])[c:3]1)[C:7](=[O:8])[N:9]([H]))[F]

Delta log(Clint) -0.33

[c:1][N:2]([H])[c:3]1[c:4]([H])[c:5][c:6]([H])[c:7]([c:8]1([H]))[C:9]([H])([F:10])[F:11]>>
[c:1][N:2]([H])[c:3]1[c:4]([H])[c:5][c:6]([H])[c:7]([c:8]1[F])[C:9]([H])([F:10])[F:11]

Delta log(Clint) +0.36

Figure S11: HF transformations have very different clearance effects in different environments

S13
Solubility

Figure S12: Solubility vs logD effects, min 50 pairs per rule. Red line = line of equality.

Linear fit Blue line: R2 = 0.70, slope = -1.4, green density ellipses at 50% and 99% of the data.

S14
Figure S13: Solubility vs logD effects, min. 20 pairs per rule. Black line = line of equality.

Linear fit: R2 = 0.66, slope = -1.19.

S15
Solubility

decrease increase NED

logD decrease 39 793 282

increase 793 39 282

NED 33 33 128

Table S2: Confusion Matrix for Solubility-logD increase/neutral/decrease/NED rules for all

rules based on more than 50 pairs on both properties.

Solubility

decrease neutral increase NED

logD decrease 66 0 1464 1038

neutral 5 4 5 30

increase 1464 0 66 1038

NED 144 0 144 972

Table S3: Confusion Matrix for Solubility-logD increase/neutral/decrease/NED rules for all

rules based on 20 to 50 pairs on both properties.

S16
Solubility

decrease neutral increase NED

logD decrease 141 0 3428 2046

neutral 7 12 7 48

increase 3428 0 141 2046

NED 299 0 299 1551


Table S4: Confusion Matrix for Solubility-logD increase/neutral/decrease/NED rules for all rules

based on minimum 20 pairs on both properties.

S17
Figure S14: Effect of change in Donor Count on logD/ Solubility (number of acceptors kept

constant). Increase in Donors = orange, decrease in donors = blue. Outliers are almost

exclusively related to changing a tertiary amine (unprotonated, does not count as donor in our

analysis) into something with more donors, e.g. hydroxyl group. Only rules based on more than

20 pairs are shown.

Figure S15: Effect of change in acceptor count on logD/ Solubility (number of donors kept

constant). Increase in acceptors = orange, decrease in acceptors = blue. Only rules based on

more than 20 pairs are shown.

logD neutral rules that improve solubility


Overall, changes in logD are highly correlated with changes in solubility. However, there are

some exceptions. Table S5 shows five transformations which on average are logD neutral but

significantly improve solubility.

S18
ΔlogD ±std ΔlogSol ±std
Transformation (nPairs) (nPairs)

0.00 ± 0.67 (91) 0.73 ± 0.72 (87)

[C:1]([H])([H])[C:2]([H])([H])[N]1[C]([H])([H])[C]([H])([
H])[O][C]([H])([H])[C]([H])([H])1>>
[C:1]([H])([H])[C:2]([H])([H])[N]1[C]([H])([H])[C]([H])([
H])[C]([H])([H])[C]([H])([H])[C]([H])([H])1

-0.10 ±0.83 (83) 0.65 ± 0.96 (69)

[c:1]([H])[c:2]([n:3])[C]#[N]>>
[c:1]([H])[c:2]([H])[n:3]

0.07 ± 0.41 (72) 0.56 ± 0.91 (52)

[c:1][N]1[C]([H])([H])[C]([H])([H])[N]([C]([H])([H])[C]([
H])([H])1)[C](=[O])[C]([H])([H])([H])>>
[c:1][N]1[C]([H])([H])[C]([H])([H])[N]([C]([H])([H])[C]([
H])([H])1)[C]([H])([H])([H])

0.07 ± 0.50 0.52 ± 0.77 (80)


(108)

[c:1][S](=[O])(=[O])[C]([H])([H])([H])>>[c:1][C](=[O])[
N]([C]([H])([H])([H]))[C]([H])([H])([H])

-0.10 ± 0.54 0.40 ± 0.78 (115)


(208)
[c:1][C]([F])([F])[F]>>
[c:1][C]([H])([H])[C]([H])([H])([H])
Table S5: Transformations that on average keep logD constant and significantly increase

solubility. Note that although the std. of the effects can be rather large, the mean is estimated

with high certainty, because there are more than 50 pairs for each transformation.

S19
There are also transformations that significantly reduce logD without affecting solubility on

average. Five of those are shown in Table S6.

ΔlogD ±std ΔlogSol ±std


Transformation (nPairs) (nPairs)

-0.59 ± 0.49 (82) 0.03 ± 0.72 (98)

[C]([H])([H])([H])[C]([C]([H])([H])([H]))([C:1]([H])([H]))[
O:2]([H])>>
[C:1]([H])([H])[C]([H])([H])[O:2]([H])

-0.58 ± 0.48 0.01 ± 0.84 (86)


(179)

[c:1]([H])[c:2]([H])[c:3]([c:4]([H])[c:5])[C]([H])([H])([H])
>>
[c:1]([H])[c:2]([H])[c:3]([c:4]([H])[c:5])[C]#[N]

-0.55 ± 0.40 0.00 ± 0.63 (74)


(131)

[c:1][C](=[O])[N]([H])[C]([H])([C]([H])([H])([H]))[C]([H])
([H])([H])>>
[c:1][C](=[O])[N]([H])[C]1([H])[C]([H])([H])[C]([H])([H])[
O][C]([H])([H])[C]([H])([H])1

-0.47 ± 0.90 (53) 0.02 ± 0.68 (57)

[c:1][N]([H])[C]([H])([H])([H])>>
[c:1][N]([H])[C]1([H])[C]([H])([H])[O][C]([H])([H])1

-0.43 ± 0.63 -0.02 ± 0.80


(303) (207)

[c:1]([H])[c:2]([c:3])[F]>>
[c:1]([H])[c:2]([c:3])[C]#[N]

S20
Table S6: Transformations that on average keep solubility constant and significantly decrease

logD.

Among all rules based on more than 50 pairs, there is only one transformation that on

average increases both logD and solubility by more than 0.3 log units. This is shown in Table

S7.

ΔlogD ±std ΔlogSol ±std


Transformation (nPairs) (nPairs)

0.45 ± 0.64 (50) 0.46 ± 1.02 (65)

[c:1]1([H])[c:2]([c:3]([H])[c:4]([c:5][c:6]1[Cl:7])[Cl:8])[
C]#[N]>>
[c:2]1([H])[c:1]([H])[c:6]([c:5][c:4]([c:3]1([H]))[Cl:8])[Cl
:7]
Table S7: Transformation that on average increases Solubility and logD.

S21
Clearance
ID Transformation Hu LM Clearance Hepatocyte
median change ± Clearance median
std ( nPairs, change ± std
percentage of (nPairs,
indicated percentage of
changes) indicated changes)
direction direction

1 -1.18±0.81 -0.92±0.44 (33,96)


(37,89) decrease decrease

[c:1][N]([H])[c]1[c]([H])[n][n]([c]([H])1)[C]([
H])([H])([H])>>[c:1][N]([H])[c]1[c]([H])[c]([
H])[n][n]1[C]([H])([H])([H])

2 -0.97±0.65 -0.56±0.35 (6,100)


(166,93) decrease
decrease
[c:1][C](=[O])[O][C]([H])([H])([H])>>[c:1][C
](=[O])[O]([H])

3 -0.71±0.48 -0.61±0.33 (8,100)


(48,95) decrease decrease

[c:1][S](=[O])(=[O])[N]([C]([H])([H])([H]))[C
]([H])([H])([H])>>[c:1][S](=[O])(=[O])[C]([H
])([H])([H])

4 -0.68±0.83 -0.48±0.44 (12,91)


(28,78) decrease decrease

[c:1][c]1[c]([H])[n][c]([H])[n][c]([H])1>>[c:1]
[C](=[O])[N]([H])[C]([H])([H])([H])

S22
5 -0.61±0.52 -0.95±0.28 (7,100)
(80,85) decrease decrease

[c:1][C]([H])([H])[O]([H])>>[c:1][C](=[O])[O
]([H])

6 -0.52±0.51 -0.73±0.39
(33,84) decrease (11,100) decrease

[c:1][O][C]([H])([H])[c]1[c]([H])[c]([H])[c]([H
])[c]([H])[c]([H])1>>[c:1][O][C]([H])([H])[C](
[H])([H])([H])

7 -0.51±0.61 -0.6±0.32 (12,91)


(112,80) decrease
decrease

[c:1][C]([H])([H])([H])>>[c:1][S](=[O])(=[O]
)[C]([H])([H])([H])

8 -0.5±0.63 (28,85) -0.47±0.39 (6,100)


decrease decrease

[c:1][N]([H])[C](=[O])[C]1([H])[C]([H])([H])[
C]([H])([H])1>>[c:1][N]([H])[C](=[O])[N]([H
])[C]([H])([H])([H])

9 -0.5±0.57 (93,82) -0.61±0.57 (12,83)


decrease decrease

[c:1][O][C]([H])([H])([H])>>[c:1][C](=[O])[N
]([H])([H])

10 -0.47±0.75 -0.39±0.38 (19,89)


(85,81) decrease decrease

S23
[C]([H])([H])([H])[C:1]>>[C:1][O]([H])

11 -0.44±0.52 -0.16±0.42 (9,88)


(30,86) decrease decrease

[c:1][C](=[O])[C]([H])([H])([H])>>[c:1][C](=[
O])[N]([H])[C]([H])([H])([H])

12 -0.43±0.38 -0.29±0.31 (10,90)


(69,88) decrease decrease

[c:1][C]([H])([H])[N]1[C]([H])([H])[C]([H])([
H])[O][C]([H])([H])[C]([H])([H])1>>[c:1][C](
[H])([H])[N]1[C]([H])([H])[C]([H])([H])[N]([C
]([H])([H])[C]([H])([H])1)[C]([H])([H])([H])

13 -0.41±0.36 -0.54±0.33 (13,92)


(98,87) decrease decrease

[c:1][N]1[C]([H])([H])[C]([H])([H])[N]([C]([H
])([H])[C]([H])([H])1)[C]([H])([H])([H])>>[c:
1][N]1[C]([H])([H])[C]([H])([H])[N]([H])[C]([
H])([H])[C]([H])([H])1

14 -0.39±0.48 -0.61±0.26
(116,85) (30,100) decrease
decrease
[c:1]SC([H])([H])([H])>>[c:1]OC([H])([H])([
H])

15 -0.38±0.6 -0.59±0.45 (34,94)


(153,81) decrease
decrease
[C]([H])([H])([H])[O][C:1]>>[C:1][O]([H])

16 -0.38±0.62 -0.55±0.63 (15,86)


(169,76) decrease
decrease
[c:1][O][C]([H])([H])([H])>>[c:1][S](=[O])(=[

S24
O])[C]([H])([H])([H])

17 -0.36±0.65 -0.5±0.17 (14,100)


(33,75) decrease decrease

[C]([H])([H])([H])[C]([H])([C]([H])([H])([H]))[
C]([H])([H])[C:1]>>[C]([H])([H])([H])[C:1]

18 -0.36±0.56 -0.23±0.55 (12,83)


(155,80) decrease
decrease
[c:1][O][C]([H])([H])([H])>>[c:1][N]([H])([H]
)

19 -0.34±0.52 -0.37±0.36 (34,91)


(121,79) decrease
decrease

[H][C@@]1([C]([H])([H])[O][C]([H])([H])[C]
([H])([H])[N]1[c:1])[C]([H])([H])([H])>>[c:1][
N]1[C]([H])([H])[C]([H])([H])[O][C]([H])([H])
[C]([H])([H])1

20 -0.32±0.6 (34,73) -0.33±0.11 (8,100)


decrease decrease

[C:1][c]1[c]([H])[c]([H])[c]([H])[c]([H])[c]([H]
)1>>[C:1][C]1([H])[C]([H])([H])[C]([H])([H])
[O][C]([H])([H])[C]([H])([H])1

Table S8: The top 20 transformations that significantly decrease both human liver

microsomal and hepatocyte clearance. The rules were selected by the following criteria: 1.the

rule has more than 20 examples for human liver microsomal clearance. 2.the rule decreases

both human liver microsomal clearance and hepatocyte clearance. 3.the rule has example pairs

from multiple companies 4.only single atom environments are were included to balance

generality and specificity for the rule. 5.very similar rules were also removed to maximize

chemical matters diversity

S25
ID Transformation Hu LM Clearance logD
median change ± median change ±
std (nPairs) std (nPairs)
direction direction

1 0.0±0.44 (37) 0.75±0.57 (83)


NED increase

[C:1][c]1[c]([H])[c]([H])[c]([H])[c]([H])[c]([H]
)1>>[C:1][c]1[c]([H])[c]([H])[c]([c]([H])[c]([H
])1)[Cl]

2 0.03±0.3 (45) 0.64±0.43 (67)


NED increase

[C:1][c]1[c]([H])[c]([H])[c]([H])[c]([H])[c]([H]
)1>>[C:1][c]1[c]([H])[c]([H])[c]([H])[c]([c]([H
])1)[Cl]

3 0.11±0.32 (31) 0.17±0.47 (66)


increase increase

[C:1][c]1[c]([H])[c]([H])[c]([H])[c]([H])[c]([H]
)1>>[C:1][c]1[c]([H])[c]([H])[c]([H])[c]([H])[c
]1[Cl]

4 -0.32±0.51 (53) 0.7±0.74 (117)


decrease increase

[c:1][c]1[c]([H])[c]([H])[c]([H])[c]([H])[c]([H])
1>>[c:1][c]1[c]([H])[c]([H])[c]([c]([H])[c]([H])
1)[Cl]

5 -0.06±0.52 (49) 0.57±0.5 (81)


NED increase

[c:1][c]1[c]([H])[c]([H])[c]([H])[c]([H])[c]([H])
1>>[c:1][c]1[c]([H])[c]([H])[c]([H])[c]([c]([H])

S26
1)[Cl]

6 0.0±0.47 (51) 0.42±0.45 (85)


NED increase

[c:1][c]1[c]([H])[c]([H])[c]([H])[c]([H])[c]([H])
1>>[c:1][c]1[c]([H])[c]([H])[c]([H])[c]([H])[c]
1[Cl]

7 0.16±0.31 (28) -0.08±0.48 (76)


NED NED

[C:1][c]1[c]([H])[c]([H])[c]([H])[c]([H])[c]([H]
)1>>[C:1][c]1[c]([H])[c]([H])[c]([H])[c]([H])[c
]1[F]

8 0.0±0.21 (50) 0.13±0.45 (76)


NED increase

[C:1][c]1[c]([H])[c]([H])[c]([H])[c]([H])[c]([H]
)1>>[C:1][c]1[c]([H])[c]([H])[c]([H])[c]([c]([H
])1)[F]

9 0.0±0.38 (74) 0.15±0.65 (146)


NED increase

[C:1][c]1[c]([H])[c]([H])[c]([H])[c]([H])[c]([H]
)1>>[C:1][c]1[c]([H])[c]([H])[c]([c]([H])[c]([H
])1)[F]

10 -0.0±0.39 (74) 0.17±0.42 (223)


NED increase

[c:1][c]1[c]([H])[c]([H])[c]([H])[c]([H])[c]([H])
1>>[c:1][c]1[c]([H])[c]([H])[c]([c]([H])[c]([H])
1)[F]

S27
11 0.0±0.32 (47) 0.2±0.44 (92)
NED increase

[c:1][c]1[c]([H])[c]([H])[c]([H])[c]([H])[c]([H])
1>>[c:1][c]1[c]([H])[c]([H])[c]([H])[c]([c]([H])
1)[F]

12 -0.03±0.35 (106) 0.02±0.65 (116)


NED NED

[c:1][c]1[c]([H])[c]([H])[c]([H])[c]([H])[c]([H])
1>>[c:1][c]1[c]([H])[c]([H])[c]([H])[c]([H])[c]
1[F]

Table S9. The Summary of the in-vitro human microsomal clearance change of ortho, meta

and para substitution on phenyl ring with Cl and F

microsomal
logD
Clearance median
ID median change ± std
Transformation change ± std (nPairs)
(nPairs) direction
direction

-0.78±0.53 (11) 0.91±1.11 (18)


1
[c:1][C](=[O])[O][C]([H])([H])([H])>> decrease increase
[c:1][O][C]([F])([F])[F]

-0.32±0.51 (53) 0.7±0.74 (117)


2
[c:1][c]1[c]([H])[c]([H])[c]([H])[c]([H])[c]( decrease increase
[H])1>>
[c:1][c]1[c]([H])[c]([H])[c]([c]([H])[c]([H])
1)[Cl]

S28
-0.23±0.36 (18) 0.73±0.61 (26)
3
[c:1][c]1[c]([H])[c]([H])[n][c]([H])[c]([H]) decrease increase
1>>
[c:1][c]1[c]([H])[c]([H])[c]([c]([H])[c]([H])
1)[C]#[N]

-0.34±0.71 (13) 0.35±0.45 (15)


4
decrease increase
[c:1][c]1[c]([H])[n]([H])[n][c]([H])1>>
[c:1][c]1[c]([H])[n][n]([c]([H])1)[C]([H])([
H])[C]([H])([H])([H])

[c:1][C]([H])([H])[N]1[C]([H])([H])[C]([H] -0.69±0.42 (8) 0.76±0.59 (7)


5
)([C]([H])([H])1)[C:2]([H])([H])>> decrease increase
[c:1][C]([H])([H])[N]1[C]([H])([H])[C]([H]
)([H])[C]([H])([C]([H])([H])[C]([H])([H])1)
[C:2]([H])([H])
Table S10: Five selected transformations that decrease human microsomal clearance with

increased logD change

microsomal
logD median
Clearance median
ID Transformation change ± std
change ± std (nPairs)
(nPairs) direction
direction

-0.34±0.22 (9) -0.1±0.08 (9)


1 [c:1][N]([H])[c]1[c]([H])[c]([H])[n][n][c]1[ decrease neutral
C]([H])([H])([H])>>
[c:1][N]([H])[c]1[c]([H])[c]([H])[n][n]1[C]
([H])([H])([H])

-0.34±0.22 (6) -0.07±0.07 (6)


2
decrease neutral
[C]([H])([H])([H])[C]([H])([H])[N]1[C]([H]

S29
)([H])[C]([H])([H])[N]([C]([H])([H])[C]([H
])([H])1)[C]([H])([H])[C:1]>>
[C]([H])([H])([H])[N]1[C]([H])([H])[C]([H]
)([H])[N]([C]([H])([H])[C]([H])([H])1)[C]([
H])([H])[C:1]

[H][C@]1([C]([H])([H])[C]([H])([H])[N]([ -0.39±0.12 (14) 0.1±0.65 (14)


3 C]([H])([H])1)[C]([H])([H])[c:1])[N:2]([H] decrease neutral
)>>
[c:1][C]([H])([H])[N]1[C]([H])([H])[C]([H]
)([H])[C]([H])([C]([H])([H])[C]([H])([H])1)
[N:2]([H])

-0.59±0.38 (14) 0.0±0.11 (19)


4 [c:1][S](=[O])(=[O])[C]([H])([H])[C]([H])(
decrease neutral
[H])[O][C]([H])([H])([H])>>
[c:1][S](=[O])(=[O])[C]([H])([H])([H])

-0.24±0.38 (8) 0.01±0.16 (25)


5
[C:1][c]1[c]([H])[c]([H])[c]([H])[c]([c]([H] decrease neutral
)1)[C]#[N]>>
[C:1][c]1[c]([H])[c]([H])[c]([c]([H])[c]([H]
)1)[C]#[N]
Table S11: Five selected transformations that decrease human microsomal clearance with

neutral LogD change

ID microsomal
logD
Transformation Clearance median
median change ± std
change ± std (nPairs)
(nPairs) direction
direction

-1.18±0.81 (37) -0.26±0.5 (43)


1
decrease decrease
[c:1][N]([H])[c]1[c]([H])[n][n]([c]([H])1)[
C]([H])([H])([H])>>
[c:1][N]([H])[c]1[c]([H])[c]([H])[n][n]1[C]
([H])([H])([H])

S30
[c:1][C](=[O])[N]([H])[S](=[O])(=[O])[N]( -1.14±0.6 (14) -0.33±0.59 (13)
2
[H])[C]([H])([H])([H])>> decrease decrease
[c:1][C](=[O])[N]([H])[S](=[O])(=[O])[C](
[H])([H])[C]([H])([H])[O][C]([H])([H])([H]
)

-0.76±0.47 (11) -0.13±0.76 (15)


3
decrease decrease
[c:1][N]([C]([H])([H])([H]))[S](=[O])(=[O]
)[C]([H])([H])([H])>>
[c:1][N]([H])[S](=[O])(=[O])[C]([H])([H])(
[H])

-0.87±0.37 (15) -0.34±0.17 (14)


4
decrease decrease
[c:1][N]([H])[c]1[c]([H])[c]([H])[n][c]([H])
[n]1>>
[c:1][N]([H])[c]1[c]([H])[c]([H])[n][n]1[C]
([H])([H])([H])

-0.94±0.6 (13) -0.38±0.4 (18)


5
decrease decrease
[c:1][C]([H])([H])[C]#[N]>>
[c:1][C](=[O])[N]([H])[C]([H])([H])([H])
Table S12: Five selected transformations with the largest difference between change in

microsomal clearance and logD.

S31
Plasma Protein Binding

human Plasma Protein log(free/bound)

decrease neutral increase NED

logD decrease 103 17 2957 737

neutral 15 42 15 46

increase 2957 17 103 737

NED 207 18 207 696


Table S13: Confusion Matrix for human PPB - logD increase/neutral/decrease/NED rules for all

rules based on more than 20 pairs on both properties.

Transformation ΔlogD ±std Δlog Hu PPB ±std

(nPairs) (nPairs)

2.54± 0.79 (33) 0.33± 0.29 (21)

[c:1][c:2]1[c:3]([H])[c:4]([c:5]([H])[n:6][c:7]1([H]))[C](
=[O])[O]([H])>>
[c:1][c:2]1[c:3]([H])[c:4]([c:5]([H])[n:6][c:7]1([H]))[C](
=[O])[N]([H])([H])
1.84 ± 0.52 (20) 0.57 ± 0.33 (13)

[C:1]([H])([H])[C]([H])([H])[C](=[O])[O]([H])>>
[C:1]([H])([H])[C]([H])([H])[C](=[O])[N]([H])([H])

S32
1.56 ± 0.76 (60) 0.93 ± 0.56 (30)

[c:1][c]1[n]([H])[n][n][n]1>>
[c:1][C](=[O])[N]([H])([H])
2.32 ± 0.90 (151) 0.35 ± 0.54 (74)

[c:1][C](=[O])[O]([H])>>
[c:1][C]([H])([H])[O]([H])
2.36 ± 0.62 (64) 0.35 ± 0.24 (38)

[c:1][c]1[c]([H])[c]([c]([H])[n][c]([H])1)[C](=[O])[O]([H
])>>
[c:1][c]1[c]([H])[n][c]([H])[n][c]([H])1
Table S14. Five selected rules that increases logD and increases the log(free/bound) for plasma

protein binding

Transformation ΔlogD ±std Δlog Hu PPB ±std


(nPairs) (nPairs)

0 ± 0.14 (11) 0.4 ± 0.2

(11)
[c:1]CCNc1ccc(cc1)[c:2]>>
[c:1]CCNC(=O)c1ccc(cc1)[c:2]

0.02 ± 0.06 (7) 0.34 ± 0.2 (7)

[c]1([H])[c]([c]([c]([H])[c]([c]1[F])[Cl])[F])[N:1]>>
[c]1([H])[c]([H])[c]([c]([c]([H])[c]1[C]([F])([F])[F])[N:1])[
F]

-0.1 ± 0.11 (21) 0.62 ± 0.19 (7)

[c:1][N:2]([C:3]([H])([H])([H]))[c]1[c]([H])[c]([H])[c]([c]([

S33
c]([H])1)[C]([H])([H])[O]([H]))[C]([H])([H])([H])>>
[c:1][N:2]([C:3]([H])([H])([H]))[c]1[c]([H])[c]([c]([H])[c]([
H])[c]1[C]([H])([H])([H]))[C]([H])([H])[O]([H])

-0.10 ± 0.14 (11) 0.40 ± 0.36 (9)

[c:1][c:2]([n:3])[N]([H])[c]1[c]([H])[c]([H])[c]([H])[c]([H])[
c]1[F]>>
[c:1][c:2]([n:3])[N]([H])[C]([H])([H])[C]1([H])[C]([H])([H]
)[C]([H])([H])1

-0.10 ± 0.11 (21) 0.38 ± 0.44 (21)

[c:1][n:2]([c:3])[C]1([C]([H])([H])[C]([H])([H])[O][C]([H])
([H])[C]([H])([H])1)[C]([H])([H])([H])>>
[H][C@]1([C]([H])([H])[C@@]([C]([H])([H])1)([O][C]([H
])([H])([H]))[H])[n:2]([c:1])[c:3]
Table S15. Five selected rules that are logD neutral but increase the free proportion in human

plasma.

Transformation ΔlogD ±std Δlog Hu PPB ±std

(nPairs) (nPairs)

0.35 ± 1.14 (80) 0.44 ± 0.46 (38)

[c:1][N:2]([H])[S:3]>>
[c:1][N:2]([C]([H])([H])([H]))[S:3]
0.48 ± 0.29 (11) 0.42 ± 0.28 (12)

[c:1][O][c]1[c]([H])[c]([H])[c]([c]([H])[c]([H])1)[C:2]([H])>>
[c:1][C]([H])([H])[c]1[c]([H])[c]([H])[c]([c]([H])[c]([H])1)[C:
2]([H])

S34
0.69 ± 0.63 (22) 0.36 ± 0.36 (21)

[c:1][c]1[c]([H])[c]([c]([n][c]([H])1)[O][C:2]([H])([H])([H]))[
C:3]>>
[c:1][c]1[c]([H])[c]([c](=[O])[n]([c]([H])1)[C:2]([H])([H])([H
]))[C:3]
0.33 ± 0.14 (11) 0.32 ± 0.34 (11)

[c:1][n:2]([c:3])[C]1([C]([H])([H])[C]([H])([H])[O][C]([H])([
H])[C]([H])([H])1)[C]([H])([H])([H])>>
[H][C@@]1([C]([H])([H])[C]([H])([H])[C@]([C]([H])([H])1
)([H])[O][C]([H])([H])([H]))[n:2]([c:1])[c:3]
Table S16: Four selected rules that are increase logD but increase the log(free/bound)human

PPB.

S35
Comparison with rules published in Papadatos’2010 paper

In 2010, Papadatos et al published the first MMP paper highlighting the importance of the

chemical context within which transformations take place. In Table S17, we compare some of

the rules they found using their specific context description with our findings. Note that

Papadatos et al use a different scheme for encoding the chemical context, so it is not possible

to directly compare the rule statistics from GRDv3 with their rule statistics. Nevertheless, Table

S17 shows that the rules we find are in qualitative agreement with the rules found by Papadatos

et al. There is very good qualitative agreement between the effect on solubility of the piperidine

to morpholine substitution. Within an aliphatic environment, substituting a piperidine for a

morpholine has a detrimental effect on solubility, whereas within an aromatic environment, this

has a beneficial effect. The structural reason for this is most probably the change of the pKa

within the aliphatic environment. This is the strongest context dependence seen, and there is a

very good agreement between Papadatos et al’s and our data. If the effects are weaker (the

other two examples in Table S17), the agreement is qualitatively still good, but it is hard to judge

more details due to the different encoding of the environment and the absence of quantitative

data in Papadatos’ et al’s paper.

Property Transformation Environment Papadatos GRDv3 median


value value ±std
(GRDv3 env) (nPairs)

hERG H >> OCH3 Aliphatic linker (-CH2-CH2-) decrease -0.07 ± 0.30


(90)
[C:1]([H])([H])([H])[C:2]([H])([
H])>>[C]([H])([H])([H])[O][C:1]
([H])([H])[C:2[CK1] ]([H])([H])

S36
Hydrophobic aromatic ring neutral 0.01 ± 0.29
(p-phenyl) (137)

[c:1]1([H])[c:2]([H])[c:3]([H])[
c:4][c:5]([H])[c:6]1([H])>>[C]([
H])([H])([H])[O][c:1]1[c:2]([H])
[c:3]([H])[c:4][c:5]([H])[c:6]1([
H])

Polar aromatic ring (p-2-N- increase 0.03 ± 0.51 (8)


pyridine)

[c:1]1([H])[c:2]([H])[c:3][c:4]([
H])[n:5][c:6]1([H])>>[C]([H])([
H])([H])[O][c:6]1[c:1]([H])[c:2]
([H])[c:3][c:4]([H])[n:5]1

Solubility Piperidine >> Polar aromatic ring increase 0.78 ± 0.87 (62)
Morpholine (aromatic ring)

[c:1][N]1[C]([H])([H])[C]([H])(
[H])[C]([H])([H])[C]([H])([H])[C
]([H])([H])1>>[c:1][N]1[C]([H])
([H])[C]([H])([H])[O][C]([H])([H
])[C]([H])([H])1

Aliphatic ring positively decrease -0.56 ± 0.79


ionizable (aliphatic linker) (127)

[C:1]([H])([H])[N]1[C]([H])([H
])[C]([H])([H])[C]([H])([H])[C]([
H])([H])[C]([H])([H])1>>[C:1]([
H])([H])[N]1[C]([H])([H])[C]([H
])([H])[O][C]([H])([H])[C]([H])([
H])1

S37
logD H >> CN Aromatic ring with H-bond Neutral - -0.10 ± 0.33
acceptor (p-3-N-pyridine) increase (43)

[c:1]1([H])[c:2]([H])[c:3]([H])[
n:4][c:5][c:6]1([H])>>[c:6]1([H
])[c:1]([H])[c:2]([c:3]([H])[n:4][
c:5]1)[C]#[N]

Aliphatic linker(-CH2- decreas -0.69 ± 0.77


CH2-) e (40)

[C:1]([H])([H])([H])[C:2]([H
])([H])>>[C:2]([H])([H])[C:1]([
H])([H])[C]#[N]

Table S17: Comparison with context-specific MMP rules previously identified by Papadatos

et al. SMILES for searching transformations in the GRD are given.

S38
Single Company versus GRDv3 rules

Roche

increase neutral decrease NED

increase 20371 0 2 229

GRD
neutral 204 7041 204 967

decrease 2 0 20371 229

NED 1840 20 1840 27651

Table S18: Qualitative change in rule direction for logD rules based Roche pairs alone and

joint GRDv3 pairs.

S39
Roche

increase neutral decrease NED


GRDv3

increase 2395 25 2 103

neutral 0 80 0 0

decrease 2 25 2395 103

NED 15 42 15 799

Table S19: Qualitative change in rule direction for logD rules based on Roche pairs alone and

joint GRDv3 pairs based on rules with at least 20 pairs.

S40
Roche

increase neutral decrease NED


GRD

increase 185 33 1 190

neutral 3 1042 3 248

decrease 1 33 185 190

NED 47 513 47 1140

Table S20: Qualitative change in rule direction for human microsomal clearance rules based on

Roche pairs alone and joint GRDv3 pairs.

S41

You might also like