Professional Documents
Culture Documents
A. Chemogenomics in Drug Discovery - The Druggable Genome
A. Chemogenomics in Drug Discovery - The Druggable Genome
4.17.1 Introduction
Over the past 100 years since Paul Ehrlich’s first systematic search for drugs to discover arsphenamine (Salvarsan),
medicinal chemistry has continuously sought more effectively means to navigate the vastness of chemical space in the
search for new therapies. Arguably the greatest contributions to the changing practice of medicinal chemistry in recent
decades have come from the influence of molecular biology and protein crystallography. Advances in molecular biology,
culminating in whole genome sequencing, provide modern drug discoverers with the entire palette of proteins that are
the past and future drug targets. From the genomics scale to the atomic scale, insights from protein crystallography
enable drug designers to observe in atomic resolution the details of the interaction between ligands and drug targets.
Modern medicinal chemistry is now capable of synthesizing knowledge from structure–activity relationships (SARs),
large-scale screening campaigns, and insights from structure-based drug design to find the intersects between protein
sequences and chemical structure. Chemogenomics attempts to integrate chemical space with biology on a genome
scale. In the following chapter we outline how insights from chemogenomics can be directly applied in medicinal
chemistry in the target discovery and lead discovery stages.
421
422 Chemogenomics in Drug Discovery – The Druggable Genome and Target Class Properties
Gene taxonomy Human targets at Human targets at Human targets at Human targets
o10 mM o1 mM o10 mM at o100 mM
Ro5 an41 Ro5 a n41
Peptide GPCRs 63 59 59 42
Transferases 49 42 36 24
Aminergic GPCRs 35 35 35 35
Oxidoreductases 40 36 38 25
Metalloproteases 44 41 41 35
Hydrolases 36 29 30 21
Serine proteases 30 30 28 21
PDEs 19 19 19 18
Cysteine proteases 16 16 14 13
GPCRs Class C 10 10 10 6
Kinases others 12 9 11 5
GPCRs Class B 7 7 4 3
Aspartyl proteases 7 7 4 4
Since our current data on the properties of drugs point to a range of molecular properties within which the
likelihood of compound becoming an oral drug is increased, it is interesting to ask how do the range of properties of
ligands binding to specific targets overlap with ‘druglike’ space. Paolini et al. have investigated the relationship between
target class and the physicochemical properties of ligands by calculating a set of physicochemical descriptors for over a
quarter of a million biologically active compounds, across over 1300 targets, where the protein sequences assigned
to each of the pharmacological targets were classified into gene families. Distinct differences in the distribution
of molecular properties between sets of compounds active against different gene families were observed (Table 2,
Figure 1). For example, ligands for the nuclear hormone receptors are significantly most lipophilic, as measured by
ClogP, mirroring the properties of steroids. In comparison the mean molecular weight (MW) of ligands binding to
aminergic G protein-coupled receptors (GPCRs) is 378 Da (SD ¼ 93 Da), close to the mean MW of approved drugs
(383 Da, SD ¼ 155 Da), while the mean MW of peptide GPCR ligands is greater at 514 Da but with a wider spread
(SD ¼ 202 Da), significantly over Lipinski’s ‘rule-of-five’ limits of 500 Da.
Development of the ideas in druglikeness has lead to the proposal of the concept of ‘degrees of druggability.’
Degrees of druggability proposed druggability and druglikeness can be measured as a probabilistic continuum, where
two protein targets may be both classified as druggable but may exhibit differences in the probabilities of success,
due to the physicochemical properties of their respective ligands. One proposed measure of the degree of druggability
is proposed as the distance of the centroid in reduced chemical space (e.g., MW, ClogP, number of hydrogen bond
donors, and number of hydrogen bond acceptors) for all of the potent actives (i.e., binding affinities o100 nM)
associated with each target, to that of the centroid of the probabilistic clustering of approved oral drugs. Over 65% of
targets for oral drugs are within a distance of 0.4 from the centroid of oral drug space and 87% of oral drug targets are
within a distance of 0.6. Within these degrees of druggability, approximately 200 human targets with potent leads,
including the current drug targets, are within a distance of 0.4 from the oral drug centroid but have yet to produce
approved drugs.
Gene taxonomy MW (Da) (Mean) MW (Da) (SD) MW (Da) 90% limit of MW ClogP (Mean) ClogP (SD) ClogP (Median) 90% limit of ClogP
(Median) (Da)
Ion channels ligand 359 91 362 430 3.0 1.8 3.2 4.7
gated
Peptide GPCRs 514 202 477 752 4.3 2.3 4.6 6.5
Protein kinases 407 109 402 505 3.8 1.8 3.9 5.7
Serine proteases 467 145 463 572 2.7 2.1 2.7 4.8
No. of hydrogen No. of hydrogen No. of hydrogen 90% limit of no. of No. of hydrogen No. of hydrogen No. of hydrogen 90% limit of no. of
bond acceptors bond acceptors bond acceptors hydrogen bond bond donors bond donors bond donors hydrogen bond
(Mean) (SD) (Median) acceptors (Mean) (SD) (Median) donors
Aminergic GPCRs 4 2 4 6 1 1 1 2
Metalloproteases 6 2 6 8 3 1 2 4
Nuclear hormone 4 2 4 6 1 1 1 2
receptors
Peptide GPCRs 5 4 4 10 2 3 1 8
Phosphodiesterases 6 2 6 8 1 1 1 2
Protein kinases 5 2 5 7 2 1 2 4
Serine proteases 5 3 5 8 3 2 2 4
No. of rotatable No. of rotatable No. of rotatable 90% limit of no. of Ligand efficiency Ligand efficiency Ligand efficiency
bonds (Mean) bonds (SD) bonds (Median) rotatable bonds (kcal mol 1 per (kcal mol 1 per (kcal mol 1 per
non-H atoms) non-H atoms) non-H atoms)
(Mean) (SD) (Median)
Aminergic GPCRs 6 3 6 8 0.4 8.0E-02 0.4
Chemogenomics in Drug Discovery – The Druggable Genome and Target Class Properties
Nuclear hormone 6 3 6 10 0.3 6.E-02 0.3
receptors
425
426 Chemogenomics in Drug Discovery – The Druggable Genome and Target Class Properties
0.200 0.50
0.175 0.45
0.40
0.150
0.35
0.125 0.30
0.100 0.25
0.075 0.20
0.15
0.050
0.10
0.025 0.05
0.000 0.00
0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 0 1 2 3 4 5 6 7 8 9 10 11
(c) Number of rotatable bonds (d) Number of hydrogen-bond acceptors
Proportion of gene family compounds
0.8
0.7
0.6 Aminergic GPCRs Protein kinases
0.5 Peptide GPCRs Metalloproteases
0.4 Serine proteases Ion channels − ligand gated
0.3 Phosphodiesterases Nuclear hormone receptors
0.2
0.1
0.0
0 1 2 3 4 5 6 7 8 9
(e) Number of hydrogen-bond donors
Figure 1 Distinct differences in the distribution of molecular properties between sets of compounds active against different
gene families by (a) molecular weight, (b) ClogP, (c) number of rotatable bonds, (d) number of hydrogen bond acceptors,
(e) number of hydrogen bond donors.
1000
All reported targets
900 Reported targets with actives <100 nM
New targets per year with actives <100 nM
800
700
600
No. of targets
500
400
300
200
100
0
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
Year
Figure 2 Number of protein targets with small-molecule leads reported in the medicinal chemistry literature per year.
80%
Proportion of published targets
60%
40%
20%
0%
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
Year of publication
Figure 3 Changes in the pharmaceutical industry’s portfolio of targets classes (as disclosed in medicinal chemistry literature
per year).
428 Chemogenomics in Drug Discovery – The Druggable Genome and Target Class Properties
700
650
600
550
500
MW
450
c
400
350
300
250
Aminergic GPCRs Peptide GPCRs All literature compounds
200
1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004
Year of publication
Figure 4 Steady rise in the median MW of reported medicinal chemistry compounds over time.2
650
Peptide GPCRs
600
Al compounds
Aminergic GPCRs
550
500
MW (Da)
450
400
350
300
146,204
10,022
250
8,057
1248
1631
136
930
389
100
185
41
53
83
17
35
200
Pre-clinical Phase I Phase II Phase III Approved
Clinical phase
Figure 5 Decline in MW of drugs in development. Median MW between aminergic GPCRs, peptide GPCRs, and all
compounds through subsequent stages of clinical development. The number of compounds for each class at each stage is
labeled.2
Chemogenomics in Drug Discovery – The Druggable Genome and Target Class Properties 429
1000
950
900
850
800
750
700
650
Molecular surface area (Å2)
600
550
500
450
400
350
300
250
200
150
100
50
0
0 100 200 300 400 500 600 700 800 900 1000
Molecular weight (Da)
Figure 6 Linear correlation between molecular surface area and molecular weight. Analysis of 49 456 biologically active,
druglike compounds with IC50r100 nM. Molecular weight was calculated from the chemical structures represented as
desalted, canonical SMILES strings. The calculated molecular surface area N, O, P, and S atoms. was estimated using the fast
Ertl method34 using a 2D approximation. All other atom types (excluding hydrogen atoms) were estimated using an overlapping
spheres method.
the ‘magic methyl,’ where a single methyl group placed in the correct position, can increase ligand affinity by tenfold.
The accessible hydrophobic surface area of a methyl group is approximately 46 Å2 (if one assumes that all of the
hydrophobic surface area is encapsulated by the protein binding site and thus forms full contact with the protein) with
a hydrophobic effect of 0.03 kcal mol 1 Å2 equal approximately to 1.36 kcal mol 1 equivalent to the observed tenfold
affinity increase: approximately the maximal affinity per non-hydrogen atom.22 In addition to the predominantly
hydrophobic contribution to the binding of many drugs, ionic interactions, such as those found in zinc proteases (such
as angiotensin-converting enzyme (ACE) inhibitors) contribute to the binding energy. The attraction of
complementary polar groups contributes up to up to 0.1 kcal mol 1 Å2, with ionic salt bridge approximately three
times greater, allowing low molecular weight compounds to bind strongly. Unlike hydrophobic interactions
complementary polar interactions are dependent on the correct geometry.
2
accessible hydrophobic surface area of at least 350 Å .31 Encapsulated cavities maximize the ratio of the surface area to
the volume and are thus capable of binding low molecular weight compounds with high affinities.
The hypothesis that the physicochemical properties of cavities on protein structures can be analyzed a priori to
predict the druggability of a protein has been developed further into automatic algorithms to assess the protein
structures in the Protein Data Bank (PDB) and the stream of novel structures determined by the structural genomics
initiatives.29,30–33 Empirical druggability predictions have been explored experimentally at Abbott using heteronuclear
nuclear magnetic resonance (NMR) to identify and characterize the binding surfaces on protein by screening B10 000
low molecular weight molecules (average MW 220, average ClogP 1.5).31,32 In a small sample of 23 proteins, the
screening results reveal that about 90% of the ligands binds to sites known to be small-molecule ligand binding sites.
Only in three out of the 23 proteins were distinct uncompetitive new binding sites discovered. In the relatively small
sample of proteins studied, Hajduk et al. noted a high correlation between experimental NMR hit rates and the ability
to find high-affinity ligands. From the experimental screening hit rates, Hajduk et al. constructed a simple model that
included physicochemical property descriptor such as cavity dimensions, surface complexity, and polar and apolar
surface area that accurately predicts the experimental screening hit rates with an R2 of 0.72, and an adjusted R2 of 0.65.
A decision-tree approach to assessing the druggability of protein structure has been developed by Inpharmatica by
Al-Lazikani and Overington.29 A range of physicochemical properties of the identified binding sites and cavities were
calculated from the protein structures including volume, depth, curvature, accessibility, hydrophobic surface area, and
polar surface area. The algorithm was trained set against a test set of 400 protein complexes binding small molecule,
‘rule-of-five’ compliant ligands. From this analysis a decision-tree was derived to predict the druggability of a binding
site or cavity from calculated physicochemical properties. The decision-tree predicts whether a cavity is druggable
within the statistical confidence of the tree. A success rate of 91% when predicting druggability on the protein drug
targets has been claimed for this approach.29 The method requires either an experimentally derived structure or a high-
quality homology model. Ideally, because of the inherent flexibility of many protein–ligand binding sites, a sample of
multiple conformations is preferred. The decision-tree method was applied to the entire PDB (December 2004
release). Following a clean-up process, 27 409 files were suitable for analysis, further classified into 76 322 structural
domains (using SCOP33) of which 28% (21 522) were found to have at least one site predicated to have some degree of
druggability. From this analysis a nonredundant set of 427 human proteins were predicted to contains a druggable
binding site, of which 281 had no prior known compounds or drugs developed against them. In a similar analysis Hudjuk
et al. calculated the druggability of 1000 nonredundant human proteins derived from the PDB, of which 35% of entries
contained at least one site predicted to be highly druggable; slightly higher but comparable with Al-Lazikani’s
prediction.
4.17.8 Conclusions
The palette of potential drug targets for modern medicinal chemistry can now be efficiently derived from searching
entire genomes. Knowledge-based methods enable the mapping of chemical space to protein structure and protein
sequences to predict druggable targets. The observed relationships between the physicochemical properties of ligands
and the targets they bind to identifies not only potential druggable targets but also the degree of druggability – a means
of assessing their probabilistic likelihood of success. Understanding the degrees of druggability between protein targets
can aid the medicinal chemist in the selection of a portfolio of drug targets, in the design of the screening strategy, in
identifying the likely region of chemical space target ligands may reside in and in the probability of success through
clinical development relative to disease indication. Advances in molecular biology and structural biology have had great
impact on the practice of modern medicinal chemistry by enabling a detailed understanding of the atomic basis of SARs
on individual protein targets. The next wave of advances in modern medicinal chemistry is likely to benefit from the
effective integration of pharmacological and chemogenomic knowledge gained from the past 100 years in a collective
whole to aid the practice of target discovery and compound design.
References
1. Overington, J.; Al-Lazikani, B.; Hopkins, A. L. Nat. Rev. Drug Disc. 2006, 5, in press.
2. Paolini, G. V.; Shapland, R. H. B.; Van Hoorn, W. P.; Mason, J. S.; Hopkins, A. L. Nat. Biotech. 2006, 24, 805–815.
3. Overington, J.; Al-Lazikani, B. Inpharmatica, StARlite database; Inpharmatica Ltd: London, 2005.
4. Lipinski, C. A.; Lombardo, F.; Dominy, B. W.; Feeney, P. J. Adv. Drug Deliv. Rev. 1997, 23, 3–25.
5. Ajay, A.; Walters, W. P.; Murcko, M. A. J. Med. Chem. 1998, 41, 3314–3324.
6. Wang, J.; Ramnarayan, K. J. Combin. Chem. 1999, 1, 524–533.
432 Chemogenomics in Drug Discovery – The Druggable Genome and Target Class Properties
7. Walters, W. P.; Ajay, A.; Murcko, M. A. Curr. Opin. Chem. Biol. 1999, 3, 384–387.
8. Lipinski, C. A. J. Pharmacol. Toxicol. Methods 2000, 44, 3–25.
9. Podlogar, B. L.; Muegge, I.; Brice, L. J. Curr. Opin. Drug Disc. Dev. 2001, 4, 102–109.
10. Muegge, I.; Heald, S. L.; Brittelli, D. J. Med. Chem. 2001, 44, 1841–1846.
11. Veber, D. F.; Johnson, S. R.; Cheng, H. Y.; Smith, B. R.; Ward, K. W.; Kopple, K. D. J. Med. Chem. 2002, 45, 2615–2623.
12. Proudfoot, J. R. Bioorg. Med. Chem. Lett. 2002, 12, 1647–1650.
13. Walters, W. P.; Murcko, M. A. Adv. Drug Deliv. Rev. 2002, 54, 255–271.
14. Egan, W. J.; Walters, W. P.; Murcko, M. A. Curr. Opin. Drug Disc. Dev. 2002, 5, 540–549.
15. Muegge, I. Med. Res. Rev. 2003, 23, 302–321.
16. Lajiness, M. S.; Vieth, M.; Erickson, J. Curr. Opin. Drug Disc. Dev. 2004, 7, 470–477.
17. Vieth, M.; Siegel, M. G.; Higgs, R. E.; Watson, I. A.; Robertson, D. H.; Savin, K. A.; Durst, G. L.; Hipskind, P. A. J. Med. Chem. 2004, 47,
224–232.
18. Wenlock, M. C.; Austin, R. P.; Barton, P.; Davis, A. M.; Leeson, P. D. J. Med. Chem. 2003, 46, 1250–1256.
19. Blake, J. F. Biotechniques 2003, June, 16-20.
20. Hopkins, A. L.; Groom, C. R.; Alex, A. Drug Disc. Today 2004, 9, 430–431.
21. Cele, A. Z.; Metz, J. T. Drug Disc. Today 2005, 10, 464–469.
22. Kuntz, I. D.; Chen, K.; Sharp, K. A.; Kollman, P. A. Proc. Natl. Acad. Sci. USA 1999, 96, 9997–10002.
23. International Human Genome Sequencing Consortium. Nature 2001, 409, 860–921.
24. Venter, J. C.; Adams, M. D.; Myers, E. W.; Li, P. W.; Mural, R. J.; Sutton, G. G.; Smith, H. O.; Yandell, M.; Evans, C. A.; Holt, R. A. et al.
Science 2001, 291, 1304–1351.
25. Hopkins, A. L.; Groom, C. R. Nat. Rev. Drug Disc. 2002, 1, 727–730.
26. Orth, A. P.; Batalov, S.; Perrone, M.; Chanda, S. K. Expert Opin. Ther. Targets 2004, 8, 587–596.
27. Russ, A. P.; Lampel, S. Drug Disc. Today 2005, 10, 1607–1610.
28. Lander, E. S.; Linton, L. M.; Birren, B.; Nusbaum, C.; Zody, M. C.; Baldwin, J.; Devon, K.; Dewar, K.; Doyle, M.; FitzHugh, W. et al. Nature
2004, 409, 860–921.
29. Al-Lazikani, B.; Gaulton, A.; Paolini, G.; Lanfear, J.; Overington, J.; Hopkins, A. Chemical Biology. In From Small Molecules to Systems Biology and
Drug Design; Schreiber, S. L., Kapoor, T., Wess, G., Eds.; Wiley: New York, 2007, pp 1–20.
30. Hopkins, A. L.; Groom, C. R. Ernst Schering Res. Found. Workshop 2003, 42, 11–17.
31. Hajduk, P. J.; Huth, J. R.; Tse, C. Predicting Protein druggability. Drug Disc. Today 2005, 10, 1675–1682.
32. Hajduk, P. J.; Huth, J. R.; Fesik, S. W. J. Med. Chem. 2005, 48, 2518–2525.
33. Murzin, A. G.; Brenner, S. E.; Hubbard, T.; Chothia, C. J. Mol. Biol. 1995, 274, 536–540.
34. Ertl, P.; Rohde, B.; Selzer, P. J. Med. Chem. 2000, 43, 3714–3717.
35. Finn, R. D.; Mistry, J.; Schuster-Bockler, B.; Griffiths-Jones, S.; Hollich, V.; Lassman, T.; Moxon, S.; Marshall, M.; Khanna, A.; Durbin, R. et al.
Nucleic Acids Res. 2006, 34, D247–D251.
Biographies
Andrew L Hopkins is presently Associate Research Fellow and Head of Chemogenomics at the Sandwich site of
Pfizer Global Research and Development. He joined Pfizer in 1998 at Sandwich, Kent, UK. Over the years, he has
established various new functions for Pfizer including, Target Analysis in 1999, Indications Discovery in 2001 and most
recently Knowledge Discovery in 2004. He won a British Steel scholarship to attend the University of Manchester from
where he graduated with first class honours in 1993 with a BSc in Chemistry. Following a brief spell in the steel industry
he won a Wellcome studentship to attend the University of Oxford, working with Prof David I Stuart FRS. He
received his DPhil in Structural Biology from the University of Oxford in 1998. During his doctorate research
Dr Hopkins designed a new class of anti-HIV agents which were developed to drug candidates by Glaxo-Wellcome.
Following his interest in drug discovery he then joined Pfizer directly after graduating from Oxford. At Pfizer,
Chemogenomics in Drug Discovery – The Druggable Genome and Target Class Properties 433
Dr Hopkins’ research involves combining chemical and biological knowledge to identify new targets or other new
opportunities for medicines. His work has involved the design and construction of major informatics systems, including
literature-mining system and a large-scale chemogenomics knowledge-base. He is the author of over 6 patents and 25
scientific publications, two of which have been cited as Hot Paper by the Thomson ISI citation index. Dr Hopkins lives
in Canterbury, Kent, UK.
Gaia Paolini is Senior Principal Scientist at the Sandwich site of Pfizer Global Research and Development. Gaia joined
Pfizer in 2002 at Sandwich, Kent, UK. Gaia received her degree (laurea) in Physics at the University of Rome, with a
research thesis on statistical mechanics computer simulation of condensed matter systems. In her career, Gaia has held
a number of positions in industry and academia, combining roles of research scientist, software specialist, and business
systems analyst. At Pfizer, Gaia designed and developed a LIMS system for structural biology, and, in her current role,
led the design, development and mining of a large chemogenomics knowledge base. For her contribution she has won a
Pfizer Achievement Award in 2006. Gaia is the author of 18 peer-reviewed scientific publications, spanning the fields of
applied mathematics, materials science, classical density functional theory, and chemogenomics. Gaia currently lives in
Canterbury, Kent, UK.
& 2007 Elsevier Ltd. All Rights Reserved Comprehensive Medicinal Chemistry II
No part of this publication may be reproduced, stored in any retrieval system or transmitted ISBN (set): 0-08-044513-6
in any form by any means electronic, electrostatic, magnetic tape, mechanical, photocopying,
recording or otherwise, without permission in writing from the publishers ISBN (Volume 4) 0-08-044517-9; pp. 421–433