Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

4.

17 Chemogenomics in Drug Discovery – The


Druggable Genome and Target Class
Properties
A L Hopkins and G V Paolini, Pfizer Global Research and Development, Sandwich, UK
& 2007 Published by Elsevier Ltd.

4.17.1 Introduction 421


4.17.2 Pharmacological Target Space 421
4.17.3 Chemical Properties of Drugs and Leads 422
4.17.4 Trends in Molecular Properties 423
4.17.5 Identifying the Druggable Genome 429
4.17.6 Molecular Recognition Basis for Druggability 429
4.17.7 Analysis of Protein Structures as Drug Targets 430
4.17.8 Conclusions 431
References 431

4.17.1 Introduction
Over the past 100 years since Paul Ehrlich’s first systematic search for drugs to discover arsphenamine (Salvarsan),
medicinal chemistry has continuously sought more effectively means to navigate the vastness of chemical space in the
search for new therapies. Arguably the greatest contributions to the changing practice of medicinal chemistry in recent
decades have come from the influence of molecular biology and protein crystallography. Advances in molecular biology,
culminating in whole genome sequencing, provide modern drug discoverers with the entire palette of proteins that are
the past and future drug targets. From the genomics scale to the atomic scale, insights from protein crystallography
enable drug designers to observe in atomic resolution the details of the interaction between ligands and drug targets.
Modern medicinal chemistry is now capable of synthesizing knowledge from structure–activity relationships (SARs),
large-scale screening campaigns, and insights from structure-based drug design to find the intersects between protein
sequences and chemical structure. Chemogenomics attempts to integrate chemical space with biology on a genome
scale. In the following chapter we outline how insights from chemogenomics can be directly applied in medicinal
chemistry in the target discovery and lead discovery stages.

4.17.2 Pharmacological Target Space


One of the key questions for molecular approaches to medicinal chemistry is what are all the proteins which current
leads and drugs act upon? The list of molecular targets for which small-molecule chemical matter has been discovered
has been difficult to ascertain, because of the lack of integrated and accessible databases for pharmacological
information. Overington et al.1 from a comprehensive survey of the literature identifies 196 human protein drug targets
that the current pharmacopoeia of US Food and Drug Administration (FDA) approved small-molecule drugs act on. In
terms of the number of protein targets for which lead matter has been identified, Paolini et al. have attempted the large-
scale integration of proprietary and published screening data to identify the number of unique molecular targets for
which chemical tools, leads, or drugs have been discovered.2 The Paolini et al. global survey of the data from Pfizer,
Warner-Lambert, and Pharmacia integrated with a large body of medicinal chemistry SAR results published in
the literature (J. Med. Chem. 1980–2003 and Bioorg. Med. Chem. Lett. 1990–2003)3 identified for 1306 proteins from
55 organisms, with biologically active chemical matter. These include a nonredundant list of 836 genes in the human
genome for which small-molecule chemical tools have been discovered, of which 727 human targets have at least one
compound with binding affinity below 10 mM compliant with Lipinski’s ‘rule-of-five’ criteria for oral drug absorption4
and 529 human targets have at least one ‘rule-of-five’ compound below 100 nM (Table 1).

421
422 Chemogenomics in Drug Discovery – The Druggable Genome and Target Class Properties

Table 1 Pharmacological target space2

Gene taxonomy Human targets at Human targets at Human targets at Human targets
o10 mM o1 mM o10 mM at o100 mM
Ro5 an41 Ro5 a n41

Protein kinases 105 99 98 83

Peptide GPCRs 63 59 59 42

Transferases 49 42 36 24

Aminergic GPCRs 35 35 35 35

GPCRs Class A others 44 44 40 32

Oxidoreductases 40 36 38 25

Metalloproteases 44 41 41 35

Hydrolases 36 29 30 21

Ion channels ligand gated 29 28 24 22

Nuclear hormone receptors 24 24 22 19

Serine proteases 30 30 28 21

Ion channels others 18 16 16 11

PDEs 19 19 19 18

Cysteine proteases 16 16 14 13

GPCRs Class C 10 10 10 6

Kinases others 12 9 11 5

GPCRs Class B 7 7 4 3

Aspartyl proteases 7 7 4 4

Others 139 119 108 63

Enzymes others 109 97 90 47

Total 836 767 727 529


a 4
Compounds passing Lipinski’s ‘rule-of-five’ criteria.

4.17.3 Chemical Properties of Drugs and Leads


Essential to the design of a drug are the physicochemical characteristics of the lead compound. A balance of solubility
and polar/hydrophobic properties is crucial for specific routes of absorption and membrane permeabilities and other
biological barriers that a drug needs to penetrate to reach the desired site of action, in order to affect the biological
equilibrium of a whole organism. The presence of such biological barriers limits the range of molecular properties, and
thus the chemical space the medicinal chemists can design within.
Lipinski’s analysis of the Derwent World Drug Index introduced the concept of physicochemical property limits,
with respect to solubility and permeability of drugs. Lipinski et al. demonstrated that orally administered drugs are far
more likely to reside in areas of chemical space defined by a limited range of molecular properties. Lipinski’s4 analysis
showed that drugs with molecular weights of less than 500 Da, fewer than 5 hydrogen-bond donors (such as the
combined OH and NH group count), fewer than 10 hydrogen-bond acceptors (such as the combined nitrogen and
oxygen atom count), and lipophilicity less than calculated log P (ClogP) of 5 were far more likely to be orally absorbed.
The multiples of five observed in the molecular properties of drugs led to the coining of the term Lipinski’s ‘rule-of-
five’ (Ro5). Several methods of predicting ‘druglikeness’ have been proposed, in which a defined range of molecular
properties and physicochemical descriptors can discriminate between drugs and nondrugs for such characteristics as
oral absorption, aqueous solubility, and permeability.4–16
Chemogenomics in Drug Discovery – The Druggable Genome and Target Class Properties 423

Since our current data on the properties of drugs point to a range of molecular properties within which the
likelihood of compound becoming an oral drug is increased, it is interesting to ask how do the range of properties of
ligands binding to specific targets overlap with ‘druglike’ space. Paolini et al. have investigated the relationship between
target class and the physicochemical properties of ligands by calculating a set of physicochemical descriptors for over a
quarter of a million biologically active compounds, across over 1300 targets, where the protein sequences assigned
to each of the pharmacological targets were classified into gene families. Distinct differences in the distribution
of molecular properties between sets of compounds active against different gene families were observed (Table 2,
Figure 1). For example, ligands for the nuclear hormone receptors are significantly most lipophilic, as measured by
ClogP, mirroring the properties of steroids. In comparison the mean molecular weight (MW) of ligands binding to
aminergic G protein-coupled receptors (GPCRs) is 378 Da (SD ¼ 93 Da), close to the mean MW of approved drugs
(383 Da, SD ¼ 155 Da), while the mean MW of peptide GPCR ligands is greater at 514 Da but with a wider spread
(SD ¼ 202 Da), significantly over Lipinski’s ‘rule-of-five’ limits of 500 Da.
Development of the ideas in druglikeness has lead to the proposal of the concept of ‘degrees of druggability.’
Degrees of druggability proposed druggability and druglikeness can be measured as a probabilistic continuum, where
two protein targets may be both classified as druggable but may exhibit differences in the probabilities of success,
due to the physicochemical properties of their respective ligands. One proposed measure of the degree of druggability
is proposed as the distance of the centroid in reduced chemical space (e.g., MW, ClogP, number of hydrogen bond
donors, and number of hydrogen bond acceptors) for all of the potent actives (i.e., binding affinities o100 nM)
associated with each target, to that of the centroid of the probabilistic clustering of approved oral drugs. Over 65% of
targets for oral drugs are within a distance of 0.4 from the centroid of oral drug space and 87% of oral drug targets are
within a distance of 0.6. Within these degrees of druggability, approximately 200 human targets with potent leads,
including the current drug targets, are within a distance of 0.4 from the oral drug centroid but have yet to produce
approved drugs.

4.17.4 Trends in Molecular Properties


Over the past two decades the number of targets (including selectivity counterscreens) published in the medicinal
chemistry literature has been growing steadily. In recent years screening data on nearly 900 proteins have been published,
with around 500 molecular targets reported with potent chemical matter with binding affinities below 100 nM.2
Chemical tools and leads for approximately 80 to 100 new molecular targets are first disclosed each year (Figure 2). No
doubt this is a conservative estimate as many new compounds and targets are only disclosed in patents, which are not
included in this initial analysis, which was based on published journal data.2 In comparison, the rate of first disclosure for
novel targets with new leads has doubled from an average of 30 new targets with leads being disclosed in the 1980s to an
average of 60 new targets per year in the 1990s. Over the same time period there have also been some significant trends
in the changing character of the industry’s portfolio of targets and targets classes (Figure 3) such as rise of interest across
the industry in protein kinases and the relative decline in proportion of aminergic GPCRs in the industry’s target
portfolio.
Interestingly over the past two decades there has been a steady rise in the mean and median molecular weight of
reported medicinal chemistry compounds (Figure 4) by around 20% with the median MW of all reported medicinal
chemistry compounds in the literature rising 68 Da from 354 to 422 Da, for the periods 1986–1990 to 1999–2003,
respectively. Interestingly, this growth is also reflected across the board in the increase of the median MW of disclosed
ligands for several gene families. Aminergic GPCRs compounds increased in molecular weight by 56 Da from 337 to
393 Da between the two 5-year periods. In contrast to the changes in the properties of medicinal chemistry compounds
over the past two decades, Vieth et al.17 have observed that the distribution of mean molecular properties of approved
oral (small-molecule) drugs has changed little in the past 20 years, despite differences in the range of indications and
targets. Interestingly, a steady decline in MW through each subsequent stage of clinical development and increase in
the proportion of compounds that are ‘rule-of-five’ compliant has also been observed.18,19 The relative difference in
molecular properties between the gene families is also reflected in compounds in clinic development; however, even
within a gene family, the median MW of compounds surviving subsequent clinical phases exhibits a slight decline
(Figure 5).2
In order to reduce the MW of leads and clinical candidates, and improve their chances in clinical development, the
metric of ‘ligand efficiency’20,21 is gaining popularity amongst medicinal chemists as a means to assess the potential of a
low molecular weight but low-affinity lead to be optimized into a high-affinity clinical candidate. The binding energy of
the ligand per atom,22 or ligand efficiency (Dg)20 of a compound can be calculated by converting the Kd into the free
energy of binding (eqn [1]) at 300 K and dividing by the number of ‘heavy’ (i.e., non-hydrogen atoms) atoms (eqn [2]):
424
Chemogenomics in Drug Discovery – The Druggable Genome and Target Class Properties
Table 2 Physicochemical properties of ligands by gene family

Gene taxonomy MW (Da) (Mean) MW (Da) (SD) MW (Da) 90% limit of MW ClogP (Mean) ClogP (SD) ClogP (Median) 90% limit of ClogP
(Median) (Da)

Aminergic GPCRs 378 93 376 460 3.8 1.6 3.9 5.6

Ion channels ligand 359 91 362 430 3.0 1.8 3.2 4.7
gated

Metalloproteases 428 103 429 530 3.0 1.9 3.1 4.8

Nuclear hormone 398 96 396 495 5.1 1.7 5.0 7.3


receptors

Peptide GPCRs 514 202 477 752 4.3 2.3 4.6 6.5

Phosphodiesterases 400 65 397 465 3.7 1.4 3.7 5.2

Protein kinases 407 109 402 505 3.8 1.8 3.9 5.7

Serine proteases 467 145 463 572 2.7 2.1 2.7 4.8

No. of hydrogen No. of hydrogen No. of hydrogen 90% limit of no. of No. of hydrogen No. of hydrogen No. of hydrogen 90% limit of no. of
bond acceptors bond acceptors bond acceptors hydrogen bond bond donors bond donors bond donors hydrogen bond
(Mean) (SD) (Median) acceptors (Mean) (SD) (Median) donors
Aminergic GPCRs 4 2 4 6 1 1 1 2

Ion channels ligand 4 2 4 6 2 1 2 3


gated

Metalloproteases 6 2 6 8 3 1 2 4

Nuclear hormone 4 2 4 6 1 1 1 2
receptors

Peptide GPCRs 5 4 4 10 2 3 1 8

Phosphodiesterases 6 2 6 8 1 1 1 2

Protein kinases 5 2 5 7 2 1 2 4

Serine proteases 5 3 5 8 3 2 2 4
No. of rotatable No. of rotatable No. of rotatable 90% limit of no. of Ligand efficiency Ligand efficiency Ligand efficiency
bonds (Mean) bonds (SD) bonds (Median) rotatable bonds (kcal mol  1 per (kcal mol  1 per (kcal mol  1 per
non-H atoms) non-H atoms) non-H atoms)
(Mean) (SD) (Median)
Aminergic GPCRs 6 3 6 8 0.4 8.0E-02 0.4

Ion channels ligand 5 3 4 7 0.4 0.1 0.4


gated

Metalloproteases 8 4 8 13 0.4 0.2 0.3

Chemogenomics in Drug Discovery – The Druggable Genome and Target Class Properties
Nuclear hormone 6 3 6 10 0.3 6.E-02 0.3
receptors

Peptide GPCRs 9 7 8 17 0.2 7.E-02 0.2

Phosphodiesterases 6 3 6 9 0.3 3.E-02 0.3

Protein kinases 6 3 5 9 0.3 7.E-02 0.3

Serine proteases 8 5 7 12 0.3 9.E-02 0.3

425
426 Chemogenomics in Drug Discovery – The Druggable Genome and Target Class Properties

Proportion of gene family compounds

Proportion of gene family compounds


0.300 0.150
0.275
0.250 0.125
0.225
0.200 0.100
0.175
0.150 0.075
0.125
0.100 0.050
0.075
0.050 0.025
0.025
0.000 0.000
0 100 200 300 400 500 600 700 800 900 −10 −5 0 5 10 15
(a) Molecular weight (MW) (b) ClogP

Proportion of gene family compounds


Proportion of gene family compounds

0.200 0.50
0.175 0.45
0.40
0.150
0.35
0.125 0.30
0.100 0.25
0.075 0.20
0.15
0.050
0.10
0.025 0.05
0.000 0.00
0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 0 1 2 3 4 5 6 7 8 9 10 11
(c) Number of rotatable bonds (d) Number of hydrogen-bond acceptors
Proportion of gene family compounds

0.8
0.7
0.6 Aminergic GPCRs Protein kinases
0.5 Peptide GPCRs Metalloproteases
0.4 Serine proteases Ion channels − ligand gated
0.3 Phosphodiesterases Nuclear hormone receptors

0.2
0.1
0.0
0 1 2 3 4 5 6 7 8 9
(e) Number of hydrogen-bond donors

Figure 1 Distinct differences in the distribution of molecular properties between sets of compounds active against different
gene families by (a) molecular weight, (b) ClogP, (c) number of rotatable bonds, (d) number of hydrogen bond acceptors,
(e) number of hydrogen bond donors.

Free energy of ligand binding:

DG ¼ RT ln Ki ¼ 1:4 log Ki ½1

where R ¼ gas constant ¼ 1.986 cal mol  1 K  1


Binding energy per atom (ligand efficiency):

Dg ¼ DG=Nnon-hydrogen atoms ½2


Chemogenomics in Drug Discovery – The Druggable Genome and Target Class Properties 427

1000
All reported targets
900 Reported targets with actives <100 nM
New targets per year with actives <100 nM
800

700

600
No. of targets

500

400

300

200

100

0
1980
1981

1982

1983

1984
1985

1986
1987

1988
1989

1990
1991

1992

1993

1994

1995
1996
1997

1998

1999

2000
2001

2002

2003
Year
Figure 2 Number of protein targets with small-molecule leads reported in the medicinal chemistry literature per year.

Transferases Oxidoreductases Ion channels - ligand gated GPCRs Class B - others


Proteases Others Ion channels: others GPCRs Class A - others
Peptide GPCRs Nuclear hormone receptors Hydrolases Enzymes - others
PDEs Kinases GPCRs Class C - others Aminergic GPCRs
100%

80%
Proportion of published targets

60%

40%

20%

0%
1980

1981

1982

1983

1984

1985

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

2001

2002

2003

Year of publication

Figure 3 Changes in the pharmaceutical industry’s portfolio of targets classes (as disclosed in medicinal chemistry literature
per year).
428 Chemogenomics in Drug Discovery – The Druggable Genome and Target Class Properties

700

650

600

550

500
MW

450
c
400

350

300

250
Aminergic GPCRs Peptide GPCRs All literature compounds
200
1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004
Year of publication
Figure 4 Steady rise in the median MW of reported medicinal chemistry compounds over time.2

650

Peptide GPCRs
600
Al compounds
Aminergic GPCRs
550

500
MW (Da)

450

400

350

300
146,204
10,022

250
8,057

1248

1631
136
930

389
100

185
41
53

83

17

35

200
Pre-clinical Phase I Phase II Phase III Approved
Clinical phase
Figure 5 Decline in MW of drugs in development. Median MW between aminergic GPCRs, peptide GPCRs, and all
compounds through subsequent stages of clinical development. The number of compounds for each class at each stage is
labeled.2
Chemogenomics in Drug Discovery – The Druggable Genome and Target Class Properties 429

4.17.5 Identifying the Druggable Genome


The knowledge of which proteins current medicinal chemistry has developed drugs and leads against can be used to
infer the subset of the proteins expressed by the human genome that have a high probability of being potentially
druggable, i.e., capable of binding druglike small molecules with high affinity. The first systematic estimate of the
number of druggable proteins – the ‘druggable genome’ – following the publication of the draft human genome,23,24 was
based on a search for membership of an extensive list of druggable gene families.25 Gene family-based analysis assumes
that the sequence and functional similarities underlie a conservation of binding site architecture between protein
family members. Thus the explicit assumption being that if one member of a gene family is modulated by a drug
molecule, other members of the druggable protein domain family are likely to also be able to bind a compound with
similar physicochemical properties. Thus analysis based on druggable protein families or domains is likely to
overestimate the number of druggable targets. Following the construction of a drug target sequence database of 399
targets of approved and experimental drugs and leads, 376 sequences could be assigned to 130 drug-binding domains,
as captured by their InterPro domain annotation. Of these, 125 InterPro domains have orthologs present in the human
proteome. At the time of the initial draft of the human genome23,24 3051 genes were identified as belonging to the 125
druggable protein domains and thus predicted to encoded proteins that are inferred to bind a drug like molecules.
Further refinements of the Hopkins and Groom analysis has been published by Orth et al.26 and Russ and Lampel27
reflecting how the number of predicted protein expressing genes in the human genome has been modified since the
initial draft. Orth et al.26 estimate that there are 3080 genes belonging the druggable genome with over 2950 druggable
gene sequences in public databases in 2004 based on an estimate of the InterPro domain assignments of druggable gene
families.25 Russ and Lampel27 conducted and analysis of the 120 druggable protein domains using InterPro and Pfam35
on the final assembly of the human genome.28 Overall the Pfam protein domain annotation predicted fewer false
positives than the InterPro classification used. When corrected for the overestimate of olfactory and taste GPCRs the
authors identify, again, 3050 druggable genes from the previously defined set of druggable protein domains,25 but with
some significant changes within individual gene families. Using more stringent predictions for enzymes, proteases, and
other subfamilies a conservative estimate of around approximately 2200 druggable genes are identified.27
In order to expand the homology analysis methodology for identifying which targets expressed from the human
genome are likely to be druggable, it is necessary to expand our survey to identify all the known biological targets of
drugs and lead compounds. Al-Lazikani and Overington (Inpharmatica, London) have conducted the most extensive
analysis, to date, on identifying the druggable genome, based on the homology to chemically tractable drug targets.29
Using the BLAST sequence alignment algorithm to search each of the sequences against the human genome,
Al-Lazikani and Overington identified 945 distinct genes that show homology to 170 human proteins of approved small-
molecular drugs,3 at a cutoff of 30% sequence identity and E-value less than or equal to 10  5. Expanding the BLAST
analysis to include human proteins from the known small-molecule chemical leads, Al-Lazikani and Overington
expanded the sequence set to include Inpharmatica’s StARLITe database of medicinal chemistry journal data (i.e.,
J. Med. Chem. 1980–2004, Bioorg. Med. Chem. Lett. 1990–2004) containing 1155 protein targets known with at least one
drug or lead compound with a binding affinity below 10 mM, 707 of which are human molecular targets. BLAST
sequence analysis of this database of medicinal chemistry literature3 identified 2921 protein sequences within the same
sequence identify cutoffs, which are predicted to be druggable proteins expressed by the human genome.

4.17.6 Molecular Recognition Basis for Druggability


The hypothesis that the druggability of a protein can be assessed a priori derives from the biophysical basis of molecular
recognition.30–32 The binding energy (DG) of a ligand to a molecular target such as a protein, RNA, DNA, or
carbohydrate is defined in eqn [1]. Van der Waals and entropy components predominately drive the binding energy by
the burying of hydrophobic surfaces. A low-affinity ‘hit’ from a high-throughput screen of Ki ¼ 1 mM affinity equates to
 8.4 kcal mol  1. A high-affinity drug molecule binding with an affinity of Ki ¼ 10 nM requires a binding energy (DG)
of  11 kcal mol  1. Thus 1.36 kcal mol  1 of binding energy is equivalent to a 10-fold increase in potency. The binding
energy potential of a ligand is approximately proportional to the available surface area and its properties, assuming there
are no strong covalent or ionic interactions between the ligand and the protein. Analysis of nearly 50 000 biologically
active druglike molecules reveals a linear correlation between molecular surface area and molecular weight (Figure 6).
The van der Waals attractions between atoms and the hydrophobic effect from the displacement of water contributes
approximately 0.03 kcal mol  1 Å2. Thus, assuming there are no strong ionic interactions between the protein and the
ligand, a ligand with a 10 nM dissociation constant would be required to bury 370 Å2 of hydrophobic surface area. The
contribution of the hydrophobic surface to binding energy is demonstrated by the medicinal chemistry phenomenon of
430 Chemogenomics in Drug Discovery – The Druggable Genome and Target Class Properties

1000
950
900
850
800
750
700
650
Molecular surface area (Å2)

600
550
500
450
400
350
300
250
200
150
100
50
0
0 100 200 300 400 500 600 700 800 900 1000
Molecular weight (Da)
Figure 6 Linear correlation between molecular surface area and molecular weight. Analysis of 49 456 biologically active,
druglike compounds with IC50r100 nM. Molecular weight was calculated from the chemical structures represented as
desalted, canonical SMILES strings. The calculated molecular surface area N, O, P, and S atoms. was estimated using the fast
Ertl method34 using a 2D approximation. All other atom types (excluding hydrogen atoms) were estimated using an overlapping
spheres method.

the ‘magic methyl,’ where a single methyl group placed in the correct position, can increase ligand affinity by tenfold.
The accessible hydrophobic surface area of a methyl group is approximately 46 Å2 (if one assumes that all of the
hydrophobic surface area is encapsulated by the protein binding site and thus forms full contact with the protein) with
a hydrophobic effect of 0.03 kcal mol  1 Å2 equal approximately to 1.36 kcal mol  1 equivalent to the observed tenfold
affinity increase: approximately the maximal affinity per non-hydrogen atom.22 In addition to the predominantly
hydrophobic contribution to the binding of many drugs, ionic interactions, such as those found in zinc proteases (such
as angiotensin-converting enzyme (ACE) inhibitors) contribute to the binding energy. The attraction of
complementary polar groups contributes up to up to 0.1 kcal mol  1 Å2, with ionic salt bridge approximately three
times greater, allowing low molecular weight compounds to bind strongly. Unlike hydrophobic interactions
complementary polar interactions are dependent on the correct geometry.

4.17.7 Analysis of Protein Structures as Drug Targets


The physicochemical and energetic constraints of molecular recognition lead to the conclusion that a drug target needs
a ‘pocket,’ whether the pocket is predefined or formed on binding by allosteric mechanisms. Druggable cavities on
proteins that are complementary with the high-affinity binding of noncovalent, small-molecule, ‘rule-of-five’ compliant
ligands (whose binding energy is predominantly driven by the entropic, hydrophobic, and van der Waals contributions)
are predominately apolar cavities of 400–1000 Å3, where over 65% of the pocket is buried or encapsulated, with an
Chemogenomics in Drug Discovery – The Druggable Genome and Target Class Properties 431

2
accessible hydrophobic surface area of at least 350 Å .31 Encapsulated cavities maximize the ratio of the surface area to
the volume and are thus capable of binding low molecular weight compounds with high affinities.
The hypothesis that the physicochemical properties of cavities on protein structures can be analyzed a priori to
predict the druggability of a protein has been developed further into automatic algorithms to assess the protein
structures in the Protein Data Bank (PDB) and the stream of novel structures determined by the structural genomics
initiatives.29,30–33 Empirical druggability predictions have been explored experimentally at Abbott using heteronuclear
nuclear magnetic resonance (NMR) to identify and characterize the binding surfaces on protein by screening B10 000
low molecular weight molecules (average MW 220, average ClogP 1.5).31,32 In a small sample of 23 proteins, the
screening results reveal that about 90% of the ligands binds to sites known to be small-molecule ligand binding sites.
Only in three out of the 23 proteins were distinct uncompetitive new binding sites discovered. In the relatively small
sample of proteins studied, Hajduk et al. noted a high correlation between experimental NMR hit rates and the ability
to find high-affinity ligands. From the experimental screening hit rates, Hajduk et al. constructed a simple model that
included physicochemical property descriptor such as cavity dimensions, surface complexity, and polar and apolar
surface area that accurately predicts the experimental screening hit rates with an R2 of 0.72, and an adjusted R2 of 0.65.
A decision-tree approach to assessing the druggability of protein structure has been developed by Inpharmatica by
Al-Lazikani and Overington.29 A range of physicochemical properties of the identified binding sites and cavities were
calculated from the protein structures including volume, depth, curvature, accessibility, hydrophobic surface area, and
polar surface area. The algorithm was trained set against a test set of 400 protein complexes binding small molecule,
‘rule-of-five’ compliant ligands. From this analysis a decision-tree was derived to predict the druggability of a binding
site or cavity from calculated physicochemical properties. The decision-tree predicts whether a cavity is druggable
within the statistical confidence of the tree. A success rate of 91% when predicting druggability on the protein drug
targets has been claimed for this approach.29 The method requires either an experimentally derived structure or a high-
quality homology model. Ideally, because of the inherent flexibility of many protein–ligand binding sites, a sample of
multiple conformations is preferred. The decision-tree method was applied to the entire PDB (December 2004
release). Following a clean-up process, 27 409 files were suitable for analysis, further classified into 76 322 structural
domains (using SCOP33) of which 28% (21 522) were found to have at least one site predicated to have some degree of
druggability. From this analysis a nonredundant set of 427 human proteins were predicted to contains a druggable
binding site, of which 281 had no prior known compounds or drugs developed against them. In a similar analysis Hudjuk
et al. calculated the druggability of 1000 nonredundant human proteins derived from the PDB, of which 35% of entries
contained at least one site predicted to be highly druggable; slightly higher but comparable with Al-Lazikani’s
prediction.

4.17.8 Conclusions
The palette of potential drug targets for modern medicinal chemistry can now be efficiently derived from searching
entire genomes. Knowledge-based methods enable the mapping of chemical space to protein structure and protein
sequences to predict druggable targets. The observed relationships between the physicochemical properties of ligands
and the targets they bind to identifies not only potential druggable targets but also the degree of druggability – a means
of assessing their probabilistic likelihood of success. Understanding the degrees of druggability between protein targets
can aid the medicinal chemist in the selection of a portfolio of drug targets, in the design of the screening strategy, in
identifying the likely region of chemical space target ligands may reside in and in the probability of success through
clinical development relative to disease indication. Advances in molecular biology and structural biology have had great
impact on the practice of modern medicinal chemistry by enabling a detailed understanding of the atomic basis of SARs
on individual protein targets. The next wave of advances in modern medicinal chemistry is likely to benefit from the
effective integration of pharmacological and chemogenomic knowledge gained from the past 100 years in a collective
whole to aid the practice of target discovery and compound design.

References
1. Overington, J.; Al-Lazikani, B.; Hopkins, A. L. Nat. Rev. Drug Disc. 2006, 5, in press.
2. Paolini, G. V.; Shapland, R. H. B.; Van Hoorn, W. P.; Mason, J. S.; Hopkins, A. L. Nat. Biotech. 2006, 24, 805–815.
3. Overington, J.; Al-Lazikani, B. Inpharmatica, StARlite database; Inpharmatica Ltd: London, 2005.
4. Lipinski, C. A.; Lombardo, F.; Dominy, B. W.; Feeney, P. J. Adv. Drug Deliv. Rev. 1997, 23, 3–25.
5. Ajay, A.; Walters, W. P.; Murcko, M. A. J. Med. Chem. 1998, 41, 3314–3324.
6. Wang, J.; Ramnarayan, K. J. Combin. Chem. 1999, 1, 524–533.
432 Chemogenomics in Drug Discovery – The Druggable Genome and Target Class Properties

7. Walters, W. P.; Ajay, A.; Murcko, M. A. Curr. Opin. Chem. Biol. 1999, 3, 384–387.
8. Lipinski, C. A. J. Pharmacol. Toxicol. Methods 2000, 44, 3–25.
9. Podlogar, B. L.; Muegge, I.; Brice, L. J. Curr. Opin. Drug Disc. Dev. 2001, 4, 102–109.
10. Muegge, I.; Heald, S. L.; Brittelli, D. J. Med. Chem. 2001, 44, 1841–1846.
11. Veber, D. F.; Johnson, S. R.; Cheng, H. Y.; Smith, B. R.; Ward, K. W.; Kopple, K. D. J. Med. Chem. 2002, 45, 2615–2623.
12. Proudfoot, J. R. Bioorg. Med. Chem. Lett. 2002, 12, 1647–1650.
13. Walters, W. P.; Murcko, M. A. Adv. Drug Deliv. Rev. 2002, 54, 255–271.
14. Egan, W. J.; Walters, W. P.; Murcko, M. A. Curr. Opin. Drug Disc. Dev. 2002, 5, 540–549.
15. Muegge, I. Med. Res. Rev. 2003, 23, 302–321.
16. Lajiness, M. S.; Vieth, M.; Erickson, J. Curr. Opin. Drug Disc. Dev. 2004, 7, 470–477.
17. Vieth, M.; Siegel, M. G.; Higgs, R. E.; Watson, I. A.; Robertson, D. H.; Savin, K. A.; Durst, G. L.; Hipskind, P. A. J. Med. Chem. 2004, 47,
224–232.
18. Wenlock, M. C.; Austin, R. P.; Barton, P.; Davis, A. M.; Leeson, P. D. J. Med. Chem. 2003, 46, 1250–1256.
19. Blake, J. F. Biotechniques 2003, June, 16-20.
20. Hopkins, A. L.; Groom, C. R.; Alex, A. Drug Disc. Today 2004, 9, 430–431.
21. Cele, A. Z.; Metz, J. T. Drug Disc. Today 2005, 10, 464–469.
22. Kuntz, I. D.; Chen, K.; Sharp, K. A.; Kollman, P. A. Proc. Natl. Acad. Sci. USA 1999, 96, 9997–10002.
23. International Human Genome Sequencing Consortium. Nature 2001, 409, 860–921.
24. Venter, J. C.; Adams, M. D.; Myers, E. W.; Li, P. W.; Mural, R. J.; Sutton, G. G.; Smith, H. O.; Yandell, M.; Evans, C. A.; Holt, R. A. et al.
Science 2001, 291, 1304–1351.
25. Hopkins, A. L.; Groom, C. R. Nat. Rev. Drug Disc. 2002, 1, 727–730.
26. Orth, A. P.; Batalov, S.; Perrone, M.; Chanda, S. K. Expert Opin. Ther. Targets 2004, 8, 587–596.
27. Russ, A. P.; Lampel, S. Drug Disc. Today 2005, 10, 1607–1610.
28. Lander, E. S.; Linton, L. M.; Birren, B.; Nusbaum, C.; Zody, M. C.; Baldwin, J.; Devon, K.; Dewar, K.; Doyle, M.; FitzHugh, W. et al. Nature
2004, 409, 860–921.
29. Al-Lazikani, B.; Gaulton, A.; Paolini, G.; Lanfear, J.; Overington, J.; Hopkins, A. Chemical Biology. In From Small Molecules to Systems Biology and
Drug Design; Schreiber, S. L., Kapoor, T., Wess, G., Eds.; Wiley: New York, 2007, pp 1–20.
30. Hopkins, A. L.; Groom, C. R. Ernst Schering Res. Found. Workshop 2003, 42, 11–17.
31. Hajduk, P. J.; Huth, J. R.; Tse, C. Predicting Protein druggability. Drug Disc. Today 2005, 10, 1675–1682.
32. Hajduk, P. J.; Huth, J. R.; Fesik, S. W. J. Med. Chem. 2005, 48, 2518–2525.
33. Murzin, A. G.; Brenner, S. E.; Hubbard, T.; Chothia, C. J. Mol. Biol. 1995, 274, 536–540.
34. Ertl, P.; Rohde, B.; Selzer, P. J. Med. Chem. 2000, 43, 3714–3717.
35. Finn, R. D.; Mistry, J.; Schuster-Bockler, B.; Griffiths-Jones, S.; Hollich, V.; Lassman, T.; Moxon, S.; Marshall, M.; Khanna, A.; Durbin, R. et al.
Nucleic Acids Res. 2006, 34, D247–D251.

Biographies

Andrew L Hopkins is presently Associate Research Fellow and Head of Chemogenomics at the Sandwich site of
Pfizer Global Research and Development. He joined Pfizer in 1998 at Sandwich, Kent, UK. Over the years, he has
established various new functions for Pfizer including, Target Analysis in 1999, Indications Discovery in 2001 and most
recently Knowledge Discovery in 2004. He won a British Steel scholarship to attend the University of Manchester from
where he graduated with first class honours in 1993 with a BSc in Chemistry. Following a brief spell in the steel industry
he won a Wellcome studentship to attend the University of Oxford, working with Prof David I Stuart FRS. He
received his DPhil in Structural Biology from the University of Oxford in 1998. During his doctorate research
Dr Hopkins designed a new class of anti-HIV agents which were developed to drug candidates by Glaxo-Wellcome.
Following his interest in drug discovery he then joined Pfizer directly after graduating from Oxford. At Pfizer,
Chemogenomics in Drug Discovery – The Druggable Genome and Target Class Properties 433

Dr Hopkins’ research involves combining chemical and biological knowledge to identify new targets or other new
opportunities for medicines. His work has involved the design and construction of major informatics systems, including
literature-mining system and a large-scale chemogenomics knowledge-base. He is the author of over 6 patents and 25
scientific publications, two of which have been cited as Hot Paper by the Thomson ISI citation index. Dr Hopkins lives
in Canterbury, Kent, UK.

Gaia Paolini is Senior Principal Scientist at the Sandwich site of Pfizer Global Research and Development. Gaia joined
Pfizer in 2002 at Sandwich, Kent, UK. Gaia received her degree (laurea) in Physics at the University of Rome, with a
research thesis on statistical mechanics computer simulation of condensed matter systems. In her career, Gaia has held
a number of positions in industry and academia, combining roles of research scientist, software specialist, and business
systems analyst. At Pfizer, Gaia designed and developed a LIMS system for structural biology, and, in her current role,
led the design, development and mining of a large chemogenomics knowledge base. For her contribution she has won a
Pfizer Achievement Award in 2006. Gaia is the author of 18 peer-reviewed scientific publications, spanning the fields of
applied mathematics, materials science, classical density functional theory, and chemogenomics. Gaia currently lives in
Canterbury, Kent, UK.

& 2007 Elsevier Ltd. All Rights Reserved Comprehensive Medicinal Chemistry II
No part of this publication may be reproduced, stored in any retrieval system or transmitted ISBN (set): 0-08-044513-6
in any form by any means electronic, electrostatic, magnetic tape, mechanical, photocopying,
recording or otherwise, without permission in writing from the publishers ISBN (Volume 4) 0-08-044517-9; pp. 421–433

You might also like