Professional Documents
Culture Documents
Connectivity and Binding-Site Recognition Applications
Connectivity and Binding-Site Recognition Applications
Connectivity and Binding-Site Recognition Applications
Abstract: Here, we describe a family of methods based on residue–residue connectivity for characterizing binding
sites and apply variants of the method to various types of protein–ligand complexes including proteases, allosteric-
binding sites, correctly and incorrectly docked poses, and inhibitors of protein–protein interactions. Residues within
ligand-binding sites have about 25% more contact neighbors than surface residues in general; high-connectivity resi-
dues are found in contact with the ligand in 84% of all complexes studied. In addition, a k-means algorithm was
developed that may be useful for identifying potential binding sites with no obvious geometric or connectivity fea-
tures. The analysis was primarily carried out on 61 protein–ligand structures from the MEROPS protease database,
250 protein–ligand structures from the PDBSelect (25%), and 30 protein–protein complexes. Analysis of four pro-
teases with crystal structures for multiple bound ligands has shown that residues with high connectivity tend to have
less variable side-chain conformation. The relevance to drug design is discussed in terms of identifying allosteric-
binding sites, distinguishing between alternative docked poses and designing protein interface inhibitors. Taken
together, this data indicate that residue–residue connectivity is highly relevant to medicinal chemistry.
q 2010 Wiley Periodicals, Inc. J Comput Chem 00: 000–000, 2010
Key words: connectivity; ligand-binding sites; k-means; docking; allosteric-binding sites; protein–protein interface
inhibitors; molecular chaperones; local connectivity
merit in its own right, it may also be used in combination with tacts distributed throughout the sequence. In general, the number
other methods in consensus approaches. of residue–residue contacts of a residue is referred to as the con-
The key role played by highly connected residues in the tran- nectivity of that residue. (Here, it is important to note that the
sition state for in vitro folding,18–21 as determined, for example, local connectivity defined below is more complex than the sim-
by F-value analysis of a series of soluble proteins, raises the ple measure given here in that it gives a higher connectivity to
possibility that such highly connected residues may also play an residues in high-connectivity regions).
important role in stable ground-state protein structures. For To study the effect of connectivity on ligand binding, a set
example, Vendruscolo et al.20 used F-value analysis to show of 331 proteins were considered including 61 proteases taken
that a cluster of well-connected residues in acylphosphatase from MEROPS24 and 270 proteins taken from the PDBSelect
made a key contribution to the folding transition state. Within 25% list,25,26 each of length at least 100 residues, and with
ground-state structures, such highly connected residues are some ligand-binding residues. For NMR structures, the first
expected to possess greater than average stability as their greater structure in the ensemble was chosen. The ligand-binding resi-
number of neighbors are likely to restrict both their natural ther- dues for the PDBSelect set were identified using the LPC data-
mal movement and also their movement in response to external base,27 accessed via the RCSB protein databank,28 whereas for
stimuli such as ligand binding, substrate transformation, or oli- the other proteins they were defined as residues within 5.5 Å of
gomerization. Here, we therefore test the hypothesis that ligand the primary ligand. Only surface-accessible residues were con-
binding to residues with a high local connectivity is expected to sidered, and these were identified as having at least 5% surface-
be favorable; one rationale for this is that there will be compara- accessible area as determined using the program NAccess.29
tively less entropy loss on binding compared with binding to The calculations were performed to test the hypothesis that
residues that are not so highly constrained. This should be residues in ligand-binding sites have greater than average con-
reflected by ligand-binding sites having a higher than average nectivity. For each of the two measures of connectivity
connectivity to surrounding residues. Here, our approach, which described above, the maximum and average connectivity over
examines the local properties of a residue, is distinct from other the ligand-binding site were compared to the maximum and av-
graph-theoretical approaches in which the key property being erage connectivity over the whole of the protein surface for both
measured is a function of all the residues in the protein.5,19,21 the MEROPS and the PDBSelect sets of proteins. For the MER-
Some additional evidence for a link between binding-site resi- OPS set, the calculations were repeated, with catalytic residues,
dues and connectivity comes from Brownian dynamics studies, identified from the Catalytic Site Atlas,30 replacing ligand-bind-
where the catalytic residues of a number of proteins were found ing site residues.
to have high force constants,22,23 implying severe constraints on To represent the results in a graphical way, the number of
their movement. Here, this connectivity effect will be analyzed residue–residue contacts for the residues of a protein was nor-
using a number of different metrics of connectivity for a series malized over the range [0,1] to produce images of the protein in
of protein–ligand complexes. We show that ligand-binding sites, line with B-factor plots, with blue representing low connectivity
but not protein interfaces, have a greater number of residue–resi- and red high connectivity.
due contacts than surface residues in general; the relevance of
these findings for molecular recognition, drug design in general, Binding-Site Prediction: Allosteric-Binding Sites,
and virtual screening in particular is discussed. Rhodopsin, and Protein–Protein Interfaces
tacts of these residues was calculated, this measure being MEROPS set to find seed points, which correspond to points of
referred to as the local connectivity of the atom. To identify high connectivity on the protein surface, and was compared to
high-connectivity residues, the atom within the entire protein the local connectivity measure described above. Full details of
with the highest local connectivity was then identified. A residue the method are given in Section 2 of Supporting Information.
was defined as having high connectivity if it contained at least
one atom with a local connectivity at least 90% of the maximal Connectivity and Side-Chain Conformation
local connectivity for the atoms of the protein. A particular
advantage of this measure is that it explicitly ensures that a To test the hypothesis that a high number of residue–residue
high-connectivity residue in a high-connectivity region scores contacts leads to a loss of residue flexibility, four example sets
higher than a high-connectivity residue in a low-connectivity of X-ray structures with different ligands bound to an identical
region. To identify ligand-binding sites, only high-connectivity protein were taken from the work of Fairlie et al.49 The exam-
residues on the surface of the protein were considered. ples taken were endothiapepsin with 20 different bound inhibi-
The local connectivity method was applied to the proteins tors, porcine pancreatic elastase with 15 different bound inhibi-
from the MEROPS set. For each protein, surface residues with tors, thermolysin with 12 different bound inhibitors, and papain
high connectivity were identified and examined for whether or with eight different bound inhibitors. Dihedral angles in the side
not they were in the ligand-binding site. This method is designed chains of each residue were used to divide residues into confor-
to identify key residues in the binding site rather than the entire mational clusters, based on an extensive analysis of amino acid
binding site. In some cases, none of the surface residues was conformation.50 In each set of structures, the variation in confor-
identified as being high connectivity. Where this was the case, mation was then described in terms of entropy, given by eq. (1).
the percentage cutoff was lowered in increments of 5% until a X
nonempty set of surface residues was found. For comparison, S¼ pi log pi ; (1)
the same proteins were examined combining a similar percent- clusters
age cutoff measure with the ‘‘closeness centrality’’ method
applied by Amitai et al.,5 which takes into account the network
properties of the protein as a whole. In an implementation simi- where pi is the observed probability of the side chain of a resi-
lar to that of the local connectivity method (details given in Sec- due being in cluster i. Full details are given in Section 3 of Sup-
tion 1 of Supporting Information), binding-site residues were porting Information.
predicted, and a comparison was made of the relative perform- We note that the entropy measured here differs from the
ance of the methods. sequence entropy, another property which has been linked with
The local connectivity method was applied to a group of sys- residues of functional importance.51 Here, a value of S equal to
tems representing areas of current pharmacological interest. zero indicates that the conformation of the side chain of a resi-
These included rhodopsin, the prototypical G-protein–coupled due is roughly constant across the set of structures, whereas
receptor (GPCR)42 because of its prominence as a representative increasing variance in the conformation of a residue leads to
of possibly the most common drug target.43–46 Six structures increasing values of S.
taken from five systems in which small molecules have been
demonstrated to inhibit the formation of protein–protein interac- Docking
tion47 were also studied [PDB codes: 1PY2 (IL-2—protein part- Redocking of the X-ray crystallographic ligand was carried out
ner is IL-2R), 1R6N (N-terminal transactivation domain of for nine systems using autodock 4.52 A large docking box was
human papillomaviruses E2 helicase—protein partner is the E1 used that covered approximately a quarter of the protein. The
helicase), 1RV1 and 1T4E (MDM2—protein partner is p53), Spearman rank correlation coefficient was used to determine
1Y2F (ZipA—protein partner is FtsZ), and 2YXJ (Bcl-xL—pro- whether the root mean square displacement (RMSD) of the
tein partner is BAK). To account for potential conformational docked ligand relative to the crystallographic coordinates corre-
changes induced in ligand binding, a parallel set of structures, in lated with connectivity of the residues contacting the ligand.
which the structure of the protein had been determined in the
absence of the ligand, was subjected to the same analysis (Pdb Protein–Protein Interactions
codes: 1M47, 1R6K, and 1Z1M (residues 25–109), 1Z1M (resi-
dues 16–111), 1F7W, and 1MAZ). Thirty protein complexes (listed in Section 4 of Supporting In-
formation) were retrieved from the PQS Protein Quaternary
k-Means Method Structure server53 to enhance the probability that the dimeric
interface is genuine rather than an artifact of nonspecific crystal
In a second method for identifying binding sites, referred to as packing. The connectivity of the protein was determined over
the k-means method, an adaptation of the k-means algorithm48 the surface residues as above. In addition, the protein was exam-
was implemented. This identifies residues in the binding site ined by the evolutionary trace (ET) method using in-house
based on a global measure of connectivity across the whole of code,8,9 and the results indicated that 13 of 60 had only one
the protein surface. The method divides the protein surface into interface, whereas the ET results of the remaining proteins indi-
clusters, with cluster centers, or seed points, determined by a cated the presence of additional interfaces or binding sites
weighted averaging of connectivity scores of residues in each besides the crystallographic dimer interface. These 13 structures
cluster. The k-means method was applied to the proteins in the were studied to determine whether connectivity could be used to
Connectivity and Ligand Binding Dataset/Methods Surface Binding site Surface Binding site Da
A clear difference was observed between the connectivity of res- MEROPS1 15.4 13.9 6.6 8.2 3.5
idues in the binding site and other residues on the protein sur- MEROPS2 6.4 5.7 2.6 3.4 4.0
face. Figure 1 shows the relative distribution of residue connec- PDBSelect1 14.8 13.0 6.5 7.6 2.7
tivity for residues from the 270 proteins in the PDBSelect data- PDBSelect2 5.9 5.2 2.5 3.1 2.9
set, for all surface residues, and for residues in the binding site.
Residues in the binding site generally have higher connectivity, The numbers 1 and 2 indicate the method of determining connectivity. 1
indicates the inclusive method, 2 indicates the grouped method. Columns
with comparatively fewer having a connectivity between 0 and
2 and 3 show the mean value of the maximal residue connectivity. Col-
7, and comparatively more having a connectivity between 7 and umns 4 and 5 show the mean value of the average residue connectivity of
13. For each protein, the average connectivity for residues in the residues on the surface as a whole and in the binding site of proteins in
surface and for surface residues in the binding site was calcu- each set. Column 6 gives the difference between columns 4 and 5.
lated. Table 1 presents the statistics for these values under the a
Difference between surface and binding-site average in standard devia-
different connectivity measures and shows that the connectivity tions of the surface figure.
Table 2. Connectivity on the Protein Surface and at the Ligand-Binding Table 2. (Continued)
Site for 61 Proteins from the MEROPS Database.
Maximum connectivity Mean connectivity
Maximum connectivity Mean connectivity
Protein Whole surface Binding site Whole surface Binding site
Protein Whole surface Binding site Whole surface Binding site
1uk4 17 14 6.52 6.92
1a16 15 15 6.67 9.00 1wht 18 18 6.96 9.58
1afq 16 16 6.47 7.90 2rmp 17 14 6.62 8.27
1ajq 17 15 6.22 11.10 3apr 14 13 6.64 8.09
1ao0 19 19 6.41 7.27 4aig 14 13 6.40 6.94
1arc 16 13 6.64 8.71 5sga 16 14 6.93 7.81
1ayu 15 12 6.77 7.58
1b6a 14 14 6.87 9.55 Column 3 is denoted in bold when the maximum connectivity is at the
1bh6 14 13 6.88 7.52 binding site, and column 5 is denoted in bold when the mean connectiv-
1bil 13 12 6.00 6.29 ity of the biding site is greater than mean connectivity of the surface as
1bmq 15 13 6.18 7.56 a whole (including the binding site).
1bqy 15 13 6.45 7.85
1bru 17 12 6.48 9.69
1bxo 15 14 6.66 8.45 teins, the mean connectivity of the binding site is higher than
1c24 14 14 6.87 10.20 the mean over the whole of the protein surface (the exceptions
1cgh 18 16 6.58 9.41 being 1cmx and 1d4l). A similar result to that for binding-site
1cmx 11 9 4.94 3.25 residues was obtained for catalytic residues. Of 58 proteins in
1cv8 16 12 6.48 8.06 the MEROPS set for which catalytic residue information was
1cvr 15 12 6.83 8.64 available, the average connectivity of the catalytic residues was
1cvz 13 12 6.58 7.61 higher than that for the protein as a whole (including surface
1czi 18 17 7.90 8.33
and buried residues) in 54 cases (Supporting Information Table
1d4l 12 9 4.87 4.80
5). A number of distinctly different proteins in Tables 1, 2, and
1dmt 16 15 7.46 9.78
1dy9 13 12 6.42 7.33 5 share common ligands or ligand types. For example, analysis
1eag 15 13 6.74 7.77 of the PDBSelect set revealed 18 proteins that bound ATP.
1f2o 15 14 6.82 9.17 Analysis of the proteins against the CATH database showed that
1fh0 16 16 7.04 8.19 each contained either a two- or three-layer a-beta sandwich do-
1fjs 16 16 7.01 8.46 main. The observation of higher connectivity applied in each of
1ft7 16 16 7.16 9.75 the 18 cases, regardless of the fold. Six of these proteins had the
1fxy 15 14 6.50 8.32 Walker A box motif (www.expasy.ch/prosite/PS00017); for each
1ga6 16 13 6.81 8.20 of these proteins, the average connectivity for residues in the
1gec 14 13 6.82 8.41
motif was higher than the average for the surface.
1gmy 19 14 6.95 8.20
Representative plots of connectivity for the systems with pdb
1gvu 14 14 6.57 8.45
1hne 15 14 6.50 8.88 codes 1c24 (methionine aminopeptidase), 1s4v (cysteine endo-
1hpg 16 13 6.70 7.44 peptidase), 1cvr (gingipain R), and 1cmx ((ubiquitin C-terminal
1i76 14 13 6.29 8.06 hydrolase) are shown in Figure 2 (lhs)—these four examples
1kug 17 17 6.61 7.50 were chosen to be as diverse as possible in terms of the quality
1lf2 16 13 6.48 8.04 of the results obtained. For the first two of these systems, the
1ls5 14 13 6.74 7.93 higher degree of connectivity of the ligand-binding site can
1lyb 15 12 5.56 6.47 clearly be seen. For the system 1cvr (Fig. 2c), higher connectiv-
1m4h 15 14 6.68 8.32 ity is revealed by the statistics, though it is not so obviously visi-
1me4 15 14 6.59 7.81
ble. The system 1cmx (Fig. 2d) is one of the two examples from
1mu0 16 16 7.06 12.00
the 61 MEROPS proteins for which the connectivity in the bind-
1n1m 16 15 7.09 11.00
1nqc 15 14 6.88 8.33 ing site was lower than for the surface as a whole. Examination
1nw9 14 14 5.86 6.92 of this system shows two ligands binding in a low-connectivity
1onx 15 15 6.90 9.92 region. However, because one of these is bound covalently, there
1pfx 16 14 6.69 8.94 are clearly additional principles affecting the binding.
1pwu 17 15 6.84 7.69
1qjj 14 14 6.31 9.10 Connectivity and Residue Type
1qrp 16 14 6.62 8.14
1qs8 16 13 6.63 7.93 Summing across all residues in all proteins in the MEROPS set,
1s2k 17 14 6.52 8.61 the average connectivity for a residue on the surface of a pro-
1s4v 15 12 6.55 8.36 tein, averaged across all residues, was found to be 6.67. Eleven
1smr 17 15 6.60 7.33
amino acids had average connectivity scores higher than this
(continued) (Gln, Val, Arg, Ile, His, Leu, Cys, Met, Phe, Tyr, and Trp,
Binding-Site Prediction
Allosteric-Binding Sites
Examination of the allosteric-binding sites showed higher than Figure 3. The relationship between connectivity and RMSD for var-
average connectivity in the majority of cases. Details of the ious ligands redocked to their native crystal structure using auto-
results for these systems are shown in Table 3. In each of the 11 dock: (a) 4ts1, (b) 1dwb, and (c) 1lgr. A large docking box was
used in these docking experiments. Images of docked poses and
cases, the primary ligand-binding site had a higher connectivity
filtered docked poses for 4TS1 and 1LGR are given in Supporting
than did the surface in general. Results for the allosteric ligands
Information Figure 1.
identified a binding-site connectivity greater than that for the
protein surface in 9 of 11 cases. This suggests that allosteric-
binding sites are more difficult to identify using connectivity
than the primary binding site, although connectivity may be of such as the muscarinic subfamily,56,57 where the allosteric-bind-
use in many cases. In fructose-1,6-bisphosphatase (1q9d, 2fhy), ing site is associated with the extracellular region. For class C
human mitochondrial NAD(P)1-dependent malic enzyme GPCRs where the main ligand binds to the N-terminus, the cav-
(1gz3), and ribonucleotide reductase R1 protein (4r1r), the ity within the helical bundle, which has high connectivity (see
enzyme is an oligomer and the higher connectivity of the allo- below), becomes the allosteric-binding site.58 In addition, for
steric-binding site includes contributions from both monomers.35 rhodopsin there is a high connectivity region in the cytoplasmic
Thus, in two of cases where the allosteric-binding site did not domain that has recently been shown to bind a small molecule
have high connectivity, inclusion of a second protein chain (see below).
reversed this result. For each of the allosteric-binding sites, simi-
lar results were obtained regardless of whether the results were GPCRs
performed on the bound cases or the unbound cases. Connectivity analysis of rhodopsin showed a markedly increased
The results on rhodopsin (detailed below) are also relevant in connectivity in the ligand-binding site, with an average score of
this respect because allosteric ligands are of interest for systems 11.5, compared with an average for the surface of 7.4. Three of
Table 3. Connectivity (conn.) Data for the Surface, Primary Binding Site, and Allosteric-Binding Site.
Connectivity Connectivity
No. pdb Ligand Protein Binding site Ligand Protein Binding site
the six highest connectivity residues are in extracellular loop 2, G-protein activation59; the equivalent residue is in contact with
underlying the importance of this loop for the stability of rho- an octyl glucoside molecule in the squid rhodopsin structure.60
dopsin-like GPCRs. Using the local connectivity method, high- However, Leu 72 has been implicated in small molecule bind-
connectivity residues were identified at the 85% and 90% cutoff ing, namely chlorin to Meta-II state rhodopsin61 (Klein-Seethara-
levels. At the 90% level all residues found were adjacent to the man, personal communication). It is possible that the high-con-
main ligand-binding site within the helical bundle. At the 85% nectivity sites on the lipid-facing regions of the GPCR are asso-
level, although most residues were close to the ligand, four addi- ciated with protein–protein interactions9,16,62 (see below).
tional residues were identified, these being Tyr10 (N-terminus),
Leu72, Leu77 (interface between helix 1 and helix 2), and Small-Molecule Inhibition of Protein–Protein Interfaces
Thr93 (interface between helix 2 and helix 3). Leu72 is on the
intracellular face of the receptor and may participate in binding Results of the application of the local connectivity method are
G-protein, but its mutation to cysteine has negligible effects on shown in Table 4. In the structures with bound ligands, between
Table 4. High-Connectivity Point Data for Protein–Protein Interface Systems, from Structures Derived with
and Without Ligand Bound.
PDB Cutoff (%) #Points #Points correct PDB Cutoff (%) #Points #Points correct
1PY2 85 1 0 1M47 85 6 1
1R6N 90 3 0 1R6K 90 7 3
1RV1 90 1 1 1Z1Ma 90 4 2
1T4E 90 1 1 1Z1Mb 90 7 3
1Y2F 90 2 1 1F7W 90 4 2
2YXJ 85 3 1 1MAZ 80 2 1
The cutoff describes the maximal percentage (in increments of 5%) such that at least one residue on the protein sur-
face contained an atom with a least this percentage of the maximal local connectivity for the protein as a whole.
#Points describe the number of such residues, and #points correct describes the number of these residues in the
ligand-binding site.
a
Residues 25–109.
b
Residues 16–111.
Protein–Protein Interactions
Table 5. Normalized Proportions of Residues with Zero and Nonzero proteins folding in vitro. In this respect, therefore, it would seem
Conformational Entropy (i.e., Residues That Do or Do Not Retain the that ligand binding has some features in common with protein
Same Conformation Across a Variety of Structures) Grouped by folding. This work may therefore offer some insight into the
Connectivity Score for Four Sets of Identical Protein Structures.
mechanism of molecular chaperones64 because it indicates that
ligands binding to high-connectivity regions may help to stabi-
Connectivity
lize the fold if these high-connectivity regions are indeed key to
Lead PDB/# ligands Entropy Low Medium High folding.63
connectivity was useful in assessing whether the ligand had restricted. In this respect, ligand binding seems to utilize princi-
docked to a valid binding site, even if the binding site is not the ples that are also common to protein folding, though these prin-
correct one for that ligand. However, the method was not useful ciples do not apply in the same way to the prediction of pro-
in distinguishing between different ligand poses in the same tein–protein interactions. We have shown that connectivity can
binding site, presumably because the variations in connectivity be used to indicate the expected degree of side-chain movement
were much less. on ligand binding; it can also be used to search for allosteric-
It remains, however, that for the general identification of binding sites and to delineate poses that have been docked to a
ligand-binding sites, the connectivity score can prove a powerful valid binding site from those that have not. The low connectivity
and effective tool. This may be particularly useful in the search observed at most protein–protein interfaces may underlie the dif-
for allosteric-binding sites. Here, we have analyzed 11 systems ficulty in designing protein interface inhibitors, but, nevertheless,
for which allosteric-binding sites have recently been identified connectivity can be used to indicate possible binding sites for
by X-ray crystallography.31–34,58 In all of these cases, although protein–protein interface inhibitors.
residues close to the primary ligand (or functional group) consis-
tently had higher connectivity than the surface in general, allo-
steric-binding sites were more difficult to identify, although pos- Acknowledgment
itive results were obtained in most cases.
The link between connectivity and conformational entropy of The authors acknowledge Garrett Morris for a copy of autodock
residues suggests that connectivity could be used to enhance vir- and for helpful discussions.
tual screening experiments involving a flexible receptor. Prior
knowledge of which residues in a receptor are likely to retain the References
same or a restricted set of conformations could result in the
design of more efficient docking experiments and result in consid- 1. Laskowski, R. A.; Luscombe, N. M.; Swindells, M. B.; Thornton, J.
erable savings in computational time. As has been shown here, M. Protein Sci 1996, 5, 2438.
residues of higher connectivity are significantly more likely to be 2. Hendlich, M.; Rippmann, F.; Barnickel, G. J Mol Graph Model
rigid over a range of receptor structures and therefore make likely 1997, 15, 359.
candidates to be fixed in a flexible docking simulation. 3. Venkatachalam, C. M.; Jiang, X.; Oldfield, T.; Waldman, M. J Mol
Targeting drugs to inhibit protein–protein interactions offers Graph Model 2003, 21, 289.
huge therapeutic potential but is nevertheless extremely difficult, 4. Wangikar, P. P.; Tendulkar, A. V.; Ramya, S.; Mali, D. N.;
despite recent encouraging successes.47,65–67 Given that protein Sarawagi, S. J Mol Biol 2003, 326, 955.
interfaces generally possess fewer druggable topological fea- 5. Amitai, G.; Shemesh, A.; Sitbon, E.; Shklar, M.; Netanely, D.;
Venger, I.; Pietrokovski, S. J Mol Biol 2004, 344, 1135.
tures,47 it may be necessary to probe multiple protein conforma-
6. Lichtarge, O.; Bourne, H. R.; Cohen, F. E. Proc Natl Acad Sci USA
tions, e.g., as generated by X-ray crystallography (here) or by 1996, 93, 7507.
molecular dynamics,68 to identify possible binding sites, includ- 7. Sowa, M. E.; He, W.; Slep, K. C.; Kercher, M. A.; Lichtarge, O.;
ing hidden binding sites, that could be exploited to inhibit pro- Wensel, T. G. Nat Struct Biol 2001, 8, 234.
tein–protein interactions. Although the results obtained were 8. Dean, M. K.; Higgs, C.; Smith, R. E.; Bywater, R. P.; Snell, C. R.;
mixed in quality, in the six systems studied here, it was suffi- Scott, P. D.; Upton, G. J.; Howe, T. J.; Reynolds, C. A. J Med
cient to study the protein monomers to identify possible binding Chem 2001, 44, 4595.
sites because in each system at least one of the high-connectivity 9. Thummer, R. P.; Campbell, M. P.; Dean, M. K.; Frusher, M. J.;
points identified was located in the binding site. Scott, P. D.; Reynolds, C. A. J Mol Neurosci 2005, 26, 113.
10. Filizola, M.; Olmea, O.; Weinstein, H. Protein Eng 2002, 15, 881.
11. Goodford, P. J. J Med Chem 1985, 28, 849.
12. Laurie, A. T.; Jackson, R. M. Bioinformatics 2005, 21, 1908.
Conclusions 13. Harris, M. R.; Kihlen, M.; Bywater, R. P. J Mol Recognit 1993, 6, 111.
14. Glick, M.; Robinson, D. D.; Grant, G. H.; Richards, W. G. J Am
We have developed a family of methods based on residue–resi- Chem Soc 2002, 124, 2337.
due connectivity for characterizing binding sites and have 15. Burgoyne, N. J.; Jackson, R. M. Bioinformatics 2006, 12, 1335.
applied them to various problems in computational medicinal 16. Vohra, S.; Chintapalli, S. V.; Illingworth, C. J.; Reeves, P. J.; Mulli-
chemistry. The basic method allows high-connectivity binding neaux, P. M.; Clark, H. S.; Dean, M. K.; Upton, G. J.; Reynolds, C.
sites to be visualized, as in Figures 2a–2d. The k-means algo- A. Biochem Soc Trans 2007, 35, 749.
rithm (Figs. 2e–2h) automatically detects centers of clusters of 17. Gouldson, P. R.; Dean, M. K.; Snell, C. R.; Bywater, R. P.; Gkou-
high connectivity and may be particularly useful when there is tos, G.; Reynolds, C. A. Protein Eng 2001, 14, 759.
no obvious region of high connectivity. The algorithm for deter- 18. Paci, E.; Lindorff-Larsen, K.; Dobson, C. M.; Karplus, M.; Vendrus-
colo, M. J Mol Biol 2005, 352, 495.
mining residues with high local connectivity may also be used
19. Vendruscolo, M.; Dokholyan, N. V.; Paci, E.; Karplus, M. Phys Rev
in automatic mode, as illustrated in Figure 4.
E 2002, 65, 061910.
Taken together, the results presented here have confirmed a 20. Vendruscolo, M.; Paci, E.; Dobson, C. M.; Karplus, M. Nature
role for high connectivity in ligand-binding sites. 2001, 409, 641.
It is possible that ligands preferentially bind to regions of 21. Dokholyan, N. V.; Li, L.; Ding, F.; Shakhnovich, E. I. Proc Natl
high connectivity to minimize the loss of entropy on binding Acad Sci USA 2002, 99, 8637.
that would occur if the motion of flexible side chains were to be 22. Sacquin-Mora, S.; Lavery, R. Biophys J 2006, 90, 2706.
23. Sacquin-Mora, S.; Laforet, E.; Lavery, R. Proteins 2007, 67, 350. 45. Schlyer, S.; Horuk, R. Drug Discovery Today 2006, 11, 481.
24. Rawlings, N. D.; Morton, F. R.; Barrett, A. J. Nucleic Acids Res 46. Higgs, C.; Reynolds, C. A. In Theoretical Biochemistry—Processes
2006, 34, D270. and Properties of Biological Systems, Vol.9 ( Theoretical and Com-
25. Hobohm, U.; Sander, C. Protein Sci 1994, 3, 522. putational Chemistry Series); Eriksson, L., Ed.; Elsevier: Amster-
26. Hobohm, U.; Scharf, M.; Schneider, R.; Sander, C. Protein Sci 1992, dam, 2001; pp. 341–376.
1, 409. 47. Wells, J. A.; McClendon, C. L. Nature 2007, 450, 1001.
27. Sobolev, V.; Sorokine, A.; Prilusky, J.; Abola, E. E.; Edelman, M. 48. MacQueen, J. Some Methods for Classification and Analysis of Mul-
Bioinformatics 1999, 15, 327. tivariate Observations; University of California Press: California,
28. Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; 1967; pp. 281–297.
Weissig, H.; Shindyalov, I. N.; Bourne, P. E. Nucleic Acids Res 49. Fairlie, D. P.; Tyndall, J. D.; Reid, R. C.; Wong, A. K.; Abbenante,
2000, 28, 235. G.; Scanlon, M. J.; March, D. R.; Bergman, D. A.; Chai, C. L.;
29. Hubbard, S. J.; Thornton, J. M. NACCESS Computer Program; Uni- Burkett, B. A. J Med Chem 2000, 43, 1271.
versity College London: London, 1993. 50. Lovell, S. C.; Word, J. M.; Richardson, J. S.; Richardson, D. C.
30. Porter, C. T.; Bartlett, G. J.; Thornton, J. M. Nucleic Acids Res Proteins 2000, 40, 389.
2004, 32, D129. 51. Elcock, A. H.; McCammon, J. A. Proc Natl Acad Sci USA 2001,
31. Choe, J. Y.; Nelson, S. W.; Arienti, K. L.; Axe, F. U.; Collins, T. 98, 2990.
L.; Jones, T. K.; Kimmich, R. D.; Newman, M. J.; Norvell, K.; 52. Morris, G. M.; Goodsell, D. S.; Huey, R.; Olson, A. J. J Comput-
Ripka, W. C.; Romano, S. J.; Short, K. M.; Slee, D. H.; Fromm, H. Aided Mol Des 1996, 10, 293.
J.; Honzatko, R. B. J Biol Chem 2003, 278, 51176. 53. Henrick, K.; Thornton, J. M. Trends Biochem Sci 1998, 23, 358.
32. Shibayama, N.; Miura, S.; Tame, J. R.; Yonetani, T.; Park, S. Y. 54. Bartlett, G. J.; Porter, C. T.; Borkakoti, N.; Thornton, J. M. J Mol
J Biol Chem 2002, 277, 38791. Biol 2002, 324, 105.
33. Yang, Z.; Lanks, C. W.; Tong, L. Structure 2002, 10, 951. 55. Gill, H. S.; Eisenberg, D. Biochemistry 2001, 40, 1903.
34. Eriksson, M.; Uhlin, U.; Ramaswamy, S.; Ekberg, M.; Regnstrom, 56. Huang, X. P.; Ellis, J. Mol Pharmacol 2007, 71, 759.
K.; Sjoberg, B. M.; Eklund, H. Structure 1997, 5, 1077. 57. May, L. T.; Leach, K.; Sexton, P. M.; Christopoulos, A. Annu Rev
35. von Geldern, T. W.; Lai, C.; Gum, R. J.; Daly, M.; Sun, C.; Fry, E. Pharmacol Toxicol 2007, 47, 1.
H.; bad-Zapatero, C. Bioorg Med Chem Lett 2006, 16, 1811. 58. Malherbe, P.; Kratochwil, N.; Knoflach, F.; Zenner, M. T.; Kew, J. N.;
36. Rudino-Pinera, E.; Rojas-Trejo, S. P.; Calcagno, M. L.; Horjales, E. Kratzeisen, C.; Maerki, H. P.; Adam, G.; Mutel, V. J Biol Chem
(in press); PDB code 2WU1. 2003, 278, 8340.
37. Hindie, V.; Stroba, A.; Zhang, H.; Lopez-Garcia, L. A.; Idrissova, 59. Klein-Seetharaman, J.; Hwa, J.; Cai, K.; Altenbach, C.; Hubbell, W.
L.; Zeuzem, S.; Hirschberg, D.; Schaeffer, F.; Jorgensen, T. J. D.; L.; Khorana, H. G. Biochemistry 1999, 38, 7938.
Engel, M.; Alzari, P. M.; Biondi, R. M. Nat Chem Biol 2009, 5, 60. Murakami, M.; Kouyama, T. Nature 2008, 453, 363.
758. 61. Balem, F.; Yanamala, N.; Klein-Seetharaman, J. Photochem Photo-
38. Ptak, C. P.; Ahmed, A. H.; Oswald, R. E. Biochemistry 2009, 48, biol 2009, 85, 471.
8594. 62. Simpson, L. M.; Taddese, B.; Wall, I. D.; Reynolds, C. A. Curr
39. Anderka, O.; Loenze, P.; Klabunde, T.; Dreyer, M. K.; Defossa, E.; Opin Pharmacol 2010, 10, 30.
Wendt, K. U.; Schmoll, D. Biochemistry 2008, 47, 4683. 63. Chintapalli, S. V.; Yew, B. K.; Upton, G. J. G.; Illingworth, C. J.
40. Hardy, J. A.; Lam, J.; Nguyen, J. T.; O’Brien, T.; Wells, J. A. Proc R.; Reeves, P. J.; Parkes, K. E.; Snell, C. R.; Reynolds, C. A.
Natl Acad Sci USA 2004, 101, 12461. J Comput Chem 2010, DOI: 10.1002/jcc/21562.
41. Vanderpool, D.; Johnson, T. O.; Ping, C.; Bergqvist, S.; Alton, G.; 64. Bernier, V.; Lagace, M.; Lonergan, M.; Arthus, M. F.; Bichet, D.
Phonephaly, S.; Rui, E.; Luo, C.; Deng, Y. L.; Grant, S.; Quenzer, G.; Bouvier, M. Mol Endocrinol 2004, 18, 2074.
T.; Margosiak, S.; Register, J.; Brown, E.; Ermolieff, J. Biochemis- 65. Arkin, M. R.; Randal, M.; DeLano, W. L.; Hyde, J.; Luong, T. N.;
try 2009, 48, 9823. Oslob, J. D.; Raphael, D. R.; Taylor, L.; Wang, J.; McDowell, R. S.;
42. Li, J.; Edwards, P.; Burghammer, M.; Villa, C.; Schertler, G. F. X. Wells, J. A.; Braisted, A. C. Proc Natl Acad Sci USA 2003, 100,
J Mol Biol 2004, 343, 1409. 1603.
43. Bondensgaard, K.; Ankersen, M.; Thogersen, H.; Hansen, B. S.; 66. Yin, H.; Hamilton, A. D. Angew Chem Int Ed Engl 2005, 44, 4130.
Wulff, B. S.; Bywater, R. P. J Med Chem 2004, 47, 888. 67. Fry, D. C. Biopolymers 2006, 84, 535.
44. Brauner-Osborne, H.; Wellendorph, P.; Jensen, A. A. Curr Drug 68. Schames, J. R.; Henchman, R. H.; Siegel, J. S.; Sotriffer, C. A.; Ni,
Targets 2007, 8, 169. H. H.; McCammon, J. A. J Med Chem 2004, 47, 1879.