Connectivity and Binding-Site Recognition Applications

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Connectivity and Binding-Site Recognition: Applications

Relevant to Drug Design

CHRISTOPHER J. R. ILLINGWORTH,1 PAUL D. SCOTT,2 KEVIN E. B. PARKES,3 CHRISTOPHER R. SNELL,3


MATTHEW P. CAMPBELL,1 CHRISTOPHER A. REYNOLDS1
1
Department of Biological Sciences, University of Essex, Wivenhoe Park, Colchester
CO4 3SQ, United Kingdom
2
Department of Computational Science and Electronic Engineering, University of Essex,
Wivenhoe Park, Colchester CO4 3SQ, United Kingdom
3
Medivir UK Ltd., Chesterford Research Park, Little Chesterford, Essex CB10 1XL,
United Kingdom

Received 24 June 2009; Revised 18 December 2009; Accepted 16 March 2010


DOI 10.1002/jcc.21561
Published online in Wiley InterScience (www.interscience.wiley.com).

Abstract: Here, we describe a family of methods based on residue–residue connectivity for characterizing binding
sites and apply variants of the method to various types of protein–ligand complexes including proteases, allosteric-
binding sites, correctly and incorrectly docked poses, and inhibitors of protein–protein interactions. Residues within
ligand-binding sites have about 25% more contact neighbors than surface residues in general; high-connectivity resi-
dues are found in contact with the ligand in 84% of all complexes studied. In addition, a k-means algorithm was
developed that may be useful for identifying potential binding sites with no obvious geometric or connectivity fea-
tures. The analysis was primarily carried out on 61 protein–ligand structures from the MEROPS protease database,
250 protein–ligand structures from the PDBSelect (25%), and 30 protein–protein complexes. Analysis of four pro-
teases with crystal structures for multiple bound ligands has shown that residues with high connectivity tend to have
less variable side-chain conformation. The relevance to drug design is discussed in terms of identifying allosteric-
binding sites, distinguishing between alternative docked poses and designing protein interface inhibitors. Taken
together, this data indicate that residue–residue connectivity is highly relevant to medicinal chemistry.
q 2010 Wiley Periodicals, Inc. J Comput Chem 00: 000–000, 2010

Key words: connectivity; ligand-binding sites; k-means; docking; allosteric-binding sites; protein–protein interface
inhibitors; molecular chaperones; local connectivity

Introduction methods, which more successfully predict protein–protein inter-


faces,6–10,16,17 but may have some relationship to other geome-
The centrality of binding sites to medicinal chemistry has led to try-based methods, because residues at the bottom of a binding
a multitude of different approaches for their identification in a site cleft will be likely to form a greater number of residue–resi-
protein structure and their subsequent exploitation, including due contacts than residues around the lip of the cleft or residues
methods based on structure,1–5 sequence conservation,6–10 and distant to the cleft. Similarities may also exist with energetic
interaction energies.11–14 Here, we describe a method based on methods based on the use of probes11,13,14 because when the
identifying residue–residue contacts and apply it to various types probe is deep within a cleft it will have more close contacts,
of protein–ligand–related problems including proteases, residue giving a higher score. However, the approach described here is
conformational flexibility, correctly and incorrectly docked essentially independent of these methods and so although it has
poses, allosteric-binding sites, and inhibitors of protein–protein
interactions. With reference to the protein–protein interface
inhibitors, we note that it is generally easier to detect ligand-
binding sites than protein binding sites,15 and indeed, although Additional Supporting Information may be found in the online version of
this article.
the methods described here do detect the binding site for a num-
ber of recently published protein–protein inhibitors, they are Correspondence to: C. A. Reynolds; e-mail: c.a.reynolds@essex.ac.uk
poor at predicting protein–protein interfaces. The methods Contract/grant sponsors: BBSRC, Medivir (UK), Royal Society for Fund-
described here are completely complementary to sequence-based ing (Theo Murphy Blue Skies Award)

q 2010 Wiley Periodicals, Inc.


2 Illingworth et al. • Vol. 00, No. 00 • Journal of Computational Chemistry

merit in its own right, it may also be used in combination with tacts distributed throughout the sequence. In general, the number
other methods in consensus approaches. of residue–residue contacts of a residue is referred to as the con-
The key role played by highly connected residues in the tran- nectivity of that residue. (Here, it is important to note that the
sition state for in vitro folding,18–21 as determined, for example, local connectivity defined below is more complex than the sim-
by F-value analysis of a series of soluble proteins, raises the ple measure given here in that it gives a higher connectivity to
possibility that such highly connected residues may also play an residues in high-connectivity regions).
important role in stable ground-state protein structures. For To study the effect of connectivity on ligand binding, a set
example, Vendruscolo et al.20 used F-value analysis to show of 331 proteins were considered including 61 proteases taken
that a cluster of well-connected residues in acylphosphatase from MEROPS24 and 270 proteins taken from the PDBSelect
made a key contribution to the folding transition state. Within 25% list,25,26 each of length at least 100 residues, and with
ground-state structures, such highly connected residues are some ligand-binding residues. For NMR structures, the first
expected to possess greater than average stability as their greater structure in the ensemble was chosen. The ligand-binding resi-
number of neighbors are likely to restrict both their natural ther- dues for the PDBSelect set were identified using the LPC data-
mal movement and also their movement in response to external base,27 accessed via the RCSB protein databank,28 whereas for
stimuli such as ligand binding, substrate transformation, or oli- the other proteins they were defined as residues within 5.5 Å of
gomerization. Here, we therefore test the hypothesis that ligand the primary ligand. Only surface-accessible residues were con-
binding to residues with a high local connectivity is expected to sidered, and these were identified as having at least 5% surface-
be favorable; one rationale for this is that there will be compara- accessible area as determined using the program NAccess.29
tively less entropy loss on binding compared with binding to The calculations were performed to test the hypothesis that
residues that are not so highly constrained. This should be residues in ligand-binding sites have greater than average con-
reflected by ligand-binding sites having a higher than average nectivity. For each of the two measures of connectivity
connectivity to surrounding residues. Here, our approach, which described above, the maximum and average connectivity over
examines the local properties of a residue, is distinct from other the ligand-binding site were compared to the maximum and av-
graph-theoretical approaches in which the key property being erage connectivity over the whole of the protein surface for both
measured is a function of all the residues in the protein.5,19,21 the MEROPS and the PDBSelect sets of proteins. For the MER-
Some additional evidence for a link between binding-site resi- OPS set, the calculations were repeated, with catalytic residues,
dues and connectivity comes from Brownian dynamics studies, identified from the Catalytic Site Atlas,30 replacing ligand-bind-
where the catalytic residues of a number of proteins were found ing site residues.
to have high force constants,22,23 implying severe constraints on To represent the results in a graphical way, the number of
their movement. Here, this connectivity effect will be analyzed residue–residue contacts for the residues of a protein was nor-
using a number of different metrics of connectivity for a series malized over the range [0,1] to produce images of the protein in
of protein–ligand complexes. We show that ligand-binding sites, line with B-factor plots, with blue representing low connectivity
but not protein interfaces, have a greater number of residue–resi- and red high connectivity.
due contacts than surface residues in general; the relevance of
these findings for molecular recognition, drug design in general, Binding-Site Prediction: Allosteric-Binding Sites,
and virtual screening in particular is discussed. Rhodopsin, and Protein–Protein Interfaces

To examine the application of connectivity to identifying alloste-


ric-binding sites, calculations were performed for 11 examples
Methods
from 10 different systems identified as having allosteric-binding
Connectivity and Selection of Test Systems
sites.31–41
For each of these systems, the average connectivity was cal-
To test the basic hypothesis of a link between residue–residue culated for residues on the protein surface and for residues in
connectivity and ligand-binding sites, the number of residue–res- the primary and allosteric-binding sites.
idue contacts of a residue was counted according to two differ- Extending the basic approach to residue–residue contacts
ent methods. In the first (inclusive) method, the number of resi- described above, two extensions of the method for semiauto-
due–residue contacts was defined as the number of neighboring matic binding-site prediction were derived using residue–residue
residues at least three residues in the sequence from the initial contacts to predict the location of binding sites; these are the
residue that are in contact within 5.5 Å; this is the method used local connectivity method and the k-means approach.
below unless otherwise stated. In the second (grouped) method,
the number of contacts with distinct parts of the protein chain Local Connectivity Method
was counted. According to this second method, residues that
contact within 5.5 Å were identified as in the first method. Next, In the first method, referred to as the local connectivity method,
pairs of contacting residues within six residues of each other in a local measure identifying points of high connectivity was
the chain were grouped and counted as a single contact. This defined. For each of the heavy atoms in a protein, the set of resi-
finds the number of disjoint parts of the protein chain that con- dues with at least one heavy atom within 5.5 Å of that atom
tact any one residue, distinguishing between a residue that has N was found (this set of residues will include the residue of the
contacts all lying close together and a residue that has N con- original atom), and the average number of residue–residue con-

Journal of Computational Chemistry DOI 10.1002/jcc


Connectivity and Binding-Site Recognition 3

tacts of these residues was calculated, this measure being MEROPS set to find seed points, which correspond to points of
referred to as the local connectivity of the atom. To identify high connectivity on the protein surface, and was compared to
high-connectivity residues, the atom within the entire protein the local connectivity measure described above. Full details of
with the highest local connectivity was then identified. A residue the method are given in Section 2 of Supporting Information.
was defined as having high connectivity if it contained at least
one atom with a local connectivity at least 90% of the maximal Connectivity and Side-Chain Conformation
local connectivity for the atoms of the protein. A particular
advantage of this measure is that it explicitly ensures that a To test the hypothesis that a high number of residue–residue
high-connectivity residue in a high-connectivity region scores contacts leads to a loss of residue flexibility, four example sets
higher than a high-connectivity residue in a low-connectivity of X-ray structures with different ligands bound to an identical
region. To identify ligand-binding sites, only high-connectivity protein were taken from the work of Fairlie et al.49 The exam-
residues on the surface of the protein were considered. ples taken were endothiapepsin with 20 different bound inhibi-
The local connectivity method was applied to the proteins tors, porcine pancreatic elastase with 15 different bound inhibi-
from the MEROPS set. For each protein, surface residues with tors, thermolysin with 12 different bound inhibitors, and papain
high connectivity were identified and examined for whether or with eight different bound inhibitors. Dihedral angles in the side
not they were in the ligand-binding site. This method is designed chains of each residue were used to divide residues into confor-
to identify key residues in the binding site rather than the entire mational clusters, based on an extensive analysis of amino acid
binding site. In some cases, none of the surface residues was conformation.50 In each set of structures, the variation in confor-
identified as being high connectivity. Where this was the case, mation was then described in terms of entropy, given by eq. (1).
the percentage cutoff was lowered in increments of 5% until a X
nonempty set of surface residues was found. For comparison, S¼ pi log pi ; (1)
the same proteins were examined combining a similar percent- clusters
age cutoff measure with the ‘‘closeness centrality’’ method
applied by Amitai et al.,5 which takes into account the network
properties of the protein as a whole. In an implementation simi- where pi is the observed probability of the side chain of a resi-
lar to that of the local connectivity method (details given in Sec- due being in cluster i. Full details are given in Section 3 of Sup-
tion 1 of Supporting Information), binding-site residues were porting Information.
predicted, and a comparison was made of the relative perform- We note that the entropy measured here differs from the
ance of the methods. sequence entropy, another property which has been linked with
The local connectivity method was applied to a group of sys- residues of functional importance.51 Here, a value of S equal to
tems representing areas of current pharmacological interest. zero indicates that the conformation of the side chain of a resi-
These included rhodopsin, the prototypical G-protein–coupled due is roughly constant across the set of structures, whereas
receptor (GPCR)42 because of its prominence as a representative increasing variance in the conformation of a residue leads to
of possibly the most common drug target.43–46 Six structures increasing values of S.
taken from five systems in which small molecules have been
demonstrated to inhibit the formation of protein–protein interac- Docking
tion47 were also studied [PDB codes: 1PY2 (IL-2—protein part- Redocking of the X-ray crystallographic ligand was carried out
ner is IL-2R), 1R6N (N-terminal transactivation domain of for nine systems using autodock 4.52 A large docking box was
human papillomaviruses E2 helicase—protein partner is the E1 used that covered approximately a quarter of the protein. The
helicase), 1RV1 and 1T4E (MDM2—protein partner is p53), Spearman rank correlation coefficient was used to determine
1Y2F (ZipA—protein partner is FtsZ), and 2YXJ (Bcl-xL—pro- whether the root mean square displacement (RMSD) of the
tein partner is BAK). To account for potential conformational docked ligand relative to the crystallographic coordinates corre-
changes induced in ligand binding, a parallel set of structures, in lated with connectivity of the residues contacting the ligand.
which the structure of the protein had been determined in the
absence of the ligand, was subjected to the same analysis (Pdb Protein–Protein Interactions
codes: 1M47, 1R6K, and 1Z1M (residues 25–109), 1Z1M (resi-
dues 16–111), 1F7W, and 1MAZ). Thirty protein complexes (listed in Section 4 of Supporting In-
formation) were retrieved from the PQS Protein Quaternary
k-Means Method Structure server53 to enhance the probability that the dimeric
interface is genuine rather than an artifact of nonspecific crystal
In a second method for identifying binding sites, referred to as packing. The connectivity of the protein was determined over
the k-means method, an adaptation of the k-means algorithm48 the surface residues as above. In addition, the protein was exam-
was implemented. This identifies residues in the binding site ined by the evolutionary trace (ET) method using in-house
based on a global measure of connectivity across the whole of code,8,9 and the results indicated that 13 of 60 had only one
the protein surface. The method divides the protein surface into interface, whereas the ET results of the remaining proteins indi-
clusters, with cluster centers, or seed points, determined by a cated the presence of additional interfaces or binding sites
weighted averaging of connectivity scores of residues in each besides the crystallographic dimer interface. These 13 structures
cluster. The k-means method was applied to the proteins in the were studied to determine whether connectivity could be used to

Journal of Computational Chemistry DOI 10.1002/jcc


4 Illingworth et al. • Vol. 00, No. 00 • Journal of Computational Chemistry

identify the protein-interaction interface. The ET method sorts


residues into bins according to residue conservation, and this
data were converted into a measure analogous to connectivity,
the ET score, defined according to the formula

ETscore ðiÞ ¼ N  binðiÞ þ 1; (2)

where N is the total number of bins from the ET method (here,


20 bins were used), and i is a residue in the protein that is sorted
by the ET method into bin i. The ET score is large if the residue
i is highly conserved, and small if the residue i is poorly con-
served.

Connectivity and Residue Type

An investigation was carried out into the dependence of connec-


tivity on residue type. It has been observed that catalytic resi-
dues in the active sites of enzymes are more likely to be charged
or polar than residues elsewhere in the protein.54 Bulkier amino Figure 1. The distribution of connectivity (normalized by the total
acids might reasonably be expected to have a higher connectiv- number of neighbors in the protein set) for 270 proteins from
ity score, and if these amino acids were also more likely to be PDBSelect, for all surface residues (dashed) and for residues that
involved in interactions with ligands that would result in a bias are in contact with a ligand (solid).
toward higher connectivity in ligand-binding sites. For both of
the MEROPS and PDBSelect sets of proteins, the number of is on an average about 25% higher in the binding site than
each of the 20 amino acids occurring in the binding site, on the across the surface as a whole. This is a somewhat surprising
protein surface, and in the protein as a whole was determined. result as the volume occupied by the ligand will ensure that the
Mean connectivity scores were calculated for each of the 20 connectivity of the binding site is inevitably reduced as contacts
amino acids, considering residues on the surface of proteins to the ligand are not included. Here, we note that, because of its
across each of the two sets in turn. These mean connectivity grouping of contacts, the second method of measuring connec-
scores were used to calculate ‘‘expected’’ connectivity scores for tivity gives significantly lower connectivity values. However, for
the ligand-binding sites of each protein, calculated by counting both sets, the difference between the average surface and aver-
the number of occurrences of each amino acid in the binding age binding-site connectivities is greater when this second
site, summing the mean connectivity scores for each amino acid, (grouped) method is used, indicating a positive contribution of
and dividing by the number of residues in the binding site. distant contacts to the high connectivity of ligand-binding sites.
These ‘‘expected’’ scores were compared to observed connectiv- The individualized results for proteins in the MEROPS dataset
ity values. are shown in Table 2 and indicate that for 59 of these 61 pro-
Software
Table 1. Mean Connectivity for the MEROPS and PDBSelect Datasets
The software is available from the authors at ftp://ftp.essex.ac. Containing 61 and 270 Proteins, Respectively.
uk/pub/oyster/Connectivity/Connectivity.tar.
Connectivity

Results Mean highest Mean average

Connectivity and Ligand Binding Dataset/Methods Surface Binding site Surface Binding site Da

A clear difference was observed between the connectivity of res- MEROPS1 15.4 13.9 6.6 8.2 3.5
idues in the binding site and other residues on the protein sur- MEROPS2 6.4 5.7 2.6 3.4 4.0
face. Figure 1 shows the relative distribution of residue connec- PDBSelect1 14.8 13.0 6.5 7.6 2.7
tivity for residues from the 270 proteins in the PDBSelect data- PDBSelect2 5.9 5.2 2.5 3.1 2.9
set, for all surface residues, and for residues in the binding site.
Residues in the binding site generally have higher connectivity, The numbers 1 and 2 indicate the method of determining connectivity. 1
indicates the inclusive method, 2 indicates the grouped method. Columns
with comparatively fewer having a connectivity between 0 and
2 and 3 show the mean value of the maximal residue connectivity. Col-
7, and comparatively more having a connectivity between 7 and umns 4 and 5 show the mean value of the average residue connectivity of
13. For each protein, the average connectivity for residues in the residues on the surface as a whole and in the binding site of proteins in
surface and for surface residues in the binding site was calcu- each set. Column 6 gives the difference between columns 4 and 5.
lated. Table 1 presents the statistics for these values under the a
Difference between surface and binding-site average in standard devia-
different connectivity measures and shows that the connectivity tions of the surface figure.

Journal of Computational Chemistry DOI 10.1002/jcc


Connectivity and Binding-Site Recognition 5

Table 2. Connectivity on the Protein Surface and at the Ligand-Binding Table 2. (Continued)
Site for 61 Proteins from the MEROPS Database.
Maximum connectivity Mean connectivity
Maximum connectivity Mean connectivity
Protein Whole surface Binding site Whole surface Binding site
Protein Whole surface Binding site Whole surface Binding site
1uk4 17 14 6.52 6.92
1a16 15 15 6.67 9.00 1wht 18 18 6.96 9.58
1afq 16 16 6.47 7.90 2rmp 17 14 6.62 8.27
1ajq 17 15 6.22 11.10 3apr 14 13 6.64 8.09
1ao0 19 19 6.41 7.27 4aig 14 13 6.40 6.94
1arc 16 13 6.64 8.71 5sga 16 14 6.93 7.81
1ayu 15 12 6.77 7.58
1b6a 14 14 6.87 9.55 Column 3 is denoted in bold when the maximum connectivity is at the
1bh6 14 13 6.88 7.52 binding site, and column 5 is denoted in bold when the mean connectiv-
1bil 13 12 6.00 6.29 ity of the biding site is greater than mean connectivity of the surface as
1bmq 15 13 6.18 7.56 a whole (including the binding site).
1bqy 15 13 6.45 7.85
1bru 17 12 6.48 9.69
1bxo 15 14 6.66 8.45 teins, the mean connectivity of the binding site is higher than
1c24 14 14 6.87 10.20 the mean over the whole of the protein surface (the exceptions
1cgh 18 16 6.58 9.41 being 1cmx and 1d4l). A similar result to that for binding-site
1cmx 11 9 4.94 3.25 residues was obtained for catalytic residues. Of 58 proteins in
1cv8 16 12 6.48 8.06 the MEROPS set for which catalytic residue information was
1cvr 15 12 6.83 8.64 available, the average connectivity of the catalytic residues was
1cvz 13 12 6.58 7.61 higher than that for the protein as a whole (including surface
1czi 18 17 7.90 8.33
and buried residues) in 54 cases (Supporting Information Table
1d4l 12 9 4.87 4.80
5). A number of distinctly different proteins in Tables 1, 2, and
1dmt 16 15 7.46 9.78
1dy9 13 12 6.42 7.33 5 share common ligands or ligand types. For example, analysis
1eag 15 13 6.74 7.77 of the PDBSelect set revealed 18 proteins that bound ATP.
1f2o 15 14 6.82 9.17 Analysis of the proteins against the CATH database showed that
1fh0 16 16 7.04 8.19 each contained either a two- or three-layer a-beta sandwich do-
1fjs 16 16 7.01 8.46 main. The observation of higher connectivity applied in each of
1ft7 16 16 7.16 9.75 the 18 cases, regardless of the fold. Six of these proteins had the
1fxy 15 14 6.50 8.32 Walker A box motif (www.expasy.ch/prosite/PS00017); for each
1ga6 16 13 6.81 8.20 of these proteins, the average connectivity for residues in the
1gec 14 13 6.82 8.41
motif was higher than the average for the surface.
1gmy 19 14 6.95 8.20
Representative plots of connectivity for the systems with pdb
1gvu 14 14 6.57 8.45
1hne 15 14 6.50 8.88 codes 1c24 (methionine aminopeptidase), 1s4v (cysteine endo-
1hpg 16 13 6.70 7.44 peptidase), 1cvr (gingipain R), and 1cmx ((ubiquitin C-terminal
1i76 14 13 6.29 8.06 hydrolase) are shown in Figure 2 (lhs)—these four examples
1kug 17 17 6.61 7.50 were chosen to be as diverse as possible in terms of the quality
1lf2 16 13 6.48 8.04 of the results obtained. For the first two of these systems, the
1ls5 14 13 6.74 7.93 higher degree of connectivity of the ligand-binding site can
1lyb 15 12 5.56 6.47 clearly be seen. For the system 1cvr (Fig. 2c), higher connectiv-
1m4h 15 14 6.68 8.32 ity is revealed by the statistics, though it is not so obviously visi-
1me4 15 14 6.59 7.81
ble. The system 1cmx (Fig. 2d) is one of the two examples from
1mu0 16 16 7.06 12.00
the 61 MEROPS proteins for which the connectivity in the bind-
1n1m 16 15 7.09 11.00
1nqc 15 14 6.88 8.33 ing site was lower than for the surface as a whole. Examination
1nw9 14 14 5.86 6.92 of this system shows two ligands binding in a low-connectivity
1onx 15 15 6.90 9.92 region. However, because one of these is bound covalently, there
1pfx 16 14 6.69 8.94 are clearly additional principles affecting the binding.
1pwu 17 15 6.84 7.69
1qjj 14 14 6.31 9.10 Connectivity and Residue Type
1qrp 16 14 6.62 8.14
1qs8 16 13 6.63 7.93 Summing across all residues in all proteins in the MEROPS set,
1s2k 17 14 6.52 8.61 the average connectivity for a residue on the surface of a pro-
1s4v 15 12 6.55 8.36 tein, averaged across all residues, was found to be 6.67. Eleven
1smr 17 15 6.60 7.33
amino acids had average connectivity scores higher than this
(continued) (Gln, Val, Arg, Ile, His, Leu, Cys, Met, Phe, Tyr, and Trp,

Journal of Computational Chemistry DOI 10.1002/jcc


6 Illingworth et al. • Vol. 00, No. 00 • Journal of Computational Chemistry

24% higher than the expected value, with a higher connectivity


score occurring in 56 of 61 cases. Similar results were obtained
for proteins in the PDBSelect set, as described in Section 5 of
Supporting Information.

Ligand Docking and Binding-Site Prediction

Binding-Site Prediction

When the local connectivity method for binding-site prediction


was applied to the MEROPS set, 190 high-connectivity residues
were identified, an average of just over three per protein. Of
these, 94 or 49% were in the ligand-binding site, with at least
one point in the binding site in 51 of 61 cases. By comparison,
an implementation of the ‘‘closeness centrality’’ method5 applied
to the same set of structures also identified a residue in the bind-
ing site in 51 of 61 cases. The structures for which the respec-
tive methods failed were different for the two algorithms, with
only two binding sites not identified by either method, suggest-
ing that the two approaches are complementary; the local con-
nectivity method did how identify more catalytic residues (Sup-
porting Information Table S1).
The k-means algorithm gave an alternative method of binding-
site prediction. Representative plots showing output from the
k-means algorithm for the MEROPS systems with pdb codes
1c24, 1s4v, 1cvr, and 1cmx are shown in Figure 2 (rhs). For 1c24
and 1cvr, Figures 2e and 2g, a high-connectivity seed point is
present in the ligand-binding site. The result for 1cvr is particu-
larly significant, as the ligand-binding site is not obvious from a
visual inspection of the connectivity of the protein surface, but is
identified by the k-means algorithm. In cases where the average
connectivity in the binding site is lower than the average for the
protein surface, such as in 1cmx, Figure 2h, the k-means algorithm
is extremely unlikely to identify the correct ligand-binding site.
In general, the ligands did not necessarily bind exclusively to
a single k-means cluster but rather tended to bind to residues in
two or more clusters. This is a reasonable result as the determi-
Figure 2. Connectivity plotted over the surface of (a) 1c24, (b) nation of the number of clusters in k-means analyses is some-
1s4v, (c) 1cvr, and (d) 1cmx; red indicates high connectivity, and what arbitrary and the MEROPS peptide-like ligands tend to be
blue indicates low connectivity. The ligand is displayed as a black large. Thus, the binding of a ligand to more than one cluster
stick diagram. Clusters of residues (varying colors) and high-connec- should not be taken to imply multiple experimental binding
tivity points (white) generated by the k-means algorithm for the pro-
modes. For 37 of the 61 proteins, a seed point was found to
teins (e) 1c24, (f) 1s4v, (g) 1cvr, and (h) 1cmx. The ligand is shown
occur in the set of ligand-contacting residues, which is quite
in a contrasting color (blue or yellow, stick diagram).
high considering that the average size of these clusters was 71
residues (range 17–249). This compared to an expected value of
scoring on an average between 6.9 and 11.3), and analysis of no more than 18 of 61 if seed points were chosen at random
the ligand-binding sites showed a prevalence of these residues. (calculation details given in Section 6 of Supporting Informa-
In total, these 11 higher connectivity residues accounted for tion), confirming the association between high-connectivity
49% of the residues in the ligand-binding sites, but for only points and ligand binding. Thus, although the local connectivity
38% of residues in the surface as a whole. Although appearing method of binding-site identification is superior, the k-mains
to support the hypothesis that higher connectivity in binding approach may nevertheless be helpful in niche applications: for
sites is the result of differing residue composition, this factor did example, in proteins where there is no obvious region of high
not account for the higher connectivity values observed in the connectivity (Figs. 2c and 2g).
binding sites. Using the average connectivity scores for each of
the amino acids, an expected value was generated for the con- Ligand Docking
nectivity of the binding sites of each of the proteins in the
MEROPS set, based on their residue composition. Over this set, The correlations between the RMSD score for the docked pose
the mean connectivity score of residues in the binding site was and the connectivity for the residues contacting the ligand show

Journal of Computational Chemistry DOI 10.1002/jcc


Connectivity and Binding-Site Recognition 7

two distinct types of behavior. Where a large docking box is


used so that the ligand may bind to multiple distinct regions of
the protein, there is a high correlation, showing that connectivity
may be used as a filter to help determine the correct binding
site. However, where the ligand docks to a single binding site,
there is a low correlation, showing that connectivity may not be
used to determine the correct pose. Thus, Supporting Informa-
tion Table S6 shows the Spearman rank correlation coefficient
calculated for the correlation between the RMSD score and con-
nectivity for the residues contacting the ligand for each of the
nine dockings. In Supporting Information Table S6, where the
different ligand poses are in the same binding site (1ett and
1pph), and in Supporting Information Table S7, the correlation
coefficient values are of low significance, but of the remaining
seven dockings in Supporting Information Table S6, six have a
strong correlation between high connectivity and low RMSD,
the exception being 1lgr. Figures 3a and 3b show the strong cor-
relation for the two ligands with the highest correlation coeffi-
cients (0.78 and 0.83 for 1dwb and 4ts1, respectively). The
results for AMP binding to glutamine synthetase, pdb code 1lgr,
are shown in Figure 3c, and here a region of high connectivity
can be seen 13–17 Å from the location of the ligand in the crystal
structure, and further examination of the structure suggests that
this represents an alternative binding site. Glutamine synthetase
binds both ATP and glutamate simultaneously in a single binding
channel. In the docking, the AMP ligand was sometimes incor-
rectly docked in the glutamate site, about 14 Å from its location
in the crystal structure. A later study of the same enzyme is
recorded in the pdb file 1fpy,55 in which phosphinothrycin occu-
pies the glutamate binding site. Thus, in all cases studied, includ-
ing 1lgr, high connectivity is associated with a genuine binding
site, albeit a binding site for a different ligand in the case of 1lgr.
This observation raises the possibility that connectivity could also
be used to identify allosteric-binding sites.

Allosteric-Binding Sites

Examination of the allosteric-binding sites showed higher than Figure 3. The relationship between connectivity and RMSD for var-
average connectivity in the majority of cases. Details of the ious ligands redocked to their native crystal structure using auto-
results for these systems are shown in Table 3. In each of the 11 dock: (a) 4ts1, (b) 1dwb, and (c) 1lgr. A large docking box was
used in these docking experiments. Images of docked poses and
cases, the primary ligand-binding site had a higher connectivity
filtered docked poses for 4TS1 and 1LGR are given in Supporting
than did the surface in general. Results for the allosteric ligands
Information Figure 1.
identified a binding-site connectivity greater than that for the
protein surface in 9 of 11 cases. This suggests that allosteric-
binding sites are more difficult to identify using connectivity
than the primary binding site, although connectivity may be of such as the muscarinic subfamily,56,57 where the allosteric-bind-
use in many cases. In fructose-1,6-bisphosphatase (1q9d, 2fhy), ing site is associated with the extracellular region. For class C
human mitochondrial NAD(P)1-dependent malic enzyme GPCRs where the main ligand binds to the N-terminus, the cav-
(1gz3), and ribonucleotide reductase R1 protein (4r1r), the ity within the helical bundle, which has high connectivity (see
enzyme is an oligomer and the higher connectivity of the allo- below), becomes the allosteric-binding site.58 In addition, for
steric-binding site includes contributions from both monomers.35 rhodopsin there is a high connectivity region in the cytoplasmic
Thus, in two of cases where the allosteric-binding site did not domain that has recently been shown to bind a small molecule
have high connectivity, inclusion of a second protein chain (see below).
reversed this result. For each of the allosteric-binding sites, simi-
lar results were obtained regardless of whether the results were GPCRs
performed on the bound cases or the unbound cases. Connectivity analysis of rhodopsin showed a markedly increased
The results on rhodopsin (detailed below) are also relevant in connectivity in the ligand-binding site, with an average score of
this respect because allosteric ligands are of interest for systems 11.5, compared with an average for the surface of 7.4. Three of

Journal of Computational Chemistry DOI 10.1002/jcc


8 Illingworth et al. • Vol. 00, No. 00 • Journal of Computational Chemistry

Table 3. Connectivity (conn.) Data for the Surface, Primary Binding Site, and Allosteric-Binding Site.

Primary ligand Allosteric ligand

Connectivity Connectivity

No. pdb Ligand Protein Binding site Ligand Protein Binding site

1 1q9da Fructose 7.0 9.3 GC252-354 7.0 5.3


2fhyb Fructose 7.6 9.3 GC252-354 7.6 7.7
2 1iwhc Heme 7.1 8.6 Benzafibrate 7.1 6.6
3 1gz3 Tartronate 7.4 10.6 Fumarate 7.4 10.6
4 4rlra GDP 7.1 7.6 dTTP 7.1 6.6
4rlrb GDP 7.1 7.6 dTTP 7.1 7.2
5 2wu1 NAG-6-phosphate 6.9 8.2 NAG-6-phosphate 6.9 6.4
6 3hrf ATP 6.4 7.5 P48 6.4 7.9
7 3ijo Glutamate 6.6 11.2 Althiazide 6.6 8.8
8 3ddw/3ceh C27H22ClN3O4 7.3 8.5 AVE5688 7.4 8.3
9(i) 3h1p/1shj DEVD_CHO 6.6 8.1 DICA 6.5 7.1
9(ii) 3h1p/1shl DEVD_CHO 6.6 8.1 FICA 6.6 7.8
10 2wmq/3jvr C9H11N3O2S 6.5 6.9 C16H13Cl2N3O2 6.2 6.6

There are 11 allosteric ligands for the 10 systems listed.


a
Monomer.
b
Dimer.
c
Generally, the k-means seed point was not part of the binding site except for the main binding site for 1iwh and the
allosteric-binding site for 2fhy.

the six highest connectivity residues are in extracellular loop 2, G-protein activation59; the equivalent residue is in contact with
underlying the importance of this loop for the stability of rho- an octyl glucoside molecule in the squid rhodopsin structure.60
dopsin-like GPCRs. Using the local connectivity method, high- However, Leu 72 has been implicated in small molecule bind-
connectivity residues were identified at the 85% and 90% cutoff ing, namely chlorin to Meta-II state rhodopsin61 (Klein-Seethara-
levels. At the 90% level all residues found were adjacent to the man, personal communication). It is possible that the high-con-
main ligand-binding site within the helical bundle. At the 85% nectivity sites on the lipid-facing regions of the GPCR are asso-
level, although most residues were close to the ligand, four addi- ciated with protein–protein interactions9,16,62 (see below).
tional residues were identified, these being Tyr10 (N-terminus),
Leu72, Leu77 (interface between helix 1 and helix 2), and Small-Molecule Inhibition of Protein–Protein Interfaces
Thr93 (interface between helix 2 and helix 3). Leu72 is on the
intracellular face of the receptor and may participate in binding Results of the application of the local connectivity method are
G-protein, but its mutation to cysteine has negligible effects on shown in Table 4. In the structures with bound ligands, between

Table 4. High-Connectivity Point Data for Protein–Protein Interface Systems, from Structures Derived with
and Without Ligand Bound.

Ligand bound Ligand unbound

PDB Cutoff (%) #Points #Points correct PDB Cutoff (%) #Points #Points correct

1PY2 85 1 0 1M47 85 6 1
1R6N 90 3 0 1R6K 90 7 3
1RV1 90 1 1 1Z1Ma 90 4 2
1T4E 90 1 1 1Z1Mb 90 7 3
1Y2F 90 2 1 1F7W 90 4 2
2YXJ 85 3 1 1MAZ 80 2 1

The cutoff describes the maximal percentage (in increments of 5%) such that at least one residue on the protein sur-
face contained an atom with a least this percentage of the maximal local connectivity for the protein as a whole.
#Points describe the number of such residues, and #points correct describes the number of these residues in the
ligand-binding site.
a
Residues 25–109.
b
Residues 16–111.

Journal of Computational Chemistry DOI 10.1002/jcc


Connectivity and Binding-Site Recognition 9

residue 96 is in the binding site of the 1PY2 ligand; of the


remaining five high-connectivity residues, residue 45 is a bind-
ing site for other ligands (PDB codes 1M48, 1M4B, 1QVN, and
1PW6), residue 92 is a lock residue for a closed loop,63 residue
20 is adjacent to a lock residue (lock residues are inevitably in
high-connectivity regions19,20,63), and residues 120 and 122 are
at the interface with the receptor (PDB code 2B5I).

Protein–Protein Interactions

The average connectivity in the protein–protein interface was


5.9 and is thus slightly lower than the average of 6.4 for the sur-
face as a whole. Of the 60 protein monomers studied, only 14
had a higher than average connectivity in the interface region,
and only 16 had a higher than expected connectivity based on
residue composition. This demonstrates a significant difference
between ligand-binding sites and protein–protein interaction
sites, namely that although high connectivity is associated with
ligand-binding sites, it is not necessarily associated with pro-
tein–protein interfaces.
In contrast, the average evolutionary trace score (ETscore)
value for the interface of a monomer was 11.2 compared with
10.3 for the surface as a whole. Forty-six of the 60 monomers had
a higher average ETscore in the interface region than for the surface
as a whole. When only the 13 monomers with a single high-
ETscore patch were considered, the overall results were not signifi-
cantly different (Supporting Information Table S2). Three of the
Figure 4. High-connectivity residues identified in the structures of 13 had a higher average connectivity in the interface region than
human MDM2 (a) with bound benzadiazepinedione and (b) without for the surface as a whole, whereas 10 of the 13 had a higher
a bound ligand, and of interleukin-2 (c) with bound small molecule
average ETscore in the interface than for the surface as a whole.
SP4206 and (d) without a bound ligand. The structure of the protein
is shown in cartoon format. Residues containing high-connectivity
points are shown in spacefill mode and colored green (residues in
the binding site) and red (residues not in the binding site). The Connectivity and Side-Chain Conformation
ligand is shown in stick format and colored in yellow. The relationship between connectivity and the extent of confor-
mational entropy resulting from ligand binding is reported in
Table 5. In each of the sets, the residues with low connectivity
1 and 3, high-connectivity points were identified in each protein, (i.e., in the bottom third of residues ordered by connectivity
with a high-connectivity point in the binding site in four of six score) are likely to have nonzero conformational entropy over a
cases. A total of 36% of high-connectivity points were in the range of proteases (with probability 70% in the case of the set
binding sites. Repeating the method on the unbound structures 1eed). Conversely, residues with high connectivity (i.e., in the
gave improved results. At least one high-connectivity point was top third of residues) are likely to have zero conformational en-
found to occur in the binding site in each case, with 40% of tropy (with probability 67% in the case of the set 1eed).
high-connectivity points occurring in the binding sites. Figure 4
shows the locations of high-connectivity residues for two of the
structures that were examined. For the system 1T4E, a single Discussion
high-connectivity residue was identified, located in the ligand-
binding site. Repeating the connectivity analysis on the unligated There are several approaches for determining binding sites in
structure gave seven high-connectivity residues, of which three proteins. These include data mining of sequence information
were in the binding site. Of the remaining high-connectivity res- within multiple sequence alignments, as exemplified by the ET
idues, three were adjacent to the high-connectivity residues in method, sequence entropy and correlated mutations,6–10 search-
the binding site, whereas the remaining residue was on the other ing for suitable pockets based on geometric criteria,1–3 and
side of the protein. The IL-2 system (1PY2) is an example of a methods based on energy criteria,12,13 amongst which GRID is
structure where the binding site is not located by the method—a the classic method.11 Such studies have shown that it is gener-
single high-connectivity residue is identified, away from the ally easier to detect ligand-binding sites than protein binding
binding site. Such high-connectivity residues could indicate sites.15 The method has advantages over sequence-based meth-
alternative functions, such as other binding sites or regions ods in that it is not reliant on either having sufficient homolo-
important for protein folding.63 Thus, in the unligated IL-2 gous sequences or on the assumption that all homologous
structure, six high-connectivity residues are identified, of which sequences share the same fold at the point of interest. Neverthe-

Journal of Computational Chemistry DOI 10.1002/jcc


10 Illingworth et al. • Vol. 00, No. 00 • Journal of Computational Chemistry

Table 5. Normalized Proportions of Residues with Zero and Nonzero proteins folding in vitro. In this respect, therefore, it would seem
Conformational Entropy (i.e., Residues That Do or Do Not Retain the that ligand binding has some features in common with protein
Same Conformation Across a Variety of Structures) Grouped by folding. This work may therefore offer some insight into the
Connectivity Score for Four Sets of Identical Protein Structures.
mechanism of molecular chaperones64 because it indicates that
ligands binding to high-connectivity regions may help to stabi-
Connectivity
lize the fold if these high-connectivity regions are indeed key to
Lead PDB/# ligands Entropy Low Medium High folding.63

1eed Zero 0.30 0.53 0.67 Protein–Protein Interactions


/20 Nonzero 0.70 0.47 0.33
1bma Zero 0.20 0.58 0.72 The analysis of connectivity over a protein surface is potentially
/15 Nonzero 0.80 0.42 0.28 problematical as it is not necessarily known a priori how many
1thl Zero 0.37 0.61 0.55 interfaces reside on the protein surface. The application of ET
/12 Nonzero 0.63 0.39 0.45 analysis to identify proteins that are likely to have a single inter-
1pad Zero 0.25 0.50 0.72 face is therefore a useful control. Nevertheless, the connectivity
/8 Nonzero 0.75 0.50 0.28 results are similar for both the set of 60 proteins with potentially
two or more interfaces and the smaller control set of 13 proteins
The results show, for example, that if a residue in the set 1eed has low
with only one predicted interface, namely that connectivity is
connectivity, there is a 70% chance that it will have nonzero entropy;
here, the other 20 ligands bind to proteins identical to 1eed albeit with not particularly useful for identifying protein–protein interaction
different ligands and PDB codes. sites and that in this regard the ET method is superior (see Sec-
tion 4 of Supporting Information). However, because sequence-
based methods such as entropy51 and ET6 can identify both
less, the method described here is essentially independent of ligand-binding sites and protein-interaction sites, but cannot nec-
these other methods and so may also be used in combination. essarily distinguish between the two, the ET method could be
Thus, of the 61 proteins from the MEROPS database that used in combination with connectivity to assess the nature of the
were studied, 59 were found to have a higher connectivity for binding site. More significantly, this result indicates a fundamen-
the ligand-binding site than for the protein surface as a whole. tal difference between ligand-binding sites and protein-interac-
The exceptions prove that a high connectivity is not an essential tion sites that is related to the fold of the protein as the former
property of a ligand-binding site, but the statistics as a whole, but not the latter is likely to have high connectivity. This funda-
carried out across a wide spectrum of different proteins, con- mental difference is probably one contributing factor toward the
firms that high local connectivity is a desirable property. Else- observation that small-molecule ligands are more likely than
where, network analysis has indicated that global connectivity is proteins to bind with high affinity. The design of ligands to in-
associated with binding sites,5 but no explanation as to why was hibit large protein–protein interfaces is notoriously difficult, but
given. Here, we propose that the most likely interpretation of the use of connectivity to identify the small proportion of pro-
this is the role of a high connectivity in reducing the entropy tein–protein interfaces that have the correct connectivity features
change of the enzyme on binding. Because of the interactions over a smaller area that are conducive for ligand binding may
involved in ligand binding, residues next to a bound ligand are assist in this process.
likely to have relatively low conformational entropy. The en-
tropy change in binding is thus more favorable if the residues in Application to Drug Design
their unbound state also have low (conformational) entropy. We
suggest that interactions with adjacent residues (i.e., those in Insights into the link between connectivity and ligand binding
close physical proximity) provide this ordering in the binding may prove useful in the identification of binding sites on a pro-
site. This conjecture is supported by the data on conformational tein surface. Coloring residues according to their connectivity
entropy in Table 5 because residues with high-connectivity score provide a visual means by which potential binding sites
scores are more likely to present a consistent side-chain confor- may be found. Alternatively, the k-means method gives precise
mation across a range of structures. points on the protein surface at which binding may take place.
The results obtained bear some similarity to work independ- As can be seen in Figure 2, the algorithm can be effective in
ently carried out on Brownian dynamics-derived force con- identifying binding sites even when they are not obvious from
stants22,23 that revealed an association between residues with the protein geometry or from a superficial examination of the
high force constants and catalytic residues. A correlation was residue connectivity (Fig. 2g). Thus, the system 1cvr is an exam-
noted between the force constant and the number of ‘‘neighbors ple of a system where the ligand-binding site is identified by the
of neighbors’’ (c.f. contacts of contacts) of a residue.23 We have k-means method, in which the ligand is not noticeably in a cleft
shown a very strong correlation between the number of in the protein surface. Both methods do, however, give a general
‘‘neighbors of neighbors’’ of a residue and the square of the guide to docking location, rather than a precise indication.
number of contacts (results not shown). Attempts to rescore ligands from a docking run according to
The results also bear some similarity to work on protein fold- the connectivity score of the residues they contact were success-
ing, which indicates that high connectivity plays an important ful where the docking used a large docking box and results of
role in the transition structure for folding18–20 for small two-state the docking run occupied different potential binding sites. Here,

Journal of Computational Chemistry DOI 10.1002/jcc


Connectivity and Binding-Site Recognition 11

connectivity was useful in assessing whether the ligand had restricted. In this respect, ligand binding seems to utilize princi-
docked to a valid binding site, even if the binding site is not the ples that are also common to protein folding, though these prin-
correct one for that ligand. However, the method was not useful ciples do not apply in the same way to the prediction of pro-
in distinguishing between different ligand poses in the same tein–protein interactions. We have shown that connectivity can
binding site, presumably because the variations in connectivity be used to indicate the expected degree of side-chain movement
were much less. on ligand binding; it can also be used to search for allosteric-
It remains, however, that for the general identification of binding sites and to delineate poses that have been docked to a
ligand-binding sites, the connectivity score can prove a powerful valid binding site from those that have not. The low connectivity
and effective tool. This may be particularly useful in the search observed at most protein–protein interfaces may underlie the dif-
for allosteric-binding sites. Here, we have analyzed 11 systems ficulty in designing protein interface inhibitors, but, nevertheless,
for which allosteric-binding sites have recently been identified connectivity can be used to indicate possible binding sites for
by X-ray crystallography.31–34,58 In all of these cases, although protein–protein interface inhibitors.
residues close to the primary ligand (or functional group) consis-
tently had higher connectivity than the surface in general, allo-
steric-binding sites were more difficult to identify, although pos- Acknowledgment
itive results were obtained in most cases.
The link between connectivity and conformational entropy of The authors acknowledge Garrett Morris for a copy of autodock
residues suggests that connectivity could be used to enhance vir- and for helpful discussions.
tual screening experiments involving a flexible receptor. Prior
knowledge of which residues in a receptor are likely to retain the References
same or a restricted set of conformations could result in the
design of more efficient docking experiments and result in consid- 1. Laskowski, R. A.; Luscombe, N. M.; Swindells, M. B.; Thornton, J.
erable savings in computational time. As has been shown here, M. Protein Sci 1996, 5, 2438.
residues of higher connectivity are significantly more likely to be 2. Hendlich, M.; Rippmann, F.; Barnickel, G. J Mol Graph Model
rigid over a range of receptor structures and therefore make likely 1997, 15, 359.
candidates to be fixed in a flexible docking simulation. 3. Venkatachalam, C. M.; Jiang, X.; Oldfield, T.; Waldman, M. J Mol
Targeting drugs to inhibit protein–protein interactions offers Graph Model 2003, 21, 289.
huge therapeutic potential but is nevertheless extremely difficult, 4. Wangikar, P. P.; Tendulkar, A. V.; Ramya, S.; Mali, D. N.;
despite recent encouraging successes.47,65–67 Given that protein Sarawagi, S. J Mol Biol 2003, 326, 955.
interfaces generally possess fewer druggable topological fea- 5. Amitai, G.; Shemesh, A.; Sitbon, E.; Shklar, M.; Netanely, D.;
Venger, I.; Pietrokovski, S. J Mol Biol 2004, 344, 1135.
tures,47 it may be necessary to probe multiple protein conforma-
6. Lichtarge, O.; Bourne, H. R.; Cohen, F. E. Proc Natl Acad Sci USA
tions, e.g., as generated by X-ray crystallography (here) or by 1996, 93, 7507.
molecular dynamics,68 to identify possible binding sites, includ- 7. Sowa, M. E.; He, W.; Slep, K. C.; Kercher, M. A.; Lichtarge, O.;
ing hidden binding sites, that could be exploited to inhibit pro- Wensel, T. G. Nat Struct Biol 2001, 8, 234.
tein–protein interactions. Although the results obtained were 8. Dean, M. K.; Higgs, C.; Smith, R. E.; Bywater, R. P.; Snell, C. R.;
mixed in quality, in the six systems studied here, it was suffi- Scott, P. D.; Upton, G. J.; Howe, T. J.; Reynolds, C. A. J Med
cient to study the protein monomers to identify possible binding Chem 2001, 44, 4595.
sites because in each system at least one of the high-connectivity 9. Thummer, R. P.; Campbell, M. P.; Dean, M. K.; Frusher, M. J.;
points identified was located in the binding site. Scott, P. D.; Reynolds, C. A. J Mol Neurosci 2005, 26, 113.
10. Filizola, M.; Olmea, O.; Weinstein, H. Protein Eng 2002, 15, 881.
11. Goodford, P. J. J Med Chem 1985, 28, 849.
12. Laurie, A. T.; Jackson, R. M. Bioinformatics 2005, 21, 1908.
Conclusions 13. Harris, M. R.; Kihlen, M.; Bywater, R. P. J Mol Recognit 1993, 6, 111.
14. Glick, M.; Robinson, D. D.; Grant, G. H.; Richards, W. G. J Am
We have developed a family of methods based on residue–resi- Chem Soc 2002, 124, 2337.
due connectivity for characterizing binding sites and have 15. Burgoyne, N. J.; Jackson, R. M. Bioinformatics 2006, 12, 1335.
applied them to various problems in computational medicinal 16. Vohra, S.; Chintapalli, S. V.; Illingworth, C. J.; Reeves, P. J.; Mulli-
chemistry. The basic method allows high-connectivity binding neaux, P. M.; Clark, H. S.; Dean, M. K.; Upton, G. J.; Reynolds, C.
sites to be visualized, as in Figures 2a–2d. The k-means algo- A. Biochem Soc Trans 2007, 35, 749.
rithm (Figs. 2e–2h) automatically detects centers of clusters of 17. Gouldson, P. R.; Dean, M. K.; Snell, C. R.; Bywater, R. P.; Gkou-
high connectivity and may be particularly useful when there is tos, G.; Reynolds, C. A. Protein Eng 2001, 14, 759.
no obvious region of high connectivity. The algorithm for deter- 18. Paci, E.; Lindorff-Larsen, K.; Dobson, C. M.; Karplus, M.; Vendrus-
colo, M. J Mol Biol 2005, 352, 495.
mining residues with high local connectivity may also be used
19. Vendruscolo, M.; Dokholyan, N. V.; Paci, E.; Karplus, M. Phys Rev
in automatic mode, as illustrated in Figure 4.
E 2002, 65, 061910.
Taken together, the results presented here have confirmed a 20. Vendruscolo, M.; Paci, E.; Dobson, C. M.; Karplus, M. Nature
role for high connectivity in ligand-binding sites. 2001, 409, 641.
It is possible that ligands preferentially bind to regions of 21. Dokholyan, N. V.; Li, L.; Ding, F.; Shakhnovich, E. I. Proc Natl
high connectivity to minimize the loss of entropy on binding Acad Sci USA 2002, 99, 8637.
that would occur if the motion of flexible side chains were to be 22. Sacquin-Mora, S.; Lavery, R. Biophys J 2006, 90, 2706.

Journal of Computational Chemistry DOI 10.1002/jcc


12 Illingworth et al. • Vol. 00, No. 00 • Journal of Computational Chemistry

23. Sacquin-Mora, S.; Laforet, E.; Lavery, R. Proteins 2007, 67, 350. 45. Schlyer, S.; Horuk, R. Drug Discovery Today 2006, 11, 481.
24. Rawlings, N. D.; Morton, F. R.; Barrett, A. J. Nucleic Acids Res 46. Higgs, C.; Reynolds, C. A. In Theoretical Biochemistry—Processes
2006, 34, D270. and Properties of Biological Systems, Vol.9 ( Theoretical and Com-
25. Hobohm, U.; Sander, C. Protein Sci 1994, 3, 522. putational Chemistry Series); Eriksson, L., Ed.; Elsevier: Amster-
26. Hobohm, U.; Scharf, M.; Schneider, R.; Sander, C. Protein Sci 1992, dam, 2001; pp. 341–376.
1, 409. 47. Wells, J. A.; McClendon, C. L. Nature 2007, 450, 1001.
27. Sobolev, V.; Sorokine, A.; Prilusky, J.; Abola, E. E.; Edelman, M. 48. MacQueen, J. Some Methods for Classification and Analysis of Mul-
Bioinformatics 1999, 15, 327. tivariate Observations; University of California Press: California,
28. Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; 1967; pp. 281–297.
Weissig, H.; Shindyalov, I. N.; Bourne, P. E. Nucleic Acids Res 49. Fairlie, D. P.; Tyndall, J. D.; Reid, R. C.; Wong, A. K.; Abbenante,
2000, 28, 235. G.; Scanlon, M. J.; March, D. R.; Bergman, D. A.; Chai, C. L.;
29. Hubbard, S. J.; Thornton, J. M. NACCESS Computer Program; Uni- Burkett, B. A. J Med Chem 2000, 43, 1271.
versity College London: London, 1993. 50. Lovell, S. C.; Word, J. M.; Richardson, J. S.; Richardson, D. C.
30. Porter, C. T.; Bartlett, G. J.; Thornton, J. M. Nucleic Acids Res Proteins 2000, 40, 389.
2004, 32, D129. 51. Elcock, A. H.; McCammon, J. A. Proc Natl Acad Sci USA 2001,
31. Choe, J. Y.; Nelson, S. W.; Arienti, K. L.; Axe, F. U.; Collins, T. 98, 2990.
L.; Jones, T. K.; Kimmich, R. D.; Newman, M. J.; Norvell, K.; 52. Morris, G. M.; Goodsell, D. S.; Huey, R.; Olson, A. J. J Comput-
Ripka, W. C.; Romano, S. J.; Short, K. M.; Slee, D. H.; Fromm, H. Aided Mol Des 1996, 10, 293.
J.; Honzatko, R. B. J Biol Chem 2003, 278, 51176. 53. Henrick, K.; Thornton, J. M. Trends Biochem Sci 1998, 23, 358.
32. Shibayama, N.; Miura, S.; Tame, J. R.; Yonetani, T.; Park, S. Y. 54. Bartlett, G. J.; Porter, C. T.; Borkakoti, N.; Thornton, J. M. J Mol
J Biol Chem 2002, 277, 38791. Biol 2002, 324, 105.
33. Yang, Z.; Lanks, C. W.; Tong, L. Structure 2002, 10, 951. 55. Gill, H. S.; Eisenberg, D. Biochemistry 2001, 40, 1903.
34. Eriksson, M.; Uhlin, U.; Ramaswamy, S.; Ekberg, M.; Regnstrom, 56. Huang, X. P.; Ellis, J. Mol Pharmacol 2007, 71, 759.
K.; Sjoberg, B. M.; Eklund, H. Structure 1997, 5, 1077. 57. May, L. T.; Leach, K.; Sexton, P. M.; Christopoulos, A. Annu Rev
35. von Geldern, T. W.; Lai, C.; Gum, R. J.; Daly, M.; Sun, C.; Fry, E. Pharmacol Toxicol 2007, 47, 1.
H.; bad-Zapatero, C. Bioorg Med Chem Lett 2006, 16, 1811. 58. Malherbe, P.; Kratochwil, N.; Knoflach, F.; Zenner, M. T.; Kew, J. N.;
36. Rudino-Pinera, E.; Rojas-Trejo, S. P.; Calcagno, M. L.; Horjales, E. Kratzeisen, C.; Maerki, H. P.; Adam, G.; Mutel, V. J Biol Chem
(in press); PDB code 2WU1. 2003, 278, 8340.
37. Hindie, V.; Stroba, A.; Zhang, H.; Lopez-Garcia, L. A.; Idrissova, 59. Klein-Seetharaman, J.; Hwa, J.; Cai, K.; Altenbach, C.; Hubbell, W.
L.; Zeuzem, S.; Hirschberg, D.; Schaeffer, F.; Jorgensen, T. J. D.; L.; Khorana, H. G. Biochemistry 1999, 38, 7938.
Engel, M.; Alzari, P. M.; Biondi, R. M. Nat Chem Biol 2009, 5, 60. Murakami, M.; Kouyama, T. Nature 2008, 453, 363.
758. 61. Balem, F.; Yanamala, N.; Klein-Seetharaman, J. Photochem Photo-
38. Ptak, C. P.; Ahmed, A. H.; Oswald, R. E. Biochemistry 2009, 48, biol 2009, 85, 471.
8594. 62. Simpson, L. M.; Taddese, B.; Wall, I. D.; Reynolds, C. A. Curr
39. Anderka, O.; Loenze, P.; Klabunde, T.; Dreyer, M. K.; Defossa, E.; Opin Pharmacol 2010, 10, 30.
Wendt, K. U.; Schmoll, D. Biochemistry 2008, 47, 4683. 63. Chintapalli, S. V.; Yew, B. K.; Upton, G. J. G.; Illingworth, C. J.
40. Hardy, J. A.; Lam, J.; Nguyen, J. T.; O’Brien, T.; Wells, J. A. Proc R.; Reeves, P. J.; Parkes, K. E.; Snell, C. R.; Reynolds, C. A.
Natl Acad Sci USA 2004, 101, 12461. J Comput Chem 2010, DOI: 10.1002/jcc/21562.
41. Vanderpool, D.; Johnson, T. O.; Ping, C.; Bergqvist, S.; Alton, G.; 64. Bernier, V.; Lagace, M.; Lonergan, M.; Arthus, M. F.; Bichet, D.
Phonephaly, S.; Rui, E.; Luo, C.; Deng, Y. L.; Grant, S.; Quenzer, G.; Bouvier, M. Mol Endocrinol 2004, 18, 2074.
T.; Margosiak, S.; Register, J.; Brown, E.; Ermolieff, J. Biochemis- 65. Arkin, M. R.; Randal, M.; DeLano, W. L.; Hyde, J.; Luong, T. N.;
try 2009, 48, 9823. Oslob, J. D.; Raphael, D. R.; Taylor, L.; Wang, J.; McDowell, R. S.;
42. Li, J.; Edwards, P.; Burghammer, M.; Villa, C.; Schertler, G. F. X. Wells, J. A.; Braisted, A. C. Proc Natl Acad Sci USA 2003, 100,
J Mol Biol 2004, 343, 1409. 1603.
43. Bondensgaard, K.; Ankersen, M.; Thogersen, H.; Hansen, B. S.; 66. Yin, H.; Hamilton, A. D. Angew Chem Int Ed Engl 2005, 44, 4130.
Wulff, B. S.; Bywater, R. P. J Med Chem 2004, 47, 888. 67. Fry, D. C. Biopolymers 2006, 84, 535.
44. Brauner-Osborne, H.; Wellendorph, P.; Jensen, A. A. Curr Drug 68. Schames, J. R.; Henchman, R. H.; Siegel, J. S.; Sotriffer, C. A.; Ni,
Targets 2007, 8, 169. H. H.; McCammon, J. A. J Med Chem 2004, 47, 1879.

Journal of Computational Chemistry DOI 10.1002/jcc

You might also like