Professional Documents
Culture Documents
Upload 8
Upload 8
Upload 8
1, JANUARY/FEBRUARY 2014 33
Abstract—Protein side chains populate diverse conformational ensembles in crystals. Despite much evidence that there is widespread
conformational polymorphism in protein side chains, most of the X-ray crystallography data are modeled by single conformations in the
Protein Data Bank. The ability to extract or to predict these conformational polymorphisms is of crucial importance, as it facilitates
deeper understanding of protein dynamics and functionality. In this paper, we describe a computational strategy capable of predicting
side-chain polymorphisms. Our approach extends a particular class of algorithms for side-chain prediction by modeling the side-chain
dihedral angles more appropriately as continuous rather than discrete variables. Employing a new inferential technique known as
particle belief propagation, we predict residue-specific distributions that encode information about side-chain polymorphisms. Our
predicted polymorphisms are in relatively close agreement with results from a state-of-the-art approach based on X-ray crystallography
data, which characterizes the conformational polymorphisms of side chains using electron density information, and has successfully
discovered previously unmodeled conformations.
Index Terms—Conformational ensemble, conformational polymorphism, mixture distribution, particle belief propagation, side-chain
prediction, von-Mises distribution
1 INTRODUCTION
problem could be formulated as a MAP estimation problem, applied to the side-chain prediction problem, and the
which could be solved using belief propagation (BP). They results have been comparable to such state-of-the-art soft-
also considered a relaxed version of the IP problem and ware as SCWRL 3.0 [52], [54].
solved the resulting convex problem with BP [53]. Given potential functions as defined by (4), messages are
Besides these optimization approaches, Li et al. [31] computed along the edges of the graph. The sum-product
recently showed that side-chain conformations also can be BP algorithm is used to compute marginal distributions; its
decided from backbone information without optimization. recursive message updating equation is as follows:
" #
ðtÞ
X Y ðt1Þ
3 MAIN METHOD OF INFERENCE mi!j ðrj Þ ¼ ri ðri Þrij ðri ; rj Þ mk!i ðri Þ ; (5)
ri 2Ri k2N ðiÞnj
We have extended the class of side-chain prediction algo-
rithms that are based on BP. To model a protein molecule where Ri is the (discrete) state space of ri , and mi!j repre-
ðtÞ
with a graphical model, the backbone is regarded as being sents the message from node i to j at iteration t. The nota-
fixed and the residues r1 ; r2 ; . . . ; rn are regarded as nodes. tion, N ðiÞ, denotes the set of nodes that are neighbours of i.
The side chain at each node is described by a sequence of For proteins, the discrete state space Ri is simply the set of
dihedral angles, collectively stored as a vector, for example, rotamers for residue ri .
ri ¼ ðxi1 ; xi2 ; . . . ; xi4 Þ. The exact number of dihedral angles The max-product BP algorithm is used for finding MAP
depends on the type of amino acid. The objective is to find estimates; its recursive updating equation is given by
the minimal-energy conformation,
" #
" # ðtÞ
Y ðt1Þ
X n n X
X mi!j ðrj Þ ¼ max ri ðri Þrij ðri ; rj Þ mk!i ðri Þ : (6)
min El ðri Þ þ Ep ðri ; rj Þ ; (1) ri 2Ri
k2N ðiÞnj
r1 ;r2 ;...;rn
i i j>i
where El is the (local) intrinsic energy of a residue, Ep is the 3.2 Particle Belief Propagation
pairwise energy between two residues, and n is the total For many applications, for example, in bioinformatics, com-
number of residues. The optimization algorithm itself is puter vision, and other fields, the state space is continuous
independent of the choice of the energy function. We will rather than discrete, or it can be discrete but very large so
say more about the energy function later in Section 4.3. that enumerating all possible states at each iteration
Consider a graphical model G, with a set of vertices V and becomes very inefficient. PBP has been developed recently
a collection of edges E. If ri denotes the random variable to address precisely such difficulties [22]. Our goal is to
associated with node i, then the joint probability distribu- model the conformation of side chains more appropriately
tion of r ¼ ðr1 ; r2 ; . . . ; rn Þ can be factorized as follows: as continuous rather than discrete random variables. Hence,
Y Y PBP is a crucial piece of technology for our work.
P ðrÞ ¼ ri ðri Þ rij ðri ; rj Þ: (2) The key idea for PBP is the following: At iteration t, if
i2V ði;jÞ2E ðtÞ
we draw ri from a certain trial distribution Wi , then (5)
The functions, ri ðri Þ and rij ðri ; rj Þ, are called node- and can be written as an “importance-sampling corrected
edge-potentials, respectively. For a given energy function expectation” [22]:
Econf , the Boltzmann distribution is given by " 3
ri ðri Þ Y
ðtÞ
mi!j ðrj Þ ¼ Er W ðtÞ rij ðri ; rj Þ
ðt1Þ
mk!i ðri Þ5:
1 Econf ðrÞ ðtÞ
Pconf ðrÞ ¼ exp ; (3) i i Wi ðri Þ k2N ðiÞnj
Z T
(7)
where T is a temperature parameter and Z is a normalizing
constant. Clearly, using the energy function (1), the distribu- It is generally not possible to express the expectation
tion (3) can be expressed in the form of (2), with Er W ðtÞ ðÞ in analytic form, but it can be obtained using
i
i
El ðri Þ Monte Carlo techniques [10], [32]. Thus, for each node ri ,
ri ðri Þ / exp ; ð1Þ ð2Þ ðLÞ
T the idea is to sample a set of L particles fri ; ri ; . . . ; ri g
(4) ðtÞ
Ep ðri ; rj Þ from Wi , typically using a Markov Chain Monte Carlo
rij ðri ; rj Þ / exp :
T (MCMC) technique such as the Metropolis-Hastings algo-
rithm [32], and then approximate (7) by
" ðlÞ
3.1 Belief Propagation ðtÞ 1X L
ri ri ðlÞ Y ðt1Þ ðlÞ
mb i!j ðrj Þ ¼ r r ; rj b k!i ri
m :
Belief propagation is an efficient local message-passing L l¼1 W ðtÞ rðlÞ ij i k2N ðiÞnj
i i
algorithm [43] for making inferences on graphical mod-
els. It performs exact inference if the graph G is a tree, (8)
and approximate inference for general graphs. If the
graph contains cycles, the algorithm is often referred to In the simplest case, the particles’ locations may remain
as “loopy BP” and there is no convergence guarantee, but unchanged [22] but, generally, each particle’s location is
many groups have reported excellent results nonetheless, updated at the end of each BP iteration. This is what allows
for example, [15], [16], [42]. Indeed, loopy BP has been PBP to explore a continuous state space and not be
36 IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 11, NO. 1, JANUARY/FEBRUARY 2014
restricted to a fixed set of choices specified a priori, such as a data (e.g., the dihedral angles for side chains), Mardia et al.
rotamer library; and it is accomplished by re-sampling the [37] introduced the more general, multivariate von-Mises
ðtÞ distribution (11) by extending the bivariate model of Singh et
particles at each iteration from the distribution, Wi , for
example, using the Metropolis-Hastings algorithm. At itera- al. [45]. More recently, Mardia et al. [38] have extended single
ðtÞ
tion t, a natural choice of Wi is the current belief of node i, MVM distributions to mixtures of MVMs. For example, they
fitted a 4D mixture of MVMs to model the two backbone
ðtÞ
Y ðt1Þ
Wi ðri Þ / ri ðri Þ mb k!i ðri Þ: (9) dihedral angles (f and c) and the first two side-chain
k2N ðiÞ dihedral angles (x1 and x2 ) of the amino acid, ILE.
In our work, we also used mixtures of MVMs (see Sec-
The max-product version for PBP was first given by tion 4.4 below). However, our work differs fundamen-
Kothapa et al. [27]: tally from those of Mardia et al. [38]. While they fitted a
Y single mixture to model the conformation of a given
ðtÞ ðlÞ ðlÞ ðt1Þ ðlÞ
b i!j ðrj Þ ¼ max ri ri rij ri ; rj
m b k!i ri
m : amino acid using data from different proteins, we used
l¼1;...;L
k2N ðiÞnj mixtures of MVMs to approximate the node- and edge-
(10) potential functions, and different mixture models were
specified for each ri and rij on a protein-by-protein basis.
Their paper [27] explains in more detail why the factor
ðtÞ
Wi does not appear in the square brackets of (10).
4.3 Energy Functions
We now give more details about the energy function (1). We
4 FAST APPROXIMATION OF ri ; rij used a very simple energy function that essentially acted as
We used mixtures of von-Mises distributions as a fast way a collision detector. This simple energy function was first
to approximate the potential functions ri ðri Þ and rij ðri ; rj Þ. popularized by SCWRL [11] and later adopted by TreePack
It is well-known that mixture densities can be used to [51] as well.
approximate any arbitrary distribution. For example, in Given two atoms, a1 and a2 , SCWRL approximates the
nonparametric belief propagation [46], mixtures of Gaus- van der Waals pairwise potential energy between them by
sians are used to model and/or approximate ri ðri Þ and 8
> 0; if d > R0 ;
rij ðri ; rj Þ. Since we are working in the space of dihedral < d
angles, the von-Mises distribution is more appropriate than Eapprx:vdw ða1 ; a2 Þ ¼ k2 þ k2 ; if k1 R0 d R0 ; (12)
>
: R0
the Gaussian distribution (see Section 4.1 below). Emax ; if d < k1 R0 ;
where d is the distance between a1 and a2 ; R0 is the sum of
4.1 The von-Mises Distribution
their radii; Emax ¼ 10; k1 ¼ 0:8254; and k2 ¼ Emax =ð1 k1 Þ.
The univariate VM distribution is a probability distribution
For Carbon (C), Nitrogen (N), Oxygen (O), and Sulfur (S),
on a circle. The multivariate generalization was introduced
fixed radii of 1.6, 1.3, 1.7, and 1.7 were used, respectively.
by Mardia et al. [37]. In particular, u 2 Rd is said to follow
Treating each residue simply as a set of atoms, the pair-
the multivariate von-Mises distribution, MVMðm; k; LÞ, if its
wise energy function, Ep ðri ; rj Þ in (1), is merely calculated
density function is given by
by summing over all atom-pairs:
X
1 T sT ðuÞLsðuÞ Ep ðri ; rj Þ ¼ Eapprx:vdw ða; bÞ:
fðu; m; k; LÞ ¼ exp k cðuÞ þ ; (11)
Zðk; LÞ 2 a2ri ;b2rj
whereas our state space is a set of dihedral angles that 5.1 PBPMixVM
describe the conformation of each residue. Therefore, a con- To reflect the fact that we used PBP for inference and mix-
version must take place every time the potential functions tures of (multivariate) VM distributions for approximating
are evaluated. This is not difficult in principle, and there is the potential functions, from this point on we shall refer to
existing, standard software for performing such a conver- our algorithm as PBPMixVM. We implemented it in C++,
sion, for example, the Biochemical Algorithms Library using the overall architecture provided by GraphLab
(BALL) [18]. In order to speed up our computation, how- (http://select.cs.cmu.edu/code/graphlab/) [33].
ever, we used a mixture of von-Mises distributions as a
crude approximation to these potential functions.
5.2 The Kolmogorov-Smirnov (KS) Test
For example, for residues described by four dihedral
For each residue, we used a two-sample Kolmogorov-Smir-
angles (e.g., the amino acid LYS), the approximation to the
nov test [7] to compare the results from Ringer with those
node potential function would be:
from PBPMixVM. The KS-test is a widely used nonparamet-
X
b
ri ðri Þ ¼ wt ft ðxi1 ; . . . ; xi4 Þ; (14) ric test for determining whether two distributions are signif-
t icantly different from each other. The significance level for
the KS-test was set to be 0:05.
where ft MVMðmt ; kt ; Lt Þ is a (multivariate) von-Mises Fig. 2 provides some visual illustrations of what the KS-
density function, given by (11), and w Pt denotes the weight test does. Based on the p-values from individual KS-tests,
of the mixture component t such that wt ¼ 1. we selected four residues, whose corresponding p-values
We used simple spherical or radial basis mixtures, that is, from the aforementioned KS-tests were 0:99, 0:77, 0:53, and
we set Lt ¼ 0. This is the same as treating the dihedral 0:25, respectively. Such a selection is meant to illustrate
angles as being locally independent—in the future, we plan varying levels of agreement between the PBPMixVM results
to generalize this by modeling the local correlations among and the Ringer results—larger p-values indicate higher lev-
the x-angles. We chose kt ¼ ð10; 10; . . . ; 10Þ for all t. For els of agreement.
each residue i, the set of discrete rotamers from the (back- The four selected residues are: residue ASN23, 1A2P,
bone-dependent) rotamer library were used as mixture cen- Chain C [39], p-value ¼ 0:99; residue HIS88, 1F41, Chain B
ters, mt . We specified the weight of each component t to be [20], p-value ¼ 0:77; residue GLU134, 1B67 [8], p-value
1 ¼ 0:53; and residue LYS87, 1F4P [44], p-value ¼ 0:25. For
wt ¼ a pðmt jf; cÞ þ ð1 aÞ ; these four residues, the polymorphisms in their respective
#ðmixture componentsÞ
x1 -angles as characterized by Ringer and predicted by
PBPMixVM are displayed next to each other in Fig. 2,
where a was chosen to be 0:1, and pðmt jf; cÞ is the rotamer ordered by p-values from top to bottom.
probability for rotamer mt from the rotamer library.
For the edge potential, our approximation was 5.3 Comparative Results
XX From the individual KS-tests, we computed two sum-
b
rij ðri ; rj Þ ¼ wt;t0 ft;t0 ðxi1 ; . . . ; xidi ; xj1 ; . . . ; xjdj Þ; mary statistics to evaluate the overall agreement between
t t0
our results from PBPMixVM and those given by Ringer:
(15) 1) percent in agreement—the fraction of residues for which
where di and dj are the number of dihedral angles for ri and the KS-test failed to show a statistically significant differ-
rj , respectively; ft;t0 is, again, a spherical MVM density ence; and 2) mean p-value—the average p-value from indi-
function, with kt;t0 ¼ ð10; 10; . . . ; 10Þ and Lt;t0 ¼ 0 for all t; t 0 vidual KS-tests.
as before. If the pairwise edge potential between two Table 1 shows the overall results for the first two dihedral
rotamers—one for residue ri and another for rj —exceeded angles, x1 and x2 , averaged over all residues across all pro-
0:05, their dihedral angles were concatenated together and teins in our data set. Table 2 shows results for x1 only, aver-
used as a mixture center, mt;t0 , with wt;t0 / rðmt ; mt0 Þ. aged over residues of the same amino acid type across all
The expressions (14) and (15) can be viewed as approxi- proteins in our data set. Based on the KS-tests, these
mations of ri and rij using a crudely specified single-layer proteins’ side-chain polymorphisms as predicted by
radial basis function network (RBFnet) in angular space. PBPMixVM and as described by Ringer agreed for well
over 50 percent of all the residues, and the average p-value
from these KS-tests was about 0:20, much higher than the
5 EXPERIMENTS AND RESULTS typical cutoff value of 0:05.
We used a data set containing 362 diverse proteins that were We also can observe from Table 2 that, for some residue
previously analyzed by the Alber group [29] with their types, including not only those having just one x-angle, for
Ringer program (http://ucxray.berkeley.edu/ringer.htm). example, Serine (SER), Threonine (THR), Valine (VAL), but
The protein data files were retrieved from the PDB. We also those having relatively large structures and hence, mul-
used functions and modules from the BALL [18] to read tiple x-angles, for example, Asparagine (ASN), Glutamine
and process the PDB files. The electron density maps for the (GLN), Glutamic Acid (GLU), Methionine (MET), the agree-
proteins, which were required to run the Ringer program, ment between PBPMixVM and Ringer can be noticeably
were downloaded from the Electron Density Server [26]. higher than the overall average (see Table 1)—the percent in
All experiments were performed on the Sharcnet system agreement for some residue types was close to 70 and 80 per-
(http://www.sharcnet.ca/). cent, and the corresponding mean p-value was close to 0:25
38 IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 11, NO. 1, JANUARY/FEBRUARY 2014
TABLE 2
Comparison of PBPMixVM and Ringer Results: x1 Only,
Average Results over All Residues of the Same Amino
Acid Type, across All Proteins in the Data Set
TABLE 3
Comparison of PBPMixVM and Ringer Results: x1 Only,
Average Results over All Residues of the Same Amino
Acid Type and Having the Same Secondary Structure,
across All Proteins in the Data Set
among the dihedral angles. We also believe that combining [22] A. Ihler and D. McAllester, “Particle Belief Propagation,” Proc.
12th Int’l Conf. Artificial Intelligence and Statistics (AISTATS ’09),
the results from PBPMixVM with those from state-of-the-art pp. 256-263, 2007.
side-chain prediction algorithms, such as SCWRL [28] and [23] R. Ishima and D.A. Torchia, “Protein Dynamics from NMR,”
TreePack [51], can further enhance the accuracy and reliabil- Nature Structural Biology, vol. 7, pp. 740-743, 2000.
ity of the predicted side-chain polymorphisms. [24] R.P. Joosten, T.A.H. te Beek, E. Krieger, M.L. Hekkelman, R.W.W.
Hooft, R. Schneider, C. Sander, and G. Vriend, “A Series of PDB
Related Databases for Everyday Needs,” Nucleic Acids Research,
REFERENCES vol. 39 (database issue), pp. D411-D419, 2010, doi: 10.1093/nar/
gkq1105.
[1] T. Akutsu, “NP-Hardness Results for Protein Side-chain Packing,” [25] W. Kabsch and C. Sander, “Dictionary of Protein Secondary Struc-
Genome Informatics, S. Miyano and T. Takagi, ed., vol. 8, pp. 180- ture: Pattern Recognition of Hydrogen-Bonded and Geometrical
186, Universal Academy Press, 1997. Features,” Biopolymers, vol. 22, pp. 2577-2637, 1983.
[2] H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. [26] G.J. Kleywegt, M.R. Harris, J.Y. Zou, T.C. Taylor, A. W€ahlby, and
Weissig, I.N. Shindyalov, and P.E. Bourne, “The Protein Data T.A. Jones, “The Uppsala Electron-Density Server,” Acta Crystal-
Bank,” Nucleic Acids Research, vol. 28, pp. 235-242, 2000. lographica, vol. D60, pp. 2240-2249, 2004.
[3] D.D. Boehr, R. Nussinov, and P.E. Wright, “The Role of Dynamic [27] R. Kothapa, J. Pacheco, and E. Sudderth, “Max-Product Particle
Conformational Ensembles in Biomolecular Recognition,” Nature Belief Propagation,” technical report, Dept. of Computer Science-
Chemical Biology, vol. 5, no. 11, pp. 789-796, 2009. Brown Univ., 2011.
[4] M. Bower, F. Cohen, and R. Dunbrack, “Prediction of Protein Side- [28] G.G. Krivov, M.V. Shapovalov, and R. Dunbrack, “Improved Pre-
Chain Rotamers from a Backbone-Dependent Rotamer Library: A diction of Protein Side-Chain Conformations with SCWRL4,” Pro-
New Homology Modeling Tool,” J. Molecular Biology, vol. 267, teins, vol. 77, no. 4, pp. 778-795, 2009.
pp. 1268-1282, 1997. [29] P.T. Lang, H.-L. Ng, J.S. Fraser, J.E. Corn, N. Echols, M. Sales, J.M.
[5] R. Bruschweiler, “New Approaches to the Dynamic Interpretation Holton, and T. Alber, “Automated Electron-Density Sampling
and Prediction of NMR Relaxation Data from Proteins,” Current Reveals Widespread Conformational Polymorphism in Proteins,”
Opinion in Structural Biology, vol. 13, pp. 175-183, 2003. Protein Science, vol. 19, pp. 1420-1431, 2010.
[6] A. Canutescu, A. Shelenkov, and R. Dunbrack, “A Graph-Theory [30] C. Lee and S. Subbiah, “Prediction of Protein Side-Chain Confor-
Algorithm for Rapid Protein Side-Chain Prediction,” Protein Sci- mation by Packing Optimization,” J. Molecular Biology, vol. 213,
ence, vol. 12, no. 9, pp. 2001-2014, 2003. pp. 373-388, 1991.
[7] W. J. Conover, Practical Nonparametric Statistics. John Wiley & [31] S.C. Li, D. Bu, and M. Li, “Residues with Similar Hexagon Neigh-
Sons, 1971. borhoods Share Similar Side-chain Conformations,” IEEE/ACM
[8] K. Decanniere, A.M. Babu, K. Sandman, J.N. Reeve, and U. Trans. Computational Biology and Bioinformatics, vol. 9, no. 1,
Heinemann, “Crystal Structures of Recombinant Histones pp. 240-248, Jan./Feb. 2012.
HMfA and HMfB from the Hyperthermophilic Archaeon [32] J. S. Liu, Monte Carlo Strategies in Scientific Computing. Springer-
Methanothermus Fervidus,” J. Molecular Biology, vol. 303, Verlag, 2001.
no. 1, pp. 35-47, 2000. [33] Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J.
[9] J. Desmet, M. De Maeyer, and I. Lasters, “The Dead-End Elimina- Hellerstein, “GraphLab: A New Framework for Parallel
tion Theorem and Its Use in Protein Side-Chain Positioning,” Machine Learning,” Proc. 26th Conf. Uncertainty in Artificial
Nature, vol. 356, pp. 539-542, 1992. Intelligence (UAI ’10), 2010.
[10] A. Doucet, N. de Freitas, and N.J. Gordon, Sequential Monte Carlo [34] B. Ma, S. Kumar, C.-J. Tsai, and R. Nussinov, “Folding Fun-
Methods in Practice. Springer-Verlag, 2001. nels and Binding Mechanisms,” Protein Eng., vol. 12, pp. 713-
[11] R. Dunbrack and M. Kurplus, “Backbone-Dependent Rotamer 720, 1999.
Library for Proteins: Application to Side-Chain Prediction,” [35] B. Ma, S. Kumar, C.-J. Tsai, H. Wolfson, N. Sinha, and R. Nus-
J. Molecular Biology, vol. 230, pp. 543-574, 1993. sinov, “Protein-Ligand Interactions: Induced Fit,” Encyclopedia
[12] G. Elidan, I. McGraw, and D. Koller, “Residual Belief Propagation: of Life Sciences, John Wiley & Sons, doi:10.1038/
Informed Scheduling for Asynchronous Message Passing,” Proc. npg.els.0003140., 2002.
22nd Conf. Uncertainty in Artificial Intelligence (UAI ’06), 2006. [36] K.V. Mardia, C.C. Taylor, and G.K. Subramaniam, “Protein
[13] H. Frauenfelder, G.A. Petsko, and D. Tsernoglou, “Temperature- Bioinformatics and Mixtures of Bivariate von-Mises Distribu-
Dependent X-Ray-Diffraction as a Probe of Protein Structural tions for Angular Data,” Biometrics, vol. 63, pp. 505-512,
Dynamics,” Nature, vol. 280, pp. 558-563, 1979. 2007.
[14] K.K. Frederick, M.S. Marlow, K.G. Valentine, and A.J. Wand, [37] K.V. Mardia, G. Hughes, C.C. Taylor, and H. Singh, “A Multivari-
“Conformational Entropy in Molecular Recognition by Proteins,” ate Von-Mises Distribution with Applications to Bioinformatics,”
Nature, vol. 448, pp. 325-329, 2007. Canadian J. Statistics, vol. 36, no. 1, pp. 99-109, 2008.
[15] W. Freeman and E. Pasztor, “Learning to Estimate Scenes from [38] K.V. Mardia, J.T. Kent, Z. Zhang, C.C. Taylor, and T. Hamelryck,
Images,” Proc. Advances in Neural Information Processing Systems 11 “Mixtures of Concentrated Multivariate Sine Distributions with
(NIPS ’99), 1999. Application to Bioinformatics,” J. Applied Statistics, vol. 39, no. 11,
[16] B. Frey, R. Koetter, and N. Petrovic, “Very Loopy Belief Propaga- pp. 2475-2492, 2012.
tion for Unwrapping Phase Images,” Proc. Advances in Neural [39] C. Martin, V. Richard, M. Salem, R. Hartley, and Y. Mauguen,
Information Processing Systems 14 (NIPS ’04), 2004. “Refinement and Structural Analysis of Barnase at 1.5 A Reso-
[17] R.F. Goldstein, “Efficient Rotamer Elimination Applied to Protein lution,” Acta Crystallographica D. Biological Crystallography, vol. 55,
Side Chains and Related Spin Glasses,” Biophysical J., vol. 66, pp. 386-398, 1999.
pp. 1335-1340, 1994. [40] B.W. Matthews, “Peripatetic Proteins,” Protein Science, vol. 19,
[18] A. Hildebrandt, A.K. Dehof, A. Rurainski, A. Bertsch, M. Schu- pp. 1279-1280, 2010.
mann, N.C. Toussaint, A. Moll, D. St€ ockel, S. Nickels, S.C. [41] A. Mittermaier and L.E. Kay, “New Tools Provide New Insights in
Mueller, H.-P. Lenhof, and O. Kohlbacher, “BALL - Biochemical NMR Studies of Protein Dynamics,” Science, vol. 312, pp. 224-228,
Algorithms Library 1.3,” BMC Bioinformatics, vol. 11, article 531, 2006.
2010. [42] K. Murphy, Y. Weiss, and M. Jordan, “Loopy Belief Propaga-
[19] L. Holm and C. Sander, “Fast and Simple Monte Carlo Algorithm tion for Approximate Inference: An Empirical Study,” Proc.
for Side-Chain Optimization in Proteins: Application to Model,” 15th Conf. Uncertainty in Artificial Intelligence (UAI ’99), pp. 467-
Proteins: Structure, Function, and Genetics, vol. 14, pp. 213-223, 1992. 475, 1999.
[20] A. H€ ornberg, T. Eneqvist, A. Olofsson, E. Lundgren, and A.E. [43] J Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of
Sauer-Eriksson, “A Comparative Analysis of 23 Structures of the Plausible Inference. Morgan Kaufmann, 1988.
Amyloidogenic Protein Transthyretin,” J. Molecular Biology, [44] R.A. Reynolds, W. Watt, and K.D. Watenpaugh, “Structures
vol. 302, no. 3, pp. 649-669, 2000. and Comparison of the Y98H (2.0 A) and Y98W (1.5 A)
[21] J. Hwang and W. Liao, “Side-Chain Prediction by Neural Net- Mutants of Flavodoxin (Desulfovibrio Vulgaris),” Acta Crystal-
works and Simulated Annealing Optimization,” Protein Eng., lographica D. Biological Crystallography, vol. 57, pp. 527-535,
vol. 8, no. 4, pp. 363-370, 1995. 2001.
SOLTAN GHORAIE ET AL.: RESIDUE-SPECIFIC SIDE-CHAIN POLYMORPHISMS VIA PARTICLE BELIEF PROPAGATION 41
[45] H. Singh, V. Hnizdo, and E. Demchuk, “Probabilistic Model For Forbes Burkowski is an associate professor in
Two Dependent Circular Variables,” Biometrika, vol. 89, pp. 719- the David R. Cheriton School of Computer Sci-
723, 2002. ence, University of Waterloo, Ontario, Canada.
[46] E. Sudderth, A. Ihler, W. Freeman, and A. Wilsky, Over the past eight years, he has served as the
“Nonparametric Belief Propagation,” Proc. IEEE CS Conf. Com- director of bioinformatics and participated in the
puter Vision and Pattern Recognition (CVPR ’03), pp. 605-612, 2003. design of the undergraduate program in bioinfor-
[47] H. van den Bedem, A. Dhanik, J.C. Latombe, and A.M. Deacon, matics at the University of Waterloo. He has
“Modeling Discrete Heterogeneity in X-Ray Diffraction Data by been teaching the fourth year structural bioinfor-
Fitting Multi-Conformers,” Acta Crystallographica, vol. D65, matics course since 2003.
pp. 1107-1117, 2009.
[48] M. Vendruscolo, “Determination of Conformationally Heteroge-
neous States of Proteins,” Current Opinion in Structural Biology,
vol. 17, pp. 15-20, 2007. Shuai Cheng Li received the BSc (hons.) and
[49] J. Wang, M. Dauter, R. Alkire, A. Joachimiak, and Z. Dauter, the MSc degrees from the National University of
“Triclinic Lysozyme at 0.65 Angstrom Resolution,” Acta Crystal- Singapore (NUS) in 2001 and 2002, respectively.
lographica D. Biological Crystallography, vol. 63, pp. 1254-1268, 2007. Between 2002 and 2004, he was a full-time
[50] M.A. Wilson and A.T. Brunger, “The 1.0A Crystal Structure of research associate in the database research
Ca2þ -bound Calmodulin: An Analysis of Disorder and Implica- group at the NUS. He was a doctoral student with
tions for Functionally Relevant Plasticity,” J. Molecular Biology, Ming Li at the University of Waterloo, Ontario,
vol. 301, no. 5, pp. 1237-1256, 2000. Canada, between 2004 and 2009. His doctoral
[51] J. Xu and B. Berger, “Fast and Accurate Algorithms for Protein dissertation addressed the protein structure pre-
Side-Chain Packing,” J. ACM, vol. 53, no. 4, pp. 533-557, 2006. diction problem from various aspects. He was a
[52] C. Yanover and Y. Weiss, “Approximate Inference and Protein postdoctoral fellow with Prof. Richard M. Karp at
Folding,” Proc. Advances in Neural Information Processing Systems the University of California, Berkeley between 2009 and 2011. His cur-
(NIPS ’02), 2002. rent research interests include bioinformatics and algorithms, with a
[53] C. Yanover, T. Meltzer, and Y. Weiss, “Linear Programming focus on problems related to protein structures.
Relaxations and Belief Propagation: An Empirical Study,”
J. Machine Learning Research, vol. 7, pp. 1887-1907, 2006.
[54] C. Yanover and Y. Weiss, “Approximate Inference and Side-Chain Mu Zhu received the AB (magna cum laude)
Prediction,” Technical Report, School of Computer Science and degree in applied mathematics from Harvard Uni-
Engineering, Hebrew University of Jerusalem, 2007. versity, Cambridge, Massachussetts, in 1995,
and the PhD degree in statistics from Stanford
Laleh Soltan Ghoraie received the MSc degree University, Stanford, California, in 2001. He is an
in computer science from the University of Wind- associate professor of statistics at the University
sor, Ontario, Canada, in 2009. She is currently of Waterloo, Ontario, Canada. His primary
working toward the PhD degree at the University research interests include machine learning and
of Waterloo, Ontario, Canada. Her research multivariate analysis.
interests include probabilistic graphical models,
protein structure prediction, and conformational
ensembles.
" For more information on this or any other computing topic,
please visit our Digital Library at www.computer.org/publications/dlib.