John Mitchell James Mcdonagh Neetika Nath

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 26

John Mitchell; James McDonagh; Neetika Nath

Rob Lowe; Richard Marchese Robinson

1
RF-Score:
a Machine Learning Scoring Function
for Protein-Ligand Binding Affinities

• Ballester, P.J. & Mitchell, J.B.O. (2010)


Bioinformatics 26, 1169-1175
Calculating the affinities of protein-ligand complexes:

For docking
For post-processing docking hits
For virtual screening
For lead optimisation
For 3D QSAR
Within series of related complexes
For any general complex
Absolute (hard!)
Relative
A difficult, unsolved problem.
Three existing approaches …

1. Force fields
Three existing approaches …

2. Empirical Functions
Three existing approaches …

2. Empirical Functions
Three existing approaches …

3. Knowledge based
How knowledge-based scoring functions have worked …

P-L complexes from PDB


Assign atoms to types
Find histograms of type-type distances
Convert to an ‘energy’
Add up the energies from all P-L atom pairs
 This conversion of the histogram into an energy function
uses a “reverse Boltzmann” methodology.

 Thus it “assumes” that the atoms of protein and ligand are


independent particles in equilibrium at temperature T.

 For a variety of reasons, these are poor assumptions …


 Molecular connectivity: atom-atom distances are
miles from being independent.

 Excluded volume effects.

 No physical basis for assuming such an equilibrium.

 Changes in structure with T are small and not like


those implied by the Boltzmann distribution.
We thought about this …

… and wrote a paper saying

“It’s not true, but it sort of works”


We thought about this …

… and wrote a paper saying

“It’s not true, but it sort of works”


Then we had a better idea – could we dispense with the
reverse Boltzmann formalism?
 Instead of assuming a formula that relates the distance
distribution to the binding free energy …

… use machine learning to learn the relationship from


known structures and binding affinities.
 Instead of assuming a formula that relates the distance
distribution to the binding free energy …

… use machine learning to learn the relationship from


known structures and binding affinities.

 And persuade someone to pay for it!


Random Forest

Predicted binding affinity


Random Forest
● Introduced by Briemann and Cutler (2001)
● Development of Decision Trees (Recursive Partitioning):

● Dataset is partitioned into consecutively


smaller subsets

● Each partition is based upon the value of


one descriptor

● The descriptor used at each split is


selected so as to optimise splitting

● Bootstrap sample of N objects chosen from


the N available objects with replacement
 TheRandom Forest is a just forest of randomly
generated decision trees …

… whose outputs are averaged to give the final prediction


Building RF-Score

PDBbind 2007
Building RF-Score

PDBbind 2007
Validation results: PDBbind set

 Following method of Cheng et al. JCIM 49, 1079 (2009)


 Independent test set PDBbind core 2007, 195 complexes from 65 clusters
Validation results: PDBbind set

 RF-Score outperforms competitor scoring functions, at least on our test


 RF-Score is available for free from our group website
John Mitchell; James McDonagh; Neetika Nath

Rob Lowe; Richard Marchese Robinson

26

You might also like