Docking MD

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 46

An introduction into

“Docking”
and
“Molecular Dynamics simulations”

Univ. Ass. Dipl.-Ing. (FH) Dr. scient. med. Bernhard Knapp


Center for Medical Statistics, Informatics and Intelligent Systems
Department for Biosimulation and Bioinformatics
Medical University of Vienna / AKH (General Hospital)
bernhard.knapp@meduniwien.ac.at

23.02.2012 Bernhard Knapp 1

TOC

1. Basic biology knowledge


2. Docking
• Docking in general
• Example AutoDock
3. Molecular Dynamics
• Introduction
• Limitations
• Example Gromacs
3. Tutorial on PDB / jmol

23.02.2012 Bernhard Knapp 2


Basic biology knowledge

23.02.2012 Bernhard Knapp 3

Amino acids

ƒ Build up proteins (german “Eiweiß”)


ƒ all have the same basic structure (“backbone” consisting of an amine
group, a carboxylic acid group and a C-alpha atom) but differ in their
side-chain => residue (the side chain defines which AA it is)

ƒ 20 different canonical amino acids (AAs) are existing (that means 20


different side-chains)

23.02.2012 4
Wikimedia
Wikimedia

23.02.2012 5

Several amino acids are connected via „peptide bonds“

Wikimedia

23.02.2012 Bernhard Knapp 6


Then they are called:

peptide: > 1 AA
oligopeptide: < 10 (other sources state 30)
polypeptide: > 10 AAs
protein: > 50 AAs
macropeptide: > 100 AAs

monopeptide: 1 AA
dipeptide: 2 AA
tripeptide: 3 AA
tetrapeptide: 4 AA
pentapeptide: 5 AA
hexapeptide: 6 AA
heptapentide: 7 AA
octapeptide: 8 AA
nonapeptide: 9 AA
decapeptide: 10 AA
undecapeptide: 11 AAs
...
icosapeptide: 20 AAs
tricontapeptide: 30 AAs
tetracontapeptide: 40 AAs

… however the exact definitions differ (and you do not need to learn them for
the examination of this lecture!)

23.02.2012 Bernhard Knapp 7

Structure levels

ƒ Primary structure: the pure sequence of the AAs

ƒ Secondary structure: e.g. beta-sheet, alpha-helix, or turns

ƒ Tertiary structure: 3D arrangement of secondary structure


elements

ƒ Quaternary structure: several proteins together

Wikimedia

23.02.2012 Bernhard Knapp 8


How we can illustrate them
(also see the tutorial at the end)

23.02.2012 9

And what
about the
size of
proteins
and AAs?

[Janeway]
~20x20x20 nm ~13x6x5 nm

1 Nanometer == 10-9m == 0.0000000001m

2 more definitions:

ƒ Ligand: also known as (small) peptide, epitope, guest, antigenic


determinant
ƒ Receptor: also known as (big) protein, host, macro molecule

23.02.2012 Bernhard Knapp 12


Docking in general

23.02.2012 Bernhard Knapp 13

What does docking mean?

trying to find the „best matching“ between 2 molecules

23.02.2012 Bernhard Knapp 14


Who could fit to me?

/ / / / .
Let us try with this one …

- („induced fit“)

23.02.2012 15

23.02.2012 Bernhard Knapp 16


[Kitchen et al., 2004]

23.02.2012 Bernhard Knapp 17

Why is docking useful?

ƒ Docking (~Virtual Screening) is of paramount interest for drug discovery


ƒ For one target millions of different possible drugs can be tested
ƒ The best n matches will be tried in experiments
ƒ Will save time, resources and money

23.02.2012 Bernhard Knapp 18


Usually 3 steps

1) Decide how to search through the spatial space

2) Decide how flexible ligand and receptor can be

3) Decide how to score various parameter sets

23.02.2012 Bernhard Knapp 19

Where is the difficulty?

1) 6 degrees of freedom in 3d space (3 translational, 3 rotational)


2) 100+ degrees of freedom if we consider full flexibility of all bounds
3) nearly each atom interacts witch every other one

23.02.2012 Bernhard
B h d Knapp
K 20
Ad 1) Search Algorithms used (for spatial space)
ƒ Systematic docking
- Brute Force
- Fragmentation
- Database
ƒ Heuristic docking
- Monte Carlo
- Genetic algorithms
- Tabu search
ƒ Simulations Docking
- Molecular Dynamics
- Gradient (Energy) Methods

23.02.2012 Bernhard Knapp 21

Ad 2) Deciding about the flexibility


ƒ“rigid body” docking
- receptor and ligand are considered as 100% rigid
- very fast (6dfs only), but inaccurate

ƒ “induced fit” docking


- moveable [backbone| side] chains

ƒ “flexible ligand”
- only the ligand is considered als flexible, the receptor remains rigid
ƒ “full flexibility”
- computational very expensive

23.02.2012 Bernhard Knapp 22


Ad 3) Scoring functions (1/2)

ƒ Force Field based scoring function


- energy of the interaction and internal energy of the ligand
- combination of : Van der Waales, Lennard Jones, electrostatic energy,

- e.g. D-Score, GoldScore, AutoDock, CHARMM, …
ƒ empirical scoring functions
- Trying to reproduce experimental observed docking behaviors by
means of formulas
- usually the sum of uncorrelated terms
- e.g. LUDI, F-Score, SCORE, X-SCORE, …

23.02.2012 Bernhard Knapp 23

Scoring Funktionen (2/2)

ƒ Knowledge based scoring function


- trying the deduce rules form experiments
- e.g. DrugScore, PMF, …
ƒ Geometrical scoring function
- based on shape complementarity
- e.g. Connely Surface, Soft Belt Scoring
ƒ Consensus scoring function
- hybrid versions
- e.g. various Review Papers: [Trost, 2005]

23.02.2012 Bernhard Knapp 24


Difference between position score and rank score

„The pose score is often a rough measure of the fit of a ligand into the
active site. The rank score is generally more complex and might attempt to
estimate binding energies.“

"relatively small chemical modifications can lead to significant changes in


binding."

[Kitchen et al., 2004]

23.02.2012 25

23.02.2012 Bernhard Knapp [Sousa, 2006] 26


23.02.2012 Bernhard Knapp [Sousa, 2006] 27

Correct result vs incorrect result

23.02.2012
223
3.02.2012 Bernhard Knapp 28
… and what about the correctness and reliability?

ƒ Currently correct results are more or less restricted to the area where the
tools have been calibrated
ƒ e.g. for pMHC the area under the ROC is between 0.5 and 0.75 using
different substitution and scoring tools [Knapp, 2008]

But
ƒ "We have long known that there is nothing in biology which is
fundamentally inconsistent or incommensurable with mathematics,
chemistry, and physics. Biology long ago rejected vitalism. The only
information needed for life is provided by an organism's chemical
constituents. It is unlikely in the extreme that living systems cannot be
understood in terms of chemistry and physics.“ [Wan, 2008]

23.02.2012 Bernhard Knapp 29

Example Autodock

23.02.2012 Bernhard Knapp 30


What is Autodock

ƒ “AutoDock is a suite of automated docking tools. It is designed to predict


how small molecules, such as substrates or drug candidates, bind to a
receptor of known 3D structure. AutoDock actually consists of two main
programs: AutoDock performs the docking of the ligand to a set of grids
describing the target protein; AutoGrid pre-calculates these grids. In
addition to using them for docking, the atomic affinity grids can be
visualised. This can help, for example, to guide organic synthetic chemists
design better binders.”
ƒ url: http://autodock.scripps.edu/

23.02.2012 Bernhard Knapp 31

search algorithms used for spatial space


ƒ Systematic docking
- Brute Force
- Fragmentation
- Database
ƒ Heuristic docking
- Monte Carlo
- Genetic algorithms
- Tabu search
ƒ Simulations Docking
- Molecular Dynamics
- Gradient (Energy) Methods

23.02.2012 Bernhard Knapp 32


Deciding about the flexibility
ƒ“rigid body” docking
- receptor and ligand are considered as 100% rigid
- very fast (6dfs only), but inaccurate

ƒ “induced fit” docking


- moveable [backbone| side] chains

ƒ “flexible ligand”
- only the ligand is considered als flexible, the receptor remains rigid
ƒ “full flexibility”
- computational very expensive

23.02.2012 Bernhard Knapp 33

Scoring functions (1/2)

ƒ Force Field based scoring function


- energy of the interaction and internal energy of the ligand
- combination of : Van der Waales, Lennard Jones, electrostatic energy,

- e.g. D-Score, GoldScore, AutoDock, CHARMM, …
ƒ empirical scoring functions
- Trying to reproduce experimental observed docking behaviors by
means of formulas
- ususlly the sum of uncorrelated terms
- e.g. LUDI, F-Score, SCORE, X-SCORE, …

23.02.2012 Bernhard Knapp 34


Scoring Funktionen (2/2)

ƒ Knowledge based scoring function


- trying the deduce rules form experiments
- e.g. DrugScore, PMF, …
ƒ Geometrical scoring function
- based on shape complementarity
- e.g. Connely Surface, Soft Belt Scoring
ƒ Consensus scoring function
- hybrid versions
- e.g. various Review Papers: [Trost, 2005]

23.02.2012 Bernhard Knapp 35

Autodock: sampling of spatial space (1/4)

ƒ Simulated Annealing

Random start up position, e.g. here


Quality of solution

Stack in local min

Global min

Different solutions
23.02.2012 Bernhard Knapp 36
Autodock: sampling of spatial space (2/4)
simulated annealing (german “abkühlen”) procedure:

ƒ Idea: local neighborhood search but „sometimes“ accepting worse


solutions (certain probability)
E j  Ei
ƒ Similar to annealing of crystals in physics
k BT
1. Melt a solid body in a heating pot p e
2. Atoms are almost randomly distributed
3. Slowly anneal
4. At each temperature a thermical balance is found
5. Atoms will arrange in an energetically advantageous position

23.02.2012 Bernhard Knapp 37

Autodock: sampling of spatial space (3/4)


ƒ Genetic Algorithms
- A set a values is used to define the ligand, receptor and their
current states
- Doing it as nature:
1. Creating random population of solutions P1 P2 C1
2. Evaluation of fitness 24 23 24
3. Selection of the fittest n solutions 46 66 46
78 84 78
4. cross over, mutation, … 90 × 90 90
5. goto 2 again 4 92 92
33 12 12
99 78 5
65 44 44

23.02.2012 Bernhard Knapp 38


Autodock: Flexibility (1/1)

ƒ receptor hold rigid


ƒ ligands bounds have full flexibility according to a rotamer library
ƒ state of ligands bounds are represented as genes in the GA

23.02.2012 Bernhard Knapp 39

Autodock: Scoring in 1998 (1/1)

12, 6 Lennard Jones


potential

Hydrogen bounds,
weighted by angle t

Electrostatic forces

Torsion angles

Solvation effects

23.02.2012 Bernhard Knapp 40


Autodock2007: in general zero!

23.02.2012 Bernhard Knapp 41

Autodock2007: unbound?

3 approches for the unbound state

ƒ Extended
ƒ Compact
ƒ Bound

23.02.2012 Bernhard Knapp 42


Autodock2007

23.02.2012 Bernhard Knapp 43

Autodock2007: the formula

ƒ the weighting factors W have been calibrated on a set of 188


recptor/ligand complexes with known experimental binding affinities
ƒ Coordinates from the protein data bank (www.pdb.org)
ƒ Binding data from ligand-protein database (http://lpdb.scripps.edu/)

23.02.2012 Bernhard Knapp 44


Autodock2007: AD3 vs AD4

23.02.2012 Bernhard Knapp 45

Autodock2007: successrate against exp data

ƒ 75 cases: found but other


scored better
ƒ 67 cases: found and
scored best
ƒ 28 cases: not found

=> 84% of all ligands found

23.02.2012 Bernhard Knapp 46


Video Autodock

[published on Autodock Homepage]

23.02.2012 Bernhard Knapp 47

Biologists have often concerns about the success of computational


techniques. [Jorgensen, 2004] nicely summarizes such a situation:

“’Is there really a case where a drug that’s on the market was designed by a
computer?’ When asked this, I invoke the professorial mantra (’All questions
are good questions.’), while sensing that the desired answer is ’no’. Then,
the inquisitor could go back to the lab with the reassurance that his or her
choice to avoid learning about computational chemistry remains wise.”

So what is the role of computers in drug discovery?

23.02.2012 48
Take home messages for the first part

ƒ Computational methods can be used to identify potential drugs


ƒ They can help to reduce the number of candidates to test or predict a
set of possible candidates. However, they can not predict the one and
only working substance in one step
ƒ The methods are diverse
ƒ Nowadays there is still much space for improvement of the methods
ƒ "The day is coming when theory and computation will guide biology, as
it does physics now.“ [Wan, 2008]

23.02.2012 Bernhard Knapp 49

Molecular Dynamics (MD)

23.02.2012 Bernhard Knapp 50


Introduction

ƒ MD is a type of computer simulation


ƒ Atoms interact under given laws of physics for a specified time
ƒ MD can be seen as an interface between “wet”-lab experiments and
theoretical models
ƒ Used to analyze the spatial and energetic dynamics of e.g. bio-
molecules, materials, …
ƒ Usually very computational power and memory consuming

23.02.2012 Bernhard Knapp 51

Calculate forces between all atoms of the system …

n=6, usually n>1000

… but what does forces mean?

A combination of bonded and non-bonded interactions …


Bonded interactions

bond length

ே್
1
ܸ௕ = ෍ ‫ܭ‬௕೙ ܾ௡ െ ܾ௡଴ ଶ ܾ௡ = ‫ݎ‬௜௝ = ‫ݎ‬௜ െ ‫ݎ‬௝
2
௡ୀଵ
bond angle

‫ݎ‬௜௝ ή ‫ݎ‬௞௝
ߠ௡ = ܽ‫ݏ݋ܿ ܿݎ‬
‫ݎ‬௜௝ ή ‫ݎ‬௞௝
torision

1
ܸఝ = ‫ ܭ‬1 + ܿ‫݌ ݏ݋‬௡ ߮௡
2 ఝ೙
‫ݎ‬௜௝ × ‫ݎ‬௞௝ ή ‫ݎ‬௞௝ × ‫ݎ‬௞௟
߮௡ = arc cos
‫ݎ‬௜௝ × ‫ݎ‬௞௝ ή ‫ݎ‬௞௝ × ‫ݎ‬௞௟

What does the „bond length“ term really mean?


perfect

too far
away

[Shaw et al.]

too
close

23.02.2012 54
What does the „bond angle“ term really mean?

perfect

too big

[Shaw et al.]

too
small

23.02.2012 55

What does the „torsion“ term really mean?

perfect

tilted
[Shaw et al.]

23.02.2012 56
Non bonded interactions

ே ே
1
ܸே஻ = ෍ ෍ ܸ௜௝௅௃ + ܸ௜௝ா௅
2
௜ ௝

Coulomb
‫ݍ‬௜ ‫ݍ‬௝
ܸ௜௝ா௅ =
‫ݎ‬௜௝

Lennard-Jones
ଵଶ ଺
ߪ௜௝ ߪ௜௝
ܸ௜௝௅௃ = 4߳௜௝ െ
‫ݎ‬௜௝ ‫ݎ‬௜௝

What does the „coulomb“ term really mean?

[Shaw et al.]

23.02.2012 58
What does the „Lennard-Jones“ term really mean?
perfect

too far
away

[Shaw et al.]

too
close

23.02.2012 59

This all together is called a „force field“

… and of course the real implementations are way


more complicated. There are several software
packages available (e.g. GROMACS, AMBER,
CHARMM, Schroedinger, …)
What can we do with this force field?

We divide time into discrete time steps of e.g. 1 fs (= 10-15 s)

0 fs t -> 10 000 000 fs (=10 ns)

… and calculate the forces for each time step while adjusting the postions
Iterate …
and iterate …
and iterate …
and iterate …
and iterate …

Finally we get something like this:

In reality however more like this:

[from wikipedia]
„The equations are solved simultaneously in small time steps. The system is
followed for some time, taking care that the temperature and pressure remain at
the required values, and the coordinates are written to an output file at regular
intervals. The coordinates as a function of time represent a trajectory of the
system.“ [Gromacs Manual]

23.02.2012 Bernhard Knapp 65

Flow diagram of a MD:

Define initial atoms positions

Calculate forces

Move atoms

Increment time

Stop criterion
reached?

23.02.2012 Bernhard Knapp 66


Example for MD simulation using Gromacs
[Hess et al., 2008]

1. Obtain atom coordinates for the system to be simulated (e.g. pdb


format from www.pdb.org) (takes minutes to days, mostly depended
on the human)
2. Validate the pdb file (takes seconds)
3. Create a virtual simulation box around the system (takes seconds)
4. Fill the box with artificial water (takes seconds)
5. Minimize the energy of the system (takes minutes to hours)
6. Warm the system up to room temperature (takes hours to days)
7. Start the real MD simulation (takes days to months)
8. Evaluate Results (takes minutes to years(!) depended on the human)

23.02.2012 Bernhard Knapp 67

Example for MD simulation

0 ns 20 ns

23.02.2012 Bernhard Knapp 68


Video MD-Simulation shown via VMD

23.02.2012 Bernhard Knapp 69

Example for
MD simulation

23.02.2012 Bernhard Knapp 70


Limitations of MD simulations (1 of 2)
(on the basis of Gromacs)

ƒ Newton’s equations of motion describe classical mechanics, not


quantum mechanics (=> sometimes problems with e.g. hydrogen
atoms)
ƒ Electrons are in ground state: they are supposed to adjust their
dynamics when the atomic positions changes (Born-Oppenheimer
approximation)
ƒ Force fields are approximate: balance between computational load and
accuracy, their parameters can be user-modified
ƒ Force fields are pair additive: omission of polarization

23.02.2012 Bernhard Knapp 71

Limitations of MD simulations (2 of 2)
(on the basis of Gromacs)

ƒ Long range interactions are cutoff: only one image of each particle in
the periodic boundary conditions is considered => cutoff can not
exceed half the box size
ƒ Boundary conditions are unnatural: a lot of particles have vacuum as
neighbor to avoid that periodic boundary conditions are used.
=> Sometimes the system is influencing itself
ƒ Computational costs and runtime (3 months for 20 ns!)
ƒ Cumulative errors in numerical integration and limitation in floating
point representation

23.02.2012 Bernhard Knapp 72


Evaluations of MD-trajectories

Now we have something like that:

… a huge set of individual configurations over time. But what does this
agglomeration of single structures tell us?

23.02.2012 Bernhard Knapp 73

RMSD

ƒ First idea: difference of the single frames (transparent) from starting


structure (solid). Calculate the root mean square deviation:

¦ r
N
1 2
RMSD i
X
 riY
N i 1
Where N is the number of atoms, i is the current atom, rX is
the target structure and rY is the reference structure.

Be careful if you compare structures with different positions and


rotations in space. You will properly need to superimpose (fit) them first.

23.02.2012 Bernhard Knapp 74


RMSD cont

ƒ The RMSD over time (in this case rY is the first frame)

All frames:

Frame
with
highest
RMSD:

23.02.2012 Bernhard Knapp 75

23.02.2012 Bernhard Knapp 76


Radius of Gyration

ƒ A similar measurement is the radius of gyration. It measures the


distance of the regions’ parts from its center of gravity.
Or in other words how packed a certain region is.

ƒ E.g.

ƒ The radius of gyration is an interesting property since it can be


determined experimentally using “static light scattering” as well as with
“small angle neutron-” or “x-ray scattering”. This allows theoretical
scientists to check their models against reality.

23.02.2012 Bernhard Knapp 77

RMSF

ƒ Next idea: fluctuation of a particular amino acid over time. Calculate the
“root mean square fluctuation”:
M

¦ r (t ri
1
)~
2
RMSFi i k
M k 1

Where M is the number of frames taken into account,


ri(tk) is particle i of complex r at time k and r with tilde
is the reference. This reference can for example be
the average over a given time window.

23.02.2012 Bernhard Knapp 78


RMSF cont

23.02.2012 Bernhard Knapp 79

23.02.2012 Bernhard Knapp 80


SASA

ƒ How much of a certain area is exposed to the solvent (e.g. a amino acid
or a region)? Calculate the solvent accessible surface area

solvent

protein

(possible)
target

23.02.2012 Bernhard Knapp 81

SASA

ƒ Methodology to calculate the SASA:

23.02.2012 Bernhard Knapp 82


23.02.2012 Bernhard Knapp 83

23.02.2012 Bernhard Knapp 84


Take home messages for the second part

ƒ MD is a computer simulation of “real” atom-atom interactions


ƒ MD is very time and resource consuming
ƒ The output trajectories are huge and various ways to analyze them are
existing
ƒ There are still certain limitations

23.02.2012 Bernhard Knapp 85

Tutorial on PDB / jmol

23.02.2012 Bernhard Knapp 86


Introduction TCRpMHC interaction on white board

23.02.2012 87

www.pdb.org => 1mi5

23.02.2012 Bernhard Knapp 88


right click => „console“
select *
cartoon off
select *:C
wireframe 100

23.02.2012 Bernhard Knapp 89

Opinions, comments und suggestions?

23.02.2012 Bernhard Knapp 90


Further literature
Docking:

ƒ Sousa SF, Fernades P, Ramos MJ. Protein-Ligand Docking Current Status and Future Challanges, Proteins
2006; 65:15-26.
ƒ A semiempirical free energy force field with charge-based desolvation. Huey,R., Morris,G.M., Olson,A.J., and
Goodsell,D.S. (2007). J Comput Chem. 28, 1145-1152.
ƒ Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function. Morris,
G. M., Goodsell, D. S., Halliday, R. S., Huey, R., Hart, W. E., Belew, R. K., and Olson, A. J. J.Computational
Chemistry 19, 1639-1662. 1998.
ƒ Kitchen DB, Decornez H, Furr JR, Bajorath J. Docking and scoring in virtual screening for drug discovery:
methods and applications, Nat. Rev. Drug Discov. 2004; 3:935-949.

Molecular Dynamics simulations:

ƒ Dodson GG, Lane DP, Verma CS (2008) Molecular simulations of protein dynamics: new windows on
mechanisms in biology. EMBO Rep 9: 144-150.
ƒ Karplus M, Kuriyan J (2005) Molecular dynamics and protein function. Proc Natl Acad Sci U S A 102: 6679-
6685.
ƒ Hess B, Kutzner C, vanderSpoel D, Lindahl E. GROMACS 4: Algorithms for Highly Efficient, Load-Balanced,
and Scalable Molecular Simulation. J Chem Theory Comput 2008.

23.02.2012 Bernhard Knapp 91

You might also like