Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Overview

Molecular modelling in structure


determination
Structure calculation from NMR data 1. Introduction: relating data to structure
2. Hybrid energy and treatment of errors
Michael Nilges 3. Minimisation of hybrid energy
Unit de Bio-Informatique Structurale 4. Relation to probability theory
Institut Pasteur 5. Problems with the minimisation approach: an alternative
nilges@pasteur.fr
6. Automated structure calculation

Michael Nilges. Structure calculation from NMR data.

(1) NMR structure determination steps


3 4

NMR experiment

1. Introduction: relating data to structure


Resonance assignment
2. Hybrid energy and treatment of errors
3. Minimisation of a hybrid energy
Structural restraints
4. Relation to probability theory distances
5. Problems with the minimisation approach: an alternative NOE assignment
6. Automated structure calculation torsion angles, orientation
Structure calculation
Structure validation
Michael Nilges. Structure calculation from NMR data.

Structure calculation Structural data from NMR experiments


5 6

chemical shifts
Have molecule and some data on conformation...
short distances
Objectives:
internal angles
find conformation(s) satisfying experimental data
maintain likely (local) conformation orientation angles

Michael Nilges. Structure calculation from NMR data. Michael Nilges. Structure calculation from NMR data.
Compare conformation to data Distance information from NOE
7 8

Basis of structure calculation: forward model to


approximately calculate data from a structure N OEij rij (x)6
example for data: X-ray map, NMR data 1
0
example for model: Isolated Spin Pair Approximation for NOE
rij (Ccal N OEij ) 6
calculate measurement (NOE) from structure (the interproton distance)

NOE between two protons depends on distance (< 4 )


ISPA: approximate, power neglects
internal dynamics
molecular conformation NOE peak size
spin diffusion

N OEij rij
6 calibration factor Ccal unknown (not measurable)
estimate from reference distances

Extension for ambiguous NOE Penalty function


9 10

! N
# 16
derive a penalty function to measure difference between
"
r (x) = ra6 (x)
calculated and measured data: Edata
a=1
e.g., harmonic potential for distances (... not a good choice)
16
r0 (Ccal N OE)
N!
N OE
#2
ri (x) ri0
"
EN OE
N : number of possible assignments of peak i=1

Ambiguous NOE can be approximately calculated from a


structure with 6th power law
Michael Nilges. Structure calculation from NMR data.

Angular information: coupling constants The angle


11 12

General dependency of J on angle:


Karplus-relationship

J = A + Bcos() + Ccos(2)

= 60
3-bond coupling depends on torsion angle

Michael Nilges. Structure calculation from NMR data. Michael Nilges. Structure calculation from NMR data.
Parametrisation of from structures Residual dipolar couplings
13 14

Proteins:
direction of bond vectors (e.g., N-
H) can be determined
The parameters A B C need to be parametrized with known relative to coordinate system
(X-ray) structures attached to molecule

A B C tensor parameters
Typical value for 3JHNH
6.51 -1.76 1.60
6.41 -1.46 1.90 ! "
i j 3
6.98 -1.38 1.72 D res
3 Dax (3 cos () 1) + Drh sin () cos(2)
2 2
rij 2
Michael Nilges. Structure calculation from NMR data.

Data summary (2)


15 16

Theory (forward model) to calculate data from structure


Forward models contain non-measurable parameters that 1. Introduction: relating data to structure
are necessary for the modelling
2. Hybrid energy and treatment of errors
calibration factor
3. Minimisation of hybrid energy
Karplus parameters for scalar couplings
4. Relation to probability theory
tensor parameters
5. Problems with the minimisation approach: an alternative
Data contain noise
6. Automated structure calculation
Forward models contain approximations - no ideal
agreement possible

Michael Nilges. Structure calculation from NMR data. Michael Nilges. Structure calculation from NMR data.

Maintain local conformation NMR structure calculation:


17 18

simplified force field

Use force field Ephys covalent interactions: rigid,


uniform force constants
compilation of ideal values (bonds, angles, ...) ideal values from Engh & Huber
compilation of strength of interaction
probability of distortion vdW interaction: quartic
potential, no attractive part

Often simplifications and approximations of MD force field no electrostatics

Michael Nilges. Structure calculation from NMR data. Michael Nilges. Structure calculation from NMR data.
Hybrid Energy Hybrid energy function
19 20

{
combine data and physical model of molecule force field
into one function (target function, hybrid energy function)
(Levitt)
force field
Ehybrid = Ephys + wdata Edata


guess wdata

minimise this function data


{
Michael Nilges. Structure calculation from NMR data. Michael Nilges. Structure calculation from NMR data.

Minimisation of hybrid energy How to use this in structure determination


21 22

make random proposal for structure


repeat until minimum is reached:
1. calculate data (approximately) from structure
All data contain errors (experimental noise)
2. compare with experimental bounds
All forward models contain approximations
3. modify structure to improve agreement
No ideal agreement between calculated and measured data
possible
Pseudopotential including data needs to contain way to
include noise

Michael Nilges. Structure calculation from NMR data.

Standard NOE distance restraint potential Other ways to treat noise


23 24

2
N
!noe (ri (X) Li ) if r(X) < Li
Edata 0 if Li r(X) Ui
2
(ri (X) Ui ) if r(X) > Ui

i
potential form
sources of error weight in hybrid energy function
measurement
spin diffusion, internal dynamics
Ehybrid = Ephys + wdata Edata
loose upper and lower bounds
FBWH potential
flat bottom harmonic walls
no force between L and U
How to determine the weights Hybrid energy summary
25 26

{
Data potential needs to include parameter to treat
(unknown) noise
potential shape
force field
weights
The weight in the hybrid energy needs to be set by

{
(empirical) means
cross-validation
data experience
Bayesian analysis

Michael Nilges. Structure calculation from NMR data. Michael Nilges. Structure calculation from NMR data.

(3) 3D structure calculation


27 28

1. Introduction: relating data to structure


2. Hybrid energy and treatment of errors convert data (1D, 2D)
3. Minimisation of hybrid energy + forcefield
4. Relation to probability theory into 3D model
5. Problems with the minimisation approach: an alternative
6. Automated structure calculation

Michael Nilges. Structure calculation from NMR data. Michael Nilges. Structure calculation from NMR data.

Minimisation algorithms Multiple minimum problem


29 30

Energy minimisation
High energy
Simulated annealing barriers
Molecular dynamics to fold protein
Torsion angle dynamics
Standard
(Monte Carlo) minimisation
Distance geometry only "downhill"
Genetic algorithm
...

Michael Nilges. Structure calculation from NMR data. Michael Nilges. Structure calculation from NMR data.
Distance Geometry Minimisation by molecular dynamics
31 32

r
d 2 ri c
2 = rE
dt mi ri hybrid

Molecular dynamics solves Newton's equations of motion


Avoids "folding problem": Molecular dynamics can overcome local energy barriers
Direct conversion distance to (approximate) coordinates
Mostly historical importance
Michael Nilges. Structure calculation from NMR data. Michael Nilges. Structure calculation from NMR data.

Newton dynamics Temperature control and variation:


33 34

"MD-simulated annealing"

Direction of motion depends on


force (derived from force field and experimental restraints)
momentum
Michael Nilges. Structure calculation from NMR data.

Energy scaling
35 36

more flexible annealing


schemes
different variation of
different energy terms

equivalence:
r mass/ energy/ temperature
d 2 ri c scaling
= rE
dt 2 mi ri hybrid
Michael Nilges. Structure calculation from NMR data. Michael Nilges. Structure calculation from NMR data.
Structure calculation with MD Structure calculation with MD
37 38

NMR data: distances 100 atoms:


1988:
Start: random structure 20000 s per structure on
mainframe (DISGEO, Havel)

Difficult search
problem: many now:
degrees of freedom
20 s per structure on PC

Michael Nilges. Structure calculation from NMR data. Michael Nilges. Structure calculation from NMR data.

Torsion angle dynamics Calculation of structure ensembles


39 40

dynamics time step dictated


by bond stretching: waste with identical data/ restraints:
of CPU repeat calculation (20-100 times)
important motions are random variation of initial
around torsions conditions (starting structure/
velocities)
~ 3 degrees of freedom per poor mans probability
AA (cf 3Natom for Newton distribution
dynamics) obtain information on
uniqueness / different fold
Available in X-PLOR,
CYANA, CNS, X-PLOR-NIH

Michael Nilges. Structure calculation from NMR data.

Structure ensembles (4)


41 42

This has very little to do


with dynamics 1. Introduction: relating data to structure
Distribution depends on 2. Hybrid energy and treatment of errors
data
3. Minimisation of hybrid energy
data representation
algorithm 4. Relation to probability theory
forcefield 5. Problems with the minimisation approach: an alternative
algorithm parameters
6. Automated structure calculation
you

Michael Nilges. Structure calculation from NMR data. Michael Nilges. Structure calculation from NMR data.
Probability and energy
43 44

Ehybrid = Ephys (X) + wdata Edata (D, X)

Where do potential forms come from


force field Ephys probability (Boltzmann)
Where do all the parameters come from
bounds probability of distortion of molecule
weights force field: background information I

! "
Ephys (X)
P (X|I) = exp
kT

Michael Nilges. Structure calculation from NMR data. Michael Nilges. Structure calculation from NMR data.

Probability and energy Likelihood


45 46

Example:
Ehybrid = Ephys (X) + wdata Edata (D, X)
Gaussian distribution of error for r,

similar: Edata probability


standard deviation ,
probability is
probability that data is correct, given structure X:
(r r(X))2
! "
likelihood
P (D|X, ) exp
2 2

Michael Nilges. Structure calculation from NMR data. Michael Nilges. Structure calculation from NMR data.

Likelihood and restraint potential Distance errors are not Gaussian


47 48

Inversely, if we know probability distribution, we can derive


potential

Edata log [P (D|X, )] Log-normal


distributions
For Gaussian error, harmonic potential (least squares)

1 2
Edata (r r(X))
2 2
The weight is related to the error in the data 1 1 2
LN(x0 , x, ) exp[ (log[x0 ] log[x]) ]
2 2 x0 2 2
Michael Nilges. Structure calculation from NMR data. Michael Nilges. Structure calculation from NMR data.
Lognormal distribution ...Life is LogNormal
49 50

... there are a lot of data with


only positive values
Gaussian distribution examples on
of logarithms
http://stat.ethz.ch/~stahel/
lognormal/
no theoretical derivation

Gaussian distribution

...why do we still use bounds?

Rieping, Habeck, Nilges, JACS 2005

Joined probability from prior and likelihood 51


Hybrid energy revisited 52

Ehybrid = Ephys (X) + wdata Edata (D, X)

To calculate the joined probability from single probabilities, The hybrid energy function is negative logarithm of joint
multiply: probability
Minimum energy corresponds to maximum probability

Probability of a structure
Relative weight should depend on data quality
Posterior Probability likelihood story is incomplete (what about wdata?)
P (X|D, I) P (X|I)P (D|X, , I)
...
Probability of a structure likelihood
prior distribution P (X|D, I) P (X|I)P (D|X, , I)
prior distribution

Bayesian determination of data weight comparison to X-ray for LogNormal


53 54

Ehybrid = Ephys + wdata Edata

Data weight has influence on structure quality GB1, 0.5

Bayesian analysis:
kB T
wdata =
2RM S
BPTI, 0.62
Update iteratively during structure calculation
weight overall data quality IL4, 1.14
only possible for least squares-type potential IL8, 1.44

Habeck M, Rieping W, Nilges M (2006). PNAS 103:1756


Summary (5)
55 56

Minimizing hybrid energy corresponds to maximizing the 1. Introduction: relating data to structure
probability of a structure, given data and force field
2. Hybrid energy and treatment of errors
...if one knows the data quality, scale factors, ...
3. Minimisation of hybrid energy
Relative weights
4. Relation to probability theory
usually set by empirically (trial and error, experience, cross validation)
5. Problems with the minimisation approach: an alternative
Bayesian determination of weight possible
Relationship of error distribution and restraint potentials 6. Automated structure calculation

Michael Nilges. Structure calculation from NMR data. Michael Nilges. Structure calculation from NMR data.

The problems ...more problems


57 58

(1) solution is not unique inversion not possible:


(3) Many unknown parameters are necessary
same data calculated from distinct conformations
calibration factors
data are incomplete
parameters for functional forms (e.g. Karplus)
(2) data are inconsistent no solution exists: data quality weights; lower and upper bounds
experimental noise how many structures to calculate/ choose?
approximate theory ... local dynamics?
need to massage data to make things consistent

Michael Nilges. Structure calculation from NMR data. Michael Nilges. Structure calculation from NMR data.

...some more Example NOE


59 60

incompleteness: assignments, NMR-visibility


inconsistency: approximate theory, noise
(4) Figures of merit for structures
unknown quantities: calibration, data consistency
depend critically on auxiliary parameters:
RMSDs around average structure
fit to bounds derived from data basic question: how well do my data determine the structure
remains unanswered, need of heuristics:
(5) No consistent concepts of data quality evaluation
cross validation
independent validation

Michael Nilges. Structure calculation from NMR data. Michael Nilges. Structure calculation from NMR data.
Structure by deduction: Inference instead of deduction
61 62

inversion of data inference:


assign a probability to each molecular conformation
use probability theory:
prior probability from physical model (force field)
likelihood from forward model

Insufficient/ conflicting data:


inversion is inappropriate to obtain structure from data posterior prior likelihood

P (, , |D, I) P (, , |I)P (D|, , , I)

Sampling Program ISD


63 64

SH3 (Campbell):
150 NOEs from perdeuterated domain
sparse data set; standard structure calculation does not produce unique
fold

Posterior P(X|D) is extremely complex for realistic problem


too many degrees of freedom to do integration
Take representative samples (Markov Chain Monte Carlo) Rieping, Habeck, Nilges, Science (2005)

(6) Automated structure calculation with ARIA


65 66

First ARIA version in 1995


First completely automated NOESY assignment in 1997
1. Introduction: relating data to structure
Basis of HADDOCK (Bonvin et al.)
2. Hybrid energy and treatment of errors
ARIA annealing protocols used in RECOORD
3. Minimisation of hybrid energy
ARIA water refinement used in RECOORD
4. Relation to probability theory
Current version 2.2
5. Problems with the minimisation approach: an alternative
(1.2 still available but not developed)
6. Automated structure calculation
ARIA2.0 written by M. Habeck, W. Rieping Python, XML
currently developed by Benjamin Bardiaux, Aymeric Bernard

Michael Nilges. Structure calculation from NMR data.


Assignment of ambiguous NOEs Error tolerance: soft potential
54 67 68

finite force for large


violations
important for automated
removal of noise peaks
(ARIA)
same as violation
confinement in CYANA
Network anchoring
similar effect as restraint
Ambiguous distance restraints (ADRs) combination
Iterative assignment rules
Michael Nilges. Structure calculation from NMR data. Michael Nilges. Structure calculation from NMR data.

Experimental terms Final refinement in H2O


69 70

All that is implemented in CNS:


Fixed distance restraints (e.g., hbonds) Improvement of:
NOE-derived distance restraints validation results
Torsion angles
Q-factors
Scalar coupling constants
RDCs X-ray molecular
Relaxation anisotropy (1.2) replacement
Support for solid state (SOLARIA)
soon: SAXS,

Michael Nilges. Structure calculation from NMR data.

Ubiquitin Q-factors Current release: ARIA2.2


71 72

no water with water

Michael Nilges. Structure calculation from NMR data.


WWW site: http://aria.pasteur.fr Registration form to download
73 74

Restraint violation in ARIA Validation results from WhatIf in ARIA


75 76

77

Calculation of symmetric multimeric complexes

Symmetry restraint method to calculate


dimeric proteins
without specification of symmetry axis
successful for simple topologies
Full integration in ARIA2.2
reports in GUI

You might also like