Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 29

NMR of proteins (and all things regular)

Now we have more or less all the major techniques used in


the determination of coupling networks (chemical structure)
and distances (3D structure, conformation).

Well see how these are used in the study of macromolecular


structure and conformational preferences, particularly of
peptides. We will try to cover in two or three classes the main
aspects of something for which several books exist.

There are certain things that I want to bring up before going


into any detail:

1) The data obtained is not better or worst than X-ray. It gives


a different picture, which can be considered complementary.
However, some of the experimental aspects are
considerably faster than X-ray.

2) One of the reasons it is faster is because we dont need


crystals. This has a two-fold advantage. First, we dont need
to spend time growing them, and second, we can do it even
if the stuff does not crystallize (small flexible peptides,
polysaccharides, etc.).

3) It gives the 3D structure in water, which is the solvent in


which most biological reactions take place (enzymes and
drugs interact in water).

4) It gives information on the dynamics of the molecule. It is



not a static picture.

A brief review of protein structure
Before we go into how we determine the structure of a protein
with NMR, we need to review briefly the chemical and three-
dimensional structure of peptides.

Peptides are composed of only ~ 20 amino acids. This makes


life a lot simpler

The chemical structure of the protein is the sequence of


amino acids forming it. We always write it from the NH2 end
to the COOH end:

H O AA2 H H O residue

N N
N N

AA1 H H O AA3 H H

peptide
group

This is called the primary structure. We see clearly that


between each AA we have C=O groups. Thus, the 1H spin
system of each AA is isolated from all the others.

For this reason, the 1H spectrum of a protein is basically the


superposition of the spectra of the isolated amino acids.
However, small deviations from this indicate a defined (not
and allow us to studythem by NMR
random) structure
A brief review of protein structure (continued)
The way in which the residues in the peptide chain arrange
locally is called the secondary structure. Some of the most
common elements of secondary structure are the -helix and
the -sheet (parallel or anti-parallel):

Another important element of secondary structure is the


-turn, which allows the polypeptide chain to reverse its
direction:


The very basics of NMR of proteins
Finally, the tertiary structure is how the whole thing packs
(or not) in solution, or how all the elements of secondary
structure come together.

The first thing we need to know is were do the peaks of an


amino acid residue show up in the 1H spectrum:

water

Aromatic
Imines Amides HC HC, , , ...

10 9 8 7 6 5 4 3 2 1 0

Since they are all very close, after we go pass 3 or 4 amino


acids we need to do 2D spectroscopy to spread out the
signals enough to resolve them.

As we said before, there are no connections between


different AAs: we cannot tell which one is which. One of the
requirements in NMR structure determination is knowledge of
the primary structure of the peptide chain.

Now, in order to determine the structure we need to assign


an amino acid in the chain to signals in the spectrum. This is
the first step in the NMR study.
Spin system assignments.
To do this we rely on the 1D (if the molecule is small enough),
COSY, and TOCSY spectra. We have seen how a whole spin
system is easily identified in a TOCSY.

In peptides, there will be an isolated line for each amino acid


starting from the NH that will go all the way down to the
side chain protons.

The only exceptions are Phe, Tyr, Trp, and His (and some
others I dont remember) in which part of the side chain is
separated by a quaternary or carbonyl carbon.

We can either assign all the spin systems to a particular


amino acid (good), or do only part of them due to spectral
overlap (bad). If this happens, we may have to go to higher
dimensions or fully labeled protein (next class).

In any case, once all possible spins systems are identified,


we have to tie them together and identify the relative position
of the signals in the primary structure.

There are two ways of doing this. One is the sequential


assignment approach, and the other one the main-chain
directed approach.

Both rely on the fact that there will be characteristic NOE


cross-peaks for protons of residue i to (i + 1) and (i - 1).

Characteristic NOE patterns.
The easiest to identify are interesidue and sequential NOE,
cross-peaks, which are NOEs among protons of the same
residue and from a residue to protons of the (i + 1) and (i - 1)
residues:
dN d dNN
H O AA2 H H O

N N
N N

AA1 H H O H H
AA3
dN, dN,
d , d , d

Apart from those, regular secondary structure will have


regular NOE patterns. For -helices and -sheets we have:

i+4 d(i)N(j)
i+3 C N
d(i, i+3) i+2
N j) C
dN(i, i+3) d (i)(
i+1 C N
dNN(i, i+3) N ( j)
dN( i )
i N C
dN(i, i+4)
i-1

Sequential assignment
In the sequential assignment approach, we try to tie spin
systems by using sequential NOE connectivities (those from
a residue to residues i + 1 or i - 1).

The idea is to pick an amino acid whose signals are well


resolved in the TOCSY, and then look in the NOESY for
sequential NOE correlations from its protons to protons in
other spin systems.

These are usually the dNN, dN, and dN correlations. At


this point we also look for the d to establish the identity of
aromatic amino acids, Asn, Arg, Gln, etc

After we found those, we go back to the TOCSY to identify to


which amino acid those correlations belong. These protons
will be in either the i + 1 or i - 1 residues.

We do it until we run out of amino acids (when we get to the


end of the peptide chain) or until we bump into a lot of
overlapping signals.

Since we may have different starting points (and directions),


the method has a built-in way of proofing itself automatically.

Yes, hundreds of folks have some sort of a computerized


algorithm that should do this. Their reliability varies, and there
is a lot of user intervention involved

Sequential assignment (continued)
We can see this with a simple diagram (sorry, could not find
much good data among my stuff).

Say we are looking at four lines in a TOCSY spectrum that


correspond to Ala, Asn , Gly and Leu. We also know that we
have Ala-Leu-Gly in the peptide, but no other combination:

TOCSY NOESY

HC HC

Gly Gly
Asn Asn
Ala Ala
Leu NH Leu NH

In the TOCSY we see all the spins. The NOESY will have
both intraresidue correlations ( ), as well as interesidue
correlations ( ), which allows us to find which residue is next
to which in the peptide chain.
Main-chain directed approach
This method was introduced by Wthrich (the grand-daddy
of protein NMR and winner of he 2003 Nobel Prize for his
work in this area). Weve seen already that regular secondary
structure has regular NOE patterns.

What if instead of doing all the sequential assignments,


which may belong in great part to regions which have no
structure, we focus in finding these regular NOE patterns?

This is exactly what we do. We actually look for cyclic NOE


patterns, which are normally found in regular secondary
structure.

After we found these patterns, we try to match them with


chunks of primary structure of our peptide.

This method is not really easy to do by hand, but is ideal to


implement into a computer searching algorithm:

- First the program looks for -helices (it looks for d(i, i+3),

dN(i, i+3), dNN(i, i+3), dN(i, i+4), etc).

- It eliminates all peaks used up by helical patterns and looks


for -sheets (stretches of connectivities from things that
cannot be close in the sequence).
these, it goes for loopsand undefined
- After eliminating
regions.
Locating secondary and tertiary structure
Although the main-chain directed approach already looks for
secondary structure, all this was done mainly to identify the
amino acids in the spectrum (assign spin systems). Now we
really need to look for secondary/tertiary structure.

If we used the main-chain directed approach, we have most


of the work done (some people say 90 %), because all the
regions of defined secondary structure ( -helices, -sheets)
have already been identified.

If weve done the assignments sequentially, we will have


most of the i to i + 1 and i - 1, or short-range NOEs. We only
need to look for medium-range (> i + 2) and long-range
(> i + 5) NOE cross-peaks.

The amount and type of medium and long-range NOEs will


obviously depend on the secondary and tertiary structure.

We group the NOEs in tables, and assign them intensity


values according to their intensity (cross-peak volume). As we
saw before we take an internal reference (a CH2 in a Phe).

Since in large molecules we can have many competing


relaxation processes, we dont give NOEs single values, but
ranges. These are usually three, for strong, medium, and
weak. Sometimes youll also see a very weak range.

Well see howthese are converted to distances


later on...
What the NOEs does and doesnt mean
So now we have everything: All spin systems identified, all
their sequential, medium, and long range NOEs assigned,
and their intensities measured.

At this point (and very likely before this point also), we will
have several conflicting cases in which we see a particular
NOE but we dont see others we think should be there.

The reason is because the NOE not only depends on the


distance between two protons, but also on the dynamics
between them (that means, how much one moves relative to
the other). This is particularly important in peptides, because
we have lots of side chain and backbone mobility.

The most important law from all this is that not seeing an
NOE cross-peak does not mean that the protons are at a
distance larger than 5 .

Also, an NOE can arise from an average of populations of the


peptide. We see something as medium (1.8 to 3.3 ), when
it is actually a mix of strong (1.8 - 2.7 ) and no NOE:

Real: Apparent:

dij < 3 dij ~ 3

dij > 6

Couplings and dihedral angles
The previous slides showed us how to use NMR to obtain
some of the structural parameters required to determine 3D
structures of macromolecules in solution.

NOEs let us find out approximate distances between


protons. They can tell us a lot when we find one that report
on thingsthat are far away in the sequence being close in
space.

However, we cannot say anything about torsions around


rotatable bonds from NOEs alone. What we can use in these
cases are the 3J coupling constants present in the peptide
spin system (also true for sugars, DNA, RNA). We can use
homonuclear or heternonuclear Js, but well concentrate on
the former (3J).

These are 3JN , which reports on the conformation of the

peptide backbone, and 3J which is related to the side chain


conformation: H O

N

N
3
J JN
3
N
H
3
J
3
J
H

H H

AA
Couplings and dihedral angles (continued)
The 3J coupling constants are related to the dihedral angles
by the Karplus equation, which is an empirical relationship
obtained from rigid molecules for which the crystal structure
is known (derived originally for small organic molecules).

The equation is a sum of cosines, and depending on the type


of topology (H-N-C-H or H-C-C-H) we have different
parameters:

3
J
3
JN ==9.4
9.4cos
cos
2
(2(--60
60))--1.1 cos(--60
1.1cos( 60))++0.4
0.4
N

J
3
3
J ==9.5
9.5cos
cos
2
(2(--60
60))--1.6 cos(--60
1.6cos( 60))++1.8
1.8

Graphically:


Couplings and dihedral angles ()
How do we measure the 3J values? When there are few
amino acids, directly from the 1D. We can also measure them
from HOMO2DJ spectra (remember what it did?), and from
COSY-type spectra with high resolution (MQF-COSY and
E-COSY).

The biggest problem of the Karplus equation is that it is


ambiguous - If we are dealing with a 3JN coupling smaller
than 4 Hz, and we look it up in the graph, we can have at
least 4 possible angles:

9.4

5.0 -60 ~0 ~110 ~170


4.0

0.0
- 60

In these cases there are two things we can do. One is just to
try figuring out the structure from NOE correlations alone and
then use the couplings to confirm what we get from NOEs.
we are sort of dumping information to the can.
This is fine, but
Couplings and dihedral angles ()
Another thing commonly done in proteins is to use only those
angles that are more common from X-ray structures. In the
case of , these are the negative values (in this case the
-60 and 170). Also, we use ranges of angles:

J < 5 Hz -80<<<<-40
-80
3
3
JN
N < 5 Hz -40
J > 8 Hz -160<<<<-80
-160
3
3
JN
N > 8 Hz -80

For side chains we have the same situation, but in this case
we have to select among three possible conformations (like
in ethane). Since we usually have two 3J values (there
are 2 protons),
N we can select the appropriate conformer:
N
N
H 1
H 2
H 2 C
C
H 1
H C H C
H C
C H 1
H 2

3
JJ
1
1~ J
~
3
J
2
2< 5
< 5
3 3
3
J 1
1< 5
3
J
< 5 (or
(orvice
viceversa)
versa)
J 2> 8
3 2
3
J
> 8
Brief introduction to molecular modeling
Now we have all (almost all) the information pertaining
structure that we could milk from our sample: NOE tables
with all the different intensities and angle ranges from 3J
coupling constants.

We will try to see how these parameters are employed to


obtain the picture of the molecule in solution.

As opposed to X-ray, in which we actually see the electron


density from atoms in the molecule and can be considered as
a direct method, with NMR we only get indirect information
on some atoms of the molecule (mainly 1Hs).

Therefore, we will have to rely on some form of theoretical


model to represent the structure of the peptide. Usually this
means a computer-generated molecular model.

A molecular model can have different degrees of complexity:

ab initio - We actually look at the atomic/molecular


orbitals and try to solve the Schredinger equation. No
parameters. Hugely computer intensive (10 - 50 atoms).

Semiempirical - We use some parameters to describe


the molecular orbitals (50 - 500 atoms).

Molecular mechanics - We use a simple parametrized


mass-and-spring type model (everything else).

Introduction to molecular modeling (continued)
We are dealing with peptides here (thousands of atoms), so
we obviously use a molecular mechanics (MM) approach.

The center of MM is the force field, or equations that


describe the energy of the system as a function of <xyz>
coordinates. In general, it is a sum of different energy terms:

EEtotal = EvdW + Ebs + Eab + Etorsion + Eelctrostatics +


total = EvdW + Ebs + Eab + Etorsion + Eelctrostatics +

Each term depends in a way or another in the geometry of


the system. For example, Ebs, the bond stretching energy
of the system is:

EEbsbs==i iKKbsibsi**((rri i--rroioi)2)2

The different constants (Kbs, ro, etc., etc.) are called the
parameters of the force field, and are obtained either from
experimental data (X-ray, microwave data) or higher level
computations (ab initio or semiempirical).

Depending on the problem we will need different parameter


sets that include (or not) certain interactions and are therefore

more or less accurate.

Inclusion of NMR data
The really good thing about MM force fields is that if we have
a function that relates our experimental data with the <xyz>
coordinates, we can basically lump it at the end of the energy
function.

This is exactly what we do with NMR data. For NOEs, we had


said before that we cannot use accurate distances. We use
ranges, and we dont constraint the lower bound, because a
weak NOE may be a long distance or just fast relaxation:

Strong
StrongNOE
NOE 1.8
1.8--2.7
2.7
Medium
MediumNOE
NOE 1.8
1.8--3.3
3.3
Weak
WeakNOE
NOE 1.8
1.8--5.0
5.0

Now, the potential energy function related to these ranges will


look like this:

EENOE
NOE
==KKNOE **((rrcalc --rrmax )2)2 ififrrcalc >>rrmax
NOE calc max calc max

EENOE =0
NOE = 0
ififrrmax >>rrcalc >>rrmin
max calc min

EENOE
NOE
==KKNOE **((rrmin --rrcalc )2)2 ififrrcalc <<rrmin
NOE min calc calc min

It is a flat-bottomed quadratic function. The further away the


distance calculated by the computer (rcalc) is from the range,

the higher the penalty. We call them NOE constraints.
Inclusion of NMR data (continued)
Similarly, we can include torsions as a range constraint:

EEJJ==KKJJ**((calc
calc
--max )2)2
max
ififcalc >>max
calc max

EEJJ==00 ififmax >>calc >>min


max calc min

EEJJ==KKJJ**((min
min
--calc )2)2
calc
ififcalc <<min
calc min

Graphically, these penalty functions look like this:

rmin rmax
min max

Rcalc or calc

Structure optimization
Now we have all the functions in the potential energy
expression for the molecule, those that represent bonded
interactions (bonds, angles, and torsions), and non-bonded
interactions (vdW, electrostatic, NMR constraints).

In order to obtain a decent model of a peptide we must be


able to minimize the energy of the system, which means to
find a low energy (or the lowest energy) conformer or group
of conformers.

In a function with so many variables this is nearly impossible,


because we are looking at a n-variable surface (each thing
we try to optimize). For the two torsions in a disaccharide:

E
(Kcal/mol)

peaks (maxima) and valleys


We have energy (minima).
Structure optimization (continued)
Minimizing the function means going down the energy
(hyper)surface of the molecule. To do so we need to
compute the derivatives WRT <xyz> (variables) for all atoms:

EEtotal
total >>00 EEtotal
EEtotal
total <<00 EEtotal
xyz xyz
total total

xyz xyz

This allows us to figure out which way is down for each


variable so we can go that way.

Now, minimization only goes downhill. We may have many


local minima of the energy surface, and if we only minimize
it can get trapped in one of these. This is bound to happen in
a protein, which has hundreds of degrees of freedom (the
number of rotatable bonds).

In these cases we have to use some other method to get to


the lowest minima. A common way of doing this is molecular
dynamics (MD).

Since we have a the energy function we can give energy to


the system (usually we rise the temperature) and see how it
evolves with time. Temperature usually translates into kinetic
energy, which allows the peptide to surmount energy barriers.

Molecular dynamics and simulated annealing
In MD we usually heat the system to a physically reasonable
temperature around 300 K. The amount of energy per mol at
this temperature is ~ kBT, were kB is the Boltzmann constant.
If you do the math, this is ~ 2 Kcal/mol.

This may be enough for certain barriers, but not for others,
and we are bound to have this other barriers. In these cases
we need to use a more drastic searching method, called
simulated annealing (called that way because it simulates
the annealing of glass or metals).

We heat the system to an obscene temperature (1000 K),


and then we allow it to cool slowly. This will hopefully let the
system fall into preferred conformations:

Hot
conformers

Cool
conformers

ps)
Time (usually
Distance geometry
Another method commonly used and completely different to
MD and SA is distance geometry (DG). Well try to describe
what we get, not so much how it works in detail.

Basically, we randomize the <xyz> coordinates of the atoms


in the peptide, putting a low and high bounds beyond which
the atoms cannot go. These include normal bonds and NMR
constraints.

This is called embedding the structure to the bound matrix.


Then we optimize this matrix by triangle inequalities by
smoothing it. We get really shuffled and lousy looking
molecules. Usually they have to be refined, either by MD
followed by minimization or by sraight minimization.

What the different methods do in the energy surface can be


represented graphically:

EM

MD
SA DG


Presentation of results
The idea behind all this was to sample the conformational
space available to the protein/peptide under the effects of the
NOE constraints.

The several low energy structures we obtain by these


methods which have no big violations of these constraints are
said to be in agreement with the NMR data.

Since there is no way we can discard any of this structures,


we normally draw a low energy set of them superimposed
along the most fixed parts of the molecule:

N-termini

C-termini

In this one we are just showing the peptide backbone atoms.


Although this is not a sought for thing, the floppiness of
certain regions is an indication of the lack of NOE constrains,
which reflects the real flexibility of the molecule
in solution.
Other types of structural data
In the prevous slides we saw how we get structural
information from basically two sources, 3J couplings and
NOEs enhancements (correlations).

NOEs gave us approximate distance information, and 3J


couplings could be transformed into dihedral constrains.

NMR spectra have a lot more information than that, which we


usually dump. First, some of the information was originally
fudged to make it work better with current MM programs of
those days (couplings into dihedrals).

Today well see how we can employ some of the NMR data
in a better fashion, as well as use other information obtained
from NMR. As we said before, as long as we can get a
relationship between the NMR derived parameter (S) and the
geometry of the atoms involved, we can use it in MM:

SScalc ==ff(xyz)
(xyz) EES ==KKS **ff[[(S
(Scalc - Sobs) ]
calc S S calc - Sobs) ]

A physicist can tell you that ALL NMR observations depend


entirely on the geometry of the molecular system, so there is
an equation for everyone. The problem is to find them and
parametrize them.

Direct use of coupling constants
Couplings are perhaps the easiest ones to start with. They
were not included as they where originally because the MM
programs did dihedral constraints easier.

As we saw last time, this had the disadvantage of creating


ambiguities on the number of possible dihedrals for a certain
coupling constant.

The assumption that only certain angles are allowed is fine in


globular proteins (for which the X-ray trends were found), but
it is a big no-no if we are dealing with small flexible peptides
or peptides containing unnatural amino acids.

The best thing to do would be to include directly the coupling


constant as part of an energy term of our MM force field. This
is what we do, and it works like a charm

EEJ ==KKJ **((JJcalc --JJobs )2)2


J J calc obs

JJcalc ==AA**cos(
cos(
cos(
)2)2++BBcos( ))++CC
calc

The computer back-calculates the 3J coupling using the


current dihedral angle and compares it to the observed value.
Since we dont choose any particular angle, we can use a

single value instead
of a range (simple quadratic function)
Use of chemical shifts
What about chemical shifts? After all, we have chemical shifts
because we have different conformations for different amino
acids in the peptide.

However, nobody really cared about them until recently. The


main problem is that, as opposed to couplings, rules or
parameters for chemical shifts can only be used in regular
structures.

Since nobody looked at proteins by NMR until the mid 80s,


there were no good parametrizations or good reference data.

The idea is that we can assign a random coil chemical shift


value to all the protons in an amino acid. Any deviation from
it, or secondary shift, arises from different effects:

a) Peptide group anisotropy. The local magnetic field of the


peptide group (CO-NH) will make protons lying above or to
the side be shifted up- or down-field.

H
N r

C O pga ==CCCO **rr-3-3**[[11--33**cos
cos((
)2)2]]
pga CO

C

Use of chemical shifts (continued)
b) Ring current effects. The local magnetic field created by the
e- current of aromatic rings will cause protons lying above or
to its the side be shifted up- or down-field. This example is
archetypal and youll find it in every organic chemistry book.

H
r

rc ==CCring **rr-3-3**[[11--33**cos
cos((
)2)2]]
rc ring

c) Polarization of C-H bonds by polar/charged groups. The


electron cloud of thebond goes back or forth the C-H bond
depending of the presence of groups of different polarity
aligned with them:
qi qi
r - r +
C H C H

Upfield shift Downfield shift

elec i cos((
elec ==CC**rr-2-2**qqi **cos ))

Use of chemical shifts (...)
So, since we have equations for each effect, we can calculate
it to a certain degree of accuracy in the computer. If we know
both the random coil and the experimental value we can tell
the MM program to make the calculated mach the observed
values or else put an energy penalty:

EE ==KK **[[((obs --random ))--((pga ++rc ++elec ))]]22


obs random pga rc elec

obs --random isisthe


thesecondary
secondaryshift
shift
obs random

This works great in some cases. The following case had no


NOEs, but a lot of secondary shifts...

Without constraints

With constraints

You might also like