Professional Documents
Culture Documents
Buchner J., Kiefhaber T. (Eds.) - Protein Folding Handbook, 5-Volume Set - Wiley (2005) PDF
Buchner J., Kiefhaber T. (Eds.) - Protein Folding Handbook, 5-Volume Set - Wiley (2005) PDF
Handbook
Edited by J. Buchner and
T. Kiefhaber
Protein Degradation
ISBN 3-527-30837-7 (Vol. 1)
ISBN 3-527-31130-0 (Vol. 2)
ISBN-13 978-3-527-30784-5
ISBN-10 3-527-30784-2
V
Contents
Part I, Volume 1
Preface LVIII
Contributors of Part I LX
Acknowledgments 190
References 190
Part I, Volume 2
Acknowledgments 626
References 627
1.7 Perspectives 24
Acknowledgements 26
References 26
Acknowledgements 337
References 337
16 Protein Folding in the Endoplasmic Reticulum Via the Hsp70 Family 563
Ying Shen, Kyung Tae Chung, and Linda M. Hendershot
16.1 Introduction 563
16.2 BiP Interactions with Unfolded Proteins 564
16.3 ER-localized DnaJ Homologues 567
16.4 ER-localized Nucleotide-exchange/releasing Factors 571
16.5 Organization and Relative Levels of Chaperones in the ER 572
16.6 Regulation of ER Chaperone Levels 573
16.7 Disposal of BiP-associated Proteins That Fail to Fold or
Assemble 575
16.8 Other Roles of BiP in the ER 576
16.9 Concluding Comments 576
16.10 Experimental Protocols 577
16.10.1 Production of Recombinant ER Proteins 577
16.10.1.1 General Concerns 577
16.10.1.2 Bacterial Expression 578
16.10.1.3 Yeast Expression 580
16.10.1.4 Baculovirus 581
16.10.1.5 Mammalian Cells 583
16.10.2 Yeast Two-hybrid Screen for Identifying Interacting Partners of ER
Proteins 586
16.10.3 Methods for Determining Subcellular Localization, Topology, and
Orientation of Proteins 588
16.10.3.1 Sequence Predictions 588
16.10.3.2 Immunofluorescence Staining 589
16.10.3.3 Subcellular Fractionation 589
16.10.3.4 Determination of Topology 590
16.10.3.5 N-linked Glycosylation 592
16.10.4 Nucleotide Binding, Hydrolysis, and Exchange Assays 594
16.10.4.1 Nucleotide-binding Assays 594
16.10.4.2 ATP Hydrolysis Assays 596
16.10.4.3 Nucleotide Exchange Assays 597
16.10.5 Assays for Protein–Protein Interactions in Vitro/in Vivo 599
16.10.5.1 In Vitro GST Pull-down Assay 599
16.10.5.2 Co-immunoprecipitation 600
16.10.5.3 Chemical Cross-linking 600
16.10.5.4 Yeast Two-hybrid System 601
16.10.6 In Vivo Folding, Assembly, and Chaperone-binding Assays 601
16.10.6.1 Monitoring Oxidation of Intrachain Disulfide Bonds 601
16.10.6.2 Detection of Chaperone Binding 602
XLIV Contents
Acknowledgements 603
References 603
24 Small Heat Shock Proteins: Dynamic Players in the Folding Game 830
Franz Narberhaus and Martin Haslbeck
24.1 Introduction 830
24.2 a-Crystallins and the Small Heat Shock Protein Family: Diverse Yet
Similar 830
24.3 Cellular Functions of a-Hsps 831
24.3.1 Chaperone Activity in Vitro 831
24.3.2 Chaperone Function in Vivo 835
24.3.3 Other Functions 836
24.4 The Oligomeric Structure of a-Hsps 837
24.5 Dynamic Structures as Key to Chaperone Activity 839
24.6 Experimental Protocols 840
24.6.1 Purification of sHsps 840
24.6.2 Chaperone Assays 843
24.6.3 Monitoring Dynamics of sHsps 846
Acknowledgements 847
References 848
27 SecB 919
Arnold J. M. Driessen, Janny de Wit, and Nico Nouwen
27.1 Introduction 919
27.2 Selective Binding of Preproteins by SecB 920
27.3 SecA-SecB Interaction 925
27.4 Preprotein Transfer from SecB to SecA 928
27.5 Concluding Remarks 929
27.6 Experimental Protocols 930
27.6.1 How to Analyze SecB-Preprotein Interactions 930
L Contents
Index 1334
LVIII
Preface
During protein folding, the linear information encoded in the amino acid sequence
of a polypeptide chain is transformed into a defined three-dimensional structure.
Although it was established decades ago that protein folding is a spontaneous reac-
tion, a complete understanding of this fundamental process is still lacking. Since
its humble beginnings in the fields of biophysical chemistry and biochemistry, the
analysis of protein folding and stability has spread into a wide range of disciplines
including physics, cell biology, medicine and biotechnology. The recent discovery
of the sophisticated cellular machinery of molecular chaperones and folding cata-
lysts which assists protein structure formation in vivo and the analysis of pro-
tein folding diseases have added additional twists to the study of protein folding.
Consequently, the field has undergone a significant transformation over the last
decade.
The increasing interest in protein folding considerably improved our knowledge
of the physical principles underlying protein stability and folding and stimulated
the development of powerful new techniques for their analysis both in vitro and in
vivo. Aspects that had been studied previously in splendid isolation such as basic
principles and protein folding diseases are now growing together, experimentally
and conceptually.
Over the past years, a number of excellent books on protein folding were pub-
lished, starting with the famous ‘red bible of protein folding’ (Jaenicke, 1980). The
Protein Folding Handbook takes a different approach as it combines in-depth re-
views with detailed experimental protocols. Thus, a comprehensive, multi-faceted
view of the entire field of protein folding from basic physical principles to mole-
cular chaperones, protein folding diseases and the biotechnology of protein folding
– is presented. The Protein Folding Handbook intends to provide those who become
interested in a specific aspect of protein folding with both the scientific back-
ground and the experimental approaches. Ideally, the protocols of the methods
sections will allow and stimulate even the novice to boldly enter new experimental
territory.
We are grateful to the authors who embarked with us on this endeavour. It was a
pleasure to see how, chapter by chapter, the current state of the art in this field
emerged and the common themes became apparent.
Preface LIX
Finally, we would like to thank Dr. Frank Weinreich from Wiley-VCH for his
never-waning efforts to transform our ideas into a book, Prof. Erich Gohl for the
cover artwork, Brigitte Heiz-Wyss and Susanne Hilber for excellent assistance and
Annett Bachmann and Stefan Walter for help with creating the index.
Reference
Contributors of Part I
Part I
1 Principles of Protein Stability and Design
3
1
Early Days of Studying the Mechanism
of Protein Folding
Robert L. Baldwin
1.1
Introduction
Modern work on the mechanism of protein folding began with Chris Anfinsen. He
recognized the folding problem, and he asked: How does the amino acid sequence
of a protein determine its three-dimensional structure? From this basic question
came various research problems, including (1) What is the mechanism of the fold-
ing process? (2) How can the three-dimensional structure be predicted from the
amino acid sequence? and (3) What is the relation between the folding process in
vivo and in vitro? Only the early history of the first problem will be considered
here.
The basic facts needed to state the folding problem were already in place before
Anfinsen’s work. He knew this, well before 1973 when he received the Nobel Prize,
and he was somewhat embarrassed about it. In his Nobel address [1], he says in
the opening paragraph ‘‘Many others, including Anson and Mirsky in the ’30s
and Lumry and Eyring in the ’50s, had observed and discussed the reversibility of
the denaturation of proteins.’’ Anfinsen’s statement of the folding problem may be
dated to 1961 [2], when his laboratory found that the amino acid sequence of ribo-
nuclease A (RNase A) contains the information needed to make the correct four
disulfide bonds of the native protein. There are eight aSH groups in the unfolded
RNase A chain which could make 105 different SaS bonds. Although in 1961
the reversibility of protein denaturation was recognized by protein chemists, the
knowledge that protein denaturation equals protein unfolding had been gained
only a few years earlier, in a series of papers from Walter Kauzmann’s laboratory,
beginning with Ref. [3]. The first protein structure, that of sperm whale myoglobin
(2 Å resolution), determined by John Kendrew and his coworkers [4], became avail-
able only in 1960. The myoglobin structure confirmed the proposal that proteins
possess 3D structures held together by weak, noncovalent bonds, and consequently
they might unfold in denaturing conditions. Wu suggested this explanation of pro-
tein denaturation as early as 1931 [5].
The other thread in Anfinsen’s proposal was, of course, the recognition that pro-
tein folding is part of the coding problem. The basic dogma of molecular biology,
‘‘DNA makes RNA makes protein,’’ was already in place in 1958, and Anfinsen
knew that the newly synthesized product of RNA translation is an inactive, un-
folded polypeptide chain. How does it fold up to become active? In the 1960s and
1970s there was speculation about a code for folding, and some workers even pro-
posed a three-letter code (i.e., three amino acid residues) [6]. Anfinsen proposed a
thermodynamic hypothesis [1, 7]: the newly synthesized polypeptide chain folds
up under the driving force of a free energy gradient and the protein reaches its
thermodynamically most stable conformation. For protein chemists familiar with
reversible denaturation, this appeared obvious: what else should drive a reversible
chemical reaction besides the free energy difference between reactant and product?
But for molecular biologists interested in knowing how an unfolded, newly synthe-
sized protein is able to fold, Anfinsen’s hypothesis represented a considerable leap
of faith. In fact, biology does introduce subtle complexities, and more is said below
about the thermodynamic hypothesis (see Section 1.3). Michael Levitt and Arieh
Warshel made a farseeing proposal in 1975: they argued that an unfolded protein
folds under the influence of a molecular force field, and someday the folding pro-
cess will be simulated with the use of a force field [8].
Starting from Anfinsen’s insights, this review examines what happened in the
1960s, 1970s, and 1980s to lay the groundwork for the modern study of how pro-
teins fold in vitro. My review ends when the literature balloons out at the end of
the 1980s with the study of new problems, such as: (1) experimental study of the
transition state, (2) whether hydrophobic collapse precedes secondary structure, (3)
the nature of the conformational reactions that allow hydrogen exchange in pro-
teins, (4) using molecular dynamics to simulate the folding process, (5) the speed
limit for folding, (6) helix and b-strand propensities, and (7) the mechanism of
forming amyloid fibrils. This review is not comprehensive. The aim is to follow
the threads that led to the prevalent view at the end of the 1980s. A few references
are added after the 1980s in order to complete the picture for topics studied earlier.
1.2
Two-state Folding
and they explain the shape of the unfolding curve. In 1965 John Brandts [10] rec-
ognized that the correct explanation for the peculiar shape of the unfolding curve
lies in the unusual thermodynamics of protein folding. Brandts argued that the
thermodynamics of unfolding is dominated by hydrophobic free energy, as pro-
posed by Walter Kauzmann [11] in 1959. Then there should be a large positive
value of DC p for unfolding, which explains the curvature of the van’t Hoff plot
[10]. DC p is the difference between the heat capacities of the native and denatured
forms of a protein, and a large value for DC p causes a strong dependence of DH on
temperature.
Brandts’ proposal that thermal denaturation of RNase A is a two-state reaction
without intermediates [10] was strongly supported, also in 1965, by Ginsburg and
Carroll [12], who introduced the superposition test for intermediates and found no
populated intermediates in the unfolding reaction of RNase A. In the superposi-
tion test, two or more normalized unfolding curves are superimposed and tested
for coincidence. They are monitored by at least two probes that report on funda-
mentally different molecular properties, such as specific viscosity that reports on
molecular volume and optical rotation that reports on secondary structure.
In 1966 Lumry, Biltonen and Brandts [13] introduced the calorimetric ratio test
for intermediates. In this test, the ratio of the calorimetric and van’t Hoff values of
DH should equal 1 if there are no populated intermediates. In the 1970s, after the
development of differential scanning calorimetry by Peter Privalov [14], the calori-
metric ratio test became widely used as a criterion for two-state folding.
Charles Tanford and his laboratory undertook a wide-ranging study of whether
the denaturation reactions of small proteins are truly two-state reactions. In 1968
Tanford summarized the results in a long and widely quoted review [15]. He recog-
nized that conditions must be found in which denatured proteins are completely
unfolded, without residual structure, before one can confidently determine if dena-
turation is a two-state reaction. His laboratory found that 6 M GdmCl (guanidi-
nium chloride) is a denaturant that eliminates residual structure in water-soluble
proteins, whereas thermally denatured proteins retain significant residual struc-
ture [15]. (Whether or not the residual structure is related to the structure of the
native protein was left for future study.) A key finding was that the reversible dena-
turation reactions of several small proteins are indeed two-state reactions when
6 M GdmCl is the denaturant [15]. This work opened the way to later study of
two basic questions: (1) Are there intermediates in protein folding reactions and,
if so, how can they be detected? and (2) How can the energetics of protein folding
be measured experimentally? In discussing these problems, I use the terms ‘‘un-
folded’’ and ‘‘denatured’’ interchangeably, and a ‘‘denatured’’ protein typically has
some residual structure that depends on solvent conditions and temperature.
1.3
Levinthal’s Paradox
from their amino acid sequences. He observed that, if protein folding is truly a
two-state reaction without intermediates, then the time needed to fold can be esti-
mated from the time needed to search randomly all possible backbone conforma-
tions. Levinthal estimated that the time needed for folding by a random search is
far longer than the life of the universe. A plausible conclusion from his calculation
is that there must be folding intermediates and pathways [16, 17]. When Levin-
thal’s calculation was repeated in 1992, with the addition of a small free energy
bias as the driving force for folding, the time needed to search all conformations
by a random search process was reduced to a few seconds [18]. Note, however,
that a free energy bias in favor of the native structure is likely to produce inter-
mediates in the folding process, although not necessarily ones that are populated.
In 1969 Levinthal pointed out [17] that, if it has a choice, a protein folds to
the structure dictated by the fastest folding pathway and not to the most stable
structure, in contrast to Anfinsen’s thermodynamic hypothesis. His proposal was
confirmed experimentally in 1996 for a protein from the serpin family, whose
members form two different stable structures. The folding pathway of the serpin,
plasminogen activator inhibitor 1, was found to be under kinetic control [19]. In
1998 Agard and coworkers [20] found that the stability of the folded structure of
a-lytic protease appears to be under kinetic control; i.e., the denatured protein is
not only kinetically unable to refold but also thermodynamically more stable than
the native form [20]. This surprising deduction does not contradict Anfinsen’s ther-
modynamic hypothesis. The enzymatically active form of this protein is formed
after complete folding of a much longer polypeptide whose long Pro sequence is
cleaved off after folding is complete.
1.4
The Domain as a Unit of Folding
Knowledge that polypeptide chains are synthesized starting from the N-terminus
led to speculation that folding begins from the N-terminus of the chain. In 1970,
Taniuchi [21] observed that the correct four SaS bonds of RNase A are not formed
and the protein remains unfolded if the four C-terminal residues are deleted from
the 124-residue polypeptide chain. Consequently, almost the entire polypeptide
chain (maybe all of it) is required for stable folding (but see, however, Section
1.10). In 1969 Goldberg [22] found that intracistronic complementation occurs
in b-galactosidase via the presence of at least two independently folding units
(‘‘globules’’) in each of the four identical polypeptide chains. In 1973 Wetlaufer
[23] found contiguous folded regions in the X-ray structures of several proteins, in
some cases apparently connected by flexible linkers. These three observations
taken together gave rise to the concept that the domain (@100 amino acids) is the
unit of stable folding.
Wetlaufer [23] pointed out that a contiguous folded region of the polypeptide
chain is likely to arise from a structural folding nucleus. He noted that a structural
nucleus might also serve as a kinetic nucleus for the folding process, with the con-
1.5 Detection of Folding Intermediates and Initial Work on the Kinetic Mechanism of Folding 7
sequence that successive folding events occur rapidly after the nucleus is formed,
so that folding intermediates are never populated. His suggestion was often used
in the early 1970s to explain why folding intermediates could not be found. In
1981 Lesk and Rose [24] pointed out that each protein domain can typically be di-
vided into two subdomains, each of which is also folded from a contiguous seg-
ment of polypeptide chain – although the subdomains are not separated by flexible
linkers. Their observation favors a hierarchic mechanism of folding [24].
In 1974 Goldberg and coworkers [25] proposed domain swapping as the explana-
tion for the concentration-dependent formation of large aggregates of refolding
tryptophanase, formed at a critical urea concentration. Their work has been taken
as a model for understanding the formation of inclusion bodies and the need for
chaperones to improve the yield in many folding reactions. Also in the 1970s
Jaenicke and coworkers [26] began a systematic investigation of folding coupled
to subunit association in the concentration-dependent folding reactions of oligo-
meric proteins. These subjects are discussed elsewhere in this book.
1.5
Detection of Folding Intermediates and Initial Work on the Kinetic Mechanism of
Folding
method, capable of observing kinetic changes only in the time range of seconds
and longer, to study the kinetics of protein folding. He reported apparent two-state
kinetics for the unfolding/refolding reactions of chymotrypsin [34], trypsin [35],
chymotrypsinogen-A [36], and RNase A [36]. He found that the entire kinetic prog-
ress curve for unfolding or refolding could be represented by a single exponential
time course and the kinetic amplitude agreed with the value expected from the
equilibrium unfolding curve. He interpreted his results as showing that folding is
highly cooperative, which strengthened the general view that folding intermediates
would be very difficult to detect, perhaps impossible. Ikai and Tanford undertook
their stopped-flow study of the unfolding/refolding kinetics of cyt c, despite Pohl’s
evidence that the kinetic folding reactions of small proteins are two-state, because
Wayne Fish in Tanford’s laboratory had evidence suggesting an equilibrium fold-
ing intermediate (A. Ikai, personal communication, 2003).
Tsong, Baldwin, and Elson [28, 37] undertook their fast T-jump study (dead time,
10 ms) of the unfolding kinetics of RNase A because they believed that fast kinetics
might reveal intermediates in apparent two-state unfolding reactions. Pörschke and
Eigen had observed [38] that short RNA helices show unfolding intermediates in
the msec time range even though both the major folding and unfolding reactions
follow a single exponential time course in the seconds time range. The unfolding
kinetics of RNase A could be fitted to a similar type of nucleation-dependent fold-
ing model [28, 37, 40]. The model predicts kinetics in which the relative amplitude
of the fast unfolding phase increases rapidly with increasing temperature, in agree-
ment with experiment [28, 37]. A fast T-jump study of the unfolding kinetics of
chymotrypsinogen-A [40] gave results like those of RNase A, suggesting that the
nucleation model might be generally applicable. For chymotrypsinogen, as for
RNase A, there is a fast (milliseconds) phase in unfolding, in addition to the slow
unfolding reaction (seconds) studied earlier by Pohl [36]. The fast phases in the
unfolding of RNase A [28, 37] and of chymotrypsinogen [40] were missed by Pohl
[36] because they account for only a few percent of the total kinetic amplitude in
his conditions, and because he measured only the slow unfolding reactions. The
relative amplitude of the fast unfolding phase approaches 100% at temperatures
near the upper end of the thermal unfolding transition of RNase A (see Section
1.6), but of course the total amplitude becomes small.
The T-jump and stopped-flow unfolding studies of RNase A were monitored by
tyrosine absorbance [28, 37] and RNase A has six tyrosine residues. Consequently,
the fast and slow unfolding reactions of RNase A might be detecting unfolding re-
actions in different parts of the molecule that occur in different time ranges. This
hypothesis was tested and ruled out by studying a chemically reacted derivative of
RNase A containing a single, partly buried, dinitrophenyl (DNP) group [41]. The
unfolding kinetics monitored by the single DNP group are biphasic, exactly like
the kinetics observed by the six tyrosyl groups. Later work by Paul Hagerman (see
below) showed that the fast phase in the unfolding of RNase A arises from an un-
folding intermediate that is a minor species in the ensemble of unfolded forms.
Thus, in 1973 the stage was set for kinetic studies of the mechanism of protein
folding/unfolding. However, three basic questions required answers before folding
1.6 Two Unfolded Forms of RNase A and Explanation by Proline Isomerization 9
mechanisms could be analyzed. (1) Are the observed folding intermediates on-
pathway? (2) Are the intermediates partly folded forms or are they completely
unfolded, while the complex unfolding/refolding kinetics result from the intercon-
version of different denatured forms? (3) How can the structures be determined
of folding intermediates whose lifetimes are as short as milliseconds?
1.6
Two Unfolded Forms of RNase A and Explanation by Proline Isomerization
In 1973 Garel and Baldwin found that unfolded RNase A contains two different
major denatured forms, a fast-folding species (UF , @20%) and a slow-folding
species (US , @80%) [42] that refolds 50 times more slowly than UF in some con-
ditions. The two different denatured forms were discovered when refolding was
monitored with a probe specific for the enzymatically active protein, namely bind-
ing of the specific inhibitor 2 0 -CMP. Refolding was studied initially after a pH
jump (pH 2.0 ! pH 5.8) at high temperatures (to obtain complete unfolding at
pH 2.0) [42] and later after dilution from 6 M GdmCl [43]. When the unfolding
transition is complete in the initial conditions, either at low pH and high tempera-
ture [42] or in 6 M GdmCl [43], both the fast and slow kinetic phases of refolding
yield native RNase A as product. In 6 M GdmCl, the denatured protein should be
completely unfolded and therefore the fast-folding and slow-folding forms corre-
spond to two different denatured species.
In 1976 Hagerman and Baldwin [44] studied the kinetic mechanism of RNase
A unfolding by using a stopped-flow apparatus and pH jumps to analyze the
unfolding/refolding kinetics as a function of temperature throughout the thermal
unfolding zone at pH 3.0. Their analysis is based on a four-species mechanism,
US $ UF $ I $ N, in which US and UF are the slow-folding and fast-folding
forms of the denatured protein discussed above and I is a new unfolding interme-
diate observed above Tm . Because I is completely unfolded, as judged either by
tyrosine absorbance or by enthalpy content [44], I may be labeled instead as U3
and the unfolding/refolding mechanism written as US $ UF $ U3 $ N. The
proportions of US :UF :U3 in denatured RNase A were predicted in 1976 to be
0.78:0.20:0.02 [44]. Although U3 was studied initially only as an unfolding interme-
diate [44], U3 should be the immediate precursor of N in refolding experiments
according to the sequential unfolding/refolding mechanism. This prediction was
confirmed next with experiments using a sequential mixing apparatus [45]. U3
and UF are populated transiently by unfolding N with a first mixing step and then
U3 and UF are allowed to refold after a second mixing step. U3 forms N much
more rapidly than UF does [45], and in 1994 U3 was detected in the equilibrium
population of denatured RNase A species [46].
The 1976 analysis predicts correctly the equilibrium curve for thermal denatura-
tion from the kinetic data [44], and also shows that refolding of UF and US to N
must occur by the sequential mechanism US $ UF $ N and not by the split
mechanism US $ N $ UF . The latter conclusion follows from the behavior of
10 1 Early Days of Studying the Mechanism of Protein Folding
the kinetic amplitudes when the relative rates of the fast and slow kinetic refolding
reactions are varied by changing the temperature [44]. The issue of a sequential
versus a split mechanism was tested in a different manner by Brandts and co-
workers [47], who were aware of Hagerman’s work (see their discussion). They in-
troduced the interrupted unfolding (or ‘‘double-jump’’) experiment in which the
species present at each time of unfolding are assayed by refolding measurements.
Brandts and coworkers [47] proposed a proline isomerization model as an expla-
nation for the two different forms of RNase A. Their model includes two separate
hypotheses. The first is that the fast-folding and slow-folding forms of a denatured
protein are produced by slow cis–trans isomerization of proline peptide bonds
after unfolding occurs. The second is that the fast-folding and slow-folding dena-
tured species account entirely for the complex unfolding/refolding kinetics and
no structural folding intermediates are populated. Thus, their unfolding/refolding
mechanism for a protein with only one proline residue is N $ UF $ US . The fast-
folding species UF has the same prolyl isomer (cis or trans) as N and the slow-
folding species US contains the other prolyl isomer.
Nuclear magnetic resonance (NMR) studies of proline-containing peptides show
that the cis:trans ratio of a prolyl peptide bond commonly lies between 30:70 and
10:90, depending on neighboring residues. Because the proline ring sometimes
clashes sterically with neighboring side chains, proline peptide bonds (X-Pro) are
quite different from ordinary peptide bonds, for which the % cis is only @0.1–1%.
RNase A contains two cis proline residues, as well as as two trans proline residues,
and therefore denatured RNase A should have an unusually high fraction of [US ]
(as observed), because the two cis residues isomerize to trans after unfolding and
produce US species. A later NMR study of the trans ! cis isomerization rate of
Gly-Pro in water [48] places it in the same time range as the US $ UF reaction of
RNase A.
The proline isomerization model of Brandts and coworkers was very persuasive
but it proved difficult to test, particularly because one of its two hypotheses turned
out to be wrong, namely that no structural intermediates are populated during
the kinetics of unfolding or refolding. In 1978 Schmid and Baldwin [49] found
that the UF $ US reaction in unfolded RNase A is acid-catalyzed, although very
strong acid, >5 M HClO4 , is required for catalysis. Very strong acid is needed for
the cis ! trans isomerization of both prolyl peptide bonds [50] and ordinary pep-
tide bonds, and the slow rate of the UF $ US reaction of RNase A implies that the
critical bonds are prolyl peptide bonds rather than ordinary peptide bonds. The
high activation enthalpy (@85 kJ mol1 ) expected for isomerization of prolyl pep-
tide bonds was found for the UF $ US reaction of denatured RNase A, both in
3.3 M HClO4 and in 5 M GdmCl [49]. The acid catalysis results were widely ac-
cepted as evidence that the UF $ US reaction of RNase A is proline isomerization,
and the later discovery of prolyl isomerases (see below) ended any doubts. In fur-
ther work, the role of proline isomerization in protein folding kinetics has been
thoroughly analyzed, especially for RNase T1 [51, 52], by combining mutagenesis
of specific proline residues with sequential mixing experiments and with accurate
measurement and analysis of kinetic amplitudes and relaxation times.
1.7 Covalent Intermediates in the Coupled Processes of Disulfide Bond Formation and Folding 11
1.7
Covalent Intermediates in the Coupled Processes of Disulfide Bond Formation and
Folding
Disulfide bonds stabilize the folded structures of proteins that contain SaS bonds
and, when they are reduced, the protein typically unfolds. This observation was the
starting point of Anfinsen’s work [2] when he showed that the folding process
directs the formation of the four unique SaS bonds of native RNase A. Tom
Creighton had the basic insight that SaS intermediates can be covalently trapped,
purified by chromatography, and structurally characterized. Because formation of
SaS bonds is linked to the folding process, these SaS intermediates should also
be folding intermediates. Beginning in 1974, Creighton reported the isolation and
general properties of both the one-disulfide [56] and two-disulfide [57] intermedi-
ates of the small protein BPTI (bovine pancreatic trypsin inhibitor, 58 residues,
three SaS bonds). He later measured the equilibrium constants for forming each
of the three SaS bonds in BPTI [58, 59]. For example, he found that the effective
concentration of the two aSH groups that form the SaS bond between cysteine res-
idues 5 and 55 is @10 7 M [58]. This value is three orders of magnitude higher than
the effective concentration of the two adjacent aSH groups in dithiothreitol [58],
and it illustrates the rigid alignment of these two aSH groups by the folded struc-
ture of BPTI. Later, after the development of two-dimensional NMR made it possi-
ble to determine the structures of protein species that are difficult to crystallize,
Creighton and coworkers determined the structures of various SaS intermediates
of BPTI (see, for example, Ref. [60]). Stabilization of the BPTI structure by a given
12 1 Early Days of Studying the Mechanism of Protein Folding
SaS bond depends on the effective concentration of the two aSH groups before the
bond is formed, and the increase in effective concentration as successive SaS
bonds are formed illustrates strikingly how the cooperativity of protein folding op-
erates [58, 59]. The same principle has been used at an early stage in the folding of
BPTI to examine the interplay between SaS bond and reverse turn formation [61].
The pathway of disulfide bond formation in BPTI is complex and has been the sub-
ject of considerable discussion [62, 63].
Early work [2] from Anfinsen’s laboratory showed that nonnative SaS bonds are
often formed during the kinetic process in which unfolded RNase A folds and
eventually makes the correct SaS bonds. Because the SaS bond is covalent, some
mechanism is needed to break nonnative SaS bonds during folding and allow for-
mation of new SaS bonds. Anfinsen and coworkers reported in 1963 that enzymes
such as protein disulfide isomerase [7] have a major role in ensuring that correct
SaS bonds are formed. The role of disulfide isomerases in folding in vivo is dis-
cussed elsewhere in this book (see Chapter 26).
1.8
Early Stages of Folding Detected by Antibodies and by Hydrogen Exchange
Initial studies of the folding of peptide fragments, and also of proteins made to
unfold by removing a stabilizing linkage or cofactor, gave the following general-
ization: the tertiary structures of proteins are easily unfolded and little residual
structure remains afterwards. In 1956 Harrington and Schellman [9] found that
breaking the four disulfide bonds of RNase A causes general unfolding of the ter-
tiary structure and also destroys the helical structure, which should be local struc-
ture that could in principle survive loss of the tertiary structure. In 1968 Epand
and Scheraga [64] tested by circular dichroism (CD) whether peptides from helix-
containing segments of myoglobin still form helices in aqueous solution. They
studied two long peptides and found they have very low helix contents; they did
not pursue the problem. In 1969 Taniuchi and Anfinsen [65] made a similar experi-
ment with SNase. They cleaved the polypeptide chain between residues 126 and
127, which causes the protein to unfold. Both fragments 1–126 and 127–149 were
found to lack detectable native-like structure by various physical methods, includ-
ing circular dichroism. For SNase as for RNase A, circular dichroism indicates that
the native protein has some helical structure and at that time (1968) the X-ray
structure of myoglobin was known [4], which gives the detailed structures of its
eight helices. Thus, the overall conclusion from these experiments was that the
helical secondary structures of the three proteins are stable only when the tertiary
structures are present. In 1971 Lewis, Momany and Scheraga [66] proposed a hier-
archic mechanism of folding in which b-turns play a directing role at early stages
of folding by increasing the effective concentrations of locally formed structures,
such as helices, that later interact in the native structure.
Anfinsen considered it likely that proteins fold by a hierarchic mechanism [1],
and he developed an antibody method for sensitively detecting any native-like
1.8 Early Stages of Folding Detected by Antibodies and by Hydrogen Exchange 13
structure still present in a denatured protein [67]. His method is simple in princi-
ple. He and his coworkers took fragments 1–126 and 99–149 of SNase, which were
devoid of detectable structure by physical criteria, and bound them covalently to
individual sepharose columns. Antisera were prepared against both native SNase
and the two polypeptide fragments, and polyclonal antibodies were purified by
immunoabsorption against each homologous antigen. The antibodies developed
against native SNase were tested for their ability to cross-react with the two dena-
tured fragments. The results indicate that a weak cross-reaction occurs and a de-
natured fragment reacts with antibodies made against native SNase as if the dena-
tured fragment exists in the ‘‘native format’’ a small fraction (@0.02%) of the time.
These experiments are the forerunner of modern ones in which monoclonal anti-
bodies are made against short peptides, and some of the monoclonal antibodies
cross-react significantly with the native protein from which the peptide is derived.
In 1975 Anfinsen and coworkers made the converse experiment [68]. They found
that antibodies directed against denatured fragments of SNase are able to cross-
react with native SNase. They conclude that antibodies made against denatured
fragments detect unfolded SNase in equilibrium with native SNase, even though
only a tiny fraction of the native protein, less than 0.01%, is found to be unfolded.
In 1979 Schmid and Baldwin [69] developed a competition method for detecting
H-bonded secondary structure formed at an early stage in the refolding of dena-
tured RNase A. Native proteins were known at that time, from Linderstrøm-Lang’s
development of the hydrogen exchange method [70], to contain large numbers of
highly protected peptide NH protons. Shortly afterwards, in 1982, NMR hydrogen
exchange experiments by Wagner and Wüthrich [71] demonstrated that, as ex-
pected, the highly protected peptide NH protons of BPTI are ones involved in H-
bonded secondary structure. In this period the exchange rates of freely exchanging,
unprotected peptide NH protons were already known from earlier studies of dipep-
tides by Englander and coworkers [72]. The peptide NH exchange rates are base-
catalyzed and the rates become faster above pH 7 than the measured folding rates
of the two major US species of denatured RNase A.
Thus, the principle of the competition experiment is straightforward [69].
Refolding of denatured, 3 H-labeled RNase A is performed at pH values where the
rate of exchange-out of the 3 H label from denatured RNase A is either faster or
slower, depending on pH, than the observed rate of refolding to form native RNase
A. (The exchange rates of peptide NH protons in denatured RNase A can be com-
puted from the peptide data [72].) When exchange-out is slower than folding, the
folding process traps many 3 H-labeled protons. When exchange-out is faster than
the formation of native RNase A, the observed folding rate can be used to predict
the number of 3 H-labeled protons that should be trapped by folding. However, the
observed number of 3 H-labeled protons trapped by folding is always much larger
than the number predicted in this way. Control experiments, made at the same pH
values but in the presence of modest GdmCl concentrations added to destabilize
folding intermediates, show no trapped 3 H label. The first conclusion is that one
or more folding intermediates are formed rapidly and they give protection against
exchange-out of 3 H label. The second conclusion is that some form of early struc-
14 1 Early Days of Studying the Mechanism of Protein Folding
ture, probably H-bonded secondary structure, is stable before the tertiary structure
is formed. In 1980, a more convenient and informative pulse-labeling version of
the competition experiment was tested [73] and found superior to the competi-
tion method. Methods of resolving and assigning the proton spectra of native pro-
teins were being developed rapidly in the early 1980s and it was evident that the
secondary structures of early folding intermediates would be determined by this
approach, probably within a few years. A stopped-flow apparatus could be used
to trap protected peptide NH protons by means of 2 Ha 1 H exchange, and 2D 1 H-
NMR could then be used to determine the structural locations of the protected
N 1 H protons.
1.9
Molten Globule Folding Intermediates
In 1981, Oleg Ptitsyn and coworkers released a bombshell [74] that was compara-
ble in its impact to Levinthal’s paradox. They proposed that the folding intermedi-
ates everyone had been searching for were sitting under our noses in plain sight,
in the form of partly folded structures formed when certain native proteins are ex-
posed to mildly destabilizing conditions. A few proteins were known to form these
curious, partly folded, structures, particularly at acid pH. Ptitsyn and coworkers
proposed that the partly folded forms, or ‘‘acid forms,’’ were structurally related to
authentic folding intermediates. The acid forms were supposed to differ from true
folding intermediates essentially only by protonation reactions resulting from pH
titration to acid pH. The acid forms were found to have surprising properties
which suggested that their secondary structures are stable and native-like and their
conformations are compact, even though the acid forms lack fixed tertiary struc-
tures (for reviews, see Refs [75, 76]). Until then, most workers had taken it for
granted that folding intermediates should simply be ‘‘partial replicas’’ of native
proteins. They should contain some unfolded segments plus some other folded
segments whose tertiary and secondary structures are native-like. Later work has
verified essential features of Ptitsyn’s proposal, although argument about the de-
tails continues.
Ptitsyn’s background was in polymer physics, and he was accustomed to ana-
lyzing problems involving the conformations of polymers. In 1973, he gave his
forecast of a plausible model for the kinetic process of protein folding [77], which
resembles the 1971 hierarchic mechanism of Scheraga and coworkers [66] but in-
cludes also later stages in the folding process. Ptitsyn later used the term ‘‘frame-
work model,’’ coined in a 1982 review by Kim and Baldwin [78], to describe his
model. In making his 1981 proposal [74], Ptitsyn was impressed by the resem-
blance between the physical properties of acid forms and the properties he had hy-
pothesized for early folding intermediates. He must also have been impressed by
Kuwajima’s results for the acid form of a-LA, which revealed some striking proper-
ties of acid forms (see below).
The name ‘‘molten globule’’ was given to these acid forms by Ohgushi and Wada
[79] in 1983: ‘‘globule’’ meaning compact, ‘‘molten’’ meaning no fixed tertiary
1.10 Structures of Peptide Models for Folding Intermediates 15
structure. Ohgushi and Wada were studying two acid forms of cyt c, which are con-
verted from one form to the other by varying the salt concentration [79, 80].
In 1981 the best studied of these partly folded acid forms was bovine a-
lactalbumin (a-LA), which was chosen by Ptitsyn and coworkers [74] for their ini-
tial study of an acid form. In 1976 and earlier, Kuwajima and coworkers had ana-
lyzed the pH-dependent interconversion between the acid form (or ‘‘A-state’’) and
native a-LA [81]. They found a very unusual equilibrium folding intermediate in
GdmCl-induced denaturation at neutral pH [81], which by continuity – as the pH
is varied – is the same species as the acid form of a-LA. The explanation for this
unusual folding intermediate is that native a-LA is a calcium metalloprotein, a
property discovered only in 1980 [82]. The 1976 [81] and earlier studies of a-LA
were made with the apoprotein in the absence of Ca 2þ , and the apoprotein is
much less stable than the holoprotein. When GdmCl-induced unfolding of the
more stable holoprotein is studied in the presence of Ca 2þ , no stable folding in-
termediate is observed [76]. The near-UV and far-UV CD spectra of the 1976 fold-
ing intermediate [81] reveal some basic properties of molten globule intermediates.
The a-LA folding intermediate has no fixed tertiary structure, as judged by its near-
UV CD spectrum, but its secondary structure resembles that of native a-LA, as
judged by its far-UV CD spectrum. In 1990, when 2D 1 H-NMR was used together
with 2 Ha 1 H exchange to determine the locations and stability of the helices in a
few acid forms, the helices were found at the same locations as in the native struc-
tures. Particularly clear results were found for the helices in the acid forms of cyt c
[83] and apomyoglobin [84].
1.10
Structures of Peptide Models for Folding Intermediates
By 1979 a paradox was evident concerning the stability of helices in folding inter-
mediates. The experiments of Epand and Scheraga and of Taniuchi and Anfinsen
indicated that the helices of myoglobin [64] and SNase [65] were unstable when
the intact polypeptide chains of these proteins were cut into smaller fragments.
On the other hand, the 3 H-labeling experiments of Schmid and Baldwin [69] indi-
cated that an early folding intermediate – probably a H-bonded intermediate – of
RNase A is stable before the tertiary structure is formed. In 1971 Brown and Klee
[85] had found partial helix formation by CD at 0 C in the ‘‘C-peptide’’ of RNase
A. C-peptide is formed by cyanogen bromide cleavage at Met13 and contains resi-
dues 1–13, while residues 3–12 form a helix in native RNase A.
In 1978, Blum, Smallcombe and Baldwin [86] used the four His residues of
RNase A as probes for structure in an NMR study in real time of the kinetics of
RNase A folding at pH 2, 10 C. These are conditions in which the folding rate is
sufficiently slow to take 1D NMR spectra during folding. The carbon-bound pro-
tons of the imidazole side chains could be resolved by 1D NMR in that period, pro-
vided the peptide N 1 H protons are first exchanged for 2 H. By chemical shift, His12
appears to be part of a rapidly formed, folded structure at 10 C, although it is un-
folded at 45 C, pH 2 [86]. This result suggests that the N-terminal helix of RNase
16 1 Early Days of Studying the Mechanism of Protein Folding
Acknowledgments
I thank Atsushi Ikai for information and Thomas Kiefhaber for discussion.
References
ribonuclease, during oxidation of the 13 Lumry, R., Biltonen, R., & Brandts,
reduced polypeptide chain. Proc. NatL J. F. (1966). Validity of the ‘‘two-state’’
Acad. Sci. USA 47, 1309–1314. hypothesis for conformational
3 Simpson, R. B. & Kauzmann, W. transitions of proteins. Biopolymers 4,
(1953). The kinetics of protein 917–944.
denaturation. I. The behavior of the 14 Privalov, P. L. (1979). Stability of
optical rotation of ovalbumin in urea proteins. Small globular proteins.
solutions. J. Am. Chem. Soc. 75, 5139– Adv. Protein Chem. 33, 167–241.
5152. 15 Tanford, C. (1968). Protein
4 Kendrew, J. C., Dickerson, R. E., denaturation. A. Characterization of
Strandberg, B. E. et al. (1960). the denatured state. B. The transition
Structure of myoglobin. A three- from native to denatured state.
dimensional Fourier synthesis at 2 Å Adv. Protein Chem. 23, 121–282.
resolution. Nature (London) 185, 422– 16 Levinthal, C. (1968). Are there
427. pathways for protein folding? J. Chim.
5 Wu, H. (1931). Studies on denatura- Phys. 65, 44–45.
tion of proteins. XIII. A theory of 17 Levinthal, C. (1969). How to fold
denaturation. Chin. J. Physiol. 5, graciously. In Mössbauer Spectroscopy
321–344. in Biological Systems (Frauenfelder,
6 Kabat, E. A. & Wu, T. T. (1972). H., Gunsalus, I. C., Tsibris, J. C. M.,
Construction of a three-dimensional Debrunner, P. G., & Münck, E.,
model of the polypeptide backbone eds), pp. 22–24, University of Illinois
of the variable region of kappa Press, Urbana.
immunoglobulin light chains. Proc. 18 Zwanzig, R., Szabo, A., & Bagchi, B.
Natl Acad. Sci. USA 69, 960–964. (1992). Levinthal’s paradox. Proc. Natl
7 Epstein, C. J., Goldberger, R. F., & Acad. Sci. USA 89, 20–22.
Anfinsen, C. B. (1963). The genetic 19 Wang, Z., Mottonen, J., &
control of tertiary protein structure: Goldsmith, E. J. (1996). Kinetically
studies with model systems. Cold controlled folding of the serpin
Spring Harbor Symp. Quant. Biol. 28, plasminogen activator inhibitor 1.
439–444. Biochemistry 35, 16443–16448.
8 Levitt, M. & Warshel, A. (1975). 20 Sohl, J. L., Jaswal, S. S., & Agard, D.
Computer simulation of protein A. (1998). Unfolded conformations of
folding. Nature (London) 253, 694– a-lytic protease are more stable than
698. its native state. Nature (London) 395,
9 Harrington, W. F. & Schellman, J. 817–819.
A. (1956). Evidence for the instability 21 Taniuchi, H. (1970). Formation of
of hydrogen-bonded peptide structures randomly paired disulfide bonds in
in water based on studies of ribo- des-(121–124)-ribonuclease after
nuclease and oxidized ribonuclease. reduction and reoxidation. J. Biol.
C.R. Trav. Lab. Carlsberg, Sér. Chim. 30, Chem. 245, 5459–5468.
21–43. 22 Goldberg, M. E. (1969). Tertiary
10 Brandts, J. F. (1965). The nature of structure of Escherichia coli b-D-
the complexities in the ribonuclease galactosidase. J. Mol. Biol. 46, 441–
conformational transition and the 446.
implications regarding clathrating. 23 Wetlaufer, D. B. (1973). Nucleation,
J. Am. Chem. Soc. 87, 2759–2760. rapid folding and globular intrachain
11 Kauzmann, W. (1959). Factors in regions in proteins. Proc. Natl Acad.
interpretation of protein denaturation. Sci. USA 70, 697–701.
Adv. Protein Chem. 14, 1–63. 24 Lesk, A. M. & Rose, G. D. (1981).
12 Ginsburg, A. & Carroll, W. R. Folding units in globular proteins.
(1965). Some specific ion effects on Proc. Natl Acad. Sci. USA 78, 4304–
the conformation and thermal stability 4308.
of ribonuclease. Biochemistry 4, 2159– 25 London, J., Skrzynia, C., &
2174. Goldberg, M. E. (1974). Renaturation
18 1 Early Days of Studying the Mechanism of Protein Folding
2
Spectroscopic Techniques to Study
Protein Folding and Stability
Franz Schmid
2.1
Introduction
Optical spectroscopy provides the standard techniques for measuring the confor-
mational stabilities of proteins and for following the kinetics of unfolding and re-
folding reactions. Practically all unfolding and refolding reactions are accompanied
by changes in absorbance, fluorescence, or optical activity (usually measured as the
circular dichroism). These optical properties scale linearly with protein concentra-
tion. Spectra and spectral changes can be measured with very high accuracy in di-
lute protein solutions, and spectrophotometers are standard laboratory equipment.
Therefore the methods of optical spectroscopy will remain the basic techniques for
studying protein folding. Light absorbance occurs in about 1015 s and the excited
states show lifetimes of around 108 s. Optical spectroscopy is thus well suited for
measuring the kinetics of ultrafast processes in folding (e.g., in temperature-jump
or pressure-jump experiments).
Proteins absorb and emit light in the UV range of the spectrum. The absorbance
originates from the peptide groups, from the aromatic amino acid side chains and
from disulfide bonds. The fluorescence emission originates from the aromatic
amino acids. Some proteins that carry covalently linked cofactors, such as the
heme proteins, show absorbance in the visible range.
In absorption, light energy is used to promote electrons from the ground state to
an excited state, and when the excited electrons revert back to the ground state they
can lose their energy in the form of emitted light, which is called fluorescence.
When a chromophore is part of an asymmetric structure, or when it is immobi-
lized in an asymmetric environment, left-handed and right-handed circularly polar-
ized light are absorbed to different extents, and we call this phenomenon circular
dichroism (CD).
The energies of the ground and the activated state depend on the molecular
environment and the mobility of the chromophores, and therefore the optical prop-
erties of folded and unfolded proteins are usually different. Spectroscopy is a
nondestructive method, and, if necessary, the samples can be recovered after the
experiments.
This chapter provides an introduction into the spectral properties of amino acids
and proteins as well as practical advice for the planning and the interpretation of
spectroscopic measurements of protein folding and stability.
2.2
Absorbance
Light absorption can occur when an incoming photon provides the energy that is
necessary to promote an electron from the ground state to an excited state. The
energy (E) of light depends on its frequency n (E ¼ hn) and is thus inversely corre-
lated with its wavelength l (E ¼ hc=l). Light in the 200-nm region has sufficient
energy for exciting electrons that participate in small electronic systems such as
double bonds. Aromatic molecules have lower lying excited states, and therefore
they also absorb light in the region between 250 and 290 nm, which is called the
‘‘near UV’’ or ‘‘aromatic’’ region of a spectrum. The amino acids phenylalanine,
tyrosine, and tryptophan absorb here, as well as the bases in nucleic acids. The
aromatic region is used extensively to follow the conformational transitions of
proteins and nucleic acids. Light in the visible region of the spectrum can only
excite electrons that participate in extended delocalized p systems. Such systems
do not occur in normal proteins, but can be introduced by site-specific labeling.
The basic aspects of protein absorption are very well covered in Refs [1–6].
The absorbance (A) is related to the intensity of the light before (I o ) and after (I)
passage through the protein solution:
A¼ecl ð2Þ
where l is the pathlength in cm, and e is the molar absorption coefficient, which
can be determined experimentally or calculated for proteins by adding up the con-
tributions of the constituent aromatic amino acids of a protein [7–10].
2.2.1
Absorbance of Proteins
100
4
80
ε x 10-4 (M-1cm-1)
3
60 2
1
40
0
sorb strongly between 180 and 230 nm. Figure 2.1 shows the descending slope of
this absorption. The shoulder near 220 nm also contains significant contributions
from the aromatic residues of lysozyme. The inset of Figure 2.1 provides an ex-
panded view of the near-UV region of the spectrum. The absorption here is caused
by the aromatic side chains of tyrosine, tryptophan, and phenylalanine. Disulfide
bonds display a very weak absorbance band around 250 nm. Figure 2.1 emphasizes
that proteins absorb much more strongly in the far-UV region than in the near-UV
(the ‘‘aromatic’’) region, which is commonly used to measure protein concentra-
tions and to follow protein unfolding.
Spectra of the individual aromatic amino acids are shown in Figure 2.2; their
molar absorptions at the respective maxima in the near UV are compared in Table
2.1. The contributions of phenylalanine, tyrosine, and tryptophan to the absorb-
ance of a protein are strongly different. In the aromatic region the molar absorp-
tion of phenylalanine (l max ¼ 258 nm, e258 ¼ 200 M1 cm1 ) is smaller by an
order of magnitude compared with those of tyrosine (l max ¼ 275 nm, e275 ¼ 1400
M1 cm1 ) and tryptophan (l max ¼ 280 nm, e280 ¼ 5600 M1 cm1 ), and it is vir-
tually zero above 270 nm. The near-UV spectrum of a protein is therefore domi-
nated by the contributions of tyrosine and tryptophan. The shape of the absorbance
spectrum of a protein reflects the relative contents of the three aromatic residues in
the molecule.
The absorbance maximum near 280 nm is used for determining the concentra-
tions of protein solutions and the 280–295 nm range is used to follow protein un-
folding reactions. In this wavelength range only tyrosine and tryptophan contribute
to the observed absorbance and to the changes caused by conformational changes.
With the distinct fine structure of their absorbance (Figure 2.2) the phenylalanine
residues contribute ‘‘wiggles’’ in the 250–260 nm region, and they influence the
2.2 Absorbance 25
Fig. 2.2. Ultraviolet absorption spectra of the (a) 1 mM phenylalanine, (b) 0.1 mM tyrosine,
aromatic amino acids in a 1-cm cell in 0.01 M (c) 0.1 mM tryptophan. The figure is taken
potassium phosphate buffer, pH 7.0 at 25 C. from Ref. [6].
ratio A260 =A280 , which is often used to identify contaminating nucleic acids in pro-
tein preparations.
The aromatic amino acids absorb only below 310 nm, and therefore pure protein
solutions should not show absorbance at wavelengths greater than 310 nm. Pro-
teins that are devoid of tryptophan should not absorb above 300 nm (cf. Figure
2.2). A sloping baseline in the 310–400 nm region usually originates from light
scattering when large particles, such as aggregates are present in the protein solu-
tion. A plot of log10 A as a function of log10 wavelength (above 310 nm) reveals the
contribution of light scattering, and the linear extrapolation to shorter wavelength
can be used to correct the measured absorbance for the contribution of scattering
[5].
Figure 2.3 reflects the strong influence of tryptophan residues on absorbance
spectra. It compares the absorption spectra of the wild-type form (Figure 2.3a)
and the Trp59Tyr variant (Figure 2.3b) of ribonuclease T1. The wild-type protein
Tab. 2.1. Absorbance and fluorescence properties of the aromatic amino acidsa.
20000 a c
Molar Absorbance
15000
(M-1cm-1) 10000
5000
0
240 260 280 300 320 340 260 280 300 320 340
Wavelength (nm) Wavelength (nm)
Absorbance Difference
4000 b d
3000
(M-1cm-1)
2000
1000
0
-1000
240 260 280 300 320 340 260 280 300 320
Wavelength (nm) Wavelength (nm)
Fig. 2.3. Ultraviolet absorption spectra of a) unfolded forms are shown in b) and d).
the wild-type form, and c) the Trp59Tyr variant Spectra of 15 mM protein were measured
of ribonuclease T1. The spectra of the native at 25 C in 1-cm cuvettes in a double-beam
proteins (in 0.1 M sodium acetate, pH 5.0) are instrument with a bandwidth of 1 nm at 25 C.
shown by the continuous lines. The spectra of The spectra of the native and unfolded
the unfolded proteins (in 6.0 M GdmCl in the proteins were recorded successively, stored,
same buffer) are shown by broken lines. The and subtracted. The figure is taken from Ref.
difference spectra between the native and [6].
contains one tryptophan and nine tyrosines, the Trp59Tyr variant contains 10 tryo-
sines and no tryptophan. The Trp59Tyr replacement reduces the absorbance in the
280–290 nm region significantly (Figure 2.3c), and the absorbance is essentially
zero above 300 nm.
In folded proteins, most of the aromatic residues are buried in the hydrophobic
core of the molecule, and upon unfolding they become exposed to the aqueous sol-
vent. The spectral changes that accompany unfolding are dominated by a shift of
the absorption spectra of the aromatic amino acids towards lower wavelengths (a
blue shift). In addition, small changes in intensity and peak widths are observed
as well. The blue shift upon unfolding is typically in the range of 3 nm. It leads to
maxima in the difference spectra between native and unfolded proteins in the de-
scending slope of the original spectrum, that is in the 285–288 nm region for tyro-
sine and around 291–294 nm for tryptophan [11].
Figure 2.3 shows also the spectra for the unfolded forms of the two variants of
ribonuclease T1. The difference spectra in Figure 2.3b,d show that the maximal dif-
2.2 Absorbance 27
ferences in absorbance are indeed observed in the descending slopes of the original
spectra in the 285–295 nm region. The difference spectrum of the Trp59Tyr variant
(Figure 2.3d) shows a prominent maximum at 287 nm. It is typical for proteins
that contain tyrosine residues only. In the wild-type protein with its single trypto-
phan (Figure 2.3b) additional signals are observed between 290 and 300 nm. They
originate from the absorbance bands of tryptophan, which are visible as shoulders
in the protein spectrum (Figure 2.3a). Difference spectra between the native and
the unfolded form, such as those for the variants of ribonuclease T1 in Figure
2.3b,d are observed for most proteins. Generally, these differences are small, but
they can be determined with very high accuracy. The contributions of phenylala-
nine residues to difference spectra are very small. In some cases they are apparent
as ripples in the spectral region around 260 nm (cf. Figure 2.3b,d). They are easily
mistaken as instrumental noise.
Proteins are usually unfolded by increasing the temperature or by adding desta-
bilizing agents such as urea or guanidinium chloride (GdmCl) (see Chapter 3).
These changes not only unfold the protein, but they also influence the spectral
properties of the protein solution. The refractive index of the solution decreases
slightly with increasing temperature, the protein concentration decreases because
the solution expands upon heating, and the ionization state of dissociable groups
can change. All these effect are very small, and in combination they often lead to
an almost temperature independent protein absorbance in the absence of unfolding.
Chemical denaturants have a significant influence on the absorbances of tyro-
sine and tryptophan at 287 nm and at 291 nm, respectively, where protein unfold-
ing is usually monitored. The refractive index of the solvent increases as a function
of the concentration of GdmCl or urea, which leads to a slight shift to higher wave-
lengths (a red shift) of the spectra and thus to an increase in absorbance at 287 and
291 nm. A291 of tryptophan increases by about 50% when 7 M GdmCl is added.
Similar increases are also observed for tyrosine at 287 nm and when urea is used
as the denaturant. The GdmCl and urea dependences of tyrosine and tryptophan
absorbance are found in Ref. [6]. The spectrum of the unfolded form of a protein
and its denaturant dependence can be modeled by mixing tyrosine and tryptophan
in a ratio as in the protein under investigation and by measuring the absorbance of
this mixture as a function of the denaturant concentration. Such data provide ade-
quate base lines for denaturant-induced unfolding transitions.
The absorbance changes upon unfolding are measured in the steeply descending
slopes of the absorbance spectra (cf. Figure 2.3) and consequently they depend
on the instrumental settings (in particular on the bandwidth). The reference data
must therefore be determined under the same experimental settings as the actual
unfolding transition.
2.2.2
Practical Considerations for the Measurement of Protein Absorbance
Tab. 2.2. Absorbance of various salt and buffer solutions in the far-UV regiona.
NaClO4 170 0 0 0 0
NaF, KF 170 0 0 0 0
Boric acid 180 0 0 0 0
NaCl 205 0 0.02 >0.5 >0.5
Na2 HPO4 210 0 0.05 0.3 >0.5
NaH2 PO4 195 0 0 0.01 0.15
Na-acetate 220 0.03 0.17 >0.5 >0.5
Glycine 220 0.03 0.1 >0.5 >0.5
Diethylamine 240 0.4 >0.5 >0.5 >0.5
NaOH pH 12 230 b0.5 >2 >2 >2
Boric acid, NaOH pH 9.1 200 0 0 0.09 0.3
Tricine pH 8.5 230 0.22 0.44 >0.5 >0.5
Tris pH 8.0 220 0.02 0.13 0.24 >0.5
Hepes pH 7.5 230 0.37 0.5 >0.5 >0.5
Pipes pH 7.0 230 0.20 0.49 0.29 >0.5
Mops pH 7.0 230 0.10 0.34 0.28 >0.5
Mes pH 6.0 230 0.07 0.29 0.29 >0.5
Cacodylate pH 6.0 210 0.01 0.20 0.22 >0.5
a Buffers were titrated with 1 M NaOH or 0.5 M H2 SO4 to the
indicated pH values. Data are from Ref. [6].
2.2.3
Data Interpretation
2.3
Fluorescence
A molecule emits fluorescence when, after excitation, an electron returns from the
first excited state (S1 ) back to the ground state (S0 ). At about 108 s, the lifetime of
the excited state is very long compared with the time for excitation, which takes
only about 1015 s (on the human time scale this would be equivalent to the times
of 1 year and 1 second). An excited molecule thus has ample time to relax into
the lowest accessible vibrational and rotational states of S1 before it returns to the
ground state. As a consequence, the energy of the emitted light is always smaller
30 2 Spectroscopic Techniques to Study Protein Folding and Stability
than that of the absorbed light. In other words, the fluorescence emission of a
chromophore always occurs at a wavelength that is higher than the wavelength of
its absorption.
Electrons can revert to the ground state S0 also by nonradiative processes (such
as vibrational transitions), and in some instances transitions to the excited triplet
state T1 occur. The S1 to T1 transition is facilitated by magnetic perturbations,
such as the collision of an activated molecule with paramagnetic molecules or
with molecules with polarizable electrons. This leads to the phenomenon of fluo-
rescence quenching. Good collisional quenchers are iodide ions, acrylamide, or
oxygen. Quenching can also occur intramolecularly, in proteins by the side chains
of Cys and His and by amides and carboxyl groups. All conformational transitions
that change the exposure to quenchers in the solvent or within the protein will
thus lead to fluorescence changes.
Another factor that leads to a decrease in emission is fluorescence resonance
energy transfer (FRET), also called Förster transfer. The efficiency of this energy
transfer between two chromophores depends, among other factors, on the extent
of overlap between the fluorescence spectrum of the donor and the absorption
spectrum of the acceptor, and, in particular, on the distance between donor and
acceptor.
In combination, these radiationless routes of deactivation compete with light
emission, and therefore the quantum yields (f F ) of most chromophores are much
smaller than 1. f F is equal to the ratio of the emitted to the absorbed photons. Its
maximal value is thus 1. Good introductions into the principles of fluorescence of
biological samples are found in Refs [2, 12, 13].
2.3.1
The Fluorescence of Proteins
The fluorescence of proteins originates from the phenylalanine, tyrosine, and tryp-
tophan residues. Emission spectra for the three aromatic amino acids are shown in
Figure 2.4, and their absorption and emission properties are summarized in Table
2.1. The excitation spectra correspond to the respective absorption spectra (Figure
2.2).
The fluorescence intensity of a particular chromophore depends on both its ab-
sorption at the excitation wavelength and its quantum yield at the wavelength of
the emission. The product e max f F is thus a good parameter to characterize the
fluorescence ‘‘sensitivity’’ of this chromophore. In proteins that contain all three
aromatic amino acids, fluorescence is usually dominated by the contribution of
the tryptophan residues because its sensitivity parameter is 730, which is much
higher than the corresponding values which are 200 for tyrosine and only 4 for
phenylalanine (Table 2.1). In proteins, phenylalanine fluorescence is practically
not observable, because the other two aromatic residues absorb strongly around
280 nm, where phenylalanine emits, and because its emission is almost completely
transferred to tyrosine and tryptophan by FRET.
The fluorescence of tyrosines is often undetectable in proteins that also contain
2.3 Fluorescence 31
Fig. 2.4. Fluorescence spectra of the aromatic 1 mM tryptophan were used. Excitation was at
amino acids in 0.01 M potassium phosphate 257 nm (Phe), 274 nm (Tyr), and 278 nm
buffer, pH 7.0 at 25 C. For the measurements, (Trp). The figure is taken from Ref. [6].
100 mM phenylalanine, 6 mM tyrosine, and
tryptophan, (i) because tryptophan shows a much stronger emission than tyrosine
(Table 2.1), (ii) because in folded proteins tryptophan emission is often shifted to
lower wavelengths and thus masks the contribution of tyrosine at 303 nm, and
(iii) because FRET can occur from tyrosine to tryptophan in the compact native
state of a protein [13, 14].
2.3.2
Energy Transfer and Fluorescence Quenching in a Protein: Barnase
(a)
(b)
Fig. 2.5. a) The structure of barnase showing buffer, pH 5.5 at the same protein concen-
the locations of the three Trp residues and of tration of 4 mM. Under these conditions His18
His18. b) Fluorescence emission spectra of is protonated. The maximal emission of the
barnase and the variants with single mutations wild-type protein (W.T.) is set as 100. The
of His18 and of its three Trp residues. All figure is taken from Ref. [15].
spectra were recorded in 10 mM Bis/Tris
2.3 Fluorescence 33
2.3.3
Protein Unfolding Monitored by Fluorescence
The extended lifetime of the excited state is a key to understanding the changes
that occur during a protein folding transition. As described above, a broad range
of interactions or perturbations can influence the fluorophore while it is in the
activated state. In particular, singulet-to-triplet inter system crossing can occur, or
energy transfer to an acceptor. Fluorescence emission is consequently much more
sensitive to changes in the environment of the chromophore than absorption.
The changes in fluorescence during the unfolding of a protein are often very
large. In proteins that contain tryptophan both shifts in wavelength and changes
in intensity are usually observed. Depending on the environment in the folded pro-
tein the tryptophan emission of a native protein can be greater or smaller than the
emission of free tryptophan in aqueous solution, and fluorescence can increase or
decrease upon unfolding, depending on the protein under investigation. The emis-
sion maximum of solvent-exposed tryptophan is near 350 nm, but in a hydropho-
bic environment, such as in the interior of a folded protein, tryptophan emission
occurs at lower wavelengths (indole shows an emission maximum of 320 nm in
hexane [16]). As an example the emission spectra of native and of unfolded ribo-
nuclease T1 are shown in Figure 2.6. As mentioned, ribonuclease T1 contains
nine tyrosine residues and only one tryptophan (Trp59), which is inaccessible to
solvent in the native protein [17].
The fluorescence of the tryptophan residues can be investigated selectively by ex-
citation at wavelengths higher than 295 nm. Tyrosine does not absorb above 295
nm because its absorbance spectrum is blue shifted (as compared with tryptophan,
Figure 2.2) and is thus not excited at 295 nm. A comparison of the emission spec-
34 2 Spectroscopic Techniques to Study Protein Folding and Stability
100
a b
Relative Fluorescence
Relative fluorescence
80 30
60
20
40
10
20
0 0
300 320 340 360 380 400 300 320 340 360 380 400
Wavelength (nm) Wavelength (nm)
Fig. 2.6. Fluorescence emission spectra of at a) 278 nm and b) 295 nm. The bandwidths
native (——) and of unfolded (-------) ribo- were 3 nm for excitation and 5 nm for
nuclease T1. Native ribonuclease T1 (1.4 mM) emission. Spectra were recorded at 25 C in
was in 0.1 M sodium acetate pH 5.0, the 1 1 cm cells in a Hitachi F-4010 fluorimeter.
sample of unfolded protein contained 6.0 M The figure is taken from Ref. [6].
GdmCl in addition. Fluorescence was excited
tra observed after excitation at 280 nm and at 295 nm gives information about the
contributions of the tryptophan and the tyrosine residues to the fluorescence of a
protein. The data for ribonuclease T1 in Figure 2.6 show that the shapes of the flu-
orescence spectra observed after excitation at 278 nm and 295 nm are virtually
identical. The measured emission originates almost completely from the single
Trp59, which is inaccessible to solvent and hence displays a strongly blue-shifted
emission maximum near 320 nm. Tyrosine emission is barely detectable in the
spectrum of the native protein, because energy transfer to tryptophan occurs. Un-
folding by GdmCl results in a strong decrease in tryptophan fluorescence and a
concomitant red shift of the maximum to about 350 nm. The distances between
the tyrosine residues and Trp59 increase, and energy transfer becomes less effi-
cient. As a consequence, the tyrosine fluorescence near 303 nm becomes visible
in the spectrum of the unfolded protein when excited at 278 nm (Figure 2.6a), but
not when excited at 295 nm (Figure 2.6b). The examples in Figure 2.6 indicate that,
unlike the absorbance changes, the fluorescence changes upon folding can be very
large and the contribution of the tryptophan residues can be studied selectively by
changing the excitation wavelength. These features, together with its very high sen-
sitivity, make fluorescence measurements extremely useful for following conforma-
tional changes in proteins.
Multiple emission bands do not necessarily originate from different tryptophan
residues of a folded protein. Figure 2.6 shows that even single tryptophan residues,
such as Trp59 of ribonuclease T1 can give rise to several emission bands.
In contrast to ribonuclease T1, where we found a fivefold decrease of tryptophan
fluorescence, the fluorescence of the FK 506 binding protein FKBP12 increases
about 12-fold upon unfolding [18]. Figure 2.7 shows fluorescence spectra recorded
2.3 Fluorescence 35
(a)
(b)
between 0 M and 7.1 M urea. Incidentally both ribonuclease T1 and FKBP12 have
a single tryptophan at position 59. The emission of Trp59 of FKBP12 is barely de-
tectable in the folded protein (Figure 2.7), but as soon as the protein enters the
transition zone (around 3 M urea, cf. inset a in Figure 2.7) the fluorescence in-
creases strongly and its maximum shifts from 330 nm to 350 nm.
In the experiments shown in Figure 2.7 the fluorescence of the tyrosine residues
of FKBP12 was suppressed by exciting Trp59 selectively by light of 294 nm. At
this wavelength tyrosine absorption is almost zero (cf. Figure 2.2). In addition, at
294 nm the absorbance of the samples is very low, the inner filter effect (cf. Sec-
tion 2.3.5) becomes negligible, and the fluorescence emission remains linear over
a wide range of protein concentrations (cf. inset b of Figure 2.7).
The fluorescence maximum of tyrosine remains near 303 nm, irrespective of its
molecular environment. Therefore the unfolding of proteins that contain only tyro-
sine is usually accompanied by changes in the intensity, but not in the wavelength
of emission. As an example, the fluorescence emission spectra of the Trp59Tyr
36 2 Spectroscopic Techniques to Study Protein Folding and Stability
20
Relative Fluorescence
15
10
0
300 310 320 330 340
Wavelength (nm)
Fig. 2.8. Fluorescence emission spectra of pH 5.0; the unfolded sample contained
1.5 mM of the native (——) and of the unfolded 6.0 M GdmCl in addition. Fluorescence was
(-------) Trp59Tyr variant of ribonuclease T1. excited at 278 nm, the spectra were recorded
This protein contains 10 Tyr residues. The as in Figure 2.6. The figure is taken from Ref.
native protein was in 0.1 M sodium acetate, [6].
variant of ribonuclease T1 in the native and unfolded states are shown in Figure
2.8. A decreased tyrosine fluorescence in the native state (as in Figure 2.8) is fre-
quently observed. It is thought to originate from hydrogen bonding of the tyrosyl
hydroxyl group and/or the proximity of quenchers, such as disulfide bonds in the
folded state [19].
2.3.4
Environmental Effects on Tyrosine and Tryptophan Emission
The fluorescence of the exposed aromatic amino acids of a protein depends on the
solvent conditions. Solvent additives that promote the S1 ! T1 transition (such as
acrylamide or iodide) quench the fluorescence, and this effect can be used to mea-
sure the exposure of tryptophan side chains to the solvent [20].
The fluorescence generally decreases with increasing temperature. This decrease
is substantial. To a first approximation, tyrosine and tryptophan emission decrease
b1% per degree increase in temperature [6]. Fluorescence is therefore generally
not suited for following the heat-induced unfolding of a protein.
Denaturants such as GdmCl and urea also influence the fluorescence of tyrosine
and tryptophan. The dependences on the concentration of GdmCl and urea of tryp-
tophan and tyrosine emission are given in Ref. [6]. As in absorbance experiments
(cf. Section 2.2.1) the emission of the unfolded protein can be represented well by
an appropriate mixture of tyrosine and tryptophan. The dependence of the emis-
sion of this mixture on denaturant concentration is often very useful to determine
the baseline equivalent to the unfolded protein for the quantitative analysis of un-
folding transition curves.
2.3 Fluorescence 37
2.3.5
Practical Considerations
length is smaller than 0.1 absorbance units. Additional practical advice on how to
measure protein fluorescence is found in Ref. [6].
2.4
Circular Dichroism
Circular dichroism (CD) is a measure for the optical activity of asymmetric mole-
cules in solution and reflects the unequal absorption of left-handed and right-
handed circularly polarized light. The CD signals of a particular chromophore are
therefore observed in the same spectral region as its absorption bands. A good in-
troduction to the basic principles of CD is provided by Refs [2] and [21].
Proteins show CD bands in two spectral regions. The CD in the far UV (170–
250 nm) originates largely from the dichroic absorbance of the amide bonds, and
therefore this region is usually termed the amide region. The aromatic side chains
also absorb in the far UV and accordingly they also contribute to the CD in this re-
gion. These contributions may dominate the spectra of proteins that show a weak
amide CD and/or a high content of aromatic residues. The CD bands in the near
UV (250–300 nm) originate from the aromatic amino acids and from small contri-
butions of disulfide bonds. Figure 2.9 shows the CD spectra of pancreatic ribonu-
clease for the native and the denaturant-unfolded forms.
The two spectral regions give different kinds of information about protein struc-
ture. The CD in the amide region reports on the backbone (i.e., the secondary)
structure of a protein and is used to characterize the secondary structure and
changes therein. In particular the a-helix displays a strong and characteristic CD
spectrum in the far-UV region. The contributions of the other elements of second-
ary structure are less well defined. The content of the different secondary struc-
tures in a protein can be calculated from its CD spectrum in the amide regions.
Methods for these calculations are found in Ref. [22].
The aromatic residues have planar chromophores and are intrinsically symmet-
ric. When they are mobile, such as in short peptides or in unfolded proteins, their
CD is almost zero. In the presence of ordered structure, such as in a folded pro-
tein, the environment of the aromatic side chains becomes asymmetric, and there-
fore they show CD bands in the near UV. The signs and the magnitudes of the
aromatic CD bands cannot be calculated; they depend on the immediate struc-
tural and electronic environment of the immobilized chromophores. Therefore the
individual peaks in the complex near-UV CD spectrum of proteins usually cannot
be assigned to specific amino acid side chains. However, the near-UV CD spectrum
represents a highly sensitive criterion for the native state of a protein. It can thus
be used as a fingerprint of the correctly folded conformation.
2.4.1
CD Spectra of Native and Unfolded Proteins
Figure 2.10 shows representative spectra in the far-UV region for all-helical (a), for
all-b (b), and for unstructured proteins (c). Helical proteins show rather strong CD
2.4 Circular Dichroism 39
Fig. 2.9. CD spectra of pancreatic was in the same buffer plus 6.0 M GdmCl.
ribonuclease (from red deer) at 10 C in the a) CD in the aromatic region. The protein
native (——) and in the unfolded state (-------). concentration was 78 mM in a 1-cm cell.
The native protein was in 0.1 M sodium b) CD in the peptide region. 28 mM protein in
cacodylate/HCl, pH 6.0, the unfolded protein a 0.1-cm cell. The figure is taken from Ref. [6].
bands with minima near 222 and 208 nm and a maximum near 195 nm. b pro-
teins show weaker and more dissimilar spectra. The shape of the CD spectrum of
a b protein depends, among other factors, on the length and orientation of the
strands and on the twist of the sheet. Most spectra show, however, a positive ellip-
ticity between 190 and 200 nm. In proteins with helices and sheets usually the
helical contributions dominate the CD spectrum. Unordered proteins and peptides
show weak CD signals above 210 nm, but a pronounced minimum between 195
and 200 nm.
40 2 Spectroscopic Techniques to Study Protein Folding and Stability
The strong difference near 195 nm between the CD of folded and unfolded pro-
teins should, in principle, provide an ideal probe for monitoring protein unfolding
transitions. In practice, however, most unfolding reactions are monitored near
220 nm. Wavelengths below 200 nm are almost never used, because most buffers
and virtually all denaturants absorb strongly below 200 nm (Table 2.2), and there-
fore the advantage of having a strong change in signal is more than offset by the
very high background absorbance of the solvent.
The aromatic amino acid side chains absorb not only in the near-UV, but also in
the far-UV region [23]. Accordingly, in native proteins they also contribute to the
CD in the 180–230 nm region, which has traditionally been ascribed to only the
amide bonds in the protein backbone. The aromatic contributions to the far-UV
CD are often modest, and for helical proteins it is usually adequate to neglect
them in the analyses, because the CD of helices is very strong (cf. Figure 2.10,
top). For b proteins with their small and diverse CD signals (cf. Figure 2.10,
middle), the aromatic bands can lead to problems and therefore the contents of b
structure as calculated from the CD spectra can be seriously wrong. Carlsson and
coworkers employed a mutational approach to analyze the contributions of the
seven tryptophan residues to the amide CD spectrum of carbonic anhydrase [24].
They found strong contributions of these tryptophan residues to the far-UV spec-
trum of this protein.
2.4.2
Measurement of Circular Dichroism
The difference between the absorbances of right- and left-handed circularly polar-
ized light of a protein sample is extremely small. In the far-UV region it lies in the
range of 104 –106 absorbance units in samples with a total absorbance of about
1.0. This requires that less than 0.1% of the absorbance signal be measured accu-
rately and reproducibly. Therefore highly sensitive instruments are necessary and
careful sample preparation is important.
CD instruments use a high-frequency photoelectric modulator to generate alter-
natively the two circularly polarized light components. This leads to an alternating
current contribution to the photomultiplier signal that is proportional to the CD of
the sample. A detailed description of the principles of CD measurements is found
in Refs [6] and [25].
Since the measured CD is so small, an extremely high stability of the CD signal
is required. In general the instrument should be allowed to warm up for 30–60
min, and the stability of the baseline with and without the cuvette should be
checked regularly.
When CD is measured in the far-UV region particular attention should be paid
to a careful selection of solvents and buffers. The contributions of buffers and salts
to the total absorbance of the sample should be as small as possible. This poses
severe restrictions on the choice of solvents for CD measurements because many
commonly used salts and buffers absorb strongly in the far-UV region. Absorbance
values for several of them are given in Table 2.2. Common denaturants, such as
urea and GdmCl also absorb strongly in the far UV. Therefore they cannot be
42 2 Spectroscopic Techniques to Study Protein Folding and Stability
used for CD measurements below 210 nm. Oxygen also absorbs in the far UV, but
precautions such as purging the cell compartment with nitrogen is necessary only
when measurements are performed below 190 nm.
Protein concentration, pathlength and solvent must be carefully selected to ob-
tain good CD spectra and to avoid artifacts. A good signal-to-noise ratio is achieved
when the total absorbance is around 1.0. The magnitude of the CD depends on the
protein concentration and therefore its contribution to the total optical density of
the sample should be as high as possible, and the absorbance of the solvent should
be as small as possible. Therefore, for measurements in the far-UV region, short
path lengths (0.2–0.01 cm) and increased protein concentrations should be used.
Cells of 0.1 cm are usually sufficient for measurements down to about 200 nm.
When preparing samples for CD measurements it should be remembered that pro-
tein absorbance is about tenfold higher in the far UV when compared with the
aromatic region. Artifacts can arise easily from an excessive absorbance of the sam-
ple or from light scattering caused by protein aggregates in the sample. Whenever
the optical density of the sample becomes too high, the instrument cannot discrim-
inate properly between the right and the left circularly polarized light components,
and the CD signal decreases in size. This phenomenon, called absorption flatten-
ing, is a serious problem below 200 nm. It is good practice to ascertain that the
measured CD is linearly related with path length and protein concentration.
2.4.3
Evaluation of CD Data
It should be noted that the concentration standards are different for ½Y and De. De
is the differential absorbance of a 1 mol L1 solution in a 1-cm cell, whereas ½Y is
the rotation in degrees of a 1 dmol cm3 solution and a path length of 1 cm.
CD of proteins in the amide region is usually given as mean residue ellipticity,
½YMRW , which is based on the concentration of the sum of the amino acids in the
protein solution under investigation. The concentration of residues is obtained
by multiplying the molar protein concentration with the number of amino acids.
For data in the aromatic region, different units are used in the literature. Fre-
quently, aromatic CD is given as De, but ½Y, based on the protein concentration,
and ½YMRW , based on the residue concentration are also found.
The molar ellipticity, ½Y, or the residue ellipticity, ½YMRW , are calculated from
the measured Y (in degrees) by using Eqs (4) or (5):
References 43
Y 100 Mr
½Y ¼ ð4Þ
cl
Y 100 Mr
½YMRW ¼ ð5Þ
c l NA
References
3
Denaturation of Proteins by Urea and
Guanidine Hydrochloride
C. Nick Pace, Gerald R. Grimsley, and J. Martin Scholtz
3.1
Historical Perspective
The ability of urea to denature proteins has been known since 1900 [1]. Urea is
also effective in denaturing more complex biological systems: ‘‘A dead frog placed
in saturated urea solution becomes translucent and falls to pieces in a few hours’’
[2]. (It was not reported if frog denaturation is reversible!) The even greater effec-
tiveness of guanidine hydrochloride (guanidinium chloride (GdmCl)) as a protein
denaturant was first reported in 1938 by Greenstein [3]. In this chapter we discuss
why urea and GdmCl have proven to be so useful to protein chemists and summa-
rize what we have learned over the past 100 years.
Tanford [4] was the first to study quantitatively the unfolding of proteins by urea.
The diagram in Figure 3.1 is from Tanford’s paper; it shows that when a protein
unfolds many nonpolar side chains and peptide groups that were buried in the
folded protein are exposed to solvent in the unfolded state. Figure 3.1 also gives
the measured values of the free energy of transfer, DGtr , of a leucine side chain
and a peptide group from water to the solvents shown. Note that DGtr of both are
negative in the presence of urea and GdmCl. This is true of almost all the constit-
uent groups of a protein, and explains why urea and GdmCl denature proteins,
and why GdmCl is a stronger denaturant than urea. It is also clear why both folded
and unfolded proteins are generally more soluble in urea and GdmCl solutions
than they are in water. Most of this chapter will focus on urea denaturation, how-
ever GdmCl denaturation will be discussed where appropriate.
3.2
How Urea Denatures Proteins
Fig. 3.1. Schematic from Tanford [4] to solvent and the open symbols represent
illustrating the changes in accessibility of groups that are not accessible to solvent. See
peptide groups (rectangles) and side chains Wang and Bolen [88] for osmolyte, and Pace
(circles) on protein unfolding. The closed et al. [56] for denaturant, DGtr values.
symbols represent groups that are accessible
into water through hydrogen bonding and there is great similarity in the spatial
distribution of water around water, urea around water, urea around urea, and water
around urea. They estimate that each urea forms A5.7 hydrogen bonds. In addi-
tion, urea hydrogen bonds with itself, forming chains and clusters in water. (Using
a similar approach, Mason et al. [12] show that the Gdm ion has no recognizable
hydration shell and is one of the most weakly hydrated cations yet characterized.
The Gdm ion can form only three undistorted hydrogen bonds with water.) Urea
is nearly three times larger than water, and can form more hydrogen bonds. This
may account for the favorable DGtr for the transfer of peptide groups from water to
urea solutions. This question is considered experimentally and theoretically in Zou
et al. [7].
Why DGtr values are favorable for the transfer of nonpolar side chains from
water to urea solutions was more difficult to explain. Muller [13] suggested that
urea molecules replace some of the water molecules in the hydration shell formed
around nonpolar solutes, and this changes the hydrogen bonding and van der
Waals interactions in such a way as to increase the solubility of the nonpolar sol-
3.2 How Urea Denatures Proteins 47
ute. This idea is supported by the experimental results of Soper et al. [5], and by
the theoretical calculations of Laidig and Daggett [14].
In the absence of denaturant, Timasheff states [15]: ‘‘The fact is that there is no
rigid shell of water around a protein molecule, but rather there is a fluctuating
cloud of water molecules that are thermodynamically affected more or less strongly
by the protein molecule.’’ Crystallographers are able to see some water molecules
that are bound tightly enough to be observed, but the fact that no water molecules
are observed near many polar groups shows that they do not see them all [16].
When crystal structures are determined in the presence of urea and GdmCl, some
denaturant molecules are seen bound to the protein, mainly by hydrogen bonds
[17, 18]. These denaturant molecules sometimes link adjacent polar groups on
the protein and reduce the conformational flexibility, as shown by a decrease in
the temperature factors.
There is also experimental evidence that denaturants (and water) reduce the flex-
ibility of the denatured state by transiently linking parts of the protein molecule
through hydrogen bonds to polar groups in the protein [19]. Halle’s group has
used magnetic relaxation dispersion techniques (MRD) to gain a better under-
standing of protein hydration at the molecular level (see Chapter 8). For example,
Modig et al. [6] summarize:
. . . the urea 2 H and water 17 O MRD data support a picture of the dena-
tured state where much of the polypeptide chain participates in clusters
that are more compact and more ordered than a random coil but neverthe-
less are penetrated by large numbers of water and urea molecules. These
solvent-penetrated clusters must be sufficiently compact to allow side-
chains from different polypeptide segments to come into hydrophobic
contact, while, at the same time, permitting solvent molecules to interact
favorably with peptide groups and with charged and polar side chains.
The exceptional hydrogen-bonding capacity and small size of water and
urea molecules are likely essential attributes in this regard. In such clusters,
many water and urea molecules will simultaneously interact with more
than one polypeptide segment and their rotational motions will therefore
be more strongly retarded than at the surface of the native protein.
Schellman [22] has extended his previous studies to show clearly how osmolytes
such as trimethylamine-N-oxide (TMAO) or denaturants such as urea act to stabi-
lize or destabilize proteins. The two key factors are the excluded volumes of the co-
solvents and their contact interactions with the protein. Because the molecules of
interest are all larger than water, the excluded volume effect will always favor the
native state. In contrast, contact interactions will generally favor the denatured
state. For urea, the contact interaction term is greater than the excluded volume ef-
fect, so urea acts as a protein denaturant. For osmolytes, the reverse is true and
osmolytes stabilize proteins. Schellman [24] was the first to show that because the
direct interaction of urea molecules with proteins is so weak, denaturation should
be represented by a solvent exchange mechanism rather than solely by the binding
of the denaturant to the protein molecule. His K av 1 term is a measure of the
contact interaction for the average site on a protein. For RNase T1, this term is
0.12 for TMAO, 0.22 for urea, and 0.40 for guanidinium ion. The larger value of
the contact term reflects the greater preferential binding of the Gdm ion compared
to urea, and this is what makes GdmCl a better denaturant than urea [22].
In summary, it is now clear that urea denatures proteins because there is a pref-
erential interaction with urea compared with water, and the denatured state has
more sites for interaction. Consequently, proteins unfold as the urea concentration
is increased.
3.3
Linear Extrapolation Method
A typical urea denaturation curve is shown in Figure 3.2A. When a protein unfolds
by a two-state mechanism, the equilibrium constant, K, can be calculated from the
experimental data using:
where ð yÞ is the observed value of the parameter used to follow unfolding, and
ð yÞN and ð yÞD are the values ð yÞ would have for the native state and the denatured
state under the same conditions where ð yÞ was measured. In the original analyses
of urea denaturation curves [25, 26], log K was found to vary linearly as a function
of log [urea] and the slope of the plot was denoted by n, and the midpoint of the
curve by (urea)1=2 (where log K ¼ 0). These parameters could then be used to cal-
culate the standard free energy of denaturation
This equation was used by Alexander and Pace [25] to estimate the differences in
3.3 Linear Extrapolation Method 49
Fig. 3.2. A) Urea denaturation curve for calculated from the data in the transition
RNase Sa determined at 25 C, pH 5.0, 30 mM region as described in the text. The DGðH2 OÞ
sodium acetate buffer as described by Pace et and m-values can be determined by fitting the
al. [78]. The solid line is based on an analysis data in B to Eq. (4), or by analyzing the data in
of the data using Eq. (5). B) DG as a function A with Eq. (5).
of urea molarity. The DG values were
stability among three genetic variants of a protein for the first time, and was the
forerunner of the linear extrapolation method.
The linear extrapolation method (LEM) was first used to analyze urea and
GdnHCl denaturation curves by Greene and Pace [27]. When DG is calculated as
50 3 Denaturation of Proteins by Urea and Guanidine Hydrochloride
a function of urea concentration using data such as those shown in Figure 3.2A,
DG is found to vary linearly with urea concentration as shown in Figure 3.2B.
Based on results such as these for several proteins, this equation was proposed
for analyzing the data:
where y F and y U are the intercepts and mF and mU the slopes of the pre- and post-
transition baselines, and DGðH2 OÞ and m are defined by Eq. (4).
The linear extrapolation method is now widely used for estimating the con-
formational stability of proteins and for measuring the difference in stability be-
tween proteins differing slightly in structure. (The original how-to-do-it paper on
solvent denaturation [30] has been cited over 1200 times.) In addition, the m-value
has taken on a life of its own [31]. These topics will be discussed further below.
3.4
DG(H2 O)
Fig. 3.3. A) Thermal denaturation curve for DG values were calculated from data in the
RNase Sa determined at pH 5.0 in 30 mM transition region as described in the text, and
sodium acetate buffer as described [78]. The the data were analyzed to determine Tm and
solid line is based on a nonlinear least squares DHm as described [78]. The DC p value is from
analysis of the data similar to that described Pace et al. [78]. The DG(25 C) value (filled
for Eq. (5), but for thermal denaturation curves circle) was calculated using Eq. (6) which is
[78]. B) DG as a function of temperature. The also shown in B.
52 3 Denaturation of Proteins by Urea and Guanidine Hydrochloride
tain the midpoint of the thermal denaturation curve, Tm , and the enthalpy change
at Tm ; DHm . These two parameters plus the heat capacity change for folding, DC p ,
can then be used in the Gibbs-Helmholtz equation [32]:
Tab. 3.1. Comparison of DGðTÞ values from DSC with DGðH2 OÞ values from UDC.
1.00
0.80
Fraction Native
0.60
0.40
0.20
0.00
0 20 40 60 80
Temperature (°C)
Fig. 3.4. Thermal unfolding curves for ecHPr smaller than the size of the symbol. The urea
in various urea concentrations at pH 7.0 concentrations are 0 M (filled circle), 1 M
(10 mM K/Pi ) as monitored by the change in (open circle), 2 M (filled triangle), 3 M (open
ellipticity at 222 nm and converted to fraction triangle), 3.5 M (filled square), and 4 M (open
folded, f F . The error on any single point is square). From Nicholson and Scholtz [41].
Finally this study as well as a number of others indicates that urea has a
number of advantages over guanidinium chloride as far as thermodynamic
interactions are concerned. The free energy and enthalpy functions of both
proteins and model compounds are more linear; the solutions more ideal;
extrapolations to zero concentration are more certain; and least-squares
analysis of the data is more stable.
One problem is that the ionic strength cannot be controlled with GdmCl, and this
has been shown to affect the results in several studies (see, for example, Ibarra-
Molero et al. [52], Monera et al. [53], Santoro and Bolen [54]).
In summary, the agreement between DGðTÞ values from DSC and thermal dena-
turation studies and DGðH2 OÞ values from urea denaturation studies analyzed by
the LEM suggests that either method can be used to reliably measure the confor-
mational stability of a protein. In addition, we now have a good understanding of
why the LEM works so well.
3.5
m-Values
The thermodynamic cycle in Figure 3.5 shows that the dependence of DG on dena-
turant concentration is determined by the groups in a protein that are exposed to
solvent in the denatured state, D, but not in the native state, N 4 . In support of this,
Myers et al. [55] showed that there is a good correlation between m-values and the
change in accessible surface area on unfolding. Since Dg tr; i values are available for
Fig. 3.5. A thermodynamic cycle connecting exposed in D but not N, and Dg tr; i is the free
protein folding in water and urea. N and D energy of transfer of groups of type i from
denote the native and denatured states, DG water to a given concentration of urea. Da is
and DGðH2 OÞ are defined by Eq. (4), ni is the the average change in exposure of all of the
total number of groups of type i in the protein, peptide groups and uncharged side chains in
ai is fraction of groups of type i that are the protein [56].
56 3 Denaturation of Proteins by Urea and Guanidine Hydrochloride
most of the constituent groups in a protein, the equation at the bottom of Figure
3.5 can be used to calculate the dependence of DG on denaturant concentration.
The m-value is an experimental measure of the dependence of DG on denaturant
concentration. Consequently, for proteins that unfold by a two-state mechanism,
Da can be estimated by comparing the calculated and measured m-values. Thus,
Da is the fraction of buried groups that must become exposed to solvent on unfold-
ing to account for the measured m-value [56]. In the original LEM paper [27], this
approach was used to estimate Da for four proteins. It was clear that differences
among the Da values for individual proteins revealed differences in the accessibility
of the denatured state and that these differences depended to some extent on the
number and location of the disulfide bonds in the protein.
Interest in m-values was stimulated by a series of papers from the Shortle labo-
ratory [57, 58]. (A review of these studies was titled: ‘‘Staphylococcal nuclease:
a showcase of m-value effects.’’ [31].) Their studies of mutants of staphylococcal
nuclease (SNase) showed a wide range of m-values. Those with m-values 5% or
more greater than wild type were designated mþ mutants (A 25%) and those with
m-values 5% or more less than wild type were designated m mutants (A 50%).
Their interpretation was that mþ mutants unfolded to a greater extent than wild
type and vice versa for m mutants. These papers were extremely important be-
cause they focused attention on the denatured state which had largely been ignored
in discussions of protein stability. (Another review from the Shortle lab [58] was
titled: ‘‘The denatured state (the other half of the folding equation) and its role in
protein stability.’’) However, this interpretation is only straightforward if all of the
mutants unfold by a two-state mechanism. Shortle and Meeker [57] chose to as-
sume a two-state mechanism to analyze their results even though the m mutants
did not appear to unfold by a two-state mechanism. It is now clear that some of the
m mutants unfold by a three-state mechanism [59–62]. The presence of an inter-
mediate state will generally lower the m-value [63]. One important consequence of
this is that the estimates of DGðH2 OÞ will be too low when a two-state mechanism
is assumed and the mechanism is, in fact, three-state. In the case of the m
mutant V66L þ G88V þ G79S, a two-state analysis indicates that this mutant is
3.0 kcal mol1 LESS stable than wild type [57], but a three-state analysis indicates
that that this mutant is 0.8 kcal mol1 MORE stable than wild type [60]. This is
also true for other m mutants [64]. This means that the DðDGÞ values for the m
mutants are not correct. Since 50% of the SNase mutants are m , this calls into
question the interpretation of many of the DðDGÞ values determined for SNase.
In Table 3.2, we have compiled m and Da values for a selection of proteins. In all
cases, the folding of the proteins is thought to closely approach a two-state mecha-
nism. The value of Da expected for unfolding a protein depends on the model as-
sumed for the denatured state. If the accessibility of the denatured state is modeled
as a tripeptide using the Lee and Richards [65] approach, a value of Da A 0:7 would
be expected. However, if a compact denatured state based on fragments with
native-state conformations is assumed, a Da A 0:53 would be expected [66]. Note
in Table 3.2 that most of the Da values are considerably less than 0.53. This sug-
gests that the urea-denatured states of proteins are less completely unfolded than
in the hypothetical state that Creamer et al. [66] thought might be a lower limit for
3.5 m-Values 57
Tab. 3.2.Change in accessibility ðDaÞ for urea denaturation calculated from the measured
m-values using Tanford’s modela.
3.6
Concluding Remarks
Protein denaturants are used for a variety of purposes by protein chemists when
they need to solubilize or denature proteins. Urea and GdmCl are the denaurants
3.7 Experimental Protocols 59
most often used. The LEM has proven to be a reliable method for measuring
the conformational stability of a protein or the difference in stability between two
proteins that differ slightly in structure. In addition, studies of the m-values of pro-
teins, especially SNase, have focused attention on the denatured state and we are
slowly gaining a better understanding of the denatured state ensemble.
3.7
Experimental Protocols
3.7.1
How to Choose the Best Denaturant for your Study
Both urea and GdmCl will increase the solubility of the folded and unfolded states
of a protein. If the denaturant is just going to be used to increase the solubility or
unfold your protein of interest, GdmCl is preferred since it is a more potent dena-
turant and is chemically stable. Guanidinium thiocyanate is an even more potent
denaturant than GdmCl and can be used for the same purposes. (The reason for
the greater potency is discussed in Mason et al. [12].)
For quantitative studies of protein stability, we previously preferred results from
urea and GdmCl unfolding over thermal unfolding studies for several reasons: the
product of unfolding seemed to be better characterized, unfolding was more likely
to closely approach a two-state mechanism, and unfolding was more likely to be
completely reversible. However, in the cases that have been carefully investigated,
the two techniques seem to give estimates of DGðH2 OÞ that are in good agreement
[36, 70]. After getting the equipment set up and learning the procedures, either
type of unfolding curve can be determined and analyzed in a single day. Conse-
quently, the safest course is to determine both types of unfolding curves whenever
possible. Also, we generally prefer urea over GdmCl because salt effects can be in-
vestigated, and sometimes reveal interesting information [46, 51, 71, 72].
3.7.2
How to Prepare Denaturant Solutions
Both urea and GdmCl can be purchased commercially in highly purified forms.
However, some lots of these denaturants may contain fluorescent or metallic im-
purities. Methods are available for checking the purity of GdmCl and for recrystal-
lization if necessary [73]. A procedure for purifying urea has also been described
[74]. GdmCl solutions are stable for months, but urea solutions slowly decompose
to form cyanate and ammonium ions in a process accelerated at high pH [75]. The
cyanate ions can react with amino groups on proteins [76]. Consequently, a fresh
urea stock solution should be prepared for each unfolding curve and used within
the day.
Table 3.3 summarizes useful information for preparing urea and GdmCl stock
solutions. We prepare urea stock solutions by weight, and then check the concen-
tration by refractive index measurements using the equation given in Table 3.3.
The preparation of a typical urea stock solution is outlined below. Since GdmCl is
60 3 Denaturation of Proteins by Urea and Guanidine Hydrochloride
Tab. 3.3. Information for preparing urea and GdmCl stock solutions.
solution and water (or buffer) at the sodium D line. The equation for
urea solutions is based on data from Warren and Gordon [86], and the
equation for GdmCl solutions is from Nozaki [73].
1. Add A60 g of urea to a tared beaker and weigh (59.91 g). Now add 0.69 g of
MOPS buffer (sodium salt), 1.8 mL of 1 M HCl and A52 mL of distilled water
and weigh the solution again (114.65 g).
2. Allow the urea to dissolve and check the pH. If necessary, add a weighed
amount of 1 M HCl to adjust to pH 7.0.
3. Prepare 30 mM MOPS buffer, pH 7.0.
4. Determine the refractive index of the urea stock solution (1.4173) and of the
buffer (1.3343). Therefore, DN ¼ 1:4173 1:3343 ¼ 0:0830.
5. Calculate the urea molarity from DN using the equation given in Table 3.3:
M ¼ 10:08.
6. Calculate the urea molarity based on the recorded weights. The density is calcu-
lated with the equation given in Table 3.1: weight fraction area ðWÞ ¼ 59:91=
114:65 ¼ 0:5226; therefore d=d 0 ¼ 1:148. Therefore volume ¼ 114:65=1:148 ¼
99:88 mL. Therefore urea molarity ¼ 59:91=60:056=:09988 ¼ 9:99 M.
3.7.3
How to Determine Solvent Denaturation Curves
Figure 3.2A shows a typical urea unfolding curve for RNase Sa using CD to follow
unfolding. We will refer to the physical parameter used to follow unfolding as y
3.7 Experimental Protocols 61
for the following discussion. The curves can be conveniently divided into three
regions: (i) The pretransition region, which shows how y for the folded protein,
y F , depends upon the denaturant. (ii) The transition region, which shows how y
varies as unfolding occurs. (iii) The posttransition region, which shows how y for
the unfolded protein, y U , varies with denaturant. All of these regions are impor-
tant for analyzing unfolding curves. As a minimum, we recommend determining
four points in the pre- and posttransition regions, and five points in the transition
region. Of course, the more points determined the better defined the curve.
With thermodynamic measurements, it is essential that equilibrium is reached
before measurements are made, and that the unfolding reaction is reversible. The
time required to reach equilibrium can vary from seconds to days, depending upon
the protein and the conditions. For example, the unfolding of RNase T1 requires
hours to reach equilibrium at 25 C, as compared to only minutes for RNase Sa.
For any given protein, the time to reach equilibrium is longest at the midpoint of
the transition and decreases in both the pre- and posttransition regions. To ensure
that equilibrium is reached, y is measured as a function of time to establish the
time required to reach equilibrium.
To test the reversibility of unfolding, allow a solution to reach equilibrium in the
posttransition region and then by dilution return the solution to the pretransition
region and measure y. If the reaction is reversible, the value of y measured after
complete unfolding should be within G5% of that determined directly. For many
proteins, unfolding by urea and GdmCl is completely reversible. We have left
RNase T1 in 6 M GdmCl for over 3 months and found that the protein will refold
completely on dilution.
Proteins containing free sulphydryl groups present special problems. If the pro-
tein contains only free aSH groups and no disulfide bonds, then a reducing agent
such as 10 mM dithiothreitol (DTT) can be added to ensure that no disulfide bonds
form during the experiments. For proteins containing both free aSH groups and
disulfide bonds, disulfide interchange can occur and this may lead to irreversibility.
Disulfide interchange can be minimized by working at low pH [77].
If the unfolding of the protein of interest requires considerable time to reach
equilibrium, then each point shown in Figure 3.2A will have to be determined
from a separate solution, prepared volumetrically using the best available pipettes.
A fixed volume of protein stock solution is mixed with the appropriate volumes of
buffer, and urea or GdmCl stock solutions. The protein and buffer solutions are
prepared by standard procedures. After the solutions for measurement have been
prepared (see the protocol below), they are incubated until equilibrium is reached
at the temperature chosen for determining the unfolding curve. After the measure-
ments have been completed, it is good practise to measure the pH of the solutions
in the transition region. If the protein equilibrates quickly, as is the case with
RNase Sa, titration methods can be used to determine the denaturation curve.
This was the method used to obtain the curve in Figure 3.2A. This protocol is de-
scribed in detail in Pace [78] and Scholtz [40]. If the amount of protein is limited,
instruments have been developed to follow the unfolding of a protein simultane-
ously with several spectral techniques using a single protein solution [79].
62 3 Denaturation of Proteins by Urea and Guanidine Hydrochloride
Urea b (mL) Buffer c (mL) Protein d (mL) Urea e (M) y 234 (mdeg) f
buffer)).
f Circular dichroism measurements at 234 nm using an Aviv 62DS
spectropolarimeter.
5. Plot these results to determine if any additional points are needed. If so, prepare
the appropriate solutions and make the measurements as described for the orig-
inal solutions.
6. Measure the pH of the solutions in the transition region.
7. Analyze the results as described below.
f U ¼ ð y F yÞ=ð y F y U Þ ðA1Þ
The equilibrium constant, K, and the free energy change, DG, can be calculated
using
and
where R is the gas constant (1.987 cal mol1 K1 or 8.3144 J K1 mol1 ) and T is
the absolute temperature. Values of y F and y U in the transition region are obtained
by extrapolating from the pre- and posttransition regions in Figure 3.2A. Usually
y F and y U are linear functions of denaturant concentration, and a least-squares
analysis is used to determine the linear expressions for y F and y U .
The calculation of f U ; K, and DG from the data in Figure 3.2A is illustrated in
Table 3.5. Values of K can be measured most accurately near the midpoints of
the curves, and the error becomes substantial for values outside the range 0.1–10.
Consequently, we generally only use DG values within the range G1.5 kcal mol1 .
See Eftink and Ionescu [33] and Allan and Pielak [80] for discussions of the prob-
lems most frequently encountered in analyzing solvent and thermal denaturation
curves.
As discussed above, the simplest method of estimating the conformational sta-
bility in the absence of urea, DGðH2 OÞ, is to use the LEM, which typically gives
results such as those shown in Figure 3.2B. We strongly recommend that values
of DGðH2 OÞ, m, and [D]1=2 be given in any study of the unfolding of a protein by
64 3 Denaturation of Proteins by Urea and Guanidine Hydrochloride
urea or GdmCl. Remember also that if the LEM is used, one can apply Eq. (5) and
use nonlinear least squares [29] to conveniently obtain the unfolding parameters.
3.7.4
Determining Differences in Stability
The midpoints of urea, (urea)1=2 , and thermal, Tm , unfolding curves can be de-
termined quite accurately and do not depend to a great extent on the unfolding
mechanism [30]. In contrast, measures of the steepness of urea unfolding curves,
m, cannot be determined as accurately, and deviations from a two-state folding
mechanism will generally change these values. Consequently, differences in stabil-
ity determined by comparing the DGðH2 OÞ values can have large errors. However,
when comparing completely different proteins or forms of a protein that differ
markedly in stability, no other choice is available. This approach is illustrated by
the first column of DðDGÞ values in Table 3.6. There is also danger in drawing con-
clusions about the conformational stabilities of unrelated proteins based solely on
the midpoints of their unfolding curves. For example, lysozyme and myoglobin
have similar DGðH2 OÞ values at 25 C, but a much higher concentration of GdmCl
is needed to denature lysozyme because the m-value is much smaller. Likewise,
lysozyme and cytochrome c are unfolded at about the same temperature, even
though lysozyme has a much larger value of DGðH2 OÞ at 25 C [81].
A second approach that can be used to determine differences in stability of sin-
gle-site mutants is to take the difference between [D]1=2 values and multiply this by
the average of the m-values. This is illustrated by the DðDGÞ values in the last col-
umn of Table 3.6. The rationale here is that the error in measuring the m-values
should generally be greater than any differences resulting from the effect of small
changes in structure on the m-value. However, substantial differences in m-values
between proteins differing by only one amino acid in sequence have been observed
with some proteins, i.e., SNase, so one must use caution when applying this
method.
Acknowledgments
This work was supported by grants GM-37039 and GM-52483 from the National
Institutes of Health (USA), and grants BE-1060 and BE-1281 from the Robert A.
Welch Foundation. We thank many colleagues and coworkers over the years that
have made important contributions to the field of protein folding.
References
6 Modig, K., Kurian, E., Prendergast, molecular structure. Curr Opin Struct
G., and Halle, B. (2003). Water and Biol 4, 770–776.
urea interactions with the native and 17 Pike, A. C. W. and Acharya, K. R.
unfolded forms of a b-barrel protein. (1994). A structural basis for the
Protein Sci 12, 2768–2781. interaction of urea with lysozyme.
7 Zou, Q., Bennion, B. J., Daggett, V., Protein Sci 3, 706–710.
and Murphy, K. P. (2002). The 18 Dunbar, J., Yennawar, H. P.,
molecular mechanism of stabilization Banerjee, S., Luo, J., and Farber, G.
of proteins by TMAO and its ability to K. (1997). The effect of denaturants on
counteract the effects of urea. J Am protein structure. Protein Sci 6, 1727–
Chem Soc 124, 1192–1202. 1733.
8 Tsai, J., Gerstein, M., and Levitt, M. 19 Makhatadze, G. I. and Privalov,
(1996). Keeping the shape but P. L. (1992). Protein interactions with
changing the charges: A simulation urea and guanidinium chloride. A
study of urea and its iso-steric analogs. calorimetric study. J Mol Biol 226,
J Chem Phys 104, 9417–9430. 491–505.
9 Tirado-Rives, J., Orozco, M., and 20 Timasheff, S. N. and Xie, G. (2003).
Jorgensen, W. L. (1997). Molecular Preferential interactions of urea with
dynamics simulations of the unfolding lysozyme and their linkage to protein
of barnase in water and 8 M aqueous denaturation. Biophys Chem 105, 421–
urea. Biochemistry 36, 7313–7329. 448.
10 Sokolic, F. I. A. (2002). Concentrated 21 Courtenay, E. S., Capp, M. W.,
aqueous urea solutions: A molecular Saecker, R. M., and Record, M. T.,
dynamics study of different models. J Jr. (2000). Thermodynamic analysis of
Chem Phys 116, 1636–1646. interactions between denaturants and
11 Bennion, B. J. and Daggett, V. protein surface exposed on unfolding:
(2003). The molecular basis for the interpretation of urea and
chemical denaturation of proteins by guanidinium chloride m-values and
urea. Proc Natl Acad Sci USA 100, their correlation with changes in
5142–5147. accessible surface area (ASA) using
12 Mason, P. E., Neilson, G. W., preferential interaction coefficients
Dempsey, C. E., Barnes, A. C., and and the local-bulk domain model.
Cruickshank, J. M. (2003). The Proteins Suppl 4, 72–85.
hydration structure of guanidinium 22 Schellman, J. A. (2003). Protein
and thiocyanate ions: implications for stability in mixed solvents: a balance
protein stability in aqueous solution. of contact interaction and excluded
Proc Natl Acad Sci USA 100, 4557– volume. Biophys J 85, 108–125.
4561. 23 Goldenberg, D. P. (2003).
13 Muller, N. (1990). A model for the Computational simulation of the
partial reversal of hydrophobic statistical properties of unfolded
hydration by addition of a urea-like proteins. J Mol Biol 326, 1615–1633.
cosolvent. J Phys Chem 94, 3856–3859. 24 Schellman, J. A. (1987). Selective
14 Laidig, K. E. and Daggett, V. (1996). binding and colvent denaturation.
Testing the modified hydration-shell Biopolymers 26, 549–559.
hydrogen-bond model of hydrophobic 25 Alexander, S. S. and Pace, C. N.
effects using molecular dynamics (1971). A comparison of the denatura-
simulation. J Phys Chem 100, 5616– tion of bovine-lactoglobulins A and B
5619. and goat-lactoglobulin. Biochemistry
15 Timasheff, S. N. (2002). Protein 10, 2738–2743.
hydration, thermodynamic binding, 26 Pace, C. N. and Tanford, C. (1968).
and preferential hydration. Thermodynamics of the unfolding of
Biochemistry 41, 13473–13482. b-lactoglobulin A in aqueous urea
16 Karplus, P. A. and Faerman, C. solutions between 5 and 55.
(1996). Ordered water in macro- Biochemistry 7, 198–208.
References 67
4
Thermal Unfolding of Proteins Studied by
Calorimetry
George I. Makhatadze
4.1
Introduction
Fig. 4.1. Schematic design of the cell (H2) are activated by a feedback loop to
assembly of a contemporary DSC instrument. equilibrate the difference in temperatures
The matched sample cell (C1) and reference between cells (i.e., to maintain DT1 ¼ 0 C).
cell (C2) of the total fill type with inlet capillary Additional heaters (H3) on the inner shield
tubes for filling and cleaning are surrounded by (IS) maintain the temperature difference
inner (IS) and outer (OS) shields. Two main between the cell and the shield DT2 ¼ 0 C.
heating elements (H1) on the cells are The feedback loops are controlled by a PC
supplying electrical power for scanning upward board.
in temperature. The auxiliary heating elements
4.2
Two-state Unfolding
First we consider in details the simplest case – a two-state model of protein unfold-
ing. Some of the more complex cases will be discussed in Section 4.6.
72 4 Thermal Unfolding of Proteins Studied by Calorimetry
½U
KðTÞ ¼ ð2Þ
½N
while the fraction of the molecules in the native, FN , and unfolded, FU , states is
defined as:
½N ½U
FN ðTÞ ¼ and FU ðTÞ ¼ ð3Þ
½N þ ½U ½N þ ½U
Combining Eqs (2) and (3), the populations of the native and unfolded states are
defined by the equilibrium constant as:
1 KðTÞ
FN ðTÞ ¼ and FU ðTÞ ¼ ð4Þ
1 þ KðTÞ 1 þ KðTÞ
The equilibrium constant of the unfolding reaction, KðTÞ, on the other hand, is re-
lated to the Gibbs energy change upon unfolding as:
DGðTÞ
KðTÞ ¼ exp ð5Þ
RT
where DHðTÞ is the enthalpy change and DSðTÞ is the entropy changes upon pro-
tein unfolding. The temperature dependence of both parameters is defined by the
heat capacity change upon unfolding, DCp :
dDHðTÞ dDSðTÞ
DCp ¼ ¼T ð7Þ
dT dT
where DHðT0 Þ and DSðT0 Þ are the enthalpy and entropy changes at a reference
temperature T0 . For a two-state process it is convenient to set the reference temper-
ature at which fractions for the native, FN , and unfolded, FU , protein are equal.
Correspondingly, at this temperature, which is usually called the transition temper-
ature, Tm , the equilibrium constant is equal to 1 and thus the Gibbs energy is
equal to zero:
DHðTm Þ
DSðTm Þ ¼ ð11Þ
Tm
DHðTm Þ T
DSðTÞ ¼ þ DCp ln ð12Þ
Tm Tm
Thus the temperature dependence of the Gibbs energy can be written as:
Tm T Tm
DGðTÞ ¼ DHðTm Þ þ DCp ðT Tm Þ þ T DCp ln ð13Þ
Tm T
The importance of Eq. (13) is that if just three parameters, Tm ; DHðTm Þ, and DCp ,
are known, all thermodynamic parameters DH; DS, and DG for the two-state pro-
cess can be calculated for any temperature.
Figure 4.2 shows the temperature dependence of the partial heat capacity,
Cp; pr ðTÞ of a hypothetical protein. It consists of two terms:
where Cp prg ðTÞ is the so-called progress heat capacity (sometimes also called the
‘‘chemical baseline’’) which is defined by the intrinsic heat capacities of protein in
different states, and hCp ðTÞiexc is the excess heat capacity which is caused by the
heat absorbed upon protein unfolding. The progress heat capacity for a two-state
unfolding can be defined as:
where Cp; N ðTÞ and Cp; U ðTÞ are the temperature dependencies of the heat capaci-
ties of the native and unfolded states, respectively. It is important to emphasize
that the heat capacity of the unfolded state is higher than that of the native state,
Cp; U > Cp; N . Extensive studies of the heat capacities of the native and unfolded
74 4 Thermal Unfolding of Proteins Studied by Calorimetry
40 A
35
Cp,pr(T)
20
Tm
B
15
∆Hcal(Tm)
<Cp(T)>exc
10
0
20 30 40 50 60 70 80 90 100
o
Temperature ( C)
Fig. 4.2. A) Temperature dependence of the is the progress heat capacity (sometimes
partial heat capacity, Cp; pr ðTÞ, of a hypothetical also called the ‘‘chemical baseline’’).
protein that has the following thermodynamic B) The excess heat capacity, hCp ðTÞiexc , is
parameters of unfolding: Tm ¼ 65 C, DH ¼ obtained as a difference between Cp; pr ðTÞ and
250 kJ mol1 , DCp ¼ 5 kJ mol1 K1 . Cp; N ðTÞ Cp prg ðTÞ. The area under hCp ðTÞiexc is the
and Cp; U ðTÞ are the temperature depen- enthalpy of unfolding, and temperature at
dencies of the heat capacities of the native which hCp ðTÞiexc is maximum defines the
and unfolded states, respectively, and Cp prg ðTÞ transition temperature, Tm .
states of small globular proteins have established some general properties [7]. The
heat capacity function for the native state appears to be a linear function of tem-
perature with a rather similar slope 0.005–0.008 J K2 g1 when calculated per
gram of protein. The absolute values for Cp; N vary between 1.2 and 1.80 J K1 g1
at 25 C. For the unfolded state, the absolute value at 25 C is always higher than
that for the native state ðCp; U > Cp; N Þ and ranges between 1.85 and 2.2 J K1 g1 .
The heat capacity of the unfolded state has nonlinear dependence on temperature.
It increases with increase of temperature and reaches a plateau between 60 and
75 C. Above this temperature Cp; U Cp; U remains constant with the typical values
ranging between 2.1 and 2.5 J K1 g1 .
The excess heat capacity function can thus be calculated as a difference between
experimental and progress heat capacities. The excess heat capacity function is very
important as it provides the temperature dependence of the fraction of protein in
the native and unfolded states:
ÐT exc
Q i ðTÞ 0 hCp ðTÞi dT
FN ðTÞ ¼ 1 FU ðTÞ ¼ ¼ Ðy exc ð16Þ
Q tot 0 hCp ðTÞi dT
4.2 Two-state Unfolding 75
The temperature dependence of the equilibrium constant defines the van’t Hoff
enthalpy of a two-state process as:
d ln KðTÞ
DHvH ¼ R ð18Þ
dð1=TÞ
The area under the hCp ðTÞiexc is the calorimetric (model independent) enthalpy of
unfolding:
ðy
DHcal ¼ hCp ðTÞiexc dT ð19Þ
0
Comparison of the two enthalpies – van’t Hoff enthalpy calculated for a two-state
transition and model-free calorimetric enthalpy of unfolding – is the most rigorous
test of whether protein unfolding is a two-state process [8–11]. If calorimetric en-
thalpy is equal to van’t Hoff enthalpy, DHcal ðTm Þ ¼ DHvH ðTm Þ, the unfolding pro-
cess is a two-state process. If calorimetric enthalpy is larger than the van’t Hoff
enthalpy, DHcal ðTm Þ > DHvH ðTm Þ, the unfolding process is more complex, such
as being not monomolecular or involving intermediate states (see Section 4.6).
Instead of the van’t Hoff enthalpy, it is more practical to use the fitted enthalpy
of two-state unfolding. This can be done using the following approach. The excess
enthalpy function (relative to the native state) is defined as [2, 8]:
where DHðTÞ is the enthalpy difference between native and unfolded states. By
definition, the excess heat capacity function is a temperature derivative of the ex-
cess enthalpy at constant pressure:
Note that the way hCp ðTÞiexc is defined by Eq. (14), the temperature dependence of
DH is in the Cp prg ðTÞ function. Combining Eqs (4), (8), and (21) it is easy to show
that the excess heat capacity hCp ðTÞiexc is defined as:
DHfit 2 K
hCp ðTÞiexc ¼ ð22Þ
R T 2 ð1 þ KÞ 2
The experimental partial molar heat capacity function, Cp; pr ðTÞ, is then fitted to
the following expression:
Cp; pr ðTÞ ¼ FN ðTÞ Cp; N ðTÞ þ hCp ðTÞiexc þ FU ðTÞ Cp; U ðTÞ ð24Þ
The heat capacity functions for the native and unfolded states can be represented
by the linear functions of temperature as:
4.3
Cold Denaturation
qDGðTÞ
¼ DSðTÞ
qT
Cold denaturation was predicted and later its existence confirmed based on the
extrapolation of thermodynamic functions obtained from the study of heat denatu-
ration process [8, 16–19]. Experimental validation of the cold denaturation phe-
nomenon once again demonstrated the power of statistical thermodynamics in
the analysis of the conformational properties of proteins.
4.4
Mechanisms of Thermostabilization
From the point of view of applied properties of proteins, two parameters are
important – the thermodynamic stability as measured by the Gibbs energy and
the thermostability as measured by the transition temperature.
Depending on the relationship between the Ts ; Tm and absolute values of DG,
three extreme models [20–24] account for the increase in DG and/or Tm and the
relationship between these two parameters (Figure 4.4).
mol-1)
mol-1)
mol-1)
Tab. 4.1. Thermodynamic parameters used in the simulation of different models for thermostabilization shown
in Figure 4.4.
comes less steep. This can be achieved by lowering DCp , but both enthalpy and
entropy will change as well.
8 Model III Increase in thermostability is accompanied by the increase in Ts but no
changes in stability at Ts , or in the temperature dependence of DG function. The
increase in stability is due to the decrease in both enthalpic and entropic factors,
but decrease in T DS is larger than in DH, while DCp remains unchanged.
Of course these three models represent ‘‘extreme’’ cases and the real systems
can use combination of all three simultaneously. Nevertheless, understanding the
contributions of different factors to the thermodynamics of proteins in terms
of DCp ; DH, and DS allows strategies for rational modulation of protein stability
and/or thermostability to be designed. These strategies are discussed in the next
section.
4.5
Thermodynamic Dissection of Forces Contributing to Protein Stability
The ultimate goal of the field is to be able to understand the determinants of pro-
tein stability in such details that will allow the prediction of the stability profile
(i.e., DG) of any protein under any conditions such as temperature, pH, ionic
strength, etc.
For example, the slope of the DG dependence on temperature is DS, which in
turn is dependent on temperature and this dependence is defined by DCp. Thus
one needs to know not only a value for DG at one particular temperature, but also
DS; DH, and DCp to calculate the DGðTÞ function.
Another potential complexity arises from the fact that DG being the difference
between the enthalpic ðDHÞ and entropic ðT DSÞ terms, is subject to the so-called
enthalpy–entropy compensation phenomenon [25–27]. It has been observed in a
number of systems that changes in the enthalpy of a system are often compen-
sated by the changes in the entropy in such a way that the resulting changes in
DG are very small. One particularly striking example of such compensation is the
effect of heavy (D2 O) and ordinary light (H2 O) water on protein stability [28].
Heavy water as a solvent provides probably the smallest possible perturbation to
the properties H2 O. The stability ðDGÞ of proteins in H2 O is very similar to the
stability of proteins in D2 O (Figure 4.5). Such similarity in the DG values might
lead to the conclusion that D2 O did not perturb the properties of the system rela-
tive to H2 O. However, if one looks at the corresponding changes in enthalpy (Fig-
ure 4.5) and entropy (not shown), it is clear that D2 O has a profound effect on the
system, and decreases the enthalpy of unfolding by @20%. This decrease in en-
thalpy is accompanied by a decrease in entropy in such a way that the resulting
DG is largely unchanged.
This result underlines the importance of the analysis of the protein stability in
terms of enthalpy, entropy, and heat capacity, so that underlying physicochemical
processes are not masked by the enthalpy–entropy compensations. The effect of
80 4 Thermal Unfolding of Proteins Studied by Calorimetry
mol-1)
mol-1)
Fig. 4.5. Comparison of the unfolding on the transition temperature, Tm . The slopes
thermodynamics for three different proteins which represent the heat capacity change upon
ribonucleases A (RNaseA), hen-egg white unfolding (DCp , kJ K1 mol1 ) are 4.8 (d-RNase),
lysozyme (Lyz) and cytochrome c (Cyt), in 5.2 (h-RNase), 6.4 (d-Lyz), 6.7 (h-Lyz), 4.7
ordinary (H2 O, squares) and heavy (D2 O, (d-Cyt c) and 5.5 (h-Cyt c). C) Dependence
circles) water. A) Dependence of the of the Gibbs energy of unfolding at DG (25 C)
transition temperature, Tm , on the pH/pD. B) on the pH/pD. Reproduced with permission
Dependence of the enthalpy of unfolding, DH, from Ref. [28].
D2 O on protein stability also underlines the importance of the solvent (or interac-
tions of protein groups with the solvent) for protein stability. Even such small per-
turbation in the properties of solvent such as D2 O versus H2 O produces dramatic
changes in the enthalpies and entropies of unfolding. Thus in analysis of the
factors important for proteins stability one needs to consider not only the interac-
tions between protein groups but also interactions between protein groups and the
solvent. The next three sections will discuss the contribution of different types
4.5 Thermodynamic Dissection of Forces Contributing to Protein Stability 81
of interactions to the heat capacity, enthalpy and entropy changes upon protein
unfolding.
4.5.1
Heat Capacity Changes, DC p
It is currently believed that the heat capacity changes upon unfolding are entirely
defined by the interactions with the solvent of the proteins groups that were buried
in the native state but become exposed upon unfolding. Both polar and nonpolar
groups seem to contribute although with different sign and magnitude. It appears
that nonpolar groups have a large positive contribution to the heat capacity of un-
folding, while the polar groups have smaller (for a group of similar size) and neg-
ative contribution to DCp . These estimates of DCp for polar and nonpolar groups
were obtained from the study of thermodynamics of transfer for model com-
pounds [7, 29–35]. To link the data on model compounds to the results obtained
on proteins, the heat capacity changes for model compounds were computed per
unit of solvent accessible surface area, and then used as coefficients to calculated
the hydration heat capacity for proteins. It has been shown that the DCp values cal-
culated are in a very good agreement with the experimentally determined heat ca-
pacity changes for proteins. From this correlation it was concluded that the DCp of
protein unfolding is caused exclusively by the hydration and that the disruption of
internal interactions in the native state do not contribute to the DCp . This conclu-
sion was in disagreement with the estimates done by Sturtevant [36], who showed
that the changes in vibrational entropy of protein upon unfolding should contrib-
ute 20–25% to the experimentally observed changes in DCp .
The common belief that DCp is entirely defined by the hydration was questioned
by experiments of Loladze et al. [37]. These authors measured experimentally the
effects of single amino acid substitutions at fully buried positions of ubiquitin on
the DCp of this protein. It was shown that substitution of the fully buried nonpolar
residues with the polar ones leads to a decrease in DCp , while the substitutions of
fully buried polar residues with nonpolar ones leads to an increase in DCp . These
results provided direct experimental support for the model compound data (that
indicated difference in the sign for the DCp of hydration for polar and nonpolar
groups) to be qualitatively applicable to proteins. Quantitatively, however, the re-
sults of substitutions on DCp were not possible to predict based on the model com-
pound data and surface area proportionality. This led to the conclusion that the dy-
namic properties of the native state also contribute to DCp [37, 38]. At this point
quantitative correlation between native state dynamics and DCp awaits experimen-
tal validation.
4.5.2
Enthalpy of Unfolding, DH
The enthalpy of protein unfolding is positive, which means that enthalpically the
native state is favored over the unfolded state. Two major types of interactions can
82 4 Thermal Unfolding of Proteins Studied by Calorimetry
lizing, while exposure to the solvent upon unfolding of the buried in the native
state groups is enthalpically destabilizing.
4.5.3
Entropy of Unfolding, DS
buried in the native state as a major source for the stability. Indeed, interactions in
the core of the native protein and interactions of these residues with the solvent in
the unfolded state provide the dominant effects for protein stability. This does not
mean, however, that the surface residues do not contribute to stability [60–63].
They are in most cases not dominant factors defining protein energetics, yet are
significantly large relative to the overall protein stability ðDGÞ and thus they are im-
portant for modulating protein stability and, in the case of charge–charge interac-
tions, enhancing protein solubility.
4.6
Multistate Transitions
In 1978 Biltonen and Freire [8] proposed a universal mechanism for equilibrium
unfolding of proteins based on standard statistical thermodynamics approach.
They considered a general reaction in which a initial native state, N, undergoes
conformational transition to the final unfolded state, U, via a series of intermediate
states, Ii :
N $ I1 $ I2 $ In $ U
and showed that the excess enthalpy of the systems (enthalpies of all states in the
system relative to the initial state) can be written as:
X
n
hDH exc i ¼ Fi DHi ð27Þ
i¼0
where Fi is the fractional population of i-th state and DHi is the enthalpy of the i-th
state relative to the initial state. Keeping in mind that since
ðT
hDH exc i ¼ DCpexc dT ð28Þ
T0
knowledge of the excess heat capacity function provides a unique description of the
thermodynamic properties of the system. One practical approximation for the mul-
tistate transitions is that when the transitions involve many states, it is convenient
to assume that the thermodynamic domains behave in an independent manner,
with each individual transition approximated by a two-state transition. In such
case the excess heat capacity function can be simply written as:
X ðDHi Þ 2 Ki
hDCp exc i ¼ ð29Þ
i
RT 2 ð1 þ Ki Þ 2
The following sections discuss several of the commonest cases for the unfolding
process other than the simple two-state case.
4.6.1
Two-state Dimeric Model
In this model only the native dimer, N2 , and unfolded monomer, U, are signifi-
cantly populated.
K
N2 $ 2 U
½U 2
K¼ ð30Þ
½N2
where [N2 ] and [U] are the concentrations of native dimers and monomers, respec-
tively. The fraction of protein in these two states can be presented as:
where FN2 and FU are the fractions of native dimer and unfolded monomers, re-
spectively, and PT , the total protein concentration expressed per mole of monomers
is:
Combining Eqs (30) and (32) allows the equilibrium constant to be expressed as a
function of total protein concentration and a fraction of unfolded protein as:
2 PT FU 2
K¼ ð33Þ
1 FU
The excess enthalpy of the system is related to the enthalpy of unfolding as:
which allows calculation of the excess heat capacity function by simple analytical
differentiation:
86 4 Thermal Unfolding of Proteins Studied by Calorimetry
Thus the experimental partial molar heat capacity profile can be expressed as:
Cp; pr ðTÞ ¼ FN ðTÞ Cp; N ðTÞ þ Cpexc ðTÞ þ FU ðTÞ Cp; U ðTÞ ð37Þ
where the heat capacity functions for the native and unfolded states are expressed
according to Eqs (25–26).
Calculation of the standard thermodynamic quantities such as Gibbs energy,
can be done by defining the reference temperature Tm as a temperature at which
FN2 ¼ 0:5, and thus according to the Eq. (33) the equilibrium constant at this tem-
perature is equal to PT . The standard Gibbs energy can then be expressed as:
Tm T
DG0 ¼ DHðTm Þ þ DCp ðT Tm Þ
Tm
Tm
þ T DCp ln R T ln KðTm Þ ð38Þ
T
It must be emphasized that the DG0 function in this case is refers to a 1 mol L1
concentration of dimers.
4.6.2
Two-state Multimeric Model
For the reaction of unfolding of homo-oligomeric system that occurs without sig-
nificant population of intermediate states, i.e.
K
Nn $ n U
it is easy to show that the excess heat capacity function can be expressed simply as
DHðTÞ 2 FU ð1 FU Þ
hDCpexc i ¼ ð39Þ
R T 2 n FU ðn 1Þ
where n is the order of the reaction. Obviously, for n ¼ 2, Eq. (39) is reduced to Eq.
(36).
4.6.3
Three-state Dimeric Model
In this model, not only the native dimer, N2 , and the unfolded monomer, U, but
also a native monomer, N, are significantly populated.
4.6 Multistate Transitions 87
Kd KU
N2 $ 2 N $ 2 U
½N 2 ½U
Kd ¼ and KU ¼ ð40Þ
½N2 ½N
where ½N2 ; ½N, and [U] are the concentrations of the native dimers, the native
monomers and the unfolded monomers, respectively. The fraction of protein in
these three states can be presented as:
where FN2 ; FN , and FU are the fractions of native dimer, native monomer, and un-
folded monomer, respectively, and PT , the total protein concentration expressed per
mole of monomers, is:
2 FN 2 PT
Kd ¼ ð45Þ
FN2
FU
KU ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð46Þ
K d FN 2
2 PT
Keeping in mind that FN2 þ FN þ FU ¼ 1 we can combine Eqs (45) and (46) to
get:
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
K d FN2 K d FN2
FN2 ¼1 KU ð47Þ
2 PT 2 PT
which allows calculation of the excess heat capacity function as a temperature de-
rivative of the excess enthalpy. The corresponding Gibbs energies can be expressed
as:
Td T Td
DG0d ¼ DHðTd Þ þ DCp ðT Td Þ þ T DCp ln R T ln KðTd Þ
Td T
ð52Þ
Tm T Tm
DGU ¼ DHðTm Þ þ DCp ðT Tm Þ þ T DCp ln ð53Þ
Tm T
where DG0d is the standard Gibbs energy of dissociation of native dimer into
native monomers, and DGU is the monomolecular Gibbs energy of unfolding
of monomers. For convenience the reference temperatures are defined as fol-
lows: Td is defined as a temperature at which FN2 ¼ FN2 , and thus according to the
Eq. (45) the equilibrium constant is equal to 2PT ; Tm is the temperature at which
FN ¼ FU and thus the equilibrium constant is equal to 1.
4.6.4
Two-state Model with Ligand Binding
In this model, the native state, N, but not the unfolded state, U, can bind a ligand,
L. The ligand does not undergo any conformational transition upon binding.
Kd KU
NL $ N þ L $ U þ L
where ½NL; ½N; ½U, and [L] are the concentrations of ligated, and unligated native
4.6 Multistate Transitions 89
states, unfolded state, and free ligand concentration, respectively. The fraction of
protein in these three states can be presented as:
½N ¼ FN PT ð55Þ
½U ¼ FU PT ð56Þ
½NL ¼ FNL PT ð57Þ
FN ðLT FNL PT Þ
Kd ¼ ð60Þ
FNL
FU ðLT FNL PT Þ
KU ¼ ð61Þ
K d FNL
Keeping in mind that FNL þ FN þ FU ¼ 1 we can combine Eqs (60)–(61) and write:
K d FNL K d KU FNL
FNL þ þ ¼1 ð62Þ
ðLT FNL PT Þ ðLT FNL PT Þ
Equation (62) is a simple quadratic equation and can be solved for FNL to give:
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
q 2 4 PT LT q
FNL ¼ ð63Þ
2 PT
where q ¼ PT þ LT þ K d þ KU K d
The fractions of the unligated native and of the unfolded states can be expressed
as:
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Kd ðq 2 4 PT LT qÞ
FN ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð64Þ
2 LT PT PT ð q 2 4 PT LT qÞ
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
KU K d ð q 2 4 PT LT qÞ
FU ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð65Þ
2 LT PT PT ð q 2 4 PT LT qÞ
90 4 Thermal Unfolding of Proteins Studied by Calorimetry
which allows calculation of the excess heat capacity function as a temperature de-
rivative of the excess enthalpy.
The corresponding Gibbs energies can be expressed as:
where DG0d is the standard Gibbs energy of dissociation, DGU is the monomolecu-
lar Gibbs energy of unfolding, T0 is the reference temperature for the dissociation
reaction, and Tm is the temperature at which FN ¼ FU and thus the equilibrium
constant is equal to 1. In practice, the DGU function is determined from the DSC
experiments performed in the absence of the ligand, and then is used to fit the
DSC profiles obtained in the presence of ligand.
4.6.5
Four-state (Two-domain Protein) Model
In this model, the protein consists of two interacting domains A and B, each of
which unfolds as a two state.
K1
ANxBN ! AUxBN
? ?
K2 ?
y
?K3
y
AN BU ! AU BU
K4
The equilibrium constants for the four reactions can be expressed through the con-
centrations of four states, native with both A and B domains folded (AN BN ), do-
main A folded domain B unfolded (AN BU ), domain A unfolded domain B folded
(AU BN ), and both domains unfolded (AU BU ), as
½AU BN
K1 ¼ ð69Þ
½AN BN
½AN BU
K2 ¼ ð70Þ
½AN BN
4.6 Multistate Transitions 91
½AU BU
K3 ¼ ð71Þ
½AU BN
½AU BU
K4 ¼ ð72Þ
½AN BU
It is important to emphasize that the four equilibrium constants are not fully inde-
pendent since the following relationship is valid K1 K3 ¼ K2 K 4 .
The fraction of each of the four states can be written as:
½AN BN
FAN BN ¼ ð73Þ
½AN BN þ ½AU BN þ ½AN BU þ ½AU BU
½AU BN
FAU BN ¼ ð74Þ
½AN BN þ ½AU BN þ ½AN BU þ ½AU BU
½AN BU
FAN BU ¼ ð75Þ
½AN BN þ ½AU BN þ ½AN BU þ ½AU BU
½AU BU
FAU BU ¼ ð76Þ
½AN BN þ ½AU BN þ ½AN BU þ ½AU BU
Combining Eqs (69)–(72) and (73)–(76) we can relate the fraction of individual
states to the corresponding equilibrium constants:
1
FAN BN ¼ ð77Þ
1 þ K1 þ K 2 þ K 3
K1
FAU BN ¼ ð78Þ
1 þ K1 þ K2 þ K1 K3
K2
FAN BU ¼ ð79Þ
1 þ K1 þ K2 þ K1 K3
K1 K3
FAU BU ¼ ð80Þ
1 þ K1 þ K2 þ K1 K3
K4 DG4 DGint
K1 ¼ ¼ exp ð81Þ
Kint RT
T
DG4 ¼ DHðTA Þ 1 þ þ DCp; A ðT TA T lnðT=TA ÞÞ ð82Þ
TA
K3 DG3 DGint
K2 ¼ ¼ exp ð83Þ
Kint RT
92 4 Thermal Unfolding of Proteins Studied by Calorimetry
T
DG3 ¼ DHðTB Þ 1 þ þ DCp; B ðT TB T lnðT=TB ÞÞ ð84Þ
TB
DGint
Kint ¼ exp ð85Þ
RT
DGint ¼ DHint T DSint ð86Þ
4.7
Experimental Protocols
4.7.1
How to Prepare for DSC Experiments
The DSC instrument should be turned on prior to the experiment and several (at
least three) baseline scans with both cells filled with the buffer should be recorded.
The last two profiles must be overlapping. This establishes the thermal history and
verifies that the instrument is working within specifications.
The pre-experiment baseline scans should be done at the scan rate that will be
used for the protein scan. The heating rate is one of the most important parame-
ters. Several considerations have to be taken into account. First, the higher the scan
rate, the higher the sensitivity of the instrument. The increase in sensitivity is lin-
ear with respect to the heating rate (i.e., the sensitivity at a heating rate of 2 min1
is twice as high as that at 1 min1 ), but the increase in sensitivity does not mean
an increase in the signal-to-noise ratio. Second, if the expected transition is very
sharp (e.g., within a few degrees) a high heating rate will affect the shape of the
heat absorption profile and lead to an error in the determination of all thermody-
namic parameters for this transition. This error will be particularly large for the
4.7 Experimental Protocols 93
transition temperature. Third, the higher the heating rate the less time the system
has to equilibrate. So for slow unfolding/refolding processes it is advisable to use
low heating rates.
In practice, for small globular proteins exhibiting reversible temperature induced
unfolding, a maximal (on most DSC instruments) heating rate of 2 min1 is ac-
ceptable. For large proteins lower heating rates (0.5–1 min1 ) are more appropri-
ate. In the case of fibrillar proteins, which usually exhibit very narrow transitions, a
heating rate of 0.1–0.25 min1 is most suitable.
The purity of the protein sample is very important. Contamination with other
proteins, nucleic acids, or lipids can affect both the shape of the calorimetric pro-
files and the estimation of the thermodynamic parameters of unfolding through
the errors in determination of the sample concentration. PAAG electrophoresis,
UV/VIS spectroscopy and mass spectroscopy should be used to validate the purity
of the protein sample.
Since calorimetric experiments are performed over a broad temperature range,
the pH of the buffer should have a small temperature dependence. For this reason
for example, very popular ‘‘physiological’’ Tris buffer is not the best choice. Gly-
cine, sodium/potassium acetate, sodium/potassium phosphate, sodium cacodilate,
and sodium citrate are the buffers of choice.
Equilibration of protein with buffer is a vital part of the sample preparation for
calorimetric experiments. One should keep in mind that the DSC instrument mea-
sures the heat capacity difference between buffer and protein solutions. Thus any
small inequality in the buffer composition will result in an incorrect absolute value
and an incorrect slope of the heat capacity profile. The best way to achieve equilib-
rium between buffer and protein solution is dialysis. The protein sample should be
dialyzed against at least two changes of buffer. The buffer solutions should be fil-
tered prior to dialysis using a 0.45 mm filter to eliminate all insoluble material.
Following dialysis, all insoluble particles should be carefully removed from the
protein solution. This can be achieved by centrifugation at @12 000 g in a micro-
centrifuge for @15–20 min.
Knowledge of the exact protein concentration after dialysis is very important.
Among all available methods for determination of protein concentration, spectro-
photometric methods seem to be most accurate and rapid. This method requires
knowledge of the extinction coefficient of the protein at a given wavelength. The
extinction coefficient can be calculated from the number of aromatic residues and
disulfide bonds in a protein using an empirical equation [64]:
0:1%; 1 cm
e280 nm ¼ ð5690 NTrp þ 1280 NTyr þ 120 NSS Þ=MW ð89Þ
where MW is the molecular weight of the protein in daltons. A more elaborate pro-
cedure is described by Pace et al. [65] and improves the accuracy of the calcula-
tions. For proteins that do not contain any aromatic residues, the method that is
based on absorption of peptide backbone gives fast and accurate results. Light scat-
tering in spectrophotometric measurements of protein concentration can be a sig-
nificant problem, but can be taken into account using a procedure suggested by
Winder and Gent [66].
94 4 Thermal Unfolding of Proteins Studied by Calorimetry
4.7.2
How to Choose Appropriate Conditions
4.7.3
Critical Factors in Running DSC Experiments
References
5
Pressure–Temperature Phase Diagrams of
Proteins
Wolfgang Doster and Josef Friedrich
5.1
Introduction
One of the most peculiar features of proteins is their marginal stability within a
narrow range of thermodynamic conditions. Biologically active structures can be
disrupted by increasing or decreasing the temperature, the pressure, the pH or by
adding denaturants. By disrupting a structure one can study its architecture and
energetics. In this article we focus on the thermodynamic aspects of temperature
and pressure denaturation. A variation of temperature inevitably leads to a simul-
taneous change of entropy and volume through thermal expansion. The advantage
of using pressure as a thermodynamic variable is that volume-dependent effects
can be isolated from temperature-dependent effects. Just as the increase in temper-
ature drives the system in equilibrium towards states of higher entropy, a pressure
increase will bias the ensemble of accessible states towards those with a smaller
volume. This is the meaning of le Chatelier’s principle. Thus, if the protein has
the lower volume in its denatured form, the native structure will become unstable
above a critical pressure. Similarly the opposite effect, stabilization of the native
state by pressure, is sometimes associated with heat denaturation.
Packing defects in the water-excluding native state, reorganization of the solvent
near exposed nonpolar side chains, and electrostriction by newly formed charges
act as volume reservoirs which can lower the volume in the unfolded state. While
dissociation of oligomeric proteins into subunits occurs at low pressures, i.e., be-
low 200 MPa [1], pressures above 300 MPa are required to denature monomeric
proteins. A number of useful review articles on this subject have been published
recently (see, for example, Ref. [2]).
The isothermal compressibility of water (0.56/GPa) is quite small. At 11 000 m
below sea level and pressures near 100 MPa, the density of water is only 5% higher
than at ambient pressure. On the other hand, the density at the freezing transition
changes by 9%. Moreover, proteins are 5–10 times less compressible than water.
As a result, pressure-induced volume changes in proteins are quite small, typically
0.5% of the total volume at the unfolding transition. For monomeric proteins
the difference corresponds approximately to the volume of five water molecules
5.2
Basic Aspects of Phase Diagrams of Proteins and Early Experiments
The most striking thermodynamic properties of proteins are related to the hydro-
phobic effect. These are
8 the large and positive heat capacity increment upon unfolding [8–10] and
8 the phenomenon of cold denaturation.
The latter was first predicted by Brandts [11, 12] based on studies of heat denatura-
tion on ribonuclease. The transition temperatures for cold denaturation are usually
below the freezing point of most aqueous solutions at ambient pressure. How-
ever, under high pressure, water remains liquid down to much lower temperatures
(18 C at 200 MPa, Figure 5.1).
Thus cold denaturation can be studied conveniently at elevated pressures under
mild denaturing conditions, without the need to add denaturants.
The phase diagram of water (Figure 5.1) illustrates the relevance of the
pressure–temperature plane for locating states of structural stability and for classi-
fying their thermodynamic properties.
The slope of the phase boundary of liquid water and ice I is negative: Increase in
pressure extends the stability range of the liquid phase. The slope of the phase
boundary is given by the ratio of the negative entropy change and the increase in
specific volume at the freezing transition (Eq. (3)). The expansion (þ9%) causes
substantial damage when biological material is frozen. In contrast, the slope of
the liquid to ice III boundary is large and positive, since the corresponding volume
change is small and negative. The application of high pressure to the liquid and
then cooling into ice III minimizes the damage by volume changes (3%) at the
freezing transition.
5.2 Basic Aspects of Phase Diagrams of Proteins and Early Experiments 101
Fig. 5.1. The phase diagram of water: For the boundary from
the liquid to hexagonal ice (I) one has dP=dT ¼ DS=DV < 0,
unlike for the technically relevant modifications, II and III where
the volume discontinuity is small.
∆G < 0
0.4 0.8
0.2
PRESSURE / GPa
0.3
xD = 0.5
∆G > 0
0.2
0.1
0 10 20 30 40 50
TEMPERATURE / °C
Fig. 5.2.Contours of constant chemical potential Dm ¼
mD mN between the denatured (D) and the native (N) state
of chymotrypsinogen (after Hawley [14]).
system. The equilibrium constant KDN ðT; PÞ follows from the pressure analog of
the van t’Hoff equation with qT ! qP and DS ! DV:
The transition width, determined by the magnitude of the transition volume DV, is
finite for a mesoscopic protein molecule, in contrast to the zero width of a macro-
scopic first-order transition.
Hawley, in 1971, was the first to interpret the curve on which the free energy dif-
ference between native and unfolded state vanishes as an elliptical phase boundary.
This provided a natural explanation for the so-called re-entrant phase behavior of
heat and cold denaturation [14]. Figure 5.2 shows the contours of the Gibbs free
energy DGðP; TÞ versus pressure and temperature for chymotrypsinogen. Note
that at moderate pressure levels (< 150 MPa), the temperature of heat denatura-
tion increases slightly with pressure. Hence, applying pressure to the thermally
denatured protein may actually drive the protein back into the native state. Upon
further increasing the pressure, however, unfolding may occur again. In other
words, the denatured phase is re-entered, irrespective of the fact that the pressure
is monotonously changed. Phase diagrams with this property are called ‘‘re-
entrant.’’ A similar behavior is found if the temperature is changed. For instance,
if one starts out at sufficiently high pressure levels (200 MPa) in the denatured
5.3 Thermodynamics of Pressure–Temperature Phase Diagrams 103
state and lowers the temperature, one enters the native regime which is, however,
left again at even lower temperatures to re-enter the denatured regime again. This
latter transition is called ‘‘cold denaturation’’.
Re-entrant phase diagrams of perfect elliptical shape have been observed for liq-
uid crystals by Cladis et al. [15] and by Klug and Whalley [16]. The elliptical shape
was analyzed in detail by Clark [16].
In 1973, Zipp and Kauzmann, in a seminal paper, investigated the pressure–
temperature–pH phase diagram of myoglobin [18]. They also observed re-entrant
behavior, but did not attempt to fit the phase boundaries using ellipses. Depending
on pH, the boundaries vary strongly in shape and show nonelliptical distortions, as
shown in Figure 5.3.
In recent years, a significant number of studies have suggested diagrams of
elliptical shape for protein denaturation or for even more complex units, such as
bacteria [19].
The stability phase diagram of proteins and related phenomena have been dis-
cussed in several review articles [20, 21, 22].
Our goal is to eludicate the physical basis and thermodynamics of re-entrant pro-
tein phase diagrams, the conditions for elliptical shapes, and the limitations of this
approach.
5.3
Thermodynamics of Pressure–Temperature Phase Diagrams
The phase equilibrium between the native and the denatured state N Ð D is con-
trolled by the difference in the chemical potentials Dm ¼ mD mN . The ratio of con-
104 5 Pressure–Temperature Phase Diagrams of Proteins
centrations of the denatured to the native form assuming a dilute solution is given
by:
The chemical potential is the driving force that induces transport and transfor-
mations of a substance. It is analogous to the electrical potential, which can only
induce currents because of charge conservation. Since the concentration of compo-
nents, water, and protein is fixed (only pressure and temperature are varied), we
restrict ourselves to one-component systems. The change dmi with temperature
and pressure obeys the Gibbs–Duhem relation for each phase D and N:
where SN ; SD and VN ; VD are the partial molar entropy and volume of the protein
in solution in the native and denatured phase, respectively. Since entropy and vol-
ume are generally positive quantities, the chemical potential always decreases with
increasing temperature, while it increases with increasing pressure, as shown in
Figures 5.4 and 5.6. The potentials of two phases, differing in entropy, will thus
cross at a particular temperature where a transition to the phase with the lower po-
tential occurs. A similar change in phase will take place with pressure when the
volumes of the two phases differ. Thus the more compact phase will be stable at
high pressure.
SD < S N
chemical potential (kJ mol-1)
1
∆µ
0
D
-1
N SD > S N
-2
-3
Tc Th
-4
260 280 300 320 340 360
temperature / °K
Fig. 5.4.The chemical potentials of the DSC ¼ 14 J K1 mol1 , DCp ¼ 75 J K1 mol1 .
denatured (D) and the native (N) state of TH ; TC , and TM are the temperatures for heat
myoglobin and the resulting Dm per mole denaturation, for cold denaturation and for
of amino acid residue. Parameters [25]: maximum stability.
Dm0N ¼ 0:2 kJ mol1 , Dm0D ¼ 0:2 kJ mol1 ,
5.3 Thermodynamics of Pressure–Temperature Phase Diagrams 105
(a) (b)
∆µ p
P D
N
0
T T
DS ¼ DSC þ DSH
ð4Þ
DV ¼ DVC þ DVH
We argue that the closed phase boundaries displayed in Figures 5.2 and 5.3 are
the consequence of the variation of DSH and DVH with temperature and pressure
due to the hydrophobic effect. Exposure of nonpolar groups upon unfolding tends
to immobilize water molecules, thereby decreasing their entropy. This mechanism
is strongly temperature dependent. Its experimental signature is the positive heat
capacity increment at constant pressure, DCp > 0, of the denatured relative to the
native form. In simple terms, to increase the temperature of the unfolded phase
requires additional entropy to continuously melt the immobilized water [8–10, 23,
24]. However, above a certain temperature, T0 , approximately 140 C, the ordering
effect has vanished.
Below T0, the hydration entropy change, DSH ðT; T0 ), is rapidly decreasing with
decreasing temperature. Assuming DCp to be approximately independent of tem-
perature, one obtains for the change in ‘‘hydration entropy’’ [25]:
ðT
DCP T0
DSH ðTÞ ¼ dT ¼ DCP ln ð5Þ
T0 T T
The hydration term z DCp is always negative below T0, stabilizing the denatured
form. This is known as the ‘‘wedging effect’’ of water, which softens the native
structure. The combined entropy difference DS vanishes at the maximum of Dm,
at the temperature of maximum stability TM (Figure 5.4):
At TM the negative hydration entropy DSH just compensates for the positive config-
urational term DSC :
The special DS ¼ 0 line, separating regions of positive and negative transition en-
tropies, is fixed by the condition ðdP=dTÞDm¼0 ¼ 0 as shown in Figure 5.11. Since
the slope in Figure 5.11 is positive and since both DCp and DV are positive, it fol-
lows that DaP > 0. Integrating Eq. (10) yields the effect of pressure on the temper-
ature of maximum stability:
For reasonably low pressures one has for the DS ¼ 0 line: TM ATM0
ð1 þ P Dap =DCp Þ. Expanding the entropic part of the chemical potential (Eq. (6))
about TM A TM0 yields an approximate parabola in T with negative curvature (Fig-
ure 5.5a):
For DV < 0, increasing pressure diminishes the stability range of the native state.
If DV is constant, one obtains the parabolic pressure–temperature phase diagram,
PðTÞDm¼0 , shown in Figure 5.5b.
Experimental data, like those shown in Figure 5.2, 5.3, and 5.11 suggest a re-
entrant phase behavior not only as a function of temperature but also as a function
of pressure. This leads to a closed or ‘‘ellipsoidal’’ phase boundary with DV de-
pending on pressure and temperature. The volume of the denatured form, i.e. the
slope of mD ðPÞ, in Figure 5.6 increases with decreasing pressure. Thus a second
crossing of the DaN potentials may occur at low or even negative pressure denoted
by PL. Thus the unfolded protein is more easily stretched than the compact native
state. Moreover, near P ¼ 0, a pressure increase will stabilize the native state. In
analogy to the temperature (Eq. (7)) we can define a pressure of maximum stabil-
ity, PM , according to:
ðqDm=qPÞP¼PM ¼ DV ¼ 0 ð13Þ
Combining Eqs (4) and (13) shows that the changes in configurational and hydra-
tion volume cancel at P ¼ PM . The small observed unfolding volumes may result
from such compensation effects. Equation (13) allows determination of the
DVðP; TÞ ¼ 0 line, separating regions of positive and negative volume changes:
108 5 Pressure–Temperature Phase Diagrams of Proteins
300
250
VD < VN
µ [kJ/mole] 200
150 D
100 N
50 VD > VN ∆µ
0
PL PM PH
-50
-0.2 0.0 0.2 0.4 0.6 0.8 1.0
Pressure [GPa]
Fig. 5.6. Effect of pressure on the chemical protein, which leads to a crossing at high
potential mð4Þ for the parameters of cytoc- pressure PH . A second crossing occurs at
hrome c at 280 K (Table 5.1 and Eq. (12, 17)). negative pressure PL . PM is the pressure of
The volume of the denatured state decreases maximum stability.
with pressure to below the value of the native
thus
The DV ¼ 0 line follows from the additional condition: ðdP=dTÞDm¼0 ¼ z. For the
pressure of maximum stability one obtains:
range where pressure is negative. Liquids under tension are unstable, although
metastable states can exist, since the creation of a new liquid–gas interface involves
the crossing of an energy barrier. The first experiments were performed by Berthe-
lot in 1850 using a spinning capillary [28]. Stretched states of water at pressures as
low as –280 bars have been observed using the same method [29]. Unfolding ex-
periments with protein solutions at negative pressures still need to be performed.
The folding of unstable proteins such as nuclease conA remains incomplete at pos-
itive pressure, while negative pressure may further stabilize the native state [30]. At
sufficiently high temperature, low-pressure denaturation may occur in the positive
range (Figure 5.11) and in this case it is easily accessible experimentally.
Pressure- and temperature-dependent transition volumes originate from finite
increments in the partial compressibility Db T and the volume expansion coefficient
DaP . Expanding the latent volume DV about a reference point ðTr ; Pr Þ yields:
The major part of the change in the compressibility upon global transformations of
proteins is due to hydration processes [31]. The greater the hydration, the smaller
the partial compressibility of protein and hydration shell. This is the rule for pro-
tein solutions at normal temperature and pressure. For native proteins the intrin-
sic compressibility is as low as that of organic solids [32–34]. The outer surface
contribution to the measured partial compressibility is quite negative. Thus com-
plete unfolding leads to a considerable decrease in the partial compressibility due
to the loss of intramolecular voids and the expansion of the surface area contacting
the bulk water [35]. In this low pressure/high temperature regime Db T is negative,
while DaP is mostly positive. This results in a positive unfolding volume (Eq. (17))
as indicated in Figure 5.6. Consequently moderate pressures stabilize the native
state. A number of studies suggest that high-pressure denaturation leads to incom-
plete unfolding and structures resembling those of molten globule states (MG) [5–
7, 36]. The N ! MG transition is generally accompanied by an increase in com-
pressibility. The tendency towards a positive Db T at higher pressures due to partial
unfolding may lead to the observed negative unfolding volumes. Combining Eqs
(12) and (17), the phase boundary DmðT; PÞ ¼ 0 with respect to a reference point
ðTr ; Pr Þ assumes the form of a second-order hyperface:
For approximately constant increments DCp ; Db T ; DaP , the phase boundary as-
sumes parabolic ðDb T ¼ 0Þ, hyperbolic or closed elliptical shapes. The basic condi-
tion to obtain an elliptical phase diagram is given by:
Since experiments show that DCp > 0, Eq. (19) requires the compressibility change
to be positive: Db T > 0. A hyperbolic shape would imply a negative left-hand side
in Eq. (19). Thus far only elliptical diagrams, in some cases with distortions (e.g.,
Figure 5.3) have been discussed. A complete solution would require data in the
range where pressure is negative: the data shown in Figure 5.2 are also consistent
with a parabolic shape.
5.4
Measuring Phase Stability Boundaries with Optical Techniques
5.4.1
Fluorescence Experiments with Cytochrome c
Protein denaturing processes can be investigated with many techniques. The most
convenient ones concerning the determination of the whole stability diagram are
spectroscopic techniques. Almost all methods have been used: IR, Raman, NMR,
absorption and fluorescence spectroscopy. However, data covering complete phase
diagrams are quite rare. Some of them were reviewed above. The various techni-
ques have their specific advantages as well as disadvantages. For instance, with vi-
brational spectroscopy one obtains information on structural changes such as the
weakening of the amide I band [20, 21, 37, 38], with NMR one obtains information
on hydrogen exchange and the associated protection factors [6] from which conclu-
sions on structural details of the denatured state can be drawn. However, as a rule
no information on the structure disrupting mechanism under pressure is ob-
tained. The situation with absorption and fluorescence spectroscopy is different.
In optical spectroscopy it is the spectral position and the linewidth of an electronic
transition that is measured. From these quantities it is, as a rule, quite difficult to
extract any structural information. On the other hand, one obtains detailed infor-
mation on the interaction of the electronic states with the respective environment.
Since these interactions are known in detail, it is possible to extract from their be-
havior under pressure and temperature changes information on the mechanism of
phase crossing.
Compared with other spectroscopic techniques, fluorescence spectroscopy is also
the most sensitive. Hence, it is possible to work at very low concentration levels so
that aggregation processes can easily be avoided. In most experiments tryptophan
is used as a fluorescent probe molecule. With chromoproteins this is, generally
speaking, impossible since fast energy transfer to the chromophore quenches the
UV fluorescence to a high degree. In the following we describe fluorescence experi-
ments on a cytochrome c-type protein in a glycerol/water matrix [39]. The native
heme iron was substituted by Zn in order to make the protein strongly fluorescent
in the visible range. As a short notation for this protein we use the abbreviation
Zn-Cc.
The set-up for a fluorescence experiment is simple and is shown in Figure 5.7.
The sample is in a temperature- and pressure-controlled diamond anvil cell.
5.4 Measuring Phase Stability Boundaries with Optical Techniques 111
Fig. 5.7. Sketch of a fluorecence experiment for measuring the phase diagram of proteins.
Pressure can be varied up to about 2 GPa. Its magnitude is determined from the
pressure shift of the ruby fluorescence, for which reference values can be taken
from the literature [40]. The temperature is controlled by a flow thermostat be-
tween 20 C and 100 C. Excitation is carried out into the Soret band at 420 nm
with light from a pulsed dye laser pumped by an excimer laser. The fluorescence is
collected in a collinear arrangement, dispersed in a spectrometer and detected via a
CCD camera. The quantities of interest for measuring the protein stability phase
diagram are the spectral position and the width of the S1 ! S0 00-transition (587
nm, Figure 5.8). Generally speaking the spectral changes for Zn-Cc are rather
small. Nevertheless the measured changes of the spectral shifts and widths are ac-
curate within about 3%. Figure 5.8 shows the fluorescence spectrum of Zn-Cc be-
tween 550 and 700 nm as it changes with pressure. The sharp line around 695 nm
is the fluorescence from ruby.
fluorescence / a.u.
wavelength / nm
Fig. 5.8. Fluorescence spectrum of Zn- changes of the first and second moment of the
cytochrome c in glycerol/water at ambient 00-band at 584 nm. The fluorescence from
temperature as it changes with pressure. The ruby is shown as well. It serves for gauging the
stability diagram was determined from the pressure.
112 5 Pressure–Temperature Phase Diagrams of Proteins
5.4.2
Results
Figures 5.9 and 5.10 show typical results for the changes in the first and second
moment (position of the maximum and width) of the 00-band under pressure and
temperature variation.
If pressure is increased the band responds with a red shift, although a very small
one (Figure 5.9). Around 0.75 GPa the red shifting regime changes rather abruptly
into a blue shifting regime which levels off beyond 1 GPa. A qualitatively similar
behavior is observed for the band shift under a temperature increase: a red shifting
phase (in this case more pronounced) is followed by a blue shifting phase (Figure
5.10). Quite interesting is the behavior of the bandwidth: An increase of pressure
leads to a rather strong increase in the width (Figure 5.9). This is the usual behav-
ior and just reflects the fact that an increase in density results in an increase in the
molecular interactions due to their strong dependence on distance. However, if the
pressure is increased beyond about 0.9 GPa, the band all of a sudden narrows. This
narrowing levels off beyond about 1 GPa. As can be seen from the figure, the max-
imum in the bandwidth coincides with the midpoint of the blue shifting regime.
The thermal behavior of the bandwidth is different (Figure 5.10): There is no nar-
rowing phase, but there is kind of a kink in the increase of the bandwidth with
temperature around 65 C. Again, this change in the thermal behavior of the width
5.4 Measuring Phase Stability Boundaries with Optical Techniques 113
coincides rather well with the midpoint in the thermally induced blue shifting
phase.
For measuring the complete phase diagram we performed a series of pressure
scans from ambient pressure up to about 1.5 GPa at various temperatures and a
series of temperature scans from a few centigrades up to about 90 C at various
pressure levels. The respective pattern of the changes of the fluorescence always
showed the sigmoid-like behavior as discussed above. We associate the midpoint
of the blue shifting regime with the stability boundary between the native and the
denatured state of the protein due to reasons discussed below.
The complete stability diagram is shown in Figure 5.11. The solid line represents
an elliptic least square fit to the experimental points. The two solid lines DS ¼ 0
and DV ¼ 0 cut the stability diagram at points where the volume and the entropy
difference change sign (see also Eqs (11) and (16)). For instance, for all data points
to the left of the DS ¼ 0 line the entropy in the denatured state SD is smaller than
SN , the entropy in the native state (see below).
Figures 5.12 and 5.13 show specific features of the fluorescence behavior which
we want to discuss separately. Figure 5.12 shows the behavior of the first and sec-
ond moment for cold denaturation at ambient pressure. The point which we want
to stress is that the band shift no longer reflects the qualitative change in its behav-
ior. Instead, we observe a blue shift which increases in a nonlinear fashion with
decreasing temperature. From such a behavior it is hard to determine a transition
114 5 Pressure–Temperature Phase Diagrams of Proteins
point. The bandwidth, on the other hand, shows a quite unusual behavior: it nar-
rows with decreasing temperature, but around 3 C it runs through a minimum
and starts broadening upon further decreasing the temperature. The unusual be-
havior concerns the broadening with decreasing temperature. This broadening ob-
viously originates from an additional state which becomes populated below 3 C.
Naturally, this state must be the denatured state of the protein. Hence, we associate
this minimum in the bandwidth with the stability boundary.
The temperature dependence of the moments in Figure 5.13 follows a rather
complex pattern. This pattern comes from the re-entrant character of the phase di-
agram which has been stressed in Section 5.2. At a pressure level of 0.7 GPa, the
protein is denatured to a high degree if the temperature is below around 10 C (see
Figure 5.11). Hence, upon increasing the temperature, the protein enters the stable
regime. This is most clearly reflected in a narrowing of the bandwidth due to the
fact that the denatured state becomes less and less populated At the same time the
band maximum undergoes a red shift in line with the observation that the dena-
tured state is blue shifted from the native one. However, around 25 C the red
shifting phase turns into a blue shifting phase, signaling that the protein re-enters
the denatured regime. The bandwidth data support this conclusion. Interestingly,
the data convey the impression that there are more than just two transitions in-
volved; their nature, however, is not clear.
116 5 Pressure–Temperature Phase Diagrams of Proteins
5.5
What Do We Learn from the Stability Diagram?
5.5.1
Thermodynamics
The eye-catching feature in Figure 5.11 is the nearly perfect elliptic shape of the
stability diagram in agreement with the simple model developed in Section 5.3.
The elliptic shape implies that the characteristic thermodynamic parameters of
the protein, namely the increments of the specific heat capacity, the compressibility
and the thermal expansion are rather well-defined material parameters, i.e., do
not significantly depend on temperature and pressure. In addition, since the model
is based on just two states, we conclude from the elliptic shape that the protein
investigated, namely Zn-Cc, can be described as a two-state folder. Applying the
Clausius–Clapeyron equation (Eq. (3)) to the elliptic phase boundary, we see that
there are distinct points characterized by (dP=dTÞDm¼0 ¼ y and by (dP=dTÞDm¼0 ¼
0. The former condition separates the regime with positive DV from the regime
with negative DV. The latter one separates the regime with positive DS from the
regime with negative DS (Eqs (7) and (13)).
Let us consider thermal denaturation at ambient pressure. DS is definitely posi-
tive because the entropy of the chain increases upon denaturation and the entropy
of the solvent increases as well. Since the slope of the phase boundary is positive,
DV must be positive, too. However, moving along the phase boundary, we even-
tually cross the DV ¼ 0 line. For all points above that line, DV is negative. In other
words: application of pressure destabilizes the native state and favors the dena-
tured state. Below the DV ¼ 0-line, DV is positive. Accordingly, application of pres-
sure favors the native state.
In a similar way, for all points to the right of the DS ¼ 0 line, DS is positive. A
positive entropy change upon denaturation is what one would straightforwardly ex-
pect. However, to the left of the DS ¼ 0 line, the entropy change is negative, mean-
ing that the entropy in the denatured state is smaller than in the native state. As
was discussed above, the negative entropy change is due to an upcoming ordering
of the hydration water as the temperature is decreased: The exposure of hydropho-
bic groups causes the water molecules in the hydration shell to become more and
more immobilized due to the formation of a stronger hydrogen network. Right at
the point where the DS ¼ 0 line cuts the phase diagram, the chain entropy and the
entropy of the hydration shell exactly cancel, as we have stressed above.
There is another interesting outcome from the elliptic shape of the phase dia-
gram: Eq. (19) tells us that the change in the compressibility upon denaturation
has to have the same sign as the change in the specific heat. The change in the
specific heat is positive meaning that the heat capacity is larger in the unfolded
state. As outlined above, this is due to the fact that an increase of the temperature
in the denatured state requires the melting of the ordered immobilized hydration
shell. Accordingly, Db has to be positive, meaning that the compressibility in the
denatured state is larger than in the native state. As to the absolute values of
5.5 What Do We Learn from the Stability Diagram? 117
the thermodynamic parameters which determine the phase diagram, they can be
determined only if one of the parameters is known. In the following section we
will show how the equilibrium constant can be determined as a function of tem-
perature and pressure and how this can be exploited to extract the thermodynamic
parameters on an absolute scale.
5.5.2
Determination of the Equilibrium Constant of Denaturation
The body of information that can be extracted from the thermodynamics of protein
denaturation reveals surprising details. However, it should be stressed again that
all the modeling is based on two important assumptions, namely that folding and
denaturing is described within the frame of just two states, N and D, and that these
two states are in thermal equilibrium. Neither of these assumptions is straightfor-
ward. It is well known that the number of structural states, even of small proteins
is, is extremely large [41–47] and the communication between the various states
can be very slow so that the establishment of thermal equilibrium is not always
ensured. We understand the two-state approximation on the basis of a concept
which we call ‘‘state lumping.’’ In simple terms this means that the structural
phase space can be partitioned into two areas comprising, on the one hand, all
the states in which the protein is functioning and, on the other hand, the area in
which the protein is dead. We associate the native state N and the denatured state
D with these two areas. In order for the ‘‘state lumping’’ concept to work, the im-
mediate consequence is that all states within the two and between the two areas
are in thermal equilibrium. Whether this is true or not can only be proven by the
outcome of the experiments.
For Zn-Cc this concept seems to hold sufficiently well. In our experiments, typi-
cal waiting times for establishing equilibrium were of the order of 20 minutes. In
order to make sure that this time span is reasonable, we measured for some points
on the phase boundary also the respective kinetics. However, we also stress that at
high pressures and low temperatures (left side of the phase diagram, Figure 5.11)
we could not get reasonable data and we attributed this to the fact that equilibrium
could not be reached within the experimental time window.
Assuming that equilibrium is established and the two-state approximation holds
with sufficient accuracy, we can determine the equilibrium constant as a function
of pressure and temperature from the fluorescence experiments and from the fact
that the phase diagram has an elliptic shape. We proceed in the following way: The
fluorescence intensity FðlÞ in the transformation range is a superposition from the
two states N and D with their respective fluorescence maximum at lN and lD :
where pN and pD are the population factors of the two states N and D, aN and aD
are the respective oscillator strengths. Changes of the temperature or the pressure
of the system cause a change in the population factors pN and pD which, in turn,
118 5 Pressure–Temperature Phase Diagrams of Proteins
leads to a change in the first moment of FðlÞ, that is in the spectral position of
the band maximum, lmax . As to Zn-Cc, we can make use of some of its specific
properties: First, the chromophore itself is quite rigid, hence, it is not likely af-
fected by pressurizing or heating the sample. Accordingly, it is safe to assume
that the oscillator strengths in the native and in the denatured state are not signifi-
cantly different from each other. Second, the spectral changes induced by varying
the temperature or the pressure are rather small compared to the total width of
the long wavelength band (Figure 5.8). In addition, the wavelengths lN and lD are
rather close and well within the inhomogeneous band. As a consequence, the ex-
ponentials in Eq. (20) are roughly of the same magnitude irrespective of the value
of l. Along these lines of reasoning, lmax is readily determined from the condition
dF=dl ¼ 0:
The last term on the right hand side of Eq. (21) holds because we restrict our eval-
uation to an effective two-state system. According to the above equation, the band
maximum in the transition region is determined by a population weighted average
of lN and lD . Since lN and lD depend on pressure and temperature themselves, we
have to determine the respective edge values in the transition region. How this is
done is shown in Figure 5.14 for thermal denaturation at ambient pressure. Simi-
lar figures are obtained for any parameter variation. Having lmax as a function of
pressure or temperature from the experiment and knowing the respective edge
Fig. 5.16. Pressure dependence of the difference in the Gibbs free energy DG.
5.5.3
Microscopic Aspects
two types of interactions, namely the dispersive and the higher order electrostatic
interaction (dipole–induced dipole). Both types of interaction are very short ranged.
They fall off with distance R as R6 . From the pioneering papers by Bayliss, McRae
and Liptay [48–50], it is well known that the dispersive interaction is always red
shifting because the polarizability in the excited state is higher than in the ground
state. However, electrostatic interactions can cause shifts in both directions, to the
red as well as to the blue, depending on how the dipole moment of the chromo-
phore changes in the excited state compared with the ground state Accordingly, in
the blue shifting regime, the electrostatic interaction of the probe with its environ-
ment obviously increases compared with the dispersive interaction. As a conse-
quence, we conclude that polar groups with a sufficiently large dipole moment
have to come close to the chromophore to induce this shift.
As to thermal denaturation, such an interpretation seems to fit quite well into
the scenario. Baldwin [23], for instance, could show that thermal denaturation of
a protein is remarkably well described in analogy to the solvation of liquid hydro-
carbons in water. This ‘‘oil droplet model’’ accounts for the temperature depen-
dence of the hydrophobic interaction: The entropic driving force for folding, which
comprises the major part of the hydrophobic interaction, decreases as the temper-
ature increases. This driving force is associated with the formation of an ordered
structure of the water molecules surrounding the hydrophobic amino acids. At suf-
ficiently high temperature this ordered structure melts away so that the change of
the unfolding entropy of the polypeptide chain takes over and the protein attains a
random coil-like shape [5]. Since a random coil is an open structure, the chromo-
phore may now be exposed to water molecules. Water molecules have a rather high
permanent dipole moment, hence, may become polarized through the electrostatic
interaction with the chromophore. Along these lines of reasoning, it seems
straightforward that the observed blue shifting regime of the fluorescence in the
thermal denaturation of Zn-Cc comes from the water molecules of the solvent.
However, as was pointed out by Kauzmann [51], the ‘‘oil droplet model’’ has severe
shortcomings, despite its success in explaining the specific features of thermal de-
naturation. It almost completely fails in explaining the specific features of pressure
induced denaturation. Pressure denaturation is governed by the volume change
DV (Section 5.3). In this respect hydrophobic molecules behave quite differently
from proteins: DV for protein unfolding is negative above a few hundred MPa
(see, for instance, Figure 5.11 and discussions in Sections 5.3 and 5.5.1), whereas
it is positive in the same pressure range for transfering hydrophobic molecules
into water. There are two possible consequences: Either the ‘‘oil droplet model’’ is
completely wrong, or the pressure denatured state is different from the thermally
denatured state.
Indeed, Hummer and coworkers could show that the latter possibility is true [52,
53]. On the basis of their calculations they suggested that, as pressure is increased,
the tretrahedral network of H-bonds in the solvent becomes more and more frus-
trated so that the pressure-induced inclusion of water molecules within the hydro-
phobic core becomes energetically more and more favorable. As a consequence, the
contact configuration between hydrophobic molecules is destabilized by pressure
122 5 Pressure–Temperature Phase Diagrams of Proteins
5.5.4
Structural Features of the Pressure-denatured State
From the discussion above it is evident that pressure denaturation leads to a struc-
tural state which is different from the respective one obtained by thermal dena-
turation. It seems that many structural features of the native protein remain
preserved under pressure denaturation. Experimental evidence comes from NMR
experiments [6], but also from IR experiments [20, 21, 37, 38].
Figure 5.17 shows another experiment [56], namely a neutron-scattering experi-
ment with myoglobin as a target from which the conservation of native structural
elements in the denatured state is directly seen. Myoglobin denatures at pH 7 in
the range between 0.3 and 0.4 GPa as judged from the optical absorption spectrum
of the heme group. The corresponding changes in the protein structure and at the
protein–water interface can be deduced from the neutron structure factor HðQÞ of
a concentrated solution of myoglobin in D2 O (Figure 5.17). Q denotes the length of
the scattering vector. The large maximum around Q ¼ 2 Å1 arises from OaO cor-
5.6 Conclusions and Outlook 123
1 bar
0,4 n-structure factor 3.5 kbar
myoglobin/D2O 4.5 kbar
0,2
H(Q)
protein helix
0,0
water
-0,2
0,5 0,7 1 2 5 8 10
-1
Q (A )
Fig. 5.17. Comparative neutron scattering myoglobin. The important feature is the partial
experiments at ambient pressure and at high persistence of the protein helical structure
pressure. Plotted is the neutron structure (peak at Q ¼ 0:7 Å1 ) in the pressure-
factor HðQÞ as a function of Q. Data are for denatured state.
relations of water near the protein interface. This maximum increases with density.
The smaller maximum near Q ¼ 0:7 Å1 reflects helix-helix correlations. The con-
trast is provided by the negative scattering length density of the protein hydrogen
atoms versus the positive scattering length density of C, N, and O. The most
remarkable feature is the persistence of this maximum above the transition at
0.45 GPa, indicating residual secondary structure in the pressure-unfolded form.
5.6
Conclusions and Outlook
Folding and denaturing processes of proteins are extremely complicated due to the
complex nature of the protein molecules with their huge manifold of structural
states. Hence, it is surprising that some of the characteristic features of equilib-
rium thermodynamics associated with the folded and denatured state are already
revealed on the basis of simple models. Most important in this context is the re-
duction of the folding and denaturing processes to just two essential states, namely
the native state and the denatured state. To reconcile this crude assumption with
the large structural phase space, we introduced the concept of ‘‘state lumping.’’
Another important approximation concerns the vanishing higher (higher than
two) derivatives of the chemical potential, rendering pressure- and temperature-
independent state parameters (specific heat, compressibility, thermal expansion) to
the protein. The result of these simplifying assumptions is an elliptically shaped
124 5 Pressure–Temperature Phase Diagrams of Proteins
stability phase diagram, provided that the sign of the change in the specific heat
capacity and in the compressibility is the same.
Our experiments on the stability of a modified cytochrome c in a glycerol/water
solvent showed an almost perfect ellipse from which the regimes with positive and
negative volume changes as well as positive and negative entropy changes could be
deduced. We tried to shed some light on the microscopic aspects responsible for
these regimes. A dominant force is the hydrophobic interaction which depends
not only on temperature but also on pressure [57]. The characteristic structural fea-
tures of the pressure-denatured state are different from the respective ones of the
thermally denatured state. In both cases, however, water molecules come very close
to the chromophore.
As to the open questions in context with the stability phase diagram, we refer to
the elliptic shape. So far only closed diagrams have been observed, and in most
cases an elliptic shape was an appropriate description. The question comes up
how general this observation is and what the microscopic implications are. Ad-
mittedly, up to now few experiments have measured the full diagram.
Another problem concerns negative pressure. From an experimental point of
view negative pressures as low as 200 bar seem to be feasible. Denaturation un-
der negative pressure, that is under conditions where the relevant interactions are
weakened, would add valuable information on pressure-induced denaturing forces.
Finally the role of the solvent has to be addressed in more detail. The solvent
may have a dramatic influence on the hydrophobic interaction, the size of the sol-
vent molecules may play an important role in all processes involving pressure, and,
last but not least, understanding the influence of the solvent may also shed light
on the behavior of membrane proteins.
Acknowledgment
The authors acknowledge financial support from the DFG (FOR 358, A1/2), from
the Fonds der Chemischen Industrie (J.F.) and from the Bundesministerium für
Bildung und Forschung 03DOE2M1.
Scientific input to this work came from many of our friends. In particular we
want to thank J. M. Vanderkooi, H. Lesch, and G. Job and F. Herrmann for the
idea to base thermodynamics on entropy and chemical potential.
References
6
Weak Interactions in Protein Folding:
Hydrophobic Free Energy, van der Waals
Interactions, Peptide Hydrogen Bonds,
and Peptide Solvation
Robert L. Baldwin
6.1
Introduction
Hydrophobic free energy has been widely accepted as a major force driving protein
folding [1, 2], although a dispute over its proper definition earlier made this issue
controversial. When a hydrocarbon solute is transferred from water to a nonaqu-
eous solvent, or a nonpolar side chain of a protein is buried in its hydrophobic
core through folding, the transfer free energy is referred to as hydrophobic free en-
ergy. The earlier dispute concerns whether the transfer free energy can be legiti-
mately separated into two parts and the free energy of hydrophobic hydration
treated separately from the overall free energy change [3–5]. If the hydrophobic
free energy is defined as the entire transfer free energy [5], then there is general
agreement that transfer of the nonpolar solute (or side chain) out of water and
into a nonaqueous environment drives folding in a major way. A related concern
has come forward, however, and scientists increasingly question whether the ener-
getics of forming the hydrophobic core of a protein should be attributed chiefly to
packing interactions (van der Waals interactions, or dispersion forces) rather than
to burial of nonpolar surface area. This question is closely related to the issue of
whether the hydrophobic free energy in protein folding should be modeled by
liquid–liquid transfer experiments or by gas–liquid transfer experiments.
The energetic role of peptide hydrogen bonds (H-bonds) was studied as long ago
as 1955 [6] but the subject has made slow progress since then, chiefly because of
difficulty in determining how water interacts with the peptide group both in the
unfolded and folded forms of a protein. Peptide H-bonds are likely to make a sig-
nificant contribution to the energetics of folding because there are so many of
them: about two-thirds of the residues in folded proteins make peptide H-bonds
[7]. Peptide backbone solvation can be predicted from electrostatic algorithms but
experimental measurements of peptide solvation are limited to amides as models
for the peptide group.
This chapter gives a brief historical introduction to the ‘‘weak interactions in pro-
tein folding’’ and then discusses current issues. It is not a comprehensive review
and only selected references are given. The term ‘‘weak interaction’’ is somewhat
misleading because these interactions are chiefly responsible for the folded struc-
tures of proteins. The problem of evaluating them quantitatively lies at the heart of
the structure prediction problem. Although there are methods such as homology
modeling for predicting protein structures that bypass evaluation of the weak inter-
actions, de novo methods of structure prediction generally rely entirely on evaluat-
ing them. Thus, the problem of analyzing the weak interactions will continue to be
a central focus of protein folding research until it is fully solved.
6.2
Hydrophobic Free Energy, Burial of Nonpolar Surface and van der Waals Interactions
6.2.1
History
The prediction in 1959 by Walter Kauzmann [1] that hydrophobic free energy
would prove to be a main factor in protein folding was both a major advance and
a remarkable prophecy. No protein structure had been determined in 1959 and
the role of hydrophobic free energy in structure formation could not be deduced
by examining protein structures. The first protein structure, that of sperm whale
myoglobin, was solved at 2 Å resolution only in 1960 [8]. On the other hand, the
predicted structure of the a-helix [9] given by Pauling and coworkers in 1951,
which was widely accepted, suggested that peptide H-bonds would prove to be the
central interaction governing protein folding. Peptide H-bonds satisfied the intu-
itive belief of protein scientists that the interactions governing protein folding
should be bonds with defined bond lengths and angles. This is not a property of
hydrophobic free energy.
Kauzmann [1] used the ambitious term ‘‘hydrophobic bonds,’’ probably aiming
to coax protein scientists into crediting their importance, while Tanford [10] intro-
duced the cautious term ‘‘the hydrophobic effect.’’ ‘‘Hydrophobic interaction’’ has
often been used because a factor that drives the folding process should be an inter-
action. However, hydrophobic interaction is also used with a different meaning
than removal of nonpolar surface from contact with water, namely the direct inter-
action of nonpolar side chains with each other. The latter topic is discussed under
the heading ‘‘van der Waals interactions.’’ The term ‘‘hydrophobic free energy’’ is
used here to signify that nonpolar groups help to drive the folding process. Tanford
[10] points out that a hydrophobic molecule has both poor solubility in water and
good solubility in nonpolar solvents. Thus, mercury is not hydrophobic because it
is insoluble in both solvents. Early work leading to the modern view of hydropho-
bic free energy is summarized by Tanford [11] and a recent discussion by Southall
et al. [12] provides a valuable perspective.
6.2.2
Liquid–Liquid Transfer Model
Kauzmann [1] proposed the liquid–liquid transfer model for quantitating hydro-
phobic free energy. His proposal was straightforward. Hydrophobic molecules
6.2 Hydrophobic Free Energy, Burial of Nonpolar Surface and van der Waals Interactions 129
prefer to be in a nonpolar environment rather than an aqueous one and the free
energy difference corresponding to this preference should be measurable by parti-
tioning hydrocarbons between water and a nonaqueous solvent. Nozaki and Tan-
ford [13] undertook a major program of using the liquid–liquid transfer model to
measure the contributions of nonpolar and partially nonpolar side chains to the
energetics of folding. They measured the solubilities of amino acids with free a-
COO and a-NH3 þ groups, while Fauchère and Pliska [14] later studied amino
acids with blocked end groups, because ionized end groups interfere with the
validity of assuming additive free energies of various groups. They measured parti-
tioning of solutes between water and n-octanol (saturated with water), which is less
polar than the two semi-polar solvents, ethanol and dioxane, used by Nozaki and
Tanford [13]. Wimley et al. [15] used a pentapeptide host to redetermine the parti-
tion coefficients of the amino acid side chains between water and water-saturated
n-octanol and they obtained significantly different results from those of Fauchère
and Pliska. They emphasize the effect of neighboring side chains (‘‘occlusion’’) in
reducing the exposure of a given side chain to water. Radzicka and Wolfenden [16]
studied a completely nonpolar solvent, cyclohexane, and observed that the transfer
free energies of hydrocarbons are quite different when cyclohexane is the non-
aqueous solvent as compared to n-octanol.
In Figure 6.1 the transfer free energies of model compounds for nonpolar amino
14
L
12 V
I
∆ G (Chx-Oct) (k J mol-1)
10
6
A
M
4 G
F
F
o
-2
-5 0 5 10 15 20 25
o
∆ G (Chx-water) (k J mol-1)
Fig. 6.1. Transfer free energies from cyclo- shown on the plot. Note that the transfer free
hexane to water compared with ones from energies between cyclohexane and n-octanol
cyclohexane to water-saturated n-octanol (data are more than half as large as those from
from Ref. [16]). The model solutes undergoing cyclohexane to water.
transfer represent the amino acid side chains
130 6 Weak Interactions in Protein Folding
acid side chains are compared using either cyclohexane or n-octanol as the nonaqu-
eous solvent [16]. If different nonaqueous solvents may be used equally well to
model the hydrophobic core of a protein, then the transfer free energies of hydro-
carbons from cyclohexane to n-octanol should be small compared with the transfer
free energies from cyclohexane to water. Figure 6.1 shows this is not the case: the
transfer free energies measured between cyclohexane and n-octanol are more than
half as large as the ones between cyclohexane and water. Thus, these results pose
the first serious question about the use of the liquid–liquid transfer model: which
nonaqueous solvent should be used to model the hydrophobic core and how valid
are the results if no single solvent is a reliable model?
6.2.3
Relation between Hydrophobic Free Energy and Molecular Surface Area
A second important step in quantifying hydrophobic free energy was taken when
several authors independently observed that the transfer free energy of a nonpolar
solute is nearly proportional to the solute’s surface area for a homologous series of
solutes [17–19]. This observation agrees with the intuitive notion that the transfer
free energy of a solute between two immiscible solvents should be proportional to
the number of contacts made between solute and solvent (however, see Section
6.2.6). Lee and Richards [20] in 1971 developed an automated algorithm for mea-
suring the water-accessible surface area (ASA) of a solute by rolling a spherical
probe, with a radius equivalent to that of a water molecule (1.4 Å) (10 Å ¼ 1 nm),
over its surface. Their work showed how to make use of the surface area of a solute
to analyze its hydrophobicity. Proportionality between transfer free energy and
ASA does not apply to model compounds containing polar side chains because po-
lar groups interact strongly and specifically with water.
A plot of transfer free energy versus ASA is shown in Figure 6.2 for linear
alkanes. The slope of the line for linear (including branched) hydrocarbons is 31
cal mol1 Å2 (1.30 J mol1 nm2 ) when partition coefficients on the mole fraction
scale [21] are used. Earlier data for the solubilities of liquid hydrocarbons in water
are used to provide the transfer free energies. In Figure 6.2 the transfer free energy
is nearly proportional to the ASA of the solute. The line does not pass through
(0,0), but deviation from strict proportionality is not surprising for small solutes
[22].
Hermann [22] points out that linear hydrocarbons exist in a broad range of con-
figurations in solution and each configuration has a different accessible surface
area. He also points out [17] that the transfer free energy arises from a modest dif-
ference between the unfavorable work of making a cavity in a liquid and the favor-
able van der Waals interaction between solute and solvent. Consequently, a moder-
ate change in the van der Waals interaction can cause a large change in the transfer
free energy. Tanford [10] analyzes the plot of transfer free energy versus the num-
ber of carbon atoms for hydrocarbons of various types, and discusses data for the
different slopes of these plots.
6.2 Hydrophobic Free Energy, Burial of Nonpolar Surface and van der Waals Interactions 131
16
∆ G (kcal mol-1)
12
8
o
0
0 100 200 300 400
ASA
Fig. 6.2. Transfer free energies of linear molarity scale and are corrected for the ratio of
alkanes (from 1 to 10 carbon atoms) from molecular volumes, solute/solvent, according
liquid alkane to water, measured from to Sharp et al. [21]. The data here and in Figure
hydrocarbon solubility in water. They are 6.3 are given in kcal mol1 to conform with the
plotted against water-accessible surface area literature on this subject. Note that the plots
(ASA in Å 2 ). Data are from Ref. [21]. The do not pass through 0,0 and note the larger
uncorrected transfer free energies (filled slope (47 cal mol1 Å2 ) of the corrected plot
circles) refer to the mole fraction scale while versus the uncorrected plot (31 cal mol1 Å2 ).
the corrected values (open circles) refer to the
6.2.4
Quasi-experimental Estimates of the Work of Making a Cavity in Water or
in Liquid Alkane
In modern solution chemistry, the solvation free energy of a solute is defined as its
transfer free energy from the gas phase into the solvent, when the appropriate
standard state concentration (1 M) is used in each phase, as specified by Ben-
Naim and Marcus [23]. Gas–liquid transfer free energies are used to analyze the
nature of liquid–liquid transfer free energies. The reason for adopting the 1 M
standard state concentration in both the gas and liquid phases is to ensure that
the density of the solute is the same in both phases at the standard state concentra-
tion. Then the transfer free energy gives the free energy change for transferring
the solute from a fixed position in the gas phase to a fixed position in the liquid
phase [23]. The solute in the gas phase can be treated as an ideal gas [24, 25]
132 6 Weak Interactions in Protein Folding
and the nonideal behavior of real gases at 1 M concentration can be omitted from
consideration.
Modern theories of solvation indicate that the gas–liquid transfer process can be
formally divided into two steps of an insertion model of solvation: see discussion
by Lee [24, 25] and basic theory by Widom [26]. In step 1 thermal fluctuations cre-
ate a cavity in the liquid with a size and shape appropriate for containing the sol-
ute. The structure of the liquid undergoes reorganization to make the cavity [24].
In step 2 the solute is inserted into the cavity, van der Waals interactions occur
between the solute and the solvent, and the solvent structure undergoes further re-
organization at the surface of the cavity. Lee [24] determines quasi-experimental
values for the entropy and enthalpy changes in the two steps of the insertion
model: (1) making a cavity in the liquid, and (2) inserting the solute into the cavity.
Experimental transfer data are used for each of five alkanes undergoing transfer
from the gas phase to the liquid phase, either to water or to neat liquid alkane.
The transfer thermodynamics then are combined with literature estimates for the
van der Waals interaction energies, obtained by Jorgensen and coworkers [27] from
Monte-Carlo simulations. The results give quasi-experimental estimates of the en-
thalpy and entropy changes in each step of the insertion model. Table 6.1 gives
these values for the free energy cost of making a cavity to contain the alkane solute
both in water and in liquid alkane. The 1977 theory of hydrophobic solvation by
Pratt and Chandler [28] divides the process of solvation into the two steps of cavity
formation and solute insertion, and the authors consider the rules for separating
the solvation process into these two steps. The two steps of dissolving an alkane
in water have also been simulated by molecular dynamics and the results analyzed
by the free energy perturbation method [29].
The following conclusions can be drawn from Lee’s data [24, 25]. (1) The work
of making a cavity in water is much larger than in liquid alkane and this difference
Tab. 6.1.Quasi-experimental estimates of the free energy of cavity formation and simulation-
based results for van der Waals interaction energies between solvent and solutea.
is the major factor determining the size of the hydrophobic free energy in liquid–
liquid transfer. For example, it costs 46.5 kJ mol1 to make a cavity for neopentane
in water but only 27.6 kJ mol1 in liquid neopentane. To understand hydrophobic
free energy, it is necessary first of all to understand the free energies of cavity for-
mation in water and in nonpolar liquids. The work of making a cavity in water is
large because it depends on the ratio of cavity size to solvent size [30, 31] and water
is a small molecule (see Section 6.2.7). It is more difficult to make a cavity of given
size by thermal fluctuations if the solvent molecule is small. (2) The work of mak-
ing a cavity in a liquid is chiefly entropic [24], while the van der Waals interactions
between solute and solvent are enthalpic. (3) The van der Waals interaction energy
between an alkane solute and water is nearly the same as between the alkane sol-
ute and liquid alkane (see Table 6.1). For example, the interaction energies between
neopentane and water versus neopentane and liquid neopentane are 36.0
kJ mol1 and 39.6 kJ mol1 [24]. Earlier, Tanford [32] used interfacial tensions
to show that the attractive force between water and hydrocarbon is approximately
equal, per unit area, to that between hydrocarbon and hydrocarbon. Scaled particle
theory predicts well the work of making a cavity either in water or in liquid alkane
[24], but it predicts only semi-quantitatively the enthalpy of solvent reorganization
for these cavities.
6.2.5
Molecular Dynamics Simulations of the Work of Making Cavities in Water
6.2.6
Dependence of Transfer Free Energy on the Volume of the Solute
Evidence is discussed in Section 6.2.3 that the transfer free energy is correlated
with the surface area (ASA) of the solute. Because it is straightforward to compute
ASA [20] from the structure of a peptide or protein, this correlation provides a very
useful means of computing the change in hydrophobic free energy that accom-
panies a particular change in conformation. In recent years, evidence has grown,
however, that the transfer free energy of a nonpolar solute depends on its size and
shape for reasons that are independent of hydrophobic free energy. In 1990
DeYoung and Dill [35] brought the problem forcibly to the attention of protein
chemists by demonstrating that the transfer free energy of benzene from liquid hy-
drocarbon to water depends on the size and shape of the liquid hydrocarbon mole-
cules. Section 6.2.2 reviews evidence that liquid–liquid transfer free energies de-
pend on the polarity (and perhaps on water content) of the nonaqueous solvent.
But in the study by DeYoung and Dill [35] the size and shape of the nonaqueous
solvent molecules affect the apparent hydrophobic free energy. A large literature
has developed on this subject and recently Chan and Dill [36] have provided a com-
prehensive review.
Chandler [37] briefly discusses the reason why solvation free energy in water de-
pends on the volume of a sufficiently small nonpolar solute. This dependence can
be found in both the information-theory model [34] and the Lum-Chandler-Weeks
theory [38] of hydrophobic solvation. Effects of the size and shape of the solute are
taken into account in the Pratt and Chandler theory [28].
Stimulated by the results of DeYoung and Dill [35], Sharp and coworkers [21]
used a thermodynamic cycle and an ideal gas model to relate the ratio of sizes, sol-
ute to solvent, to the transfer free energy for gas–liquid transfer. They conclude
that the transfer free energy depends on the ratio of solute/solvent molecular vol-
umes. Their paper has generated much discussion and controversy. In 1994 Lee
[39] gave a more general derivation for the transfer free energy, based on statistical
mechanics, and considered possible assumptions that will yield the result of Sharp
et al. [21].
The Lum-Chandler-Weeks theory of hydrophobic solvation [38] predicts a cross-
over occurring between the solvation properties of macroscopic and microscopic
cavities when the cavity radius is 10 Å. Huang and Chandler [40] point out that
the ratio of the work of making a cavity in water to its surface area reaches a pla-
teau value for radii above 10 Å, and this value agrees with the known surface ten-
sion of water at various temperatures. On the other hand, the hydrophobic free
energy found from hydrocarbon transfer experiments increases slightly with tem-
perature (see Section 6.2.9), implying that the work of making a sufficiently small
cavity in water increases with temperature. Chandler [37] explains that these
two different outcomes, which depend on solute size, arise naturally from the H-
bonding properties of water, because the sheath of water molecules that surrounds
a nonpolar solute remains fully H-bonded when the solute is sufficiently small but
6.2 Hydrophobic Free Energy, Burial of Nonpolar Surface and van der Waals Interactions 135
not when the solute radius exceeds a critical value. Southall and Dill [41] find that
a highly simplified model of the water molecule (the ‘‘Mercedes-Benz’’ model),
which reproduces several remarkable properties of water, also predicts such a tran-
sition from microscopic to macroscopic solvation behavior.
The question of interest to protein chemists is: should a transfer free energy be
corrected for the ratio of solute/solvent volumes or not? Figures 6.2 and 6.3 com-
pare the uncorrected with the volume-corrected plots of transfer free energy versus
ASA, for both liquid–liquid transfer (Figure 6.2) and gas–liquid transfer (Figure
6.3). Both correlations show good linearity. However, the hydrophobic free energy
corresponding to a given ASA value is substantially larger if the volume-corrected
transfer free energy is used (see Ref. [21]). Whether the volume correction should
be made remains controversial. Scaled particle theory emphasizes the role of sur-
face area in determining the free energy of cavity formation while the information-
theory model [34] and the Lum-Chandler-Weeks theory [38] both emphasize the
10
8
∆ G (kcal mol-1)
4
o
0
100 150 200 250 300 350 400
ASA
Fig. 6.3. Transfer free energy from the gas while the corrected values (filled squares) refer
phase to liquid water for linear alkanes (from 1 to the molarity scale and are volume-corrected
to 10 carbon atoms) plotted against water- according to Ref. [21]. The slopes of the lines
accessible surface area (in Å 2 ). Data are from are 5.5 cal mol1 Å2 (uncorrected) and 24
Ref. [21]. The uncorrected transfer free energies cal mol1 Å2 (corrected).
(filled circles) refer to the mole fraction scale
136 6 Weak Interactions in Protein Folding
6.2.7
Molecular Nature of Hydrophobic Free Energy
The molecular nature of hydrophobic free energy has been controversial for a long
time [3, 4, 11, 12]. A long-standing proposal, supported by liquid–liquid transfer
data at 25 C [1] and by simulation results [27], is that the arrangement of water
molecules in the solvation shell around a dissolved hydrocarbon is entropically un-
favorable. Consequently the unfavorable entropy change for dissolving a hydrocar-
bon in water should provide the driving force for expelling the solute from water.
This proposal required modification when it was learned that the hydrophobic free
energy found from liquid–liquid transfer is purely entropic only at an isolated
temperature near room temperature [43]. Hydrophobic free energy becomes in-
creasingly enthalpy-driven as the temperature increases and it becomes entirely
enthalpic upon reaching its maximum value at a temperature above 100 C (see
Section 6.2.9). The characteristic property of hydrophobic free energy that domi-
nates its temperature-dependent behavior is the large positive value of DCp , the
difference between the heat capacities of the hydrocarbon in water and in the non-
aqueous solvent.
In contradiction to the thesis developed by Kauzmann [1], Privalov and Gill [3, 4]
proposed that the hydration shell surrounding a dissolved hydrocarbon tends to
stabilize the hydrocarbon in water while the van der Waals interactions between
the hydrocarbon solute and the nonaqueous solvent account for the hydrophobicity
of the solute. They assume that the van der Waals interaction energy between sol-
ute and solvent is large in the nonaqueous solvent compared to water. However,
Lee’s data (see Table 6.1) show that the large work of making a cavity in water is
responsible for the hydrophobic free energy while a hydrocarbon solute makes
nearly equal van der Waals interactions with water and with a liquid hydrocarbon
[24]. Privalov and Gill coupled two proposals: (1) the van der Waals interactions
between nonpolar side chains drive the formation of the hydrophobic core of a pro-
tein, and (2) the hydration shell surrounding a dissolved hydrocarbon tends to sta-
bilize it in water. The latter proposal is now widely believed to be incorrect but
there is increasing interest in the first proposal.
There are restrictions both on the possible orientations of water molecules in the
solvation shell around a hydrocarbon solute and on the hydrogen bonds they form
[12]. Models suggesting how these restrictions can explain the large positive values
of DCp found for nonpolar molecules in water have been discussed as far back as
1985, in Gill’s pioneering study of the problem [44]. Water-containing clathrates of
nonpolar molecules surrounded by a single shell of water molecules have been
6.2 Hydrophobic Free Energy, Burial of Nonpolar Surface and van der Waals Interactions 137
crystallized and their X-ray structures determined [45]. The water molecules form
interconnected 5- and 6-membered rings. Water molecules in ice I are oriented tet-
rahedrally in a lattice and the oxygen atoms form six-membered rings [45].
In 1985 Lee [30] used scaled particle theory to argue that the low solubilities of
nonpolar solutes in water, and the magnitude of hydrophobic free energy, depend
strongly on the solute/solvent size ratio, and he reviews prior literature on this sub-
ject. Rank and Baker [31] confirmed his conclusion with Monte-Carlo simulations
of the potential of mean force between two methane molecules in water. The
remarkable temperature-dependent properties of hydrophobic free energy (see Sec-
tion 6.2.9) are determined, however, chiefly by DCp, which depends on the hydro-
gen bonding properties of water according to most authors (see Ref. [46]).
6.2.8
Simulation of Hydrophobic Clusters
Formation of the hydrophobic core of a protein during folding must proceed by di-
rect interaction between nonpolar side chains. Yet direct interaction between two
hydrocarbon molecules in water is known to be extremely weak (compare Ref.
[28]). Benzene dimers or complexes between benzene and phenol in water are
barely detectable [47]. Raschke, Tsai and Levitt used molecular dynamics to simu-
late the formation of hydrophobic clusters, starting from a collection of isolated hy-
drocarbon molecules in water [48]. Their results give an interesting picture of the
thermodynamics of the process. The gain in negative free energy from adding a
hydrocarbon molecule to a hydrocarbon cluster in water increases with the size of
the cluster until limiting behavior is reached for large clusters. The simulations
of cluster formation yield a proportionality between transfer free energy and burial
of nonpolar surface area that is similar to the one found from liquid–liquid trans-
fer experiments [48]. The simulation results make the important point that a stan-
dard molecular force field is able to simulate the thermodynamics of hydrophobic
free energy when a hydrophobic cluster is formed in water [48]. Rank and Baker
[49] found that solvent-separated hydrocarbon clusters precede the desolvated
clusters found in the interior of large hydrocarbon clusters. Thus, a hydrocarbon
desolvation barrier may be important in the kinetics of protein folding [49]. In
both simulation studies [48, 49], the authors find that the molecular surface area
(defined by Richards [50]) is more useful than water-accessible surface area in ana-
lyzing cluster formation.
6.2.9
DC p and the Temperature-dependent Thermodynamics of Hydrophobic Free Energy
Although the gas–liquid transfer model is now often used instead of the liquid–
liquid transfer model to analyze hydrophobic free energy, nevertheless thermo-
dynamic data for the transfer of liquid hydrocarbons to water are remarkably
successful in capturing basic thermodynamic properties of hydrophobic free en-
ergy in protein folding. Figure 6.4A shows DH versus temperature for the process
138 6 Weak Interactions in Protein Folding
A 6
4
∆ H (kJ mol-1)
-2
-4
10 15 20 25 30 35 40
o
T ( C)
Fig. 6.4. (A) Enthalpy of transfer plotted liquid hydrocarbon to water. Data are from Ref.
against temperature for three liquid [51]. Note that the plots are straight lines in
hydrocarbons (benzene (filled circles), this temperature range, where DCp ¼ constant,
ethylbenzene (open circles) and cyclohexane and DH passes through 0 near room tem-
(filled squares)) undergoing transfer from perature.
Figure 6.4B compares DH; TDS and DG as functions of temperature for the
transfer of benzene from liquid benzene to water. (The behavior of a liquid alkane,
6.2 Hydrophobic Free Energy, Burial of Nonpolar Surface and van der Waals Interactions 139
B 25
20
15
kJ mol-1
10
-5
10 15 20 25 30 35 40
o
T ( C)
Fig. 6.4. (B) Transfer free energy and the contribution to DG (open squares) from DH
contributions to free energy from DH (filled increases with temperature while the
circles) and TDS (filled squares), plotted contribution of TDS decreases, and DG
against temperature, for the transfer of increases only slightly with temperature. The
benzene from liquid benzene to water. Data plots illustrate how entropy–enthalpy
are from Ref. [43]. Note that the relative compensation affects hydrophobic free energy.
pentane, is fairly similar to that of the aromatic hydrocarbon benzene, see figure 12
of Ref. [3].) Substituting Eqs (1) and (2) into the standard relation
gives
DG ðT2 Þ ¼ DG ðT1 Þ þ DCp ðT2 T1 Þ T2 DCp lnðT2 =T1 Þ ð4Þ
Equation (4) shows that the DCp -induced changes in DH and DS with temperature
tend to compensate each other to produce only a small net increase in DG as the
temperature increases. This property of enthalpy–entropy compensation is one of
the most characteristic features of hydrophobic free energy. (DG and DS depend
strongly on the solute concentration and the superscript emphasizes that DG
and DS refer to the standard state concentration.)
140 6 Weak Interactions in Protein Folding
10
∆ S (J/deg/mol) -10
-20
-30
-40
-50
-60
-70
-0.05 0 0.05 0.1 0.15 0.2 0.25 0.3
ln(Ts /T)
Fig. 6.5.Decrease of DS towards 0 as decrease in DS with temperature [43], which
temperature (K) approaches Ts (386 K), the depends on using data from the temperature
temperature at which DS ¼ 0. The transfer of range (15–50 C) where DCp ¼ constant (see
benzene from liquid benzene to water is figure 12 of Ref. [3]). Ts ¼ 386 K is an average
shown. Data are from Ref. [43]. Note the linear value for several hydrocarbon solutes [43].
Figures 6.4A and 6.4B illustrate some basic thermodynamic properties of hy-
drophobic free energy as modeled by liquid–liquid transfer. The enthalpy change
DH is zero near room temperature (the exact temperature depends on the hydro-
carbon) and the transfer process is entropy-driven around room temperature. How-
ever, as the temperature increases above 25 C, the transfer process gradually
becomes enthalpy-driven. A surprising property is that of entropy convergence:
if DCp ¼ constant, then different hydrocarbons share a convergence temperature
at which DS ¼ 0 (386 K or 113 C) [43]. Data taken between 15 and 35 C are
extrapolated linearly versus ln T in Figure 6.5, according to Eqn (2). When the
gradual decrease in DCp at temperatures above 50 C [3] is taken into account by
using a curved extrapolation, hydrocarbon data for DS still approach a common
value near Ts, the temperature at which the entropy change is 0, which is approxi-
mately 140 C [3, 4]. The property of entropy convergence is predicted both by the
information-theory model [53] and scaled particle theory [54].
Privalov [55] discovered in 1979 that values for the specific entropy change on
protein unfolding converge near 113 C when results for some different proteins
6.2 Hydrophobic Free Energy, Burial of Nonpolar Surface and van der Waals Interactions 141
are extrapolated linearly versus temperature. His observation, taken together with
the entropy convergence shown by hydrocarbon solutes, suggests that the hydro-
phobic entropy change of a protein unfolding reaction might be removed by extrap-
olating the total entropy change to Ts for hydrocarbon transfer [43]. However,
when Robertson and Murphy [56] analyzed unfolding data from several labora-
tories, they found more scatter in the data and they did not find a common inter-
cept for the specific entropy change on protein unfolding. The extrapolation offers
a possible route to determining the change in conformational entropy on unfold-
ing, one of the basic unsolved problems in protein folding. The hydrophobic con-
tribution to the entropy of unfolding is opposite in sign and, at room temperature,
comparable in size to the change in conformational entropy [43].
Schellman [57] points out that there are advantages to using DG=T as an index
of stability instead of DG itself when considering the temperature dependence of
hydrophobic free energy. The low solubilities of hydrocarbons in water have mini-
mum values near room temperature, corresponding to the maximum values of
DG=T and to DH ¼ 0, whereas DG itself increases steadily but slowly with temper-
ature until it reaches a maximum value above 100 C, at a temperature where
DS ¼ 0.
A standard test of whether the hydrophobic free energy found from liquid–liquid
transfer experiments applies to protein folding experiments is to compare the
unfolding free energy change caused by a ‘‘large to small’’ mutation with the
free energy change predicted from the change in ASA on unfolding. Pace [58] sur-
veyed the literature on this subject (see also Ref. [21]). He concludes there is good
agreement provided the proportionality coefficient between DDG and DASA is the
volume-corrected value 47 cal mol1 Å2 [21]. However, this type of comparison
between liquid–liquid transfer and protein folding thermodynamics is complicated
by the presence of cavities produced when large-to-small mutations are made [59].
A direct test of the correspondence between liquid–liquid transfer thermody-
namics and protein folding is to compare the value of DCp for a protein unfolding
experiment with the value predicted from the change in ASA on unfolding. No
other factor besides the burial of nonpolar surface area is known to make a signif-
icant positive contribution to DCp . This comparison was studied in 1991 by Record
and coworkers [60], who found that values of DCp measured in unfolding experi-
ments do agree with the ones predicted from the change in nonpolar ASA on un-
folding. The issue became somewhat complicated, however, by contemporary work
showing that polar groups (and especially the peptide group) contribute negatively
to DCp for unfolding. Makhatadze and Privalov [61] gave extensive model com-
pound data and used them to estimate contributions to the enthalpy and heat ca-
pacity changes on unfolding of all groups present in proteins. Freire and coworkers
[62] used empirical calibration of protein unfolding data to give the contributions
to DCp expected from the changes in polar and nonpolar ASA on unfolding. The
issue is seriously complicated, however, by evidence from the study of model com-
pounds that the enthalpies of interaction with water are not related in any simple
way to polar ASA: see Table 6.2 and Section 6.3.2. To make progress in analyzing
this problem, it is important to get data for the DCp values accompanying unfold-
142 6 Weak Interactions in Protein Folding
ing of peptide helices and progress has been reported recently by Richardson and
Makhatadze [63].
A long-standing puzzle in comparing the thermodynamics of protein unfolding
with those of liquid–liquid transfer has been the unfolding results produced by
high pressures [64]. The information-theory model [65] provides new insight into
the problem by predicting that water penetrates the hydrophobic cores of proteins
during pressure-induced unfolding, as observed experimentally [66], but does not
penetrate the hydrophobic cores during thermal unfolding.
6.2.10
Modeling Formation of the Hydrophobic Core from Solvation Free Energy and
van der Waals Interactions between Nonpolar Residues
Scheme 6.1
6.2 Hydrophobic Free Energy, Burial of Nonpolar Surface and van der Waals Interactions 143
change for folding in aqueous solution. Process A is burial (removal from water) of
nonpolar side chains (or moieties) in steps 1 and 3, and process B is formation of
van der Waals interactions between nonpolar side chains (or moieties) in step 2.
Let DG bur and DGvdw be the contributions to DG UN from processes A and B in
Eq. (5).
Let DASA be the net change in nonpolar solvent-accessible surface area in steps
1 þ 3. Evidence is presented above that DG bur can be related to DASA even for mi-
croscopic cavities. Thus DG bur can be represented by
in which b is the proportionality coefficient between DG solv and ASA for alkanes
in transfer experiments from the vapor phase to aqueous solution. Figure 6.3
shows good linearity between DG solv and ASA for several alkanes, and b is rea-
sonably constant. However, the value of b changes substantially (from 5.5 to 24
cal mol1 Å2 [21]) when volume-corrected transfer free energy is used. Evaluation
of DGvdw depends sensitively on parameters that are difficult to determine experi-
mentally, and there is little discussion in the literature of how mutations cause
DGvdw to vary. See, however, the discussion of DGvdw values for selected proteins
by Makhatadze and Privalov [61] and note that the number of van der Waals con-
tacts is being discussed in mutational studies of packing [70].
For the process of forming the hydrophobic core, DGvdw can be estimated ap-
proximately in the following manner. First, consider the empirical relation between
DG UN and DASA given by the liquid–liquid transfer model.
In Eq. (7), DASA is the net value of ASA buried upon folding and B is the propor-
tionality factor between liquid–liquid transfer free energy and ASA. Combining
Eqns (5)–(7) gives:
Data for the melting of solid crystalline alkanes show that DH is substantial and
favors the solid alkane, but DG for melting is small because there is a substantial
DS that favors the liquid alkane [72]. These results have been used to argue that a
liquid hydrocarbon should model satisfactorily the hydrophobic core of a protein
[72], and therefore the van der Waals interactions need not be considered explicitly.
6.2.11
Evidence Supporting a Role for van der Waals Interactions in Forming the
Hydrophobic Core
An important role for van der Waals interactions in protein folding is suggested by
the unusual close-packed arrangement of the side chains in folded proteins. This
argument was pointed out in 1971 by Klapper, who compares the protein core to
a wax ball rather than an oil drop [73], and in 1977 by Richards, who compares
the protein core to an organic crystal [50]. The fraction of void space in folded pro-
teins is quite small and comparable to that in close-packed spheres, whereas there
is a sharp contrast between the small void space in folded proteins and the large
void space in nonpolar liquids, as pointed out by these authors. In 1994 Chothia
and coworkers [74] made a detailed study of the volumes of aliphatic side chains
inside proteins and found they are substantially smaller than in aqueous solution.
Water molecules are nevertheless rather well packed around hydrocarbons in aque-
ous solution and hydrocarbons occupy larger volumes in nonpolar solvents than in
water, as pointed out by Klapper [73], who found good agreement between experi-
mental data and predictions based on scaled particle theory.
Stites and coworkers have made extensive mutational experiments on the hydro-
phobic core of staphylococcal nuclease (SNase) [70, 75] including many single,
double, triple, and quadruple mutants. The authors measure interaction energies
by taking the difference between the DG values of two single mutants and the
double mutant, and so forth. Their overall results suggest that the packing arrange-
ment of nonpolar side chains in the core is an important determinant of SNase
stability because favorable interaction energies are associated with certain pairs
of residues [75]. They tested this interpretation by constructing seven multiple
mutants that were predicted to be hyperstable [75]. The X-ray structures of five mu-
tants were determined. The mutants uniformly show high temperatures for ther-
mal unfolding ðTm Þ. The largest increase in Tm is 22.9 C and the increase in Tm
correlates with the number of van der Waals contacts. The overall results indicate
that increased protein stability is correlated with an increased number of van der
Waals contacts, and certain pairs of buried side chains make characteristic contri-
butions to the number of van der Waals contacts.
A novel approach to analyzing the thermodynamics of burial of nonpolar side
chains in protein folding was undertaken by Varadarajan and coworkers [76, 77].
They measure DDH of folding for mutants with varying extents of burial of nonpo-
lar surface area. Note that DDH should be small near 25 C for burial of nonpolar
surface area, according to liquid–liquid transfer experiments with hydrocarbons
(see Figures 6.4A and 6.4B). On the other hand, van der Waals packing interactions
6.3 Peptide Solvation and the Peptide Hydrogen Bond 145
are enthalpy-driven and larger values of DDH are expected if the folding thermody-
namics are dominated by packing interactions. The authors are able to measure
DH directly at fixed temperatures by titration calorimetry for their ribonuclease
S system (which is composed of two dissociable species, S-peptide þ S-protein) by
titrating unfolded S-peptide versus folded S-protein. They determine the X-ray
structures of the mutant RNase S species and measure DH for folding S-peptide
at varying temperatures. They find substantial DDH values for the mutants and
conclude that van der Waals packing interactions are responsible; see also the
study of cavity mutants of T4 lysozyme by Matthews and coworkers [59].
Zhou and Zhou [78, 79] present an interesting analysis of the energetics of bury-
ing different side-chain types. They construct a database of mutational experiments
on protein folding from the literature and extract the ‘‘transfer free energy’’ contri-
buted by each amino acid side chain to the unfolding free energy: the transfer pro-
cess is from the folded protein to the unfolded protein in aqueous solution. Then
the authors simulate the mutational results by a theoretical analysis based on a po-
tential of mean force, and plot the results as DDG versus DASA for each side-chain
type. The resulting plots are similar for the various nonpolar amino acids, and the
slopes are in the range 25–30 cal mol1 Å2 . Interestingly, the plots for polar
amino acids are similar in character but have substantially smaller slopes. The au-
thors conclude that it is energetically favorable to fold polar as well as nonpolar
side chains, but folding polar side chains contributes less to stability. There should
be an energetic penalty if the polar moiety of a side chain is buried without making
an H-bond, because the polar group interacts favorably with water (see below).
6.3
Peptide Solvation and the Peptide Hydrogen Bond
6.3.1
History
Interest in the role of the peptide hydrogen bond in stabilizing protein folding took
off when Pauling and Corey proposed structures for the a-helix and the parallel
and antiparallel b-sheets, and when Schellman analyzed [6] the factors that should
determine the stability of the helix backbone in water. He estimated the net en-
thalpy change for forming a peptide H-bond in water as 1.5 kcal mol1 , based
on heat of dilution data for urea solutions [80], and he concluded that an isolated
a-helix backbone should be marginally stable in water and might or might not be
detectable [6]. Klotz and Franzen [81] investigated by infrared spectroscopy wheth-
er H-bonded dimers of N-methylacetamide are detectable in water and concluded
that they have at best marginal stability: see also a later study of the dimerization
of d-valerolactam [82].
Determination of the first X-ray structures of proteins sparked interest in the
problem of whether the peptide sequence for a protein helix can form the helix in
water. Initial results based on circular dichroism (CD) (or optical rotatory disper-
146 6 Weak Interactions in Protein Folding
sion) spectroscopy were disappointing. In 1968 Epand and Scheraga [83] examined
two peptides obtained by cyanogen bromide cleavage from the highly helical pro-
tein sperm whale myoglobin and found very low values for the possible helix con-
tents. In 1969 Taniuchi and Anfinsen [84] cleaved staphylococcal nuclease, which
has 149 residues, between residues 126 and 127 and found that both fragments 1–
126 and 127–149 (each of which has a helix of modest size) were structureless by
CD as well as by other physical criteria. In 1969–1971 Klee obtained tantalizing re-
sults, suggestive of some helix formation at low temperatures, with the ‘‘S-peptide’’
(residues 1–20) [85] and ‘‘C-peptide’’ (residues 1–13) [86] of ribonuclease A. (The
RNase A helix contains residues 3–12.) The helix problem languished for 11 years,
perhaps because of the disappointing results found in the laboratories of Anfinsen
and Scheraga. In 1982, Bierzynski et al. [87] used NMR as well as CD spectroscopy
to reinvestigate the claim of Brown and Klee [86] and confirmed that partial helix
formation occurs. Bierzynski et al. found that helix formation is strongly pH de-
pendent and the pK values indicate that both a His residue and a Glu residue are
required for helix formation, thus implicating specific side-chain interactions [87].
Several kinds of helix-stabilizing side-chain interactions were later studied, such as
salt bridges, amino–aromatic interactions, and charge–helix dipole interactions
[88]. The helix formed by C-peptide or S-peptide has basic properties in common
with the protein helix of RNase A: both the peptide and protein helices have the
same two helix-stabilizing side-chain interactions and the same helix termination
signals. Today hundreds of peptides with sequences derived from proteins form
partly stable helices in water [89]. The helix termination signals are well-studied
[90] and consistent results are found from protein and peptide helices. The AGA-
DIR algorithm of Muñoz and Serrano [89] fits the helix contents of peptides by sta-
tistically evaluating the many parameters describing side-chain interactions and
helix propensities.
In 1989 Marqusee et al. [91] found that alanine-rich peptides form stable helices
in water without the aid of specific side-chain interactions. Of the 20 natural amino
acids, only alanine has this ability [92]. Because alanine has only a aCH3 group for
a side chain, this result strongly suggests that the helix backbone itself has measur-
able stability in water and that side chains longer than alanine somehow destabi-
lize the helix. In 1985 Dyson and coworkers used NMR to detect a reverse turn
conformation in a nine-residue peptide [93], and later studied the specific se-
quence requirements for forming this class of turn in peptides [94]. In 1994 Ser-
rano and coworkers [95] found the b-hairpin conformation in a protein-derived
peptide, and other b-hairpin sequences were soon found. Thus, H-bonded second-
ary structures derived from proteins can in general be studied in peptides in aque-
ous solution.
Determination of the energetic nature of the peptide H-bond was a chief motiva-
tion for the pioneering development in 1979 of a molecular force field by Lifson
and coworkers [96]. They concluded that the peptide H-bond is represented to a
good first approximation by an electrostatic model based on partial charges resid-
ing on the peptide CO and NH groups. Their analysis indicates that the peptide
6.3 Peptide Solvation and the Peptide Hydrogen Bond 147
CO and NH groups interact with water as well as with each other. Solvation of the
peptide polar groups has been analyzed by using amides such as N-methylaceta-
mide (NMA) as models for the peptide group.
6.3.2
Solvation Free Energies of Amides
Values of DGðpolÞ are given for four amides in Table 6.2, taken from Avbelj et al.
[100]. Although the correction term DGðnpÞ is obtained by the approximate proce-
dure of using experimental data for nonpolar solutes, the correction term is fairly
148 6 Weak Interactions in Protein Folding
Tab. 6.2. Solvation free energy data for amides as models for the free peptide groupa.
small (9.0 kJ mol1 for N-methylacetamide) compared with the observed solvation
free energy (42.1 kJ mol1 ) or the value of DGðpolÞ (51.2 kJ mol1 ).
The value of b (Eq. (6)) used by Sitkoff et al. [99] (to obtain values for DGðnpÞ in
Eq. (10)) is based on transfer free energies that are not volume-corrected [21, 99].
The resulting values of DGðpolÞ (obtained from Eq. (9) for a database of model
compounds) have been used to calibrate the PARSE parameter set of the DelPhi
algorithm [99] (see Section 6.3.5). For this reason, the values of DGðnpÞ in Table
6.1 are based on transfer free energies that are not volume-corrected.
The enthalpy of interaction between the polar groups and water, DHðpolÞ, has
been obtained [100] from calorimetric data in the literature after approximating
DHðnpÞ by experimental data for alkanes, as before.
The value for DHðobsÞ in Eq. (11) is found by combining results from two calori-
metric experiments. The first experiment yields the heat of vaporization of the liq-
uid amide and the second experiment yields the heat of solution of the liquid
amide in water. The value of DHðnpÞ is found from the heats of solution, plotted
against ASA, of gaseous hydrocarbons in water, for which the heats of solution de-
pend chiefly [24] on the favorable van der Waals interaction energy of the hydrocar-
bon solute with water.
Some interesting conclusions may be drawn from Table 6.2: (1) The four differ-
ent amides have similar values for DGðpolÞ and also for DHðpolÞ. Although there
6.3 Peptide Solvation and the Peptide Hydrogen Bond 149
is a threefold difference between the values of polar ASA for acetamide and N,N-
dimethylacetamide, their values of DGðpolÞ differ by less than 10%. Thus, the
computationally convenient approximation of assuming proportionality between
DGðpolÞ and polar ASA [101] is not a satisfactory assumption for amides. The po-
lar groups of an amide interact strongly with water even when they are only partly
exposed. (2) The value of DGðpolÞ is nearly equal to the value of DHðpolÞ for each
amide, and so the TDSðpolÞ term must be small by comparison. This conclusion is
surprising if one expects a large entropy change when water interacts with polar
groups. It is not surprising, however, after considering the entropies of hydration
of crystalline hydrate salts [102] or the prediction [103] made from the Born equa-
tion that DGðpolÞ is almost entirely enthalpic. The basic reason why the entropy
term looks small is that the enthalpy term is large. (3) The values of DGðpolÞ are
impressively large for amides. NMA is sometimes taken as a model for studying
the interaction between water and the peptide group, and its value of DGðpolÞ is
51.2 kJ mol1 . This is a huge value for only one peptide group: DG for folding
a 100-residue protein is typically 40 kJ mol1 or less. Evidently solvation of the pep-
tide group is a major factor in the energetics of folding: see Wolfenden [97] and
Makhatadze and Privalov [61].
6.3.3
Test of the Hydrogen-Bond Inventory
Protein chemists often use the H-bond inventory to interpret results of mutational
experiments when the experiment involves a change in the number of hydrogen
bonds [104, 105]. Experimental data for amides provide a direct test of the H-bond
inventory [106]. A key feature of the H-bond inventory is that water (W) is treated
as a reactant and the change in the total number of W W H-bonds is included
in the inventory. When discussing the formation of a peptide H-bond (CO HN)
in water, the H-bond inventory takes the form of Eq. (12).
CO W þ NH W ¼ CO HN þ W W ð12Þ
The peptide NH and CO groups are predicted to make one H-bond each to water
in the unfolded peptide, and the CO and NH groups are predicted not to interact
further with water after the peptide H-bond is made. Gas phase calculations of H-
bond energies indicate that all four H-bond types in Eq. (12) have nearly the same
energy [107], 25 G 4 kJ mol1 , and consequently the net enthalpy change pre-
dicted for forming a peptide H-bond in water is 0 G 4 kJ mol1 . The enthalpy of
sublimation of ice has been used to give an experimental estimate of 21
kJ mol1 for the enthalpy of the W W H-bond [45].
When a dry molecule of amide (Am) in the gas phase becomes hydrated upon
solution in water, the H-bond inventory model postulates that the CO and NH
groups each make one H-bond to water and one W W H-bond is broken:
Thus, the H-bond inventory predicts a net increase of one H-bond when a dry
amide becomes solvated, with a net enthalpy change of 25 G 4 kJ mol1 . But
the experimental enthalpy change for solvating the amide polar groups is @50
kJ mol1 (Table 6.2) and the H-bond inventory fails badly.
6.3.4
The Born Equation
In 1920 Max Born wanted to know why atom-smashing experiments can be visual-
ized in Wilson’s cloud chamber [108]: an ion leaves a track of tiny water droplets as
it passes through supersaturated water vapor. He computed the work of charging a
spherical ion in vacuum and in water, and he treated water as a continuum solvent
with dielectric constant D. He gave the favorable change in free energy for transfer-
ring the charged ion from vacuum to water as:
in which q is the charge and r is the radius of the ion [109]. This simple calculation
answers Born’s question: the free energy change is enormous, of the order of 420
kJ mol1 for a monovalent ion transferred to water. Consequently the ion easily
nucleates the formation of water droplets. Inorganic chemists – notably W. M.
Latimer – realized that Born’s equation provides a useful guide to the behavior of
solvation free energies. Note that the solvation free energy predicted by Born’s
equation is inversely proportional to the ionic radius, but which radius should be
used? Ions are strongly solvated and the solvated radius is subject to argument. Ra-
shin and Honig [103] argued that the covalent radius is a logical choice and gives
consistent results for different monovalent anions and cations. They also point out
that one can obtain the enthalpy of solvation from Born’s equation and it predicts
that, if the solvent is water with a high dielectric constant (near 80), the solvation
free energy is almost entirely enthalpy. Quite recently the solvation free energies of
monovalent ions have been analyzed by a force field employing polarizable water
molecules [110], with excellent agreement between theory and experiment.
6.3.5
Prediction of Solvation Free Energies of Polar Molecules by an Electrostatic Algorithm
[112]. The latter treatment includes the effect of electrostatic solvation (the ‘‘Born
term’’) as well as the electrostatic interaction between the two charged carboxylate
groups.
Various electrostatic algorithms are in current use [99, 113] today, including one
based on using Langevin dipoles [113] to treat the polarization of water molecules
in the vicinity of charged groups. The focus here is on the DelPhi algorithm of
Honig and coworkers [99], because the PARSE parameter set of DelPhi has been
calibrated against a database of experimental solvation free energies for small mol-
ecules that includes amides [99]. Thus, DelPhi may plausibly be used to predict the
solvation free energies of the polar CO and NH groups of peptides in various con-
formations. There are no adjustable parameters and the predicted values of elec-
trostatic solvation free energy (ESF) may be compared directly with experiments if
suitable data are available. The DelPhi algorithm uses a low dielectric ðD ¼ 2Þ cav-
ity to represent the shape of the solute while the partial charges of the solute are
placed on a finely spaced grid running through the cavity, and the solvent is repre-
sented by a uniform dielectric constant of 80 [99]. The results are calculated by
Poisson’s equation if the solvent does not contain mobile ions, or by the Poisson-
Boltzmann equation if it does.
6.3.6
Prediction of the Solvation Free Energies of Peptide Groups in Different Backbone
Conformations
hinder access of solvent to the peptide group and reduce the negative ESF value.
Consequently, helix and b-structure propensities of the different amino acids
should depend on the ESF values of the peptide groups [100, 116–118]. The differ-
ent helix propensities of the amino acids are often attributed instead to the loss
in side-chain entropy when an unfolded peptide forms a helix [119]. Because ESF
is almost entirely enthalpic (see Table 6.2), these alternative explanations can be
tested by determining if the helix propensity differences are enthalpic or entropic.
Temperature-dependence results [120] measured for the nonpolar amino acids (see
also Ref. [63]) indicate that the helix propensity differences are largely enthalpic.
Table 6.3 gives some ESF values for alanine peptide groups in different backbone
conformations. There are several points of interest: (1) A peptide group in a non-H-
bonded alanine peptide has substantially different ESF values in the three major
backbone conformations b; aR , and PII (polyproline II). Consequently any analysis
based on group additivity that predicts the overall enthalpy change by assigning
constant energetic contributions (independent of backbone conformation) to the
polar peptide CO and NH groups cannot be valid. The overall enthalpy change con-
6.3 Peptide Solvation and the Peptide Hydrogen Bond 153
tains contributions from both ESF and the local intrachain electrostatic potential
ðElocal Þ [116–118], both of which depend on backbone conformation. (2) Because
ESF depends on the accessibility of water to a peptide group, the N-terminal and
C-terminal peptide groups of an all-alanine peptide have more negative ESF values
than interior peptide groups. (3) Replacement of Ala by a larger or more bulky res-
idue such as Val [100] changes the ESF not only at the substitution site but also at
neighboring sites (Avbelj and Baldwin, to be published). (4) N-Methylacetamide,
whose DGpol (or ESF) is 51.2 kJ mol1 (Table 6.2) is a poor model for the interac-
tion with water of a free peptide group. There is a 16 kJ mol1 difference between
the ESF of NMA and that of an alanine peptide group in the b-conformation
(Tables 6.2 and 6.3).
6.3.7
Predicted Desolvation Penalty for Burial of a Peptide H-bond
folding by considering mutational data for H-bonds made between specific pairs of
side chains. The ESF studies considered here emphasize the role of context:
solvent-exposed peptide H-bonds should stabilize folding but buried H-bonds
should detract from stability.
The ESF perspective provides a simple explanation for why alanine, alone of the
20 amino acids in proteins, forms a stable helix in water. The H-bonded, solvent-
exposed peptide group in an alanine helix has the most stabilizing ESF value of
any nonpolar amino acid except glycine, and glycine fails to form a helix because
of the exceptional flexibility of the glycine peptide linkage.
The interaction energy between water and the peptide group is a critical quantity
in predicting the structures of membrane proteins from amino acid sequences
[126]. Accurate values are needed for the energy of desolvation both of the free
and H-bonded forms of the peptide group. The solvent n-octanol, which has been
used extensively to model the interiors of water-soluble proteins (see Sections
6.2.2 and 6.2.3), has also been used as an experimental model for the lipid bilayer
environment of membrane proteins, when measuring transfer free energies of
peptides between water and a membrane-like solvent. The small value of the trans-
fer free energy found for the free glycine peptide group transferred from water to
water-saturated n-octanol (4.8 kJ mol1 [15]) emphasizes the role of the water con-
tained in this solvent, which increases its attraction for the peptide group. Cyclo-
hexane has been used as an alternative solvent that provides a nearly water-free en-
vironment when partitioning model compounds [16]. The transfer free energies of
amides are much larger in the cyclohexane/water pair than the 4.8 kJ mol1 deter-
mined for the peptide group in octanol/water: 25 and 21 kJ mol1 for acetamide
and propionamide, respectively [16].
6.3.8
Gas–Liquid Transfer Model
The gas–liquid transfer model has been used in Section 6.2.8 to discuss the respec-
tive roles of van der Waals interactions and burial of nonpolar surface area in pro-
tein folding (Scheme 6.1). The gas–liquid transfer model can be adapted to discuss
the roles of peptide H-bonds and peptide solvation in folding. Scheme 6.2 is writ-
ten for a simple folding reaction involving only a single peptide H-bond.
Here the conformation of the unfolded form U must be considered because
changes in both local electrostatic free energy ðElocal Þ and ESF contribute to DH
Scheme 6.2
6.3 Peptide Solvation and the Peptide Hydrogen Bond 155
and they depend on backbone conformation. In order to form the peptide H-bond,
a change in peptide backbone conformation must usually take place that involves
substantial changes in Elocal and ESF [116]. The unfolded form UðgÞ normally ex-
ists as a complex equilibrium mixture of conformations and UðdÞ represents the
new conformation needed to make the peptide H-bond.
In principle, Scheme 6.2 may be adapted to predict the enthalpy change for
forming an alanine peptide helix, whose experimental value is known [123, 124],
by assigning provisional values to the different steps. Step 1, desolvating the free
peptide group, has ESF ¼ 36 kJ mol1 if the free peptide conformation is an ex-
tended b-strand (Table 6.2), and this ESF value gives the contribution to DH be-
cause ESF and DHðpolÞ are the same within error for amides (Table 6.2). Likewise,
solvating the alanine helix in step 4 can be assigned a contribution to DH equal to
the ESF of the solvent-exposed, H-bonded peptide group, 10.5 kJ mol1 [100]. In
step 3, making the peptide H-bond in the gas phase, the H-bond energy has been
calculated by quantum mechanics to be 28 kJ mol1 for the H-bonded NMA
dimer [122]. Step 3 also contains, however, an additional unknown quantity, the
van der Waals interaction energy made in forming the helix backbone. In step 1
there is also an additional problem. The ESF depends not only on the appropriate
mixture of backbone conformations (values for aR ; b, and PII are given in Table
6.3), but also on the tendency of a peptide to bend back on itself, which reduces
the exposure to solvent of the peptide groups [100]. Likewise, there are unknown
– and possibly quite large – changes in Elocal in both steps 2 and 3. Thus, the puz-
zle of predicting DH even for a simple folding reaction, like that of forming an ala-
nine peptide helix is far from being completely understood.
Nevertheless, the gas–liquid transfer model provides the background for a sim-
ple interpretation of the enthalpy of forming the alanine helix. The observed en-
thalpy of helix formation, DH(H-C), can be written as a sum of three terms, and
the enthalpy of interaction with water of the helix or of the coil can be approxi-
mated by the ESF value, as explained above.
In Eq. (15), H refers to helix, C to coil (the unfolded peptide), W to water and hb to
the peptide H-bond. DH(H-C) is the observed enthalpy of helix formation in water
(4.0 kJ mol1 , see above). It equals DHðhbÞ, the enthalpy of forming the peptide
H-bond (27.6 kJ mol1 [122]), plus DH(H-W), the enthalpy of interaction be-
tween water and the helix (10.5 kJ mol1 (see Table 6.3), minus DH(C-W), the
enthalpy of interaction between water and the coil (unknown, for reasons ex-
plained above). The unknown term DH(C-W) is found by solving the equation to
be 27:6 10:5 þ 4:0 ¼ 34:1 kJ mol1 . This is a reasonable value for the en-
thalpy of interaction between water and the coil, in view of the ESF values given
in Table 6.3 for different unfolded conformations and given also that bending
back of the unfolded peptide on itself is likely to reduce the negative ESF value.
156 6 Weak Interactions in Protein Folding
Acknowledgments
I thank Franc Avbelj, David Baker, David Chandler, Ken Dill, Pehr Harbury,
B.-K. Lee and John Schellman for discussion, and David Chandler, Alan Gross-
field, George Makhatadze and Yaoqi Zhou for sending me their papers before
publication.
An important paper [127] was overlooked in writing section 6.2.11 on the role of
van der Waals interactions in forming the hydrophobic cores of proteins. Loladze,
Ermolenko and Makhatadze [127] made a set of large-to-small mutations (e.g.,
Val ! Ala) in ubiquitin and used scanning calorimetry to measure DH; TDS and
DG for mutant unfolding. In agreement with a related study by Varadarajan, Ri-
chards and coworkers [76, 77], they find that large-to small mutations are charac-
terized chiefly by unfavorable enthalpy changes, and not by the expected unfavor-
able entropy changes. Surprisingly, they find favorable entropy changes for these
mutants, indicating that some kind of loosening of the structure occurs, perhaps
at residues lining the cavity surface. Thus, they conclude that burial of nonpolar
surface area stabilizes folding primarily by a favorable enthalpy change and not by
the favorable change in TDS predicted by the liquid–liquid transfer model. The
measurements are reported at 50 C [127], where the favorable free energy change
for burial of nonpolar surface area given by the liquid–liquid transfer model (Sec-
tion [6.2.9]) is approximately 30% enthalpy and 70% TDS . The changes in confor-
mational entropy that occur in large-to-small mutants [127] invalidate testing the
relation between buried ASA and DG for folding unless the enthalpy changes are
also measured and analyzed. There is the further problem of a dependence of
DDG on the size of the cavity, discussed by Matthews and coworkers [59]. Makha-
tadze and coworkers find values of DCp in these experiments that are too small to
be determined accurately [128], in agreement with predictions by the liquid–liquid
transfer and other models.
References
K. A., & Rose, G. D. (1992). Hydrogen bonding and accessible surface area in
bonding in globular proteins. J. Mol. proteins. Nature 248, 338–339.
Biol. 226, 1143–1159. 19 Reynolds, J. A., Gilbert, D. B., &
8 Kendrew, J. C., Dickerson, R. E., Tanford, C. (1974). Empirical
Strandberg, B. E. et al. (1960). correlation between hydrophobic free
Structure of myoglobin. A three- energy and aqueous cavity surface
dimensional Fourier synthesis at 2 Å area. Proc. Natl Acad. Sci. USA 71,
resolution. Nature 185, 422–427. 2925–2927.
9 Pauling, L., Corey, R. B. & Branson, 20 Lee, B. & Richards, F. M. (1971). The
H. R. (1951). The structure of pro- interpretation of protein structures:
teins: two hydrogen-bonded helical estimation of static accessibility. J.
configurations of the polypeptide Mol. Biol. 55, 379–400.
chain. Proc. Natl Acad. Sci. USA 37, 21 Sharp, K. A., Nicholls, A.,
205–211. Friedman, R., & Honig, B. (1991).
10 Tanford, C. (1980). The Hydrophobic Extracting hydrophobic free energies
Effect, 2nd edn. John Wiley & Sons, from experimental data: relationship
New York. to protein folding and theoretical
11 Tanford, C. (1997). How protein models. Biochemistry 30, 9686–9697.
chemists learned about the 22 Hermann, R. B. (1977). Use of
hydrophobic factor. Protein Sci. 6, solvent cavity area and number of
1358–1366. packed molecules around a solute in
12 Southall, N. T., Dill, K. A., & regard to hydrocarbon solubilities and
Haymet, A. D. J. (2002). A view of the hydrophobic interactions. Proc. Natl
hydrophobic effect. J. Phys Chem. B Acad. Sci. USA 74, 4144–4145.
106, 521–533. 23 Ben-Naim, A. & Marcus, Y. (1984).
13 Nozaki, Y. & Tanford, C. (1971). The Solvation thermodynamics of nonionic
solubility of amino acids and two solutes. J. Chem. Phys. 81, 2016–2027.
glycine peptides in aqueous ethanol 24 Lee, B. (1991). Solvent reorganization
and dioxane solutions. J. Biol. Chem. contribution to the transfer thermo-
246, 2211–2217. dynamics of small nonpolar molecules.
14 Fauchère, J.-L. & Pliska, V. (1983). Biopolymers 31, 993–1008.
Hydrophobic parameters p of amino- 25 Lee, B. (1995). Analyzing solvent
acid side chains from partitioning of reorganization and hydrophobicity.
N-acetyl-amino-acid amides. Eur. J. Methods Enzymol. 259, 555–576.
Med. Chem. 18, 369–375. 26 Widom, B. (1982). Potential-
15 Wimley, W. C., Creamer, T. P., & distribution theory and the statistical
White, S. H. (1996). Solvation mechanics of fluids. J. Phys. Chem. 86,
energies of amino acid side chains 869–872.
and backbone in a family of host-guest 27 Jorgensen, W. L., Gao, J., &
pentapeptides. Biochemistry 35, 5109– Ravimohan, C. (1985). Monte Carlo
5124. simulations of alkanes in water.
16 Radzicka, A. & Wolfenden, R. Hydration numbers and the
(1988). Comparing the polarities of hydrophobic effect. J. Phys. Chem. 89,
the amino acids: side-chain dis- 3470–3473.
tribution coefficients between the 28 Pratt, L. R. & Chandler, D. (1977).
vapor phase, cyclohexane, 1-octanol Theory of the hydrophobic effect.
and neutral aqueous solution. J. Chem. Phys. 67, 3683–3704.
Biochemistry 27, 1644–1670. 29 Gallichio, E., Kubo, M. M., & Levy,
17 Hermann, R. B. (1972). Theory of R. M. (2000). Enthalpy-entropy and
hydrophobic bonding. II. The cor- cavity decomposition of alkane
relation of hydrocarbon solubility in hydration free energies: Numerical
water with solvent cavity surface results and implications for theories of
area. J. Phys. Chem. 76, 2754–2759. hydrophobic solvation. J. Phys. Chem.
18 Chothia, C. (1974). Hydrophobic B 104, 6271–6285.
158 6 Weak Interactions in Protein Folding
7
Electrostatics of Proteins: Principles, Models
and Applications
Sonja Braun-Sand and Arieh Warshel
7.1
Introduction
The foundation of continuum (classical) electrostatic theory was laid in the early
eighteenth century and formulated rigorously by the Maxwell equations. Unfortu-
nately, this formal progress and the availability of analytical solutions of a few sim-
ple cases have not provided proper tools for calculations of electrostatic energies in
proteins. Here one is faced with electrostatic interactions in microscopically small
distances, where the concepts of a dielectric constant are problematic at best. Fur-
thermore, the irregular shape of protein environments made the use of analytical
models impractical. Nevertheless, the realization that some aspects of protein
action must involve electrostatic interactions led to the emergence of simplified
phenomenological models [1, 2] that could provide in some cases a useful insight
(see below).
The availability of X-ray structures of proteins and the search for concrete quan-
titative results led to the emergence of modern, microscopically based studies of
electrostatic effects in proteins (see below). These studies and subsequent numeri-
cal continuum studies led to the gradual realization that the electrostatic energies
provide by far the best structure function correlator for proteins and other macro-
molecules (Section 7.6). However, the overwhelming impact of macroscopic
concepts still leads to major confusions in the field and to oversimplified assump-
tions. Here the problem can be reduced by focusing on relevant validation studies
(Section 7.5).
This review will consider the above issues and try to lead the reader from con-
cepts to models and to applications, pointing out pitfalls and promising directions.
General background material can also be found in the following reviews [3–9].
7.2
Historical Perspectives
The basis for all electrostatic theories is Coulomb’s law, which expresses the re-
versible work of bringing two charges (Q i and Q j ) together by:
Where the free energy is given in kcal mol1 , the distance in Å, and Q in atomic
units (au). Manipulations of this led to the general macroscopic equation [10]:
‘E ¼ 4pr ð2Þ
where E is the macroscopic electric field and r is the charge density (rðrÞ ¼
P
i Q i ðdðr ri ÞÞ). Further manipulation, using the fact that the field E can be ex-
pressed as a gradient of a scalar potential, U, gives:
E ¼ ‘U ð3Þ
Assuming that the effects which are not treated explicitly can be represented by a
dielectric constant, e, leads to:
When ions are presented in the solution around our system and when the ion dis-
tribution is described by the Boltzmann distribution we obtain rather some approx-
imation, the linearized Poisson–Boltzmann (PB) equation:
where k is the Debye–Hückel screening parameter. The major problem with these
‘‘rigorous’’ equations is their physical foundation. Not only that the continuum as-
sumptions are very doubtful on a molecular level, but the nature of the dielectric
constant in this equation is far from being obvious when applied to heterogeneous
environments. Nevertheless, the early history of protein electrostatics has been
marked by the emergence of phenomenological models [2, 11]. Perhaps the most
influential model was the Tanford Kirkwood (TK) model [2] that describes the pro-
tein as a sphere with a uniform dielectric constant. The validity of this model was
not obvious since it was formulated before the elucidation of the structure of pro-
teins. However, subsequent analysis by Warshel and coworkers [12] indicates that
the model ignores one of the most important physical aspects of protein electro-
statics, the self-energy of the charged groups (the energy of forming these charges
in the given protein site). This problem was not obvious at the time of the formu-
lation of the TK model, when it was thought that the ionizable groups reside on
the surface of the protein). However, this assumption was found to be incorrect in
key cases. Nevertheless, the TK model is repeatedly invoked as a physically reason-
able model, where only the assumption of a spherical shape is considered as a
rough approximation. Interestingly, even today we find cases where the self-energy
7.2 Historical Perspectives 165
of charge formation, such as redox and ionization processes. This assumption may
have reflected unfamiliarity with the fact that electrostatic free energies involve
entropy changes, and that averaging over configurations is an integral part of the
proper evaluation of electrostatic free energies rather than an inherent dynamic ef-
fect. In some important cases, such as enzyme reactions, it was impossible to as-
sess the importance of electrostatic effects without developing quantitative models.
Here the use of macroscopic estimates led repeatedly to the conclusion that electro-
static effects cannot play a major role in catalysis (see, for example, Ref. [24]).
7.3
Electrostatic Models: From Microscopic to Macroscopic Models
7.3.1
All-Atom Models
Fig. 7.1. The three main options for The simplified microscopic model replaces the
representing the solvent in computer time average dipole of each solvent molecule
simulation approaches. The microscopic by a point dipole, while the macroscopic model
model uses detailed all-atom representation is based on considering a collection of
and evaluates the average interaction between solvent dipoles in a large volume element as a
the solvent residual charges and the solute polarization vector.
charges. Such calculations are expensive.
heat flow by stochastic boundaries [34], has been solved by the surface constraint
all-atom solvent (SCAAS) model [35].
It is important to reemphasize here that the problem is not the well-known bulk
contribution from the surrounding of the simulation sphere [36], which has been
implemented repeatedly in simulation studies (see, for example, Refs [37] and
[38]), but the polarization at the microscopic boundaries of the simulation sphere.
With reasonable boundary conditions and proper long-range treatment, one may
use all-atom models for evaluation of electrostatic free energies. Here the most ob-
vious approach is the free energy perturbation (FEP) approach [39, 40] which in-
volves a gradual change of the solute charge from zero to its actual value Q ¼ Q 0
using a mapping potential of the form (see, for example, Refs [39] and [41]):
Although the use of the FEP and LRA approaches is very appealing, it still involves
major convergence problems and a relatively small number of systematic studies
(see, for example, Refs [39] and [43–45]).
7.3.2
Dipolar Lattice Models and the PDLD Approach
The most natural simplification of all-atom polar models involves the representa-
tion of molecules or groups as dipoles. Dipolar models have a long history in the
theory of polar solvents and in providing the fundamental microscopic basis of
electrostatic theories. Simulations with sticky dipoles [46] and formal studies [47]
addressing the origin of the dipolar representation have been recently reported.
The nature of the Brownian dipole lattice (BDL) and Langevin dipole (LD) models
and their relationship to continuum models has been established [48]. This study
indicated that continuum solvation models are the infinite dipole density limit of
a more general dipolar representation, and that their linearity is a natural conse-
quence of being at that limit (although the discreteness of dipole lattice models is
not equal to that of continuum models [49]). It was also found that the linearity of
a continuum model is a direct consequence of being the infinite density limit of a
dipolar model. An instructive conclusion of the above studies is the finding that
7.3 Electrostatic Models: From Microscopic to Macroscopic Models 169
the solvation behavior of the LD model is identical to that of the more rigorous
BDL model. This is significant since a version of the LD model, the protein dipoles
Langevin dipoles (PDLD) model, has been used extensively to treat electrostatics in
macromolecules [3]. PDLD is probably the first to properly treat macromolecular
electrostatics.
At this point it might be useful to readdress the criticism of the PDLD model by
some workers who have little difficulties accepting macroscopic models as a realis-
tic description of macromolecules. Some of this criticism involved selective citation
of incorrect arguments (see discussion in footnote 21 of Ref. [43]). Unfortunately
the criticism continues despite the obvious conclusions of repeated rigorous stud-
ies of dipolar lattices [49] and related models (for example, see the recent excellent
study of Borgis and coworkers [50]). An instructive example is the suggestion [51]
that the PDLD model of Ref. [14] did not evaluate self-energies and that this model
is in fact macroscopic in nature and that it is physically equivalent to the use of a
finite-difference grid within DC methods. These assertions are incorrect both in
presentation of facts and in interpretation of the physics of dipolar solvents. It
should be clear by now that a grid of dipoles with finite spacing or the equivalent
system of polar molecules on a lattice is not equivalent to the numerical grid used
in evaluating the electrostatic potential in DC approaches (see the clear analysis of
footnote 26 of Ref. [43]).
Another more recent example is the attempt to support the argument that the
LD and DC models are equivalent, which was advanced by Ref. [8]. This work
stated ‘‘the LD model is not only analogous to the continuum model (as stated
many times in the literature), but identical to the continuum model in its discrete
approximation,’’ continuing ‘‘to our knowledge this rigorous equivalence was only
pointed out very recently [50].’’ Not only does this reflect a clear misunderstanding
of the conclusions of Ref. [50], but it also overlooks the rigorous finding of Ref.
[49], where it was found that the LD and DC models have different structural fea-
tures even at the limit of infinitely small spacing. Only with a regular cubic grid of
linearized Langevin dipoles and with a Clausius-Mossotti based local polarizability
one can move, at the limit of infinitely small spacing, from the LD to the DC limit.
In considering the difference between the LD and continuum philosophies, it
should be realized that the original LD model, with its finite spacing, has always
been a simplified microscopic model. Insisting on the clear physical features of
the model (i.e., the clear identification of each grid point as a discrete particle) has
been the reason for the fact that the PDLD model did not fall into all the traps that
plagued continuum studies of proteins. For example, looking at the microscopic
interactions led immediately to the explicit considerations of the protein perma-
nent dipoles rather than to the futile attempt of representing them by a low dielec-
tric constant. At any rate, saying that the LD model (or other dipolar model) is
equal to a continuum model is like saying that an all-atom model is equal to a con-
tinuum model. Finally, if the reader is still confused at this point by the suggestion
that the LD model is a continuum model [8, 51], it is useful to point out that mac-
roscopic models involve scaling by a dielectric constant but no such scaling is used
in the PDLD model.
170 7 Electrostatics of Proteins: Principles, Models and Applications
7.3.3
The PDLD/S-LRA Model
The PDLD model and other microscopic models do not involve any dielectric con-
stant. While this is a conceptual strength it is also sometimes a practical ‘‘weak-
ness’’ since they involve the balance between very large contributions, which is
hard to obtain without a perfect convergence. In some cases it is possible to as-
sume that the balance must exist and to represent this balance by scaling the
results with a dielectric constant (see Section 7.4). Such a philosophy leads to
semi-macroscopic models, which are obviously more stable (but not necessarily
more reliable) than the corresponding microscopic models. A promising way to ex-
ploit the stability of the semi-macroscopic models, and yet to keep a clear physical
picture, is the semi-macroscopic version of the PDLD model (the PDLD/S model
[52]). The PDLD/S model takes the PDLD model and scales the corresponding
contributions by assuming that the protein has a dielectric constant ep . Now, in or-
der to reduce the unknown factors in ep it is useful to move to the PDLD/S-LRA
model, where the PDLD/S energy is evaluated within the LRA approximation with
an average on the configurations generated by MD simulation of the charged and
the uncharged states (see Refs [19, 37, 43]). In this way one uses MD simulations
to automatically generate protein configurations for the charged and uncharged
forms of the given ‘‘solute’’ and then average these contributions according to Eq.
(9). This treatment expresses the free energy of moving a charge from water to the
protein active site, which is given by [42]:
w!p
DDGsol ¼ 12½hDU w!p ðQ ¼ 0 ! Q ¼ Q 0 ÞiQ 0 þ hDU w!p ðQ ¼ Q 0 ÞiQ¼0 ð10Þ
where
1 1 p 1
DU w!p ¼ ½DDGwp ðQ ¼ 0 ! Q ¼ Q 0 Þ DGwQ þ DUQm ðQ ¼ Q 0 Þ
ep ew ep
ð11Þ
Here DGwQ is the solvation energy of the given charge in water, DDGwp is the change
in the solvation energy of the entire protein (plus its bound charge) upon changing
this charge from its actual value (Q 0 ) to zero. DUQm is the microscopic electrostatic
interaction between the charge and its surrounding protein polar groups (for sim-
plicity we consider the case when the interaction with the protein ionizable groups
is treated in a separate cycle (see, for example, Ref. [43]) with the effective dielectric
constant eeff of Section 7.3.5. Now we bring here the explicit expression of Eq. (11),
since to the best of our knowledge it provides the clearest connection between mi-
croscopic quantities such as DDGwp and DUQm and semi-macroscopic treatments.
In fact, the PDLD/S treatment without the LRA treatment converges to the PB
treatment and thus provides the clearest connection between PB treatments and
microscopic treatments, allowing one to demonstrate the exact nature of eeff (see,
for example, Ref. [19] and Section 7.4). At any rate, it is important to keep in mind
7.3 Electrostatic Models: From Microscopic to Macroscopic Models 171
that the LRA considerations of Eq. (10) treat the protein reorganization explicitly,
and this reduces the uncertainties about the magnitude of ep . It is also important
to mention that the recently introduced molecular mechanics PB surface area
(MM-PBSA) model [53] is basically an adaptation of the PDLD/S-LRA idea of MD
generation of configurations for implicit solvent calculations, while only calculat-
ing the average over the configurations generated with the charged solute (the first
term of Eq. (9)).
7.3.4
Continuum (Poisson-Boltzmann) and Related Approaches
Continuum solvent models have become a powerful tool for calculations of solva-
tion energies of large solutes in solutions. Here one solves the Poisson equation or
related equations [54] numerically [15, 16]. However, these treatments need more
than the solvent dielectric, es , to be useful. Solvation energy depends on the chosen
solute cavity radius much more strongly than it does on es . In addition to solute
size, the cavity radius must absorb the effect of local structure near the solute. Con-
sistent studies [49] showed that the cavity radius depends on the degree of solvent
discreteness, even though the solute ‘‘size,’’ as defined by an exclusion radius, is
invariant. Also, a ‘‘cavity-in-a-continuum’’ description of solvation is physically in-
consistent for discrete (i.e., real) solvents, leading to Born radii that depend on
purely nonstructural factors [49, 55]. Thus, one cannot treat the cavity size as a
transferable parameter that can be used in different dielectrics or in a given dielec-
tric under even moderately different conditions.
PB approaches for studies of protein electrostatics represent the solvent and the
protein by two dielectric constants (ep and es , respectively) and solve numerically
Eqs (5) or (6). Although these approaches became extremely popular, they involve
significant conceptual and practical problems. Purely macroscopic (continuum)
models are entirely inadequate for proteins because they miss the crucial local po-
larity defined largely by the net orientations of protein permanent dipoles rather
than by a dielectric constant [7, 52]. The effect of these dipoles (sometimes called
‘‘back field’’; see Ref. [43]) is now widely appreciated. Thus, most current ‘‘con-
tinuum’’ models are no longer macroscopic but semi-macroscopic and therefore
potentially reliable. Their reliability depends, however, on properly selecting the
‘‘dielectric constant’’ ep (see below) for the relevant protein region. Despite recent
suggestions that the reliability of continuum methods can be increased by config-
urational averaging [19], such averaging still does not capture the crucial protein
reorganization effect (see Section 7.4).
The popularity of continuum models may be due to the belief that the for-
mulation of the PB equation must reflect some fundamental physics and that the
criticism of such models only address minor points that can be ‘‘fixed’’ in one way
or another. The discussion of conceptually trivial issues such as the linear approxi-
mation of the second term in Eq. (6) often created the feeling that this is the true
problem in such models. In fact, the real unsolved problem is the fact that the re-
sults depend critically on ep , whose value cannot be determined uniquely (see Sec-
172 7 Electrostatics of Proteins: Principles, Models and Applications
tion 7.4). Another reason for the acceptance of PB models has been the use of
improper validation, addressing mainly trivial cases of surface groups where all
macroscopic models would give the same results (see Section 7.5). At any rate, PB
models (see, for example, Ref. [56]) are becoming more and more microscopic,
borrowing ideas from the semi-macroscopic PDLD/S-LRA model.
7.3.5
Effective Dielectric Constant for Charge–charge Interactions and the GB Model
X
N X
DG ¼ DGi þ DGij ð12Þ
i¼1 i>j
where the first and second terms correspond to the first and second steps, respec-
tively. DGi , which represents the change in self-energies of the i-th ionized group
upon transfer from water to the protein site, can be evaluated by the PDLD/S-LRA
approach with a relatively small ep . However, the charge–charge interaction terms
(the DGij ) are best reproduced by using a simple macroscopic Coulomb’s-type law:
where DG is given in kcal mol1 , rij in Å and eij is a relatively high distance-
dependent dielectric constant [3, 12]. The dielectric constant eij can be described by
(e.g., Ref. [12]):
Although the fact that eij is large has been supported by mutation experiments
(e.g., [57, 58]) and conceptual considerations [3], the use of Eq. (13) is still consid-
ered by many as a poor approximation for PB treatments (without realizing that
the PB approach depends entirely on an unknown ep ). Interestingly, the so-called
generalized Born (GB) model, whose usefulness is now widely appreciated [59], is
basically a combination of Eq. (13) and the Born energy of the individual charges.
Thus, the GB model is merely a version of an earlier treatment [3]. More specifi-
cally, as pointed out originally by Warshel and coworkers [3, 60], the energy of an
ion pair in a uniform dielectric medium can be written as [3]:
where the free energies are given in kcal mol1 and the distances in Å. DGsol is the
solvation energy of the ions and DGy sol is the solvation of the ions at infinite separa-
tion. Equation (15) gives (see also Ref. [3]):
DGsol ¼ DGy
sol ð332Q i Q j =rij Þð1 1=eÞ
where the ai and a j are the Born’s radii of the positive and negative ions, re-
spectively. Equation (16) with some empirical modifications leads exactly to the
GB treatment [61]. Thus, the widely accepted GB treatment is in fact a glorified
Coulomb’s law treatment [7]. Nevertheless, it is instructive to note that the GB
approximation can also be obtained by assuming a local model where the vacuum
electric field, E0 , and the displacement vector, D, are identical [62]. This model,
which can be considered as a ‘‘local Langevin Dipole model’’ [50] or as a version
of the noniterative LD model, allows one to approximate the energy of a collection
of charge and to obtain (with some assumptions about the position dependence of
the dielectric constant) the GB approximation.
It should be noted that the simple derivations of the GB approximation are valid
for homogeneous media with a high dielectric (e.g., the solvent around the pro-
tein). This can be problematic when one considers models where the protein has
a low dielectric constant and when we are dealing with transfer of charges from
water to the protein environment. Fortunately, it is reasonable to use a large e for
the second term in Eq. (15), even for protein interiors (see discussion of eij of Eq.
(14)). Here again, the main issue is the validity of Eq. (13) and not its legitimiza-
tion by the seemingly ‘‘rigorous’’ GB formulation. It is also important to realize
þ
that a consistent treatment should involve the replacement of DGy sol by the DG
and DG of the ions in their specific protein site when ep is different from ew .
7.4
The Meaning and Use of the Protein Dielectric Constant
As demonstrated in the validation studies of Ref. [19], the value of the optimal di-
electric constant depends drastically on the model used. This finding might seem a
questionable conclusion to readers who are accustomed to the view that macro-
scopic models have a universal meaning. This might also look strange to some
workers who are experienced with microscopic statistical mechanics, where e is
evaluated in a unique way from the fluctuations of the total dipole moment of the
system (see below). Apparently, even now, it is not widely recognized that ep has
little to do with what is usually considered as the protein dielectric constant. That
is, as was pointed out and illustrated by the conceptual analysis of Warshel and
Russell [3], the value of the dielectric constant of proteins is entirely dependent
on the method used to define this constant and on the model used. Subsequent
studies [5, 19, 63] clarified this fact and established its validity. As stated in these
174 7 Electrostatics of Proteins: Principles, Models and Applications
studies the dependence of ep on the model used can be best realized by considering
different limiting cases. That is, when all the interactions are treated explicitly,
e ¼ 1; when all but induced dipoles are included explicitly, e ¼ 2 [3]; and when
the solvent is not included explicitly, e > 40 (although this is a very bad model).
When the protein permanent dipoles are included explicitly but their relaxation
(the protein reorganization) and the protein induced dipoles are included implic-
itly, the value of ep is not well defined. In this case ep should be between 4 and 6
for dipole–charge interactions and >10 for charge–charge interactions [64, 65].
One may clearly argue that there is only one ‘‘proper’’ dielectric constant, which
is the one obtained from the fluctuation of the average dipole moment [6, 63, 66–
68]. However this protein dielectric constant (e ¼ e) is not a constant because it de-
pends on the site considered [63, 66]. More importantly (see above), e cannot be
equal to ep , where the simplest example is the use of explicit models where
ep ¼ 1. Detailed analysis of this issue is given in Ref. [69]. At any rate, it is still use-
ful to review what has been learned on the nature of e. Apparently it has been
found [63] that the value of e can be significantly larger than the traditional value
(e G 4) deduced from measurements of dry proteins and peptide powders [70, 71]
or from simulations of the entire protein, rather than a specific region (see, for ex-
ample, Ref. [63]). In fact, the early finding of e ¼ 4, which was obtained from a gas
phase study [71], ignored the large effect of the solvent reaction field (see analysis
and discussion in Ref. [63]). The larger than expected value of e was attributed to
the fluctuations of ionized side chains [8, 66, 67]. However, consistent simulations
[63] that included the reaction field of the solvent gave e G 10 in and near protein
active sites, even in the absence of the fluctuations of the ionized residues. This
established that the fluctuations of the protein polar groups and internal water
molecules contribute significantly to the increase in e. In view of this finding it
might be instructive to address recent suggestions [8] that the calculations of refer-
ence [63] did not converge. Apart from the problematic nature of this assertion it is
important to realize that the SCAAS model used in Ref. [63] converges much
faster than any of the periodic models (and the corresponding very large simula-
tion systems) used in the alternative ‘‘converging’’ studies mentioned in Ref. [8].
In fact the convergence of e in reference [63] was examined by longer runs and by
using alternative formulations. Obviously, there is a wide intuitive support for the
idea that e in protein active sites cannot be large unless it reflects the effect of ion-
ized surface groups. However, the scientific way to challenge the finding of Ref.
[63] is to repeat the calculations of e in the active site of trypsin (rather than the
whole protein) and to see if the conclusions of Ref. [63] reflect convergence prob-
lems. Unfortunately, no such attempt to repeat the calculations of Ref. [63] (or to
study e in other active sites with and without ionizable surface groups) has been
reported. Finally, it is important to realize that recent experimental studies [73,
74] provided a clear support to the finding that e is large in active sites even when
they are far from the protein surface.
In view of the above analysis, and despite the obvious interest in the nature of
e, it should be recognized that e is not so relevant to electrostatic energies in pro-
teins. Here one should focus on the energies of charges in their specific environ-
7.4 The Meaning and Use of the Protein Dielectric Constant 175
ment and the best ways for a reliable and effective evaluation of these energies.
Unfortunately, e does not tell us in a unique way how to determine the ep of semi-
macroscopic models. For example (see also above), obtaining ep ¼ 1 in a fully mi-
croscopic model, or ep ¼ 2 in a model with implicit induced dipoles has nothing
to do with the corresponding e. The best way to realize this point is to think of
charges in a water sphere (our hypothetical protein) surrounded by water. In such
a model e G 80, whereas microscopic models of the same system with implicit in-
duced dipoles will be best described by using ep ¼ 2. Of course, one may argue that
the entire concept of a dielectric constant is invalid in the heterogeneous environ-
ment of proteins [14]. However, because fully microscopic models are still not giv-
ing sufficiently precise results, it is very useful to have reasonable estimates using
implicit models and in particular semi-macroscopic models. Thus, it is justified to
look for optimal ep values after realizing, however, what these parameters really
mean. Even the use of a consistent semi-macroscopic model such as the PDLD/
S-LRA does not guarantee that the corresponding ep will approach a universal
value. For example, in principal the PDLD/S-LRA model should involve ep ¼ 2, be-
cause all effects except the effect of the protein-induced dipoles are considered ex-
plicitly. However, the configurational sampling by the LRA approach is not perfect.
The protein reorganization energy is probably captured to a reasonable extent [69],
but the change in water penetration on ionization of charged residues is probably
not reproduced in an accurate way. This problem is particularly serious when we
have ionized groups in nonpolar sites in the interior of the proteins. In such cases
we have significant changes in water penetration and local conformations during
the ionization process, (as indicated by the very instructive study of Ref. [75]) but
these changes might not be reproduced in standard simulation times.
The fact that eeff for charge–charge interactions is large has been pointed out
by several workers [12, 76, 77]. However, this important observation was not always
analyzed with a clear microscopic perspective. Mehler and coworkers [76, 78] ar-
gued, correctly, that eeff ðrÞ is a sigmoid function with a large value at relatively
short distances based on classical studies of eðrÞ in water [79–81]. Similarly, it
has been assumed by Jonsson and coworkers [77] that electrostatic interactions in
proteins can be described by using a large eeff for charge–charge interactions. Now,
although we completely agree that eeff is large in many cases and we repeatedly
advanced this view [3], we believe that the reasons for this are much more subtle.
That is, the finding of a particular behavior of e in water might be not so relevant
to the corresponding behavior in proteins (in principle, eeff in proteins can be very
different from eeff in water). Here the remarkable reason why eeff is large, even in
proteins, was rationalized in Ref. [3], where it was pointed out that the large eeff
must reflect the compensation of charge–charge interactions and solvation energy
in proteins (see, for example, figure 4 in Ref. [64]). It seems to us that the under-
standing of this crucial compensation effect is essential for the understanding of
eeff in proteins.
Finally, it might be useful to comment on the perception that somehow the mac-
roscopic dielectric of a given region in the protein can still be used to provide a
general description of the energetics of changes in the same region. The funda-
176 7 Electrostatics of Proteins: Principles, Models and Applications
Fig. 7.2. A single macroscopic dielectric a hypothetical site with a relatively fixed and
constant (which is evaluated by some freely rotated dipoles. Using the same ep for
microscopic prescription) cannot be used both dipoles will underestimate the effect of ma
to reproduce the energetics of charges in and overestimate the effect of mb .
proteins in a unique way. The figure considers
mental problem with such an assertion is demonstrated in Figure 7.2. This figure
considers a typical site with an internal site with two dipolar groups, one fixed (pre-
organized) and the other free to rotate. Since this is an internal site, the leading
term in semi-macroscopic treatments will be the UQ m =ep term of Eq. (11). Now,
one of the best ways to estimate ep is to calculate the reorganization energy (l) for
the given charges [8, 69]. Using this ep (for UQ m =ep Þ for the interaction between
site (a) to the charge will underestimate the corresponding free energy contribu-
tion, while using it for site (b) will overestimate this interaction. The same problem
will occur if we use e evaluated from the dipolar fluctuations in the site that in-
cludes the dipoles and the charge. Furthermore, if the site involves an ion pair in-
stead of a single ionized group we will need a different ep to reproduce the energy
of this ion pair.
7.5
Validation Studies
As discussed above, the conceptual and practical problems associated with the
proper evaluation of electrostatic energies in proteins are far from trivial. Here the
general realization of the problems associated with different models has been slow
in part because of the use of improper validation approaches. For example, al-
though it has been realized that pK a value calculations provide an effective way
for validating electrostatic models, [3, 39, 44, 72, 75, 78, 82, 83] most studies have
focused mainly on the pK a values of surface groups. Unfortunately, these pK a
values can be reproduced by many models, including those that are fundamentally
incorrect [7]. Thus, the commonly used benchmarks may lead toward unjustified
7.5 Validation Studies 177
conclusions about the nature of electrostatic effects in proteins. It has been pointed
out [7, 19, 75] that ionizable groups in protein interiors should provide a much
more discriminating benchmark because these groups reside in a very heteroge-
neous environment, where the interplay between polar and nonpolar components
is crucial. For example, an ionizable group in a true nonpolar environment will
have an enormous pK a shift, and such a shift cannot be reproduced by models
that assume a high dielectric in the protein interior. Similarly, a model that de-
scribes the protein as a uniform low dielectric medium, without permanent di-
poles, will not work in cases where the environment around the ionizable group
is polar [3].
The problems with nondiscriminating benchmarks were established in Ref. [19]
and are reillustrated in Figure 7.3. This figure considers the pK a values of acidic
and basic groups and correlates the results obtained by the modified TK (MTK)
model with the corresponding observed results. The figure gives the impression
Fig. 7.3. The correlation between the with ep ¼ 40 and eeff ¼ 80. As seen from the
calculated and observed pKa values gives a figure, we obtain seemingly impressive results
nondiscriminative benchmark that only for this benchmark, but when the results are
involves surface groups. The figure considers displayed as DpKa , it becomes clear that we
Asp, Glu, and Lys residues on the surfaces of have simply a poor benchmark where DpKa A 0
lysozyme and ribonuclease but includes only and any model that produces a small pKa shift
obs
those residues with pK app < 1. The calculated looks like an excellent model.
pKa values are evaluated by the MTK(S) model
178 7 Electrostatics of Proteins: Principles, Models and Applications
that the MTK is an excellent model, but the same agreement would be obtained for
any model that uses a large e since most surface groups have very small DpK a so
that Asp and Glu have pK a A 4 and all Lys have pK a A 10. Thus, as emphasized in
the inset in the figure, we are dealing with a very small amount of relevant infor-
mation and probably the best correlation would have been obtained with the trivial
null model when DpK a ¼ 0 for all surface groups.
In view of the above example, it is very important to validate electrostatic models
by using discriminative benchmarks (DBM) with DpK a > 2 and to use the same
philosophy for other electrostatic properties. The use of a DBM in pK a calculations
[19] has been found to be extremely useful in demonstrating the range of validity
and the proper dielectric of different models.
7.6
Systems Studied
7.6.1
Solvation Energies of Small Molecules
Modeling of almost any biological process requires one to know the solvation
free energy of the relevant reacting system in aqueous solutions. Thus for example,
calculations of binding energies compare the solvation energy of the ligand in its
protein site to the corresponding solvation energy in solution. Early attempts to es-
timate solvation energies (see, for example, Refs [84] and [85]) were based on the
use of the Born’s or Onsager’s model with basically arbitrary cavity radius. The first
attempts to move toward quantitative evaluation of solvation energies can be di-
vided into two branches. One direction involves attempts to examine quantum me-
chanically the interaction between the solute and a single solvent molecule [86, 87].
The other direction that turns out to be more successful involves the realization
that quantitative evaluation of solvation free energies requires parameterization of
the solute solvent van der Waals interaction in a complete solute solvent system
[33] and evaluation of the interaction between the solvent and many (rather than
one) solvent molecules.
Although such an empirical approach was considered initially as having ‘‘too
many parameters’’ it was realized eventually that having an atom-solvent parame-
ter for each type of the solute atoms is the key requirement in any quantitative
semiempirical solvation model. The same number of parameters exists now in
more recent continuum solvation models [54, 88, 89] and in all-atom models [35,
40, 41, 90]. At any rate, it took significant time (see Ref. [91]) until the realization
that with any solvation model with a given set of solute charges, the most crucial
step is the parameterization of the solute–solvent repulsion term (or the corre-
sponding atomic radius).
Now assuming that we have reasonable atomic parameters the next requirement
is to obtain converging results with the given modeling method (all-atom solvent
simplified solvent model, PB model, GB model, etc.). Here the general progress is
7.6 Systems Studied 179
7.6.2
Calculation of pK a Values of Ionizable Residues
The electrostatic energy of ionizable residues in proteins plays a major role in most
biological processes including enzymatic reactions, proton pumps, and protein sta-
bility. Calculation of pK a values of ionizable groups in proteins provides a stringent
test of different models for electrostatic energies in proteins [3]. Calculations of
pK a values by all-atom FEP approaches have been reported in only a few cases
[39, 45]. Recent works include studies of the pK a of metal-bound water molecules
[99] and proton transfer in proteins [100]. All-atom LRA calculations [43, 44] were
also reported: the error range of the all-atom models is still somewhat disappoint-
ing, although the inclusion of proper long-range treatments and induced dipoles
leads to some improvements [43, 45]. Remaining problems might reflect insuf-
ficient sampling of protein configurations and solvent penetration. PDLD/S-LRA
calculations [19, 43] gave encouraging results with ep ¼ 4. Although the LRA treat-
ment accounts at least formally for the protein reorganization, the limit of ep ¼ 2
(which would be expected from a ‘‘perfect’’ model with implicitly induced dipoles)
has not been reached, since apparently some of this reorganization and/or water
penetration is not captured in the computer time used in the simulations.
Semi-macroscopic ‘‘continuum’’ (i.e., PB) models, (which now include the pro-
tein permanent dipoles and the self-energy term [12]) also often give reasonable
results (e.g., [72, 101, 102]). Yet most PB models do not treat the effect of the pro-
tein reorganization explicitly, and the so-called multi-conformation continuum
electrostatic (MCCE) model [56] (which has been inspired by the LRA approach)
only considers the reorganization of polar side chains instead of performing a
180 7 Electrostatics of Proteins: Principles, Models and Applications
proper LRA treatment on all the protein coordinates. Another problem with cur-
rent PB models is the assumption that the same ep should be used in accounting
for the effect of the protein dipoles and in providing the screening for charge-
charge interactions [19, 43]. This requirement led to inconsistency that can only
be demonstrated by the use of discriminating benchmarks for both self-energies
and charge–charge interactions. Although it is extremely important to continue to
use DBM with large DpK a in attempts to further explore the validity of different
electrostatic models, it is important to realize that the trend in pK a changes can
be captured by models that focus more on charge–charge interactions than on the
self-energy term or by models that estimate the self-energy term by empirical con-
siderations [78].
It might be useful to point here to the perception that the need for averaging
over the protein configurations is the main problem in continuum calculations of
pK a values [103–105]. First, the obvious need for proper configurational averaging
in free energy calculations should not be confused, as done in several cases (e.g.,
[106]), with dynamics effects. Second, and perhaps more importantly, most at-
tempts of configurational averaging involve only one charge state. This corre-
sponds to only one charge state in the LRA treatment (Eq. (9)) and thus to incorrect
estimates of the relevant charging free energy. Here it would help to recognize that
a proper LRA averaging, rather than a simple average on one charge state, is essen-
tial for formally correct pK a calculations.
Overall, we believe that a more complete consensus about the validity of different
models for pK a calculations will be obtained when the microscopic models start to
give stable quantitative results. Here the comparison of the microscopic and semi-
macroscopic models (as was done in Ref. [43]) will be particularly useful.
7.6.3
Redox and Electron Transport Processes
can usually be described well using Coulomb’s law with a large effective dielectric
constant [12, 57, 58].
Microscopic estimates of protein reorganization energies have been reported [57,
69, 120] and used very effectively in studies of the rate constants of biological elec-
tron transport. This also includes studies of the nuclear quantum mechanical ef-
fect, associated with the fluctuations of the protein polar groups (for review see
Ref. [121]). PB studies of redox proteins have progressed significantly since the
early studies that considered the protein as a nonpolar sphere (see above). These
studies (e.g., [108, 111, 122–124]) started to reflect a gradual recognition of the im-
portance of the protein permanent dipoles, although some confusion still exists
(see discussion in Refs [108] and [114]).
Recent studies reported quantum mechanical evaluation of the redox potentials
in blue copper proteins [125], but even in this case it was found that the key prob-
lem is the convergence issue, and at present semi-macroscopic models give more
accurate results than microscopic models.
7.6.4
Ligand Binding
A reliable evaluation of the free energy of ligand binding can potentially play a ma-
jor role in designing effective drugs against various diseases (e.g., [126]). Here we
have interplay between electrostatic, hydrophobic, and steric effects, but accurate
estimates of the relevant electrostatic contributions are still crucial.
In principle, it is possible to evaluate binding free energies by performing FEP
calculations of the type considered in Eqs (7) and (8), for ‘‘mutating’’ the ligand to
‘‘nothing’’ in water and in the protein active site. This approach, however, encoun-
ters major converence problems, and at present the reported results are disappoint-
ing except in cases of very small ligands. Alternatively one may study in simple
cases the effect of small ‘‘mutations’’ of the given ligand [40], for example replace-
ment of NH2 by OH. However, when one is interested in the absolute binding of
medium-size ligands, it is essential to use simpler approaches. Perhaps the most
useful alternative is offered by the LRA approach of Eq. (9), and augmented by es-
timates of the nonelectrostatic effects. That is, the LRA approach is particularly ef-
fective in calculating the electrostatic contribution to the binding energy [42, 127,
128]. With this approximation one can express the binding energy as
p p w w nonelec
DGbind ¼ 12½hUelec; l il þ hUelec; l il 0 hUelec; l il hUelec; l il 0 þ DGbind ð17Þ
p
where Uelec; l is the electrostatic contribution for the interaction between the ligand
and its surrounding; p and w designate protein and water, respectively; and l and l 0
designate the ligand in its actual charge form and the ‘‘nonpolar’’ ligand where all
the residual charges are set to zero. In this expression the terms hUelec; l Uelec; l 0 i,
which are required by Eq. (9), are replaced by hUelec; l i since Uelec; l 0 ¼ 0. Now, the
evaluation of the nonelectrostatic contribution DGnonelec
bind is still very challenging,
182 7 Electrostatics of Proteins: Principles, Models and Applications
since these contributions might not follow the LRA. A useful option, which was
used in Refs [42] and [127], is to evaluate the contribution to the binding free en-
ergy from hydrophobic effects, van der Waals, and water penetration. Another pow-
erful option is the so-called linear interaction energy (LIE) approach [93]. This
approach adopts the LRA approximation for the electrostatic contribution but ne-
glects the hUelec; l il 0 terms. The binding energy is then expressed as:
p w p w
DGbind A abhUelec; l il hUelec; l il c þ bbhUvdW; l il hUvdW; l il c ð18Þ
7.6.5
Enzyme Catalysis
7.6.6
Ion Pairs
Ion pairs play a major role in biological systems and biological processes. This in-
cludes the ‘‘salt bridges’’ that control the Bohr effect in hemoglobin [152], ion pair
type transition states [132], ion pairs that help in maintaining protein stability (see
Section 7.6.9), ion pairs that play crucial roles in signal transduction (Section 7.6.6)
and ion pairs that play a major role in photobiological processes [153]. Reliable
analysis of the energetics of ion pairs in proteins presents a major challenge and
has been the subject of significant conceptual confusions (see discussions in Refs
[3] and [12]). The most common confusion is the intuitive assumption that ion
pairs are more stable in nonpolar environments rather than less stable in nonpolar
environments (see analysis in Refs [154] and [155]). Excellent examples of this con-
fusion are provided even in recent studies (e.g., [156]) and in most of the desolva-
tion models (e.g., [157–159]) which are based on the assumption that enzymes
destabilize their ground states by placing them in nonpolar environments. How-
ever, as has been demonstrated elsewhere (e.g., [132, 150]) enzymes do not provide
nonpolar active sites for ion pair type ground states (they provide a very polar envi-
ronment that stabilizes the transition state).
In general it is important to keep in mind that ion pairs are not stable in non-
polar sites in proteins. Thus most ion pairs in proteins are located on the surface,
and those key ion pairs that are located in protein interiors are always surrounded
by polar environments [132]. Forcing ion pairs between amino acids to be in non-
polar environments usually results in the conversion of the pair to its nonpolar
form (by a proton transfer from the base to the acid).
The energetics of specific ion pairs in proteins has been studied by microscopic
FEP calculations [160, 161] and by semi-macroscopic calculations [19, 37, 52, 162,
163]. However, we only have very few systematic validation studies. An instructive
example is the study of Hendsch and Tidor [164] and its reanalysis by Schutz
and Warshel [19]. That is, Ref. [164] used a PB approach with a low value of
ep . This might have led to nonquantitative results. That is, in the case of T4 lyso-
zyme, the calculated energies for Asp70 0 . . . His31þ ! Asp70 . . . His31þ and
for Asp70 . . . His31 0 ! Asp70 . . . His31þ were 0.69 kcal mol1 and 0.77
kcal mol1 , respectively. This should be compared with the corresponding ob-
184 7 Electrostatics of Proteins: Principles, Models and Applications
served values (4.7 and 1.9 kcal mol1 ) or the values obtained in Ref. [19] (4.7
and 2.2 kcal mol1 ). Similarly, the estimate of 10 kcal mol1 destabilization of
the Glu11-Arg45 ion pair in T4 lysozyme seems to be a significant overestimate rel-
ative to the result of @3 kcal mol1 obtained by the current PDLD/S-LRA study. It
is significant that the Asp70 . . . His31þ ion pair does not appear to destabilize the
protein. That is, the energy of moving the ion pair from infinite separation in wa-
ter to the protein site (relative to the corresponding energy for a nonpolar pair) was
found here to be around zero compared with the 3.46 kcal mol1 obtained in Ref.
[164]. The origin of difference between the two studies can be traced in part to the
estimate of the stabilizing effect of the protein permanent dipoles (the VQm term in
Ref. [19] and DGprotein in Ref. [164]). The @5 kcal mol1 VQm contribution found
by the PDLD/S-LRA calculation corresponds to a contribution of only @0:26
kcal mol1 in Ref. [164]. This might be due to the use of an approach that does
not allow polar groups to reorient in response to the charges of the ion pair. It is
exactly the stabilizing effect of the protein polar groups that plays a crucial role in
stabilizing ion pairs in proteins. Clearly, the estimate of the energetics of ion pairs
depends crucially on the dielectric constant used and models that do not consider
the protein relaxation explicitly must use larger ep than the PDLD/S-LRA model.
Obviously, it is essential to have more systematic validation studies considering
internal ion pairs with well-defined structure and energy. In this respect, it is use-
ful to point out that the instructive statistical analysis reported so far (e.g., [165])
may involve similar problems as those mentioned in Section 7.5, since it does not
consider a discriminative database; it includes mainly surface groups rather than
ion pairs in protein interiors.
7.6.7
Protein–Protein Interactions
7.6.8
Ion Channels
7.6.9
Helix Macrodipoles versus Localized Molecular Dipoles
The idea that the macroscopic dipoles of a-helices provide a critical electrostatic
contribution [183, 184] has gained significant popularity and appeared in many
186 7 Electrostatics of Proteins: Principles, Models and Applications
proposals. The general acceptance of this idea and the corresponding estimates
(see below) is in fact a reflection of a superficial attitude. That is, here we have a
case where the idea that microscopic dipoles (e.g., hydrogen bonds and carbonyls)
play a major role in protein electrostatics [3] is replaced by a problematic idea that
macrodipoles are the source of large electrostatic effects. The main reason for the
acceptance of the helix-dipole idea (except the structural appeal of this proposal)
has probably been due to the use of incorrect dielectric concepts. That is, estimates
of large helix-dipole effects [184–188] involved a major underestimate of the corre-
sponding dielectric constant and the customary tendency to avoid proper validation
studies. Almost all attempts to estimate the magnitude of the helix-dipole effect
have not been verified by using the same model in calculations of relevant observ-
ables (e.g., pK a shift and enzyme catalysis). The first quantitative estimate of the
effect of the helix dipole [189] established that the actual effect is due to the first
few microscopic dipoles at the end of the helix and not to the helix macrodipole.
It was also predicted that neutralizing the end of the helix by an opposing charge
will have a very small effect and this prediction was confirmed experimentally
[190].
An interesting recent examples of the need for quantitative considerations of the
helix-dipole effect has been provided by the KcsA Kþ channel. The instructive
study of Roux and MacKinnon [191] used PB calculations with ep ¼ 2 and obtained
an extremely large effect of the helix dipoles on the stabilization of the Kþ ion in
the central cavity (@20 kcal mol1 ). However, a recent study [182] that used a
proper LRA procedure in the framework of the PDLD/S-LRA approach gave a
much smaller effect of the helix macrodipole (Figure 7.4). Basically, the use of
ep ¼ 2 overestimates the effect of the helix dipole by a factor of 3 and the effect is
rather localized on the first few residues.
7.6.10
Folding and Stability
The study of protein folding by simulation approaches dates back to the introduc-
tion of the simplified folding model in 1975 [192]. This model, which forms the
base for many of the current folding models, demonstrated for the first time that
the folding problem is much simpler than previously thought. The simplified fold-
ing model and the somewhat less physical model developed by Go [193], which are
sometimes classified as off-lattice and on-lattice models, respectively, paved the way
for major advances in studies of the folding process and the free energy landscape
of this process (e.g., [194–197]). Further progress has been obtained in recent years
from all-atom simulation of peptides and small proteins (e.g., [198–200]). Some of
these simulations involved the use of explicit solvent models and most involved the
use of implicit solvation models ranging from GB type models [198], to the Lange-
vin dipole model [200]. Interesting insights have been provided by the idea of
using a simplified folding model as a reference potential for all-atom free energy
simulation [200], but the full potential of this approach has not been exploited
7.6 Systems Studied 187
Fig. 7.4. Examination of the effect of the helix dielectric treatment used. It is shown that
dipoles of the KcsA ion channel (upper panel) the use of ep ¼ 2 drastically overestimates
on a Kþ ion in the central cavity. The lower the contribution of the macrodipoles, which
panel presents the contribution of the residues is evaluated more quantitatively with the
in the four helices as a function of the PDLD/S-LRA treatment.
yet. Another promising direction has been offered by the use of a large amount of
distributed computing resources [201].
At present it seems that simplified folding models describe quite well the aver-
age behavior of all-atom models (e.g., [200]), although more systematic studies are
needed. The use of implicit solvent models also looks promising but here again
there is a clear need for detailed validation studies focusing on observed electro-
static energies.
Despite the progress in studies of the folding process, we do not have yet a
clear understanding of the factors that determined protein stability. Early insight-
188 7 Electrostatics of Proteins: Principles, Models and Applications
ful considerations [20] argued that electrostatic effects must play a major role in
controlling protein stability. Other workers emphasized hydrophobic effects and di-
sulfide bridges, yet no consensus has been reached. Here it might be useful to con-
sider the current situation with regards to the role of electrostatic contributions to
thermal stability. Experimental studies of mesophilic, thermophilic, and hyperther-
mophilic proteins have provided an excellent benchmark for studies of the relation-
ship between electrostatic energy and protein stability (e.g., [202, 203]). In general,
the number of charged residues increases in hyperthermophiles, and thus can be
considered as a stabilizing factor. However, the effect of ionized residues [164] and
polar groups [204] has been thought to lead to destabilization bases on continuum
studies (e.g., [164]). A notable example is the study of Hendsch and Tidor [164].
This study concluded that internal salt bridges tend to destabilize proteins, but as
discussed in Section 7.6.6, this study did not reproduce the relevant observed ener-
gies. On the other hand, some studies (e.g., [205–208]) seem to support the idea
that charged residues may help to optimize protein stability. The problem in reach-
ing clear conclusions is closely related to the uncertainty about the value of ep in
PB and related treatments. Here we have a competition between desolvation penal-
ties, stabilization by local protein dipoles, and charge–charge interactions. The sit-
uation is relatively simple when one deals with a hypothetical hydrophobic protein
(the original TK model). In this case, as was shown first in Ref. [12], both isolated
ions and ion pairs become unstable in the ‘‘protein’’ interiors. The situation be-
comes, however, much more complex in real proteins where charges are stabilized
by polar groups (e.g., [17]). Here the balance between charge–charge interactions
and self-energy may depend drastically on the assumed ep . As stated in previous
sections, it seems to us that different ep should be used for charge–charge interac-
tions and self-energies. The above point can be clarified by writing the contribution
of the ionizable residues to the folding energy as:
X ðfÞ p
DGelec elec
fold ¼ DGf DGelec
uf ¼ 2:3RT Q i ðpK i; int pHÞ
i
0 1
1 X ð f Þ ð f Þ@ 332 A
X ðuf Þ
þ Qi Qj ð f Þ ð f Þ
Q i ðpK wa; i pHÞ
2 i0j r eeff ðr Þ i
ij ij
0 1
1 X ðuf Þ ðuf Þ @ 332 A
Q Qj ð19Þ
2 i>j i 80r
ðuf Þ
ij
where f and uf designate, respectively, the folded and unfolded form of the pro-
tein, and the Qs are the charges of the ionized groups in their particular configura-
tion. The above expression is based on our previous description of the electrostatic
free energies of ionization states in proteins [209] and on the simplified approxi-
mation that considered the charges in the unfolded state as being fully solvated by
water. If we assume that the same groups are ionized in the folded state, we can
get a simpler expression:
7.7 Concluding Remarks 189
2 3
X p
X 1 1
DGelec
fold A 2:3RT Q i ðpK i; int pK wi; w Þ þ 166 Q i Q j4 ðfÞ ðfÞ
ðuf Þ
5 ð20Þ
i i>j rij eij 80rij
The first term in Eq. (20) represents the change of self-energy upon moving the
given charge from water to the protein active site, and the second term represents
the effect of charge–charge interaction. Now, in general the ep for the first term
and the eeff of the second term are quite different, and this makes the analysis
complicated. At any rate, our preliminary analysis [210] evaluated the relative fold-
ing energies of two proteins expressed from the cold shock protein B (CspB)
family, expressed by the mesophilic bacterium Bacillus subtilis [211] and the
thermophilic bacterium Bacillus caldolyticus. This study obtained different trends
with different ep and eeff , and reproduced the increased stability of the thermophilic
protein with ep ¼ 8 and eeff ¼ 20. It seems to us, however, that this issue must
be subjected to very careful validation studies, focusing in particular on the repro-
duction of the observed pK a values of the proteins under consideration. Relevant
benchmarks can be obtained in part from available experimental studies.
7.7
Concluding Remarks
Electrostatic effects play a major role in almost all biological processes. Thus, accu-
rate evaluation of electrostatic energies is a key to quantitative structure function
analysis [18]. Here the key to further progress is expected to depend on wider rec-
ognition of the importance of proper validation studies and on the increase in ac-
curacy of microscopic calculations. That is, in many cases the focus on trivial prop-
erties of surface groups lead to entirely incorrect views of the validity of different
models. Another worrisome direction is the tendency to validate different implicit
models by their ability to reproduce PB results. For example, GB calculations are
frequently verified by comparing them to the corresponding PB calculations. This
implies a belief that the PB results represent the correct electrostatic energies, over-
looking the fact that these calculated energies depend critically on an undefined di-
electric constant. Consistent validations and developments of reliable models must
involve comparisons to relevant experimental results. Proper verification of semi-
macroscopic models might also involve a comparison to microscopic results but
in many cases these results do not reach proper convergence. Here the wider real-
ization of the importance of long-range effects, boundary conditions and conver-
gence issues are expected to help in increasing the accuracy of microscopic models.
The continuing increase in computational resources will also play a key role in pro-
viding more accurate electrostatic energies.
Recent studies of electrostatic energies in proteins have repeatedly focused on
the formal validations of PB and related approaches, while focusing on the contri-
bution of the solvent around the protein. This focus, however, seems to overlook
and sometimes ignore a key issue. That is, the PB and GB approaches provide ex-
190 7 Electrostatics of Proteins: Principles, Models and Applications
cellent models for the effect of the solvent around the protein. The problem, how-
ever, is the effect of the protein and the proper ep. Here we cannot use any formal
justification since our results depend drastically on a somewhat arbitrary parame-
ter (i.e., ep ).
Although this review presents a somewhat critical discussion of different semi-
macroscopic approaches, it does not mean that such approaches are not useful.
That is, as long as the difficulties of obtaining reliable results by microscopic
approaches persist one must resort to semi-macroscopic approaches. Such ap-
proaches are also essential in studies of problems that require fast evaluation of
the relevant electrostatic energies (e.g., the folding problem). The results obtained
by semi-macroscopic models are expected to become more accurate with a better
understanding of the protein dielectric ‘‘constant’’.
Acknowledgments
This work was supported by NIH grants GM 40283 and 24492, and NSF grant
MCB-0003872. We are grateful to Mitsunori Kato for the work presented in Figure
7.4.
References
channel from microscopic free energy folding. Annu. Rev. Biophys. Biol.,
perturbation calculations. Biochim. 1983; 12: 183–210.
Biophys. Acta Protein Struct. Mol., 194 Onuchic, J. N., P. G. Wolynes, Z.
2001; 36446: 1–9. Luthey-Schulten, and N. D. Socci,
182 Burykin, A., M. Kato, and A. Toward an outline of the topography
Warshel, Exploring the origin of the of a realistic protein-folding funnel.
ion selectivity of the KcsA potassium Proc. Natl Acad. Sci. USA, 1995; 92:
channel. Proteins, 2003; 52: 412–426. 3626–3630.
183 Wada, A., The alpha-helix as an 195 Godzik, A., J. Skolnick, and A.
electric macro-dipole. Adv. Biophys., Kolinski, Simulations of the folding
1976; 9: 1–63. pathway of triose phosphate
184 Hol, W. G. J., P. T. V. Duijnen, and isomerase-type-alpha/beta barrel
H. J. C. Berendson, Alpha helix proteins. Proc. Natl Acad. Sci. USA,
dipole and the properties of proteins. 1992; 89(7): 2629–2633.
Nature, 1978; 273: 443–446. 196 Karanicolas, J. and C. L. Brooks,
185 Roux, B., S. Berneche, and W. Im, The structural basis for biphasic
Ion channels, permeation and kinetics in the folding of the WW
electrostatics: insight into the function domain from a formin-binding
of KcsA. Biochemistry, 2000; 39(44): protein: Lessons for protein design?
13295–13306. Proc. Natl Acad. Sci. USA, 2003;
186 Daggett, V. D., P. A. Kollman, and 100(7): 3954–3959.
I. D. Kuntz, Free-energy perturbation 197 Dill, K. A., Dominant forces in
calculations of charge interactions protein folding. Biochemistry, 1990; 29:
with the helix dipole. Chem. Scripta, 7133–7155.
1989; 29A: 205–215. 198 Simmerling, C., B. Strockbine, and
187 Gilson, M. and B. Honig, The A. E. Roitberg, All-atom structure
energetics of charge–charge prediction and folding simulations of
interactions in proteins. Proteins, 1988; a stable protein. J. Am. Chem. Soc.,
3: 32–52. 2002; 124(38): 11258–11259.
188 Vanduijnen, P. T., B. T. Thole, and 199 Snow, C. D., N. Nguyen, V. S.
W. G. J. Hol, Role of the active-site Pande, and M. Gruebele, Absolute
helix in papain, an abinitio molecular- comparison of simulated and
orbital study. Biophys. Chem., 1979; experimental protein-folding
9(3): 273–280. dynamics. Nature, 2002; 420(6911):
189 A˚qvist, J., H. Luecke, F. A. Quiocho, 102–106.
and A. Warshel, Dipoles localized at 200 Fan, Z. Z., J. K. Hwang, and A.
helix termini of proteins stabilize Warshel, Using simplified protein
charges. Proc. Natl Acad. Sci. USA, representation as a reference potential
1991; 88(5): 2026–2030. for all-atom calculations of folding free
190 Lodi, P. J. and J. R. Knowles, Direct energies. Theor. Chem. Acc., 1999; 103:
evidence for the exploitation of an 77–80.
alpha-helix in the catalytic mechanism 201 Shirts, M. and V. S. Pande,
of triosephosphate isomerase. Bio- Computing – Screen savers of the
chemistry, 1993; 32(16): 4338–4343. world unite! Science, 2000. 290(5498):
191 Roux, B. and R. MacKinnon, The 1903–1904.
cavity and pore helices the KcsA Kþ 202 Hollien, J. and S. Marqusee,
channel: Electrostatic stabilization of Structural distribution of stability in a
monovalent cations. Science, 1999; thermophilic enzyme. Proc. Natl Acad.
285: 100–102. Sci. USA, 1999; 96(24): 13674–13678.
192 Levitt, M. and A. Warshel, 203 Usher, K. C., A. F. A. De la Cruz, F.
Computer simulation of protein W. Dahlquist, R. V. Swanson, M. I.
folding. Nature, 1975; 253(5494): 694– Simon, and S. J. Remington, Crystal
698. structures of CheY from Thermotoga
193 Go, N., Theoretical-studies of protein maritima do not support conventional
200 7 Electrostatics of Proteins: Principles, Models and Applications
8
Protein Conformational Transitions as Seen
from the Solvent: Magnetic Relaxation
Dispersion Studies of Water, Co-solvent,
and Denaturant Interactions
with Nonnative Proteins
Bertil Halle, Vladimir P. Denisov, Kristofer Modig,
and Monika Davidovic
8.1
The Role of the Solvent in Protein Folding and Stability
During the course of three billion years of evolution, proteins have adapted to their
aqueous environments by exploiting the unusual physical properties of water [1]
in many ways. The solvent therefore play a central role in the noncovalent inter-
actions responsible for the unique three-dimensional structures of native globular
proteins. The most important example of this is the hydrophobic effect [2, 3], which
provides the main driving force for protein folding (see Chapter 6). The thermody-
namic stability of native proteins, conferred by numerous but weak noncovalent
interactions, is marginal [4]. Even a small variation in external conditions can
therefore cause denaturation, that is, a partial or complete loss of the native poly-
peptide conformation. Proteins can be denatured by a variety of perturbations [5]:
thermal (see Chapter 4), pressure (see Chapter 5), electrostatic (see Chapter 7), or
solvent composition (see Chapters 3 and 24). Because each perturbation alters the
balance of stabilizing forces in a different way, a given protein may adopt a variety
of nonnative states with distinct properties.
A fundamental understanding of protein folding and stability, including the
mechanism of action of denaturants and co-solvents, must be based on studies of
the structure, solvation, and energetics of nonnative proteins at a level of detail
comparable with what has been achieved for native proteins [6]. Because of their
complexity, nonnative proteins need to be examined from different vantage points
using several techniques. Most experimental studies have focused on the proper-
ties of the polypeptide chain, such as its degree of compactness and secondary
structure content. Inferences about solvation have usually been indirect – where
the peptide chain is not, there is solvent – or have relied on uncertain premises.
The current view of nonnative protein solvation derives largely from calorimetric
(see Chapter 4), volumetric, and other macroscopic measurements, the interpreta-
tion of which is highly model dependent.
Computer simulations can, in principle, give a detailed picture of nonnative
protein solvation (see Chapter 32), but suffer from two serious limitations. First,
because molecular dynamics trajectories cannot (yet) be extended to the time scales
where proteins unfold, the experimentally studied nonnative equilibrium states of
proteins are inaccessible. Second, protein conformational equilibria are governed
by small free energy differences involving hundreds of noncovalent interaction
sites. Even minor inaccuracies in the (mostly empirical) force field model can thus
produce unacceptably large accumulated errors. At the other extreme, qualitative
insights into protein folding are sought from ‘‘minimalist’’ models. Because such
models describe the solvent implicitly or, at best, in a highly idealized manner, they
cannot be expected to capture the subtle thermodynamics of real proteins.
As one of the few methods that directly probes water molecules interacting with
proteins, water 17 O magnetic relaxation dispersion (MRD) [7, 8] has been used ex-
tensively to study both the internal and surface hydration of native proteins [9–11].
In recent years, the MRD method has also been applied to nonnative proteins,
yielding new information about the interaction of water ( 17 O and 2 H MRD) and
co-solvent ( 2 H and 19 F MRD) molecules with proteins denatured by heat, cold,
acid, urea, guanidinium chloride, or trifluoroethanol. In the following, we summa-
rize the results of these MRD studies, which, in most cases, indicate that nonnative
proteins are more structured and less solvent exposed than commonly believed.
Relevant aspects of MRD methodology and data analysis are described at the end
of this chapter. More comprehensive and technical accounts of biomolecular MRD
are available [7, 8].
8.2
Information Content of Magnetic Relaxation Dispersion
solvent exposure of the protein in different states. If water is the only solvent com-
ponent, then y W ¼ 1 and NaW is proportional to the water-accessible surface area
AW W
P . Consequently, Na should increase when the protein unfolds. However, the
MRD profile does not monitor solvent accessibility directly; an observed change in
Na ra may also be due to a variation in the retardation factor ra. For native proteins,
the average ta is dominated by a relatively small number of water molecules lo-
cated in surface pockets, where the geometric constraints prevent the cooperative
motions that are responsible for the fast rotational and translational dynamics in
bulk water [9–11]. If these surface pockets are disrupted in the unfolded protein,
ra will be smaller than for the native protein. On the other hand, water molecules
that penetrate a (partially) unfolded protein, perhaps in the form of hydrogen-
bonded water chains or small clusters, may be more strongly motionally retarded
than water molecules at the fully exposed (convex) surface of the native protein. Be-
cause of this interpretational ambiguity, the quantity Na ra can only provide bounds
on the solvent exposure (based on an assumption about ra ) or on the rotational
retardation (based on an assumption about Na ). On the other hand, penetrating
solvent molecules may become long-lived, in which case they contribute to the pa-
rameter Nb Sb2.
For solvent-denatured proteins, we must also deal with the complication of
preferential solvation [14, 15]. In other words, NaS is determined not only by the
solvent-accessible area of the protein, but also by the competition for this area by
water and co-solvent molecules. The occupancy y S, which describes this competi-
tion, can be estimated from the known activities of both solvent species and the
(usually unknown) co-solvent binding constant (which may differ between native
and nonnative states).
The third parameter provided by an MRD profile is the correlation time tb ,
which is the characteristic time for orientational randomization of long-lived sol-
vent molecules trapped within the protein. In general, this randomization occurs
by two parallel and independent processes: rotational diffusion of the protein, with
rotational correlation time tR , and escape of the solvent molecule from the protein,
with mean residence time tS (the inverse of the dissociation rate constant). Mathe-
matically, this is expressed as tb ¼ tR tS /ðtR þ tS Þ. Therefore, if tS g tR , as is often
the case, the observed correlation time tb can be identified with the rotational cor-
relation time tR of the protein. For native globular proteins, tR is usually described
by the Stokes-Einstein-Debye equation, tR ¼ h0 VH /ðkB TÞ, where h0 is the viscosity
of the solvent (not the solution!) and VH is an effective hydrodynamic volume. If
the protein were a smooth sphere that did not perturb the solvent (ra ¼ 0), then
VH would simply be the protein volume. However, a real protein sweeps out a
larger volume when it rotates than does a compact sphere of the same volume. Fur-
thermore, it perturbs the motion of adjacent solvent molecules. The former effect
can be handled by molecular hydrodynamics, where the hydrodynamic equations
are solved for the actual protein structure, specified in atomic detail [16]. The sec-
ond effect can be incorporated by assigning a higher viscosity to the hydration layer
[17]. An empirical approach [17], which predicts tR with similar accuracy, is to set
8.3 Thermal Perturbations 205
VH ¼ 2V0 , where V0 is the protein volume obtained from the partial specific vol-
ume, vP , that is, V0 ¼ vP MP /NA .
MRD exploits trapped solvent molecules as intrinsic probes of protein rotational
diffusion. It would thus appear that tR cannot be determined for an unfolded pro-
tein without long-lived solvent molecules. However, water 2 H MRD monitors not
only water molecules but also the labile hydrogens in the protein that exchange
sufficiently rapidly (usually meaning residence times < 1 ms) with water hydro-
gens. Under most conditions (pH and temperature being the critical variables),
the labile hydrogen contribution is sufficient to produce a 2 H dispersion even in
the absence of long-lived water molecules. For native proteins, tR can be deter-
mined by several other techniques besides MRD, for example, 15 N spin relaxation
and fluorescence depolarization.
Under conditions where tb ¼ tR , which is always the case for the labile-hydrogen
contribution to the 2 H dispersion, the correlation time yields the hydrodynamic
volume of the protein. However, for extensively unfolded proteins, VH is an appar-
ent volume that cannot be related to protein structure in a simple way. The Stokes-
Einstein-Debye relation between tR and VH is only valid for a rigid and nearly
spherical protein. In the random-coil limit, the rotational dynamics would have to
be described by a distribution of correlation times, as in the Rouse-Zimm theory
[18]. Because a flexible protein exhibits internal rotational modes with shorter cor-
relation times than for the rigid protein, the apparent VH does not necessarily
increase when the protein expands. This complication limits the diagnostic value
of tR . Another confounding factor is that unfolding leads to increased exposure of
nonpolar groups, which may cause the denatured protein to self-associate. How-
ever, these complications notwithstanding, the correlation time tb always provides
a lower bound on the residence time tS for the long-lived solvent molecules re-
sponsible for the dispersion. As such, it defines the time scale on which the local
structural integrity of the protein is probed.
8.3
Thermal Perturbations
8.3.1
Heat Denaturation
The first water 2 H and 17 O MRD study of heat denaturation examined bovine pan-
creatic ribonuclease A (RNase A) [19], which is known to undergo a reversible and
cooperative thermal unfolding at neutral or acidic pH. However, some studies have
indicated that the conformational transition may not be truly two-state. The ther-
mally denatured state of RNase A appears to be relatively compact and is further
unfolded on reduction of the four disulfide bonds.
MRD profiles have been recorded at seven temperatures in the interval 4–65 C
at pH 2.0 and 4.0 (all pH values quoted in this chapter are uncorrected for H2 O/
206 8 Protein Conformational Transitions as Seen from the Solvent
0.5 17
O
R1 / Rbulk
x
0 20
1 40
10 60 ° C)
ν0 (MH 100 T (
z)
1.0
2
H
R1 / Rbulk
x
0 20
1 40
10 60 °C)
ν0 (MH
z)
100 T (
Fig. 8.1. Water 17 O (top) and 2 H (bottom) R1 R bulk , normalized by R bulk to remove most
MRD profiles from 3.8 mM solution of of the trivial (viscosity-related) temperature
ribonuclease A in D2 O at pH 2.0 and seven dependence. The 17 O dispersion curves are
temperatures from 4 to 65 C [19]. The vertical three-parameter fits according to Eq. (8).
axis shows the excess relaxation rate R1x ¼
D2 O isotope effects). Figure 8.1 shows the MRD data obtained at pH 2, where the
transition midpoint Tm is 31 C according to CD measurements. For the native
protein, observed at low temperatures, the MRD profiles yield Nb Sb2 ¼ 2:3 G 0:1,
which requires at least three long-lived water molecules (since Sb2 a 1). The crystal
structure of RNase A reveals six water molecules partly buried in surface pockets,
but no deeply buried water molecules. The 17 O MRD profile from the native pro-
tein must be due to some or all of these six water molecules. The absence of 17 O
dispersion at high temperatures indicates a loss of persistent native structure at the
locations of these hydration sites.
If the conformational transition involves only two states, then Nb Sb2 yields the
fraction native protein at any temperature (see Eq. 17). Figure 8.2a compares this
with the native-state fraction derived from CD measurements. The MRD and CD
results agree on the width (DT) of the transition, but Tm is 5–10 C higher in the
8.3 Thermal Perturbations 207
2
NβSβ2
1
a
0
3
10-3 Nαρα
1
b
0
40
30
VH (nm3)
20
10
c
0
0 10 20 30 40 50 60 70
T (°C)
Fig. 8.2. Temperature dependence of MRD VH from tb for the labile-hydrogen contribution
parameters derived from water 17 O and 2 H to the 2 H dispersion at pH 4.0 [19]. The curves
MRD profiles from 3.8 mM solution of correspond to a two-state transition with
ribonuclease A in D2 O: a) Na ra from að 17 OÞ at transition midpoint and width as determined
pH 2.0; b) Nb Sb2 from bð 17 OÞ at pH 2.0; and c) by CD measurements at 222 and 275 nm.
MRD data. This shift indicates that the transition is not truly two-state and that the
first step does not affect the tertiary structure that maintains the 3–6 long-lived hy-
dration sites. On the other hand, the MRD data rule out a scenario with an inter-
mediate state extensively permeated by long-lived water molecules.
Figure 8.2b shows the surface hydration parameter Na ra derived from the 17 O
MRD profiles. The solid curve is a fit based on the two-state model with Tm and
DT fixed at the values determined by CD measurements. Since the rotational retar-
dation factor ra involves a ratio of correlation times that may have different activa-
tion energies, it is expected to vary with temperature (dashed curves in Figure
208 8 Protein Conformational Transitions as Seen from the Solvent
surements. Thus all temperatures but the two highest refer to the native state,
where VH ¼ 27:6 G 1:0 nm 3 , in reasonable agreement with the value 30.5 nm 3 de-
termined by 15 N relaxation at pH 6.4, 25 C (and sixfold lower protein concentra-
tion) [24]. Remarkably, VH is not significantly different at 65 C, where the protein
is 94% denatured according to CD. Presumably, this invariance is a result of com-
pensating effects on the rotational dynamics of polypeptide expansion and flexibil-
ity (see Section 8.2).
In summary, the MRD data show that the 3–6 long-lived hydration sites in native
RNase A are disrupted in the thermally denatured state. While the denatured pro-
tein lacks structural integrity on the 10 ns time scale at these sites, it is much more
compact (less solvent exposed) than the maximally unfolded disulfide-intact pro-
tein. This picture is consistent with other observations, indicating substantial resid-
ual structure in the heat-denatured state of RNase A.
8.3.2
Cold Denaturation
For most proteins, the temperature-dependent free energy difference, DGðTÞ, be-
tween denatured and native states exhibits negative curvature at all (accessible)
temperatures, with maximum thermal stability at or near ambient temperature.
Such a parabolic stability profile is predicted by macroscopic thermodynamics if
the isobaric heat capacity difference, DCP , is positive and temperature-independent.
This simplified analysis also predicts that DGðTÞ ¼ 0, not only at the heat denatura-
tion midpoint Tm , but also at a lower temperature, Tm0 , identified as the midpoint
of an exothermic cold denaturation [25]. Heat denaturation is readily understood
as the result of thermal excitation of the numerous disordered configurations of
the polypeptide chain. While the molecular details of cold denaturation remain ob-
scure, it is usually attributed to a weakening of the hydrophobic effect caused by
the temperature-dependent structure of bulk water [25]. If this notion is correct,
cold denaturation might hold the key to understanding the major driving force for
protein folding.
Studies of cold denaturation are hampered by the fact that, for most proteins, Tm0
is predicted to be well below the freezing point of water. This obstacle can be cir-
cumvented by increasing Tm0 and/or preventing ice formation. The most common
strategy is to expose the protein to a second (nonthermal) perturbation, typically a
few kilobars of hydrostatic pressure or a moderate denaturant concentration, which
lowers the DGðTÞ curve, thereby raising Tm0 and lowering Tm . Both pressure eleva-
tion and co-solvent addition also depress the freezing point of the water (to 19 C
at 2 kbar). However, this approach has a serious drawback: even if the secondary
perturbation does not denature the protein at ambient temperature, the relative im-
portance of the two simultaneous perturbations is difficult to establish. Thus, for ex-
ample, the term pressure-assisted cold denaturation merely indicates that tempera-
ture was the experimental variable, but does not imply that the low temperature is
more important than the high pressure in determining the properties of the result-
ing nonnative state. In other words, it is important to distinguish denaturation oc-
curring at low temperature from denaturation caused by the low temperature.
210 8 Protein Conformational Transitions as Seen from the Solvent
To study cold denaturation per se for proteins with Tm0 < 0 C, the aqueous sol-
vent must be maintained in a metastable, supercooled state. By careful water treat-
ment and judicious choice of (small) sample containers to eliminate ice nucleation
sites, water can be supercooled to about 20 C [26]. A more robust method is to
subdivide the protein solution into micrometer-sized emulsion droplets, the vast
majority of which remain unfrozen down to the homogeneous nucleation temper-
ature of 38 C (for H2 O) [27]. While sample emulsification interferes with most
scattering and spectroscopic measurements, it poses little or no problem for MRD
studies.
Cold denaturation of bovine b-lactoglobulin (bLG) has been observed by several
methods, using denaturants [28–30] or elevated pressure [31] to bring Tm0 into a
more accessible range. This 162-residue b-barrel protein is dimeric at physiological
pH, but is essentially monomeric at pH < 3 in the absence of salt [32]. Remark-
ably, bLG is more stable towards urea denaturation at pH 2 than at neutral pH.
Calorimetric, 1 H NMR and CD studies of bLG at pH 2 in the presence of 4 M
urea (which raises Tm0 to about 20 C) indicate that cold denaturation involves
an intermediate with disrupted tertiary structure, which unfolds extensively at
temperatures close to 0 C [28, 29]. However, SAXS, hydrogen exchange and
heteronuclear NMR measurements at 0 C (pH 2.5, 4 M urea) indicate a rather
compact structure (27% increase of the radius of gyration) with a residual, native-
like b-hairpin structure (stabilized by one of the two disulfide bonds) [30]. All these
studies, as well as the MRD studies to be discussed, refer to isoform A of bLG.
Crystal structures of bLG identify two water molecules trapped in small cavities.
In addition, the b barrel contains a large, elongated, hydrophobic cavity that can
bind retinol, fatty acids and other nonpolar ligands. One crystal structure indicates
that the ligand-free cavity is occupied by a hydrogen-bonded string of five water
molecules [33]. Consequently, we expect that bLG contains at least two, but possi-
bly more, long-lived water molecules. This structure-based prediction is consistent
with 17 O MRD studies of native bLG (0.9 mM, pH 2.5) at 10 C, yielding Nb Sb2 ¼
2:2 G 0:2 [34]. In addition, the MRD results reveal several water molecules with a
correlation time of 2–3 ns, tentatively assigned to partially buried hydration sites in
small surface pockets and/or in the large binding pocket.
Lowering of the temperature to 1 C does not affect Nb Sb2 significantly (see Fig-
ure 8.3), indicating that the native protein structure remains intact at this tem-
perature. However, addition of 4 M urea virtually eliminates the 17 O dispersion at
1 C, as expected from earlier studies of urea-assisted cold denaturation [28–30].
The MRD result, Nb Sb2 ¼ 0:3 G 0:2, indicates a native fraction on the order of
10% under these conditions. In the absence of urea and at temperatures down to
1 C, the rotational correlation time tR of bLG, derived from the 2 H MRD profile
(with a large contribution from labile hydrogens), agrees quantitatively (after h/T
scaling) with expectations based on hydrodynamic modeling [17] and 15 N relax-
ation [35]. This provides further evidence for a native structure under these condi-
tions. For the denatured protein in 4 M urea at 1 C, the 2 H correlation time tb is
30% shorter than the tR expected for the native protein, even though SAXS mea-
surements indicate a twofold volume expansion in the denatured state [30]. This
8.3 Thermal Perturbations 211
2
NT (R1 – Rbulk – α)/(ωQ2 τR)
1.5
1.0
0.5
0
0.01 -5
0
0.1 5
ν0 τ 10
1 15 )
R 20 T (°C
Fig. 8.3. Water 17 O MRD profiles from 0.9 R1 – R bulk a, normalized by oQ2 tR /NT to
mM solutions of b-lactoglobulin A (pH 2.5, remove the trivial dependence on viscosity
52% D2 O) at 19.4, 10.2 and 1.1 C [34]. The and temperature. This reduced quantity
solvent is salt-free water (open circles) or 4 M approaches Nb Sb2 at low frequencies. The curves
urea (solid circles). The vertical axis shows are three-parameter fits according to Eq. (8).
the b-dispersion part of the relaxation rate,
shortening of the correlation time shows that the denatured protein does not
undergo rigid body rotational diffusion. Rather, it exhibits internal reorientational
motions on a range of time scales (shorter than the rigid body tR ), as expected for
labile hydrogens in unfolded polypeptide segments.
To examine whether bLG can be cold denatured in the absence of urea, water 2 H
and 17 O MRD profiles have been recorded on emulsified samples down to 20 C
[34]. Control experiments yield identical MRD profiles before and after emulsifica-
tion, justifying the neglect of bLG interactions with the interface of the 10 mm di-
ameter emulsion droplets. The MRD data show no evidence of a cold denaturation,
but indicate that the native structure persists down to at least 20 C (see Figure
8.4). Indeed, after normalization to remove the trivial temperature dependence via
tR , the 17 O MRD profiles exhibit little variation from 27 C to 20 C. The 2 H pro-
files do have reduced amplitude at low temperatures, but this can be accounted for
by the slower labile hydrogen exchange.
In summary, MRD shows that bLG is extensively denatured in 4 M urea at
1 C, in agreement with previous studies. However, even though the conven-
tional thermodynamic analysis indicates that the midpoint temperature for cold
denaturation of bLG in the absence of urea should be about 20 C, the MRD data
indicate that the native protein structure persists down to this temperature. This
suggests that the standard thermodynamic analysis is incomplete and that the ob-
served denaturation at low temperature in the presence of urea may not be driven
primarily by the temperature dependence of the water structure, which is the usual
explanation of cold denaturation.
212 8 Protein Conformational Transitions as Seen from the Solvent
3
17
O
6
2
5 27 °C H
NT (R1 – Rbulk – α)/(ωQ2 τR)
4
–20 °C
3
0
10-4 10-3 10-2 10-1 100 101
ν0 τR
Fig. 8.4. Water 17 O (top) and 2 H (bottom) [34]. The relaxation rate has been reduced as
MRD profiles from emulsified 0.8 mM in Figure 8.3. The curves are three-parameter
solutions of b-lactoglobulin A (pH 2.7, 50– fits (to the 27 C data in the case of 17 O)
100% D2 O) at 27 C (open circles), 10 C according to Eq. (8).
(solid diamonds) and 20 C (solid circles)
8.4 Electrostatic Perturbations 213
8.4
Electrostatic Perturbations
Ionizable side chains contribute to protein stability via their mutual Coulomb inter-
actions and via ionic solvation effects, that is, the charge-dependent part of side-
chain interactions with the polarizable environment, including water, co-solvents,
salt and buffer ions, and the rest of the protein (see Chapter 7). This contribution
to protein stability can be manipulated by changing the net charge of side chains,
by means of pH titration or site-directed mutagenesis, or by adding salt to the sol-
vent. In contrast to thermal or solvent denaturation, pH titration or mutation of an
ionizable side chain is a localized perturbation. However, because the Coulomb in-
teraction is long-ranged, the effect of the perturbation is not necessarily local.
Whereas some proteins remain in the native state even at extreme pH values,
others undergo a cooperative transition to a compact globular state with disordered
(‘‘molten’’) side chains, but with native-like secondary structure and backbone fold
(see Chapter 23). This so-called molten globule (MG) state is thought to be an equi-
librium analog of a universal kinetic intermediate on the protein folding pathway
[36]. If the native protein is regarded as an (irregular) crystal, one would expect the
MG state to be favored by high temperature. However, in most reported studies,
the MG state has been induced by nonthermal perturbations. In particular, many
proteins adopt MG-like states when some or all carboxylate groups are neutralized.
MG proteins are remarkably compact. Typically, the radius of gyration is only
10–30% larger than for the native state [36]. Nevertheless, if the MG is modeled
as a uniformly expanded version of the native protein, this corresponds to a vol-
ume expansion factor of 1.3–2.2. Even for a small protein, the inferred difference
in volume between the native and MG states translates into several hundred water
molecules. However, it is not clear whether the additional volume of the MG pro-
tein is, in fact, occupied by water. This issue, which is of vital importance for the
understanding of the MG state, has been addressed by water 17 O MRD studies of
the acid MG states of three proteins [37].
Among these proteins is a-lactalbumin (aLA), which at pH 2 adopts a conforma-
tion that has come to be regarded as the paradigmatic molten globule [36]. The
crystal structure of native (human) aLA shows six potentially long-lived water mol-
ecules. Two of these coordinate the bound Ca 2þ ion, residing in a cavity between
the a and b domains that traps three additional water molecules. The 17 O MRD
profile (see Figure 8.5) from native (bovine) aLA yields Nb Sb2 ¼ 3:4 G 0:2, indicat-
ing a residence time between 10 ns and a few microseconds (at 27 C) for at least
four, and perhaps all six, of these water molecules. For the MG state at pH 2,
Nb Sb2 ¼ 1:8 G 0:2, corresponding to at least two long-lived water molecules. This
observation demonstrates that the structural integrity of the MG state is sufficient
to trap at least two long-lived water molecules. Because the Ca 2þ ion is no longer
bound at pH 2, the difference in Nb Sb2 between the native and MG states might be
fully accounted for by the loss of the two calcium-coordinated water molecules.
However, since the MRD data do not reveal water locations, we cannot say if the
long-lived water molecules in the MG state correspond to native hydration sites.
214 8 Protein Conformational Transitions as Seen from the Solvent
60
Native
40
R1 (s–1)
MG
x
20
0
1 10 100
ν0 (MHz)
Fig. 8.5. Water 17 O MRD profiles from 1.9 curves are three-parameter fits according to Eq.
mM solutions of bovine a-lactalbumin (D2 O, (8). The presence of the MG state at pH 2.0
27 C) at pH 8.4 (native) and 2.0 (molten was verified by the characteristic CD profile
globule, MG) [37]. The vertical axis shows the and loss of the upfield methyl resonances in
excess relaxation rate R1x ¼ R1 R bulk . The the 1 H NMR spectrum.
The convergence at high frequencies of the native and MG profiles (see Figure
8.5) implies that the two states have very similar surface hydration parameter. In-
deed, the fits yield Na ra ¼ ð1:8 G 0:1Þ 10 3 for the native state and ð1:9 G 0:1Þ
10 3 for the MG state. From the solvent-accessible area of the native protein, we
estimate Na ¼ 450 and thus ra ¼ 4:0 G 0:3, a typical value for native globular
proteins. Low-temperature MRD studies of protein hydration and 17 O relaxation
studies of small-solute hydration suggest that ra ¼ 1–2 for the vast majority of the
water molecules in contact with the protein surface [11]. The ra values of 4–5 typ-
ically found for native proteins must therefore be dominated by a small subset of
hydration sites with strongly rotationally retarded water molecules. If these special
hydration sites are abolished in the MG state, ra would be substantially smaller
than for the native protein. In this case, the invariance of Na ra observed for aLA
must be accidental, the reduction of ra being precisely canceled by an increase of
Na due to a greater solvent exposure in the MG state. This additional solvent expo-
sure could reflect extensive water penetration of the MG protein. This scenario
would be consistent with a strictly geometric interpretation of the 35–60% reported
increase of the hydrodynamic volume for the MG state, leading to the conclusion
that 270 water molecules penetrate the interior of the aLA molten globule [38].
However, later measurements have indicated a significantly smaller expansion of
the aLA MG (see below). Moreover, as the hydrodynamic volume reflects the vis-
8.4 Electrostatic Perturbations 215
cous energy dissipation associated with protein motion, it depends, not only on
protein geometry, but also on water dynamics at the protein surface, as manifested
in ra and in the local viscosity [17].
The observed invariance of Na ra may also be interpreted, in a more restrictive
way, as resulting from the invariance of each of the quantities Na and ra . In this
interpretation, the expansion of the MG must be explained without invoking water
penetration. Recent reports indicate that the aLA MG is even more compact than
previously thought. From NMR diffusion measurements a volume expansion of
20 G 4% [39] or 22 G 5% [40] has been reported and the fluorescence anisotropy
decay of the three tryptophan residues indicates even smaller values of 12–18%
[41]. These results should be compared with the typical 15% expansion on melting
of crystals of small organic molecules [42]. Considering also that native proteins
are typically 4% more densely packed than small-molecule crystals [42], a 20% ex-
pansion of aLA upon ‘‘melting’’ of the side chains seems plausible. In this view,
which also appears to be consistent with calorimetric data [43, 44], the weakening
of the attractive van der Waals interactions in the protein core is compensated by
the configurational entropy associated with alternative side chain packings without
the need to invoke a significant amount of water penetration.
Although the individual factors Na and ra in the product Na ra cannot be deter-
mined from MRD data alone, the interpretational ambiguity for aLA might be re-
solved by investigating other proteins. If the Na ra invariance observed for aLA is
an accidental result of large compensating changes in Na and ra , then it is unlikely
to be a universal phenomenon. On the other hand, if the generic MG state has
an expanded and dynamic but dry core and a native-like surface, then neither
Na nor ra would change much in the native to MG transition and their product
should be nearly invariant for most MG proteins. This latter scenario is supported
by the MRD data available so far [37]. Thus, for human carbonic anhydrase (CA)
in form II (homologous with the bovine B form), the 17 O MRD profiles yield
Na ra ¼ ð7:0 G 0:4Þ 10 3 for the native state at pH 9.0 and ð6:9 G 0:3Þ 10 3 for
the MG state at pH 3.0 [37]. With Na ¼ 770 estimated from the solvent-accessible
area of the native protein, this yields a rotational retardation factor ra of 9:1 G 0:5,
twice as large as for most other native proteins. This large value is attributed to the
unusual prevalence in CA of hydrated surface pockets and, possibly, to internal
motions of the large number of long-lived buried water molecules (Nb Sb2 ¼ 8–9
for both the native and the MG state). The observed state invariance of the large
Na ra value (for a protein of this size) suggests that strongly motionally retarded wa-
ter molecules occupy surface pockets also in the MG state of CA.
A different picture emerged from a 1 H NMR study of (bovine) CA in the native
(pH 7.2), MG (pH 3.9) and urea-denatured (pH 7.2) states [45]. Magnetization
transfer between water and protein protons was observed on the 100 ms time scale
in the MG state, but not in the native or denatured states. This finding was taken
as evidence for a much greater hydration (due to extensive water penetration) for
the MG state than for the other two states. However, a more plausible explanation
is that the observed magnetization transfer is relayed by the roughly 50 hydroxyl
and carboxyl protons present in CA at pH 3.9, which exchange with water protons
216 8 Protein Conformational Transitions as Seen from the Solvent
50
40 Native
MG
30
R1 (s–1)
x
Denatured
20
10
0
1 10 100
ν0 (MHz)
Fig. 8.6. Water 17 O MRD profiles from 2.1 parameter fits according to Eq. (8) or its
mM solutions of equine apomyoglobin (D2 O, bilorentzian generalization. The presence of
27 C) at pH 5.9 (native), 4.2 (molten globule, the MG state at pH 4.2 was verified by the
MG) and 2.2 (denatured) [37]. The vertical characteristic CD profile and loss of the upfield
axis shows the excess relaxation rate methyl resonances in the 1 H NMR spectrum.
R1x ¼ R1 R bulk . The curves are three- or five-
within 100 ms at pH 3.9, but are either titrated (carboxyl) or exchange too slowly
(hydroxyl) at pH 7.2 [46].
Also for apomyoglobin (apoMb), the 17 O MRD profiles for the native (pH 5.9)
and MG (pH 4.2) states converge at high frequencies (see Figure 8.6), yielding
Na ra ¼ ð2:1 G 0:4Þ 10 3 for the native state (corresponding to ra ¼ 4:0 G 0:8)
and ð2:2 G 0:2Þ 10 3 for the MG state [37]. The MG state of apoMb has a high
helix content and native-like fold [47], but, with about 70% larger hydrodynamic
volume than native apoMb, it is less compact than the aLA MG. A uniform volume
expansion by 70% can hardly be achieved without water penetration. However,
the observed 20% increase in hydrodynamic radius [48] and radius of gyration
[49] could be produced by a smaller expansion of the core (as for aLA) and some
protrusion of polypeptide segments from fraying helices and termini [49]. Because
water does not penetrate the core and water in contact with protruding polypep-
tide segments should not be much rotationally retarded (ra ¼ 1–2, as for small
molecules), this picture of the MG structure is compatible with the observed Na ra
invariance.
Myoglobin has an exceptionally large amount of internal cavities, some of
which contain disordered water molecules [50]. The exchange of these buried wa-
ter molecules among internal hydration sites may be responsible for the observed
high-frequency dispersion with a correlation time of a few nanoseconds. The MRD
profiles in Figure 8.6 were thus fitted with a bilorentzian dispersion function. It is
8.4 Electrostatic Perturbations 217
apparent from the MRD profiles that the number of long-lived water molecules is
comparable in the native (at least three) and MG (at least two) states.
At pH 2 in the absence of salt, apoMb is extensively unfolded with 50% larger
radius of gyration than in the native state [49]. This contrast with aLA, which
forms an MG under these conditions, presumably due to the conformational
constraints imposed by the four disulfide bonds (there are no disulfide bonds in
apoMb). Another factor contributing to this difference may be the 80% higher sur-
face charge density (ratio of net charge to solvent-accessible area) in apoMb. The
17
O MRD profile from apoMb at pH 2 (see Figure 8.6) exhibits a small dispersion
(Nb Sb2 ¼ 0:3 G 0:1), probably due to a small fraction of coexisting MG state. Even
though the acid-denatured state should be much more solvent exposed (larger Na )
than the native or MG states, it has a smaller Na ra of ð1:8 G 0:1Þ 10 3 . This im-
plies that, on average, water penetrating the unfolded protein is considerably less
rotationally retarded (smaller ra ) than water at the surface of the native or MG pro-
teins. With Na A 1300 computed from the solvent-accessible area of the fully ex-
tended polypeptide chain [51] we obtain a lower bound of ra A 1:4, similar to the
rotational retardation for small organic molecules [11]. Because the polypeptide
chain cannot be fully solvent exposed even in the denatured state of apoMb, ra
should be somewhat larger, but still substantially smaller than the value of 4–5 typ-
ical for native proteins. The 15% reduction in Na ra on acid denaturation of apoMb
contrast with the 30% increase on heat denaturation of RNase A at pH 2 (see Sec-
tion 8.3.1). This difference may result from the four disulfide bonds in RNase A,
which force the denatured protein to adopt more compact conformations, where
the penetrating water molecules are more strongly rotationally retarded.
In summary, 17 O MRD data from three proteins show that the MG state has the
same value for the surface hydration parameter Na ra as in the native state. The
possibility that, for all three proteins, this invariance is the result of an accidental
near-perfect compensation of a large increase in Na (due to extensive water pene-
tration of the MG structure) and a corresponding decrease in ra seems unlikely
because the native (and, presumably, MG) structures of the proteins are very differ-
ent: apoMb is mainly a-helical, CA mainly b-sheet, and aLA contains one a and one
b domain. Furthermore, the rotational retardation factor ra for native CA is a factor
2.3 larger than for the other two proteins. An accidental compensation would
therefore require an exceptionally large water penetration of the MG state of CA.
For these reasons, we believe that the MRD data support the picture of a dry mol-
ten globule, with little or no water penetration of the protein core and with native-
like surface hydration. This view is further supported by the finding that the MG
and native states have a comparable number of long-lived water molecules trapped
within the protein structure.
To the extent that the MG state plays the role of a kinetic folding intermediate,
the MRD results also provide new insights into the mechanism of protein folding.
Notably, they indicate that the cooperative transition from the MG to the native
state is not accompanied (or driven) by water expulsion. Consequently, this transi-
tion only involves the locking in of the side chains in the native conformation, with
the consequent entropy loss being compensated mainly by more favorable van der
218 8 Protein Conformational Transitions as Seen from the Solvent
Waals interactions in the more densely packed protein core. Furthermore, the pres-
ence of some long-lived water molecules in the MG state suggests that some or all
of the internal water molecules that stabilize the native protein by extending the
hydrogen bond framework are already in place when the native tertiary structure
is fully formed. This underscores the important role of internal water molecules,
not only in stabilizing the native protein, but also in shaping the energy landscape
that governs the folding process.
8.5
Solvent Perturbations
Early work suggested that whereas thermally denatured proteins tend to be rela-
tively compact and may retain some native-like structure, proteins exposed to high
concentrations of denaturants like urea or guanidinium chloride (and disulfide re-
duction) are essentially structureless random coils [5]. This view of solvent dena-
tured proteins has received support from SAXS measurements, showing that, for
many proteins, solvent denaturation roughly doubles the radius of gyration. The
picture of the denatured state as a fully unfolded, and thus fully solvent-exposed,
polypeptide chain is also implicit in the widely used solution transfer approach to
protein stability [23]. However, because a theta solvent does not exist for a hetero-
polymer, it is clear that the random-coil model is, at best, an approximation. In fact,
a growing body of experimental evidence for the survival of significant amounts of
residual structure under the harshest denaturing conditions [52, 53] suggests that
the random-coil model is, in many cases, far from reality.
The phenomenon of solvent denaturation raises two fundamental questions: (i)
What is the structure (in a statistical sense) of the denatured protein? and (ii) What
is the mechanism whereby denaturants destabilize native proteins? Because pro-
tein stability results from the free energy difference between native and denatured
states, the second problem cannot be fully solved without an answer to the first
question [54]. While detailed information about polypeptide conformation in dena-
tured states is beginning to emerge, our knowledge about the other side of the coin
– the solvent – remains incomplete and indirect. Solvent denaturation is obviously
a result of altered protein–solvent interactions, but there is little consensus about
the precise thermodynamic and structural role of the solvent. At the high denatur-
ant concentrations commonly used, nearly all water molecules interact directly
with the denaturant. Solvent denaturation thus interferes directly with the main
driving force for protein folding – the hydrophobic effect [2–5]. Moreover, a mixed
solvent in contact with a protein is, in general, spatially inhomogeneous. In other
words, the protein may exhibit preferential solvation, preferential solvent penetra-
tion, and differential denaturant ‘‘binding’’ in the native and denatured states [14,
15]. Through these complex phenomena, the structural and mechanistic aspects of
solvent denaturation are inextricably linked. To make progress, we need to take a
closer look at the solvent.
8.5 Solvent Perturbations 219
8.5.1
Denaturation Induced by Urea
Despite the widespread use of urea in studies of protein stability and folding ther-
modynamics [2, 5, 23] (see Chapter 3), the molecular mechanism whereby urea
unfolds proteins has not been established. In particular, it is not clear whether
urea acts directly by binding to the polypeptide, indirectly by perturbing solvent-
mediated hydrophobic interactions, or by a combination of these mechanisms.
The direct mechanism is made plausible by the structural similarity between urea
and the peptide group, suggesting that urea–peptide interactions, like peptide–
peptide interactions, can compete favorably with water–peptide interactions. If
this is the case, then solvent denaturation can be driven simply by the exposure
of more binding sites in the denatured protein [55]. The indirect mechanism is
supported by the observation that urea enhances the solubility of not-too-small
nonpolar solutes or groups [56, 57] and, by implication, weakens the hydrophobic
stabilization of the folded protein.
The available structural data on urea–protein interactions are limited and, with
few exceptions [58], are restricted to native proteins [59–61]. Explicit solvent simu-
lations cannot access the time scales on which solvent-induced protein unfolding
takes place and have only provided information about urea–protein interactions in
the native state or in partially unfolded states at very high temperatures [62, 63].
The MRD method is an ideal tool for investigating solvent denaturation. By exploit-
ing different nuclear isotopes, water–protein and denaturant–protein interactions
can be monitored simultaneously across the unfolding transition. So far, the MRD
method has been used to study urea denaturation of two proteins, b-lactoglobulin
(see Section 8.3.2) and intestinal fatty acid-binding protein (I-FABP) [64].
I-FABP is a 15 kDa cytoplasmic protein with a b clam structure composed of 10
antiparallel strands that enclose a very large (500–1000 Å3 ) internal cavity. Lipids
are thought to enter the cavity via a small ‘‘portal’’ lined by two short a-helices. In
the apo form, the cavity is occupied by 20–25 water molecules [65]. The internal
and external hydration of native I-FABP in both apo and holo forms have been
characterized in detail by 17 O and 2 H MRD [66, 67]. A three-parameter fit (see
Eq. (8)) to the low-frequency part (b dispersion) of the 17 O MRD profile at 27 C
yields Nb Sb2 ¼ 2:4 G 0:3 and tb ¼ 6:8 G 0:5 ns [64, 66]. The correlation time tb
agrees quantitatively with the (h/T scaled) rotational correlation time tR of native
I-FABP as determined by 15 N NMR relaxation [68] and fluorescence depolarization
[69].
The MRD profile also exhibits a high-frequency dispersion (labeled g), which
is only partly sampled at 27 C. The g dispersion is produced by the 20–25 water
molecules in the binding cavity, which exchange among internal hydration sites on
a time scale of 1 ns. This intra-cavity exchange has also been seen in molecular
simulations [70, 71]. Because these water molecules remain in the cavity for pe-
riods longer than tR , they also contribute to the b dispersion. Therefore [67],
Nb Sb2 ¼ NI SI2 þ NC SC2 AC2 , where the subscripts I and C refer to singly buried water
220 8 Protein Conformational Transitions as Seen from the Solvent
100
80
R1 (s–1)
60
x
40
20
0
1 0
10 2
4
ν0 ( 100 6
MH 8 M)
z) 1000 10 CU (
17
Fig. 8.7. Water O MRD profiles from 2.4 mM solution of apo
I-FABP at pH 7.0 (52% D2 O, 27 C) and 10 different urea
concentrations in the range 0–8.6 M [64]. The vertical axis
shows the excess relaxation rate R1x ¼ R1 R bulk .
molecules and water molecules in the large binding cavity, respectively. The local,
root-mean-square order parameters for these two classes of hydration sites are de-
noted by SI and SC , while the anisotropy parameter A C is, roughly speaking, a
measure of the nonsphericity of the cavity shape [67]. A detailed analysis [65–67]
indicates that the b dispersion has contributions from the 20–25 cavity water mol-
ecules as well as from an isolated water molecule (known as W135) buried near a
hydrophobic cluster at the turn between b-strands D and E. This internal water
molecule is conserved across the family of lipid-binding proteins and must there-
fore contribute importantly to the stability of the native protein [72]. Because the
hydrophobic cluster at the D–E turn forms early on the folding pathway, W135
can be used as an MRD marker for this (un)folding event.
Figure 8.7 shows 17 O MRD profiles from the apo form (without bound fatty acid)
of I-FABP at 10 different urea concentrations, CU ¼ 0–8.6 M [64]. An examination
of the dispersion parameters shows that the maximum seen in Figure 8.7 is due to
an increase of Nb Sb2 by nearly one unit in the range 0–3 M urea, where CD data
indicate that the protein is fully native (see Figure 8.8a). Whereas the CD data are
well described by a two-state model, the MRD data thus signal the presence of an
intermediate state. This result conforms with the detection, by 1 H– 15 N HSQC
NMR, of an intermediate with maximum population in the range of 2.0–3.5 M
urea [73].
The MRD data yield a significantly higher denaturation midpoint (C1/2 ¼ 6:5 M)
than the CD data (5.1 M). Again, this is consistent with 1 Ha 15 N HSQC NMR spec-
tra [73], demonstrating that native-like structural elements persist up to 6.5 M
urea, where CD and fluorescence data suggest that the protein is fully denatured.
The observation of a substantial 17 O dispersion at 6.5 M urea implies that the
residual protein structure is sufficiently permanent to trap water molecules for pe-
8.5 Solvent Perturbations 221
3 1
Fraction native
0.8
NβSβ2
2
0.6
1 0.4
a 0.2
0 0
10
τβ x η(0)/η(CU) (ns)
b
0
0 2 4 6 8
CU (M)
Fig. 8.8. Urea concentration dependence of a fit based on the standard two-state linear free
MRD parameters for apo I-FABP, derived from energy model. b) The correlation time tb ,
the 17 O MRD profiles in Figure 8.7 [64]. a) viscosity-corrected to remove the effect of urea
Nb Sb2 and the two-state fit (dashed curve). The on the hydrodynamic friction. The dashed line
solid curve and the right-hand axis refer to the corresponds to the independently determined
apparent fraction native protein, derived from rotational correlation time of native I-FABP.
216 and 222 nm CD data (12 mM I-FABP) and
riods longer than 10 ns. This residual structure may be related to the equilibrium
folding intermediate detected at CU ¼ 4–7 M by 19 F NMR on fluorinated Trp82
[74], the backbone NH of which donates a hydrogen bond to the long-lived internal
water molecule (W135) in the D–E turn. The vanishing of Nb Sb2 at 8.6 M urea (see
Figure 8.8a) therefore indicates that the binding cavity has collapsed and that the
hydrophobic cluster at the D–E turn has disintegrated.
In contrast to Nb Sb2 , the correlation time tb , when corrected for the variation in
solvent viscosity, is virtually independent of urea concentration (see Figure 8.8b).
In the absence of urea, tb can be identified with the rotational correlation tR of
the protein. The viscosity-corrected tb should therefore reflect any global changes
in protein structure upon denaturation. However, the disappearance of the 17 O dis-
persion at 8.6 M urea shows that there are no long-lived water molecules in the
fully denatured protein. For a two-state denaturation, the frequency-dependent
part of R1 would thus be entirely due to the native protein fraction (see Eq. (17)).
The invariance of tb in Figure 8.8b then indicates that the overall structure of the
native state is essentially independent of urea concentration. This conclusion is
consistent with studies of BPTI [60] and hen lysozyme [59, 61], showing that the
native protein structure is virtually unaffected by high urea concentrations. On the
222 8 Protein Conformational Transitions as Seen from the Solvent
other hand, MRD and other data reveal intermediate states in the urea denatura-
tion of I-FABP. The observed invariance of tb thus also requires the hydrodynamic
volume of the intermediate species to be similar to the native state.
The third piece of information obtained from a three-parameter fit (see Eq. (8))
to (the low-frequency part of ) the 17 O MRD profile is the surface hydration param-
eter Na ra. For native I-FABP, the interpretation of this quantity is complicated by
water exchange among hydration sites in the large binding cavity, which contrib-
utes to Na ra to roughly the same extent as the rotational retardation of some 460
water molecules in contact with the external protein surface. However, in the dena-
tured state of I-FABP, present at 8.6 M urea, the binding cavity has collapsed.
Under these conditions, Na ra is governed by solvent exposure, preferential solva-
tion, and rotational retardation. While it is impossible to separate these effects,
some conclusions can be drawn with the aid of independent data. Thus, if the
urea-binding constant KU is assumed to lie in the range 0.05–0.2 M1 , we obtain
NS rS ¼ ð4:6–9:4Þ 10 3 (see Section 8.7.2.6; the subscript S refers to the external
protein surface). If the urea denatured state of I-FABP resembled a fully solvent-
exposed polypeptide chain, we would expect rS A 1:3 [11, 37] and NS A 1250
(based on the solvent-accessible area of the fully unfolded protein). The NS rS value
predicted for the fully unfolded polypeptide chain is thus a factor 3–6 smaller than
the experimental result. Because NS cannot exceed the estimate for the unfolded
chain, it follows that the rotational retardation factor rS for urea denatured
I-FABP is substantially larger than expected for a fully unfolded structure. This
is expected to be the case if the denatured protein is penetrated by water strings
or clusters that interact simultaneously with several polypeptide segments. In other
words, the MRD data indicate that denatured I-FABP is considerably more com-
pact than a random coil even in 8.6 M urea. This conclusion is in line with recent
reports of residual native-like structure in 8 M urea for staphylococcal nuclease [52]
and hen lysozyme [53].
Because the expected value of NS rS is similar for the native (460 4:0 ¼
1:8 10 3 ) and fully unfolded (1250 1:3 ¼ 1:6 10 3 ) states, it is clear that this
quantity is not related to protein structure in a simple way. Nevertheless, one can
easily imagine denatured protein structures where NS rS would have much smaller
or larger values. Experimental NS rS values thus provide valuable information, even
if a unique structural interpretation is precluded. For example, if the relative
change in NS rS on going from the native to the denatured state varies substantially
among different proteins (of similar size) or for different denaturation agents, then
we can expect a corresponding variation in the properties of the denatured states.
For the two urea denatured proteins that have been investigated by MRD, bLG and
I-FABP, the relative change in NS rS is 100–300%. This may be contrasted with the
much smaller changes of 30% for heat denatured RNase A (see Section 8.3.1) and
15% for acid-denatured apoMb (see Section 8.4). While the molecular basis of
these results remains to be clarified, the very much larger change in NS rS for
urea-denatured proteins can hardly be explained by more extensive unfolding or
solvent exposure (larger NS ). More likely, the dramatic increase in NS rS reflects a
stronger rotational retardation (larger rS ) of water molecules that penetrate urea
8.5 Solvent Perturbations 223
5.8 M
R1 (s–1)
4
x
3.1 M
2
0
1 10 100
ν0 (MHz)
Fig. 8.9. Urea 2 H MRD profiles from 2.3 mM to the same urea/protein mole ratio
solutions of apo I-FABP at pH 7.0 (52% D2 O, (NUT ¼ 1470, corresponding to CU ¼ 3:1 M) to
27 C) and two different urea concentrations remove the trivial dependence on urea
[64]. The vertical axis shows the excess urea concentration. The curves are three-parameter
2
H relaxation rate R1x ¼ R1 R bulk , normalized fits according to Eq. (8).
224 8 Protein Conformational Transitions as Seen from the Solvent
about denaturant–protein interactions. As for the water dispersions, the urea corre-
lation time tb can be identified with the tumbling time tR of the protein. In princi-
ple, the species giving rise to the urea dispersion could be either bound urea or
labile hydrogens in the protein. However, at pH 7 hydrogen exchange between pro-
tein and urea, whether direct or water mediated, is slow on the relaxation time
scale [60, 64]. Consequently, the urea 2 H dispersion, observed at all investigated
urea concentrations from 3.1 to 7.5 M, provides direct evidence for urea binding
to I-FABP with a residence time between 10 ns and 200 ms.
The fits to the urea 2 H profiles yield Nb Sb2 values in the range 0.5–1.0. These
dispersions might result from urea molecules trapped in the binding cavity,
but then Nb Sb2 should decrease with increasing urea concentration and vanish at
CU ¼ 7:5 M, where the CD data indicate that the cavity is disrupted (see Figure
8.8a). Since Nb Sb2 actually increases (slightly) with CU , this possibility can be ruled
out. It can therefore be concluded that both the native and denatured forms of
I-FABP contain at least one specific urea-binding site. This finding raises the pos-
sibility that strong urea binding contributes significantly to the unfolding ther-
modynamics, thus compromising the linear extrapolation method widely used to
determine the stability of the native protein in the absence of urea [23].
The high urea concentrations needed to denature proteins implies that weak
binding to many sites is involved. Information about such interactions is contained
in the MRD parameter Na ra , which can be attributed to urea molecules in short-
lived (< 1 ns) association with the external protein surface. Taking into account
exchange averaging over coexisting native and denatured proteins (see Eq. (17))
as well as preferential solvation effects (see Eq. (18)), it can be shown [64] that
the Na ra results are consistent with urea-binding constants in the range 0.05–
0.2 M1 and a similar rotational retardation for urea and water in the denatured
protein.
To summarize, the urea 2 H and water 17 O MRD data support a picture of the
denatured state where much of the polypeptide chain participates in clusters that
are more compact and more ordered than a random coil but nevertheless are pene-
trated by large numbers of water and urea molecules. These solvent-permeated
clusters must be sufficiently compact to allow side chains from different polypep-
tide segments to come into hydrophobic contact, while, at the same time, permit-
ting solvent molecules to interact favorably with peptide groups and with charged
and polar side chains. The exceptional hydrogen-bonding capacity and small size of
water and urea molecules are likely to be essential attributes in this regard. In such
clusters, many water and urea molecules will simultaneously interact with more
than one polypeptide segment and their rotational motions will therefore be more
strongly retarded than at the surface of the native protein. While the hydrogen-
bonding capacity per unit volume is similar for water and urea, the 2.5-fold larger
volume of urea reduces the entropic penalty for confining a certain volume of sol-
vent to a cluster. The energetics and dynamics of solvent included in clusters is ex-
pected to differ considerably from solvent at the surface of the native protein. This
view is supported by the slow water and urea rotation in the denatured state, as
deduced from the MRD data.
8.5 Solvent Perturbations 225
8.5.2
Denaturation Induced by Guanidinium Chloride
60
αLA HEWL RNase A
50
40
R x (s )
–1
1
30
20 4M
10
2 10 50 2 10 50 2 10 50
ν (MHz)
0
Fig. 8.10. Water 17 O MRD profiles from native solvent was D2 O and the GdmCl concentration
and denatured states of bovine a-lactalbumin for the denatured states was 4.4 M (aLA),
(1.9 mM aLA, pH 8.4 native, pH 7.0 7.0 M (HEWL), or 4.0 M (RNase A). The verti-
denatured), hen lysozyme (7.0 mM HEWL, cal axis shows the excess relaxation rate
pH 4.4 native, pH 5.0 denatured), and R1x ¼ R1 R bulk , scaled to the same water/
ribonuclease A (3.8 mM RNase A, pH 4.0), all protein mole ratio (NT ¼ 28 500) for all three
at 27 C [37]. The MRD profiles refer to the proteins. The curves are three-parameter fits
native (open circles), denatured (filled circles) according to Eq. (8).
and reduced-denatured (crosses) protein. The
that the hydrodynamic volume of the native protein expands by 120 G 10% in the
disulfide-intact denatured state (6 M GdmCl) and by 380 G 20% on disulfide re-
duction (without GdmCl) [40].
As judged by the 17 O MRD data in Figure 8.10, the reduced–denatured states of
the three proteins are very similar with regard to hydration and are therefore likely
to have similar configurational statistics. (The smaller excess relaxation for HEWL
can be attributed to the higher GdmCl concentration. When HEWL and aLA are
both studied at 7 M GdmCl, nearly identical MRD profiles are obtained for the de-
natured states.) However, the disulfide-intact reduced states are markedly different.
For the structurally homologous aLA and HEWL, disulfide reduction has virtually
no effect on the surface hydration parameter Na ra (see Figure 8.10). This is an un-
expected result, because disulfide reduction is accompanied by a large expansion of
the hydrodynamic volume of aLA [40]. Because the invariance of Na ra under disul-
fide reduction has only been observed for two structurally related proteins, and is
not seen for RNase A, it may result from an accidental compensation of a sizeable
increase in Na by a corresponding decrease in ra . For RNase A, Na ra increases by
42 G 2% when the four disulfide bonds are broken (see Figure 8.10). The MRD
8.5 Solvent Perturbations 227
3 a
NβSβ2
6
b
10-3 Nαρα
0
0 1 2 3 4 5 6 7
CG (M)
Fig. 8.11. Guanidinium chloride concentration two-state fit based on Eqs (17) and (18) and
dependence of a) Nb Sb2 and b) Na ra , derived the standard linear free energy model. The
from 17 O MRD profiles for 1.9 mM disulfide- dashed curves show the effect on Na ra for the
intact a-lactalbumin at pH 7 and 27 C [37]. native and denatured states of replacing water
The solid curves resulted from a simultaneous by GdmCl.
data thus indicate that the hydration of the disulfide-intact denatured state is more
native-like for RNase A than for the other two proteins (smaller Na and/or ra ). This
is consistent with reports of substantial residual helix content and aromatic side
chain clustering in GdmCl denatured RNase A.
For a quantitative analysis of MRD data from solvent-denatured proteins, it is
necessary to take preferential solvation effects into account (see Section 8.7.2.6).
This requires MRD data to be recorded at several denaturant concentrations. In
the case of GdmCl, this has been done for aLA (see Figure 8.11) [37]. The analysis
of these data shows that the variation of the MRD parameters Nb Sb2 and Na ra with
the GdmCl concentration is well described by the standard two-state linear free en-
ergy model [23]. The coincidence of the Nb Sb2 and Na ra transitions shows that the
release of the long-lived water molecules from the native protein occurs in the
same cooperative unfolding event as the influx of short-lived water molecules into
the expanded denatured state. The apparent two-state behavior also shows that the
MG intermediate, which is maximally populated at GdmCl concentrations of 2–3
M, cannot have a much larger number (Nb ) of long-lived water molecules than the
native state. The fit to the combined Nb Sb2 and Na ra data yields a midpoint GdmCl
concentration of 2:6 G 0:2 M and an m-value of 4:8 G 1:7 kJ mol1 M1 . The close
agreement of these values with those derived from the far-UV CD measurements
228 8 Protein Conformational Transitions as Seen from the Solvent
[75] indicates that the major hydration changes upon GdmCl denaturation are cor-
related with the disruption of secondary structure.
The fit in Figure 8.11 yields an average GdmCl binding constant K G ¼
0:16 G 0:07 M1 , in good agreement with calorimetrically determined values for
HEWL and RNase A [76, 77]. For the denatured state (see the upper dashed curve
in Figure 8.11b), we obtain NS rS ¼ 5:6 10 3 . This value is a factor 3.8 larger than
expected for the fully hydrated polypeptide chain (based on the maximum solvent-
accessible area and rS ¼ 1:3). Water molecules interacting with the GdmCl dena-
tured state of aLA are thus much more dynamically perturbed (larger rS ) than
water molecules in contact with a fully exposed polypeptide chain. This suggests
that a substantial fraction of the water molecules that penetrate the denatured pro-
tein interact strongly with several polypeptide segments and, therefore, that the
structure of the denatured state contains relatively compact domains. The finding
that NS rS is unaffected by reduction of the four disulfide bonds in aLA (see Figure
8.10) implies that the compact domains are not due to topological constraints, al-
though the loss of the residual dispersion suggests that these domains become
more dynamic. As viewed from the solvent, the GdmCl denatured state of aLA ap-
pears to be similar to the urea-denatured state of I-FABP (see Section 8.5.1).
In summary, the MRD data show that loss of specific internal water sites and in-
flux of external solvent are concomitant with disintegration of secondary structure
(as seen by CD), consistent with the view that disruption of a-helices and b-sheets
is promoted by water acting as a competitive hydrogen bond partner [78]. The large
rotational retardation inferred for the denatured state suggests that water penetrat-
ing the GdmCl denatured protein differs substantially from the hydration shell of
a fully exposed polypeptide chain. Taken together with the residual dispersion
from the denatured state of all three investigated proteins, this suggests that even
strongly solvent-denatured proteins contain relatively compact domains. This is
consistent with the finding that mutations can exert their destabilizing effects di-
rectly on the denatured state [54]. Furthermore, for aLA and HEWL, disulfide
bond cleavage appears to affect the flexibility of the denatured protein more than
its compactness. The MRD data thus suggest that, even under extremely denatur-
ing conditions, proteins are far from the idealized random-coil state.
8.5.3
Conformational Transitions Induced by Co-solvents
60
17O
50
30 0%
20 30%
10
0
1 10 100
ν0 (MHz)
2 2H
30%
16%
1.5
0%
R1x (s–1)
0.5
0
1 10 100
ν0 (MHz)
Fig. 8.12. Water 17 O (top) and 2 H (bottom) protein mole ratio (NTW ¼ 103 600) for all
MRD profiles from 0.5 mM solutions of b- samples. The curves are three-parameter ( 17 O,
lactoglobulin A (25–50% D2 O, 4 C, pH 2.4) 30% TFE) or five-parameter (all other profiles)
at the indicated TFE concentrations [91]. The fits according to Eq. (8) or its bilorentzian
vertical axis shows the excess relaxation rate generalization.
R1x ¼ R1 R bulk , scaled to the same water/
profile can thus provide the tumbling time (and hydrodynamic volume) of the pro-
tein even in the absence of long-lived water molecules.
Analysis of the 4 C data in Figure 8.12 shows that the effect of adding 16% TFE
is essentially to increase the viscosity (thereby increasing tb ), whereas Nb Sb2 is
hardly affected. This indicates that bLG retains a compact native-like structure at
16% TFE and 4 C, consistent with the finding that the 1 Ha 15 N heteronuclear
single-quantum coherence (HSQC) NMR spectrum of bLG in 15% TFE is close
8.5 Solvent Perturbations 231
0.2
19F
0.15
21%
7%
R1x (s–1)
0.1
0.05
0
1 10 100
ν0 (MHz)
Fig. 8.13. Trifluoroethanol 19 F MRD profiles to the same TFE/protein mole ratio
from 1.0 mM solutions of b-lactoglobulin A (NTTFE ¼ 970) for both samples. The curves are
(50% D2 O, 4 C, pH 2.4) at the indicated TFE three-parameter fits according to Eq. (8) and
concentrations [91]. The vertical axis shows the the dashed lines indicate the high-frequency
excess relaxation rate R1x ¼ R1 R bulk , scaled (a) plateau.
232 8 Protein Conformational Transitions as Seen from the Solvent
that the residence time of TFE molecules bound to bLG is in the range 5–10 ns at
27 C. The 19 F parameter Nb Sb2 is 4.5 and 6 at the two concentrations, indicating
that bLG contains at least five such long-lived TFE molecules. (Since Sb2 may be
well below its rigid-binding limit of 1, the number Nb might be an order of magni-
tude larger.)
According to CD and high-resolution NMR studies [82, 83, 85, 86], bLG retains
its native structure at 7% TFE, whereas at 21% TFE the I and H states are present
at roughly equal populations without any coexisting native protein. The 19 F MRD
results thus suggest that Nb Sb2 is nearly the same in the native and TFE-induced
states. However, the TFE-binding sites are not necessarily the same in the different
protein states. In fact, judging by the strong variation of the 2 H and 17 O profiles
with TFE concentration, this is highly improbable. A more likely scenario is that
the native state has a relatively small number of long-lived binding sites with large
Sb2 , whereas the I and H states are penetrated by a considerably larger number of
less ordered TFE molecules.
As for water, TFE residence times in the nanosecond range cannot be ration-
alized solely in terms of favorable hydrogen bonds and other noncovalent inter-
actions. These interactions are also present in the bulk solvent, where molecular
motions take place on a picosecond time scale. Instead, long residence times result
from trapping of solvent molecules within the protein structure. In the native pro-
tein at 7% TFE, the most likely sites for long-lived TFE molecules are the small
surface pockets and large binding cavity seen in the crystal structures. If the water
molecules that occupy these pockets in the absence of TFE have residence times of
order 100 ps, TFE binding should be accompanied by a reduction of Na ra in the
17
O profile. Although 17 O data at 7% TFE are not available, such a reduction is in-
deed seen at 16% TFE.
Addition of 30% TFE reduces Na ra ( 17 O) by a factor 4 (at both temperatures),
even though the expanded state of bLG should have a larger solvent-accessible
area than the native state. If this is due to replacement of water by TFE at the pro-
tein surface, it should be accompanied by an increase in Na ra ( 19 F). This is indeed
observed: Na ra ( 19 F) increases from 150 G 20 at 7% TFE to 350 G 90 at 21% TFE.
Because TFE is a poor hydrogen bond acceptor and interacts only weakly with pep-
tides, we expect ra to be smaller for TFE than for water. The Na ra ( 19 F) values are
therefore consistent with several hundred TFE molecules at the protein surface.
On purely geometrical grounds, we estimate that about 225 TFE molecules are
needed to cover the surface of native bLG. In the expanded state, this number
would be larger. On the other hand, if water and TFE were uniformly distributed
throughout the solvent, then only about 30 TFE molecules would make contact
with the (native) protein surface at 21% TFE. To account for Na ra ð 19 FÞ ¼ 350, we
would then need a retardation factor of about 10, which is clearly unrealistic. The
19
F MRD data thus indicate that bLG has a strong preference for solvation by TFE.
This is in accord with the two- to threefold TFE enrichment in the surface layer of
peptides in 30% TFE seen in several simulations [87–89].
In summary, the transformations N ! I ! H at 27 C are found to be accompa-
nied by a progressive expansion of the protein and loss of specific long-lived hydra-
8.7 Experimental Protocols and Data Analysis 233
tion sites. The observation of 17 O and 19 F dispersions when the protein is in the H
state demonstrates long-lived association of water (> 40 ns) and TFE (5–10 ns)
with the expanded protein. These long residence times indicate that solvent mol-
ecules penetrate the protein matrix. The number of long-lived internal TFE mol-
ecules is @5/Sb2 , with Sb2 in the range 0–1. In the N ! H transition, the surface
hydration parameter decreases by a factor 4, while the corresponding TFE parame-
ter increases by a factor 2 between 7 and 21% TFE. The MRD data are consistent
with a strong accumulation of TFE at the surface as well as in the interior of the
protein. The 19 F MRD data also demonstrate long-lived binding of several TFE
molecules in the native protein, presumably by displacement of water molecules
in deep surface pockets. At 4 C, bLG is much less affected by TFE. The protein
remains in the native state in the presence of 16% TFE, but adopts a nonnative
structure at 30% TFE. This nonnative structure is not penetrated by long-lived
water molecules, but presumably contains internal TFE molecules.
8.6
Outlook
During the past 5 years, the MRD method has been used to study proteins in a
variety of nonnative states induced by thermal, electrostatic, and solvent perturba-
tions. By focusing on the solvent, the MRD method provides a unique perspective
on protein folding and stability and a valuable complement to techniques that
probe the conformation and dynamics of the polypeptide chain. While the quanti-
tative interpretation of MRD data is not always straightforward, the underlying the-
oretical framework of nuclear spin relaxation is rigorous. In general, the MRD
method yields clear-cut information about long-lived protein–solvent interactions,
while inferences about the more transient surface solvation tend to be more model
dependent. Given the high complexity of the investigated systems, one cannot hope
that a single technique will provide all the answers. MRD applications to nonnative
proteins are still in an early exploratory phase and further studies are needed to es-
tablish the generality of the (sometimes unexpected) results obtained so far. MRD
studies of nonnative proteins should not only be extended to a larger set of pro-
teins, they can also be used to examine other types of denaturants and co-solvents
as well as different perturbations, including pressure-denatured proteins.
8.7
Experimental Protocols and Data Analysis
8.7.1
Experimental Methodology
The term magnetic relaxation dispersion (MRD) refers to the frequency depen-
dence of spin relaxation rates arising from thermal molecular motions. In a typical
234 8 Protein Conformational Transitions as Seen from the Solvent
that the static field is varied during the course of an individual relaxation experi-
ment. The principal limitations of FC-MRD are the relatively low maximum attain-
able field and the time required to switch the field, factors that have so far pre-
cluded FC-MRD studies of fast-relaxing nuclei. Due to their different strengths
and weaknesses, the MF-MRD and FC-MRD techniques complement each other.
The initial nonequilibrium polarization required for a longitudinal relaxation
experiment can be created either by manipulating the spin state populations with
a coherent radiofrequency magnetic field without altering the spin energy levels,
as in an MF-MRD experiment at a fixed magnetic field, or by changing the level
spacing without altering the populations, as in the cyclic field variation performed
in FC-MRD. In either case, the nonequilibrium state should preferably be estab-
lished (by the radiofrequency pulse or the static field switch) in a time that is short
compared with the longitudinal relaxation time T1 . Field switching can be accom-
plished mechanically or electronically. The mechanical approach, where the sam-
ple is pneumatically shuttled between two magnetic fields, has the advantage that
the detection field can be provided by a high-field cryomagnet, with consequent
improvements in sensitivity and resolution. However, the long shuttling time (typ-
ically on the order of 100 ms) limits the mechanical approach to samples with
relatively slow relaxation. With electronic switching of the magnet current, the
maximum field is usually in range 0.5–2 T, but the field switching rate may be
100 T s1 or higher, allowing measurements of T1 values in the millisecond range.
lar relaxation of 17 O allows for efficient signal averaging and, in combination with
the small magnetic moment, makes 17 O much less susceptible than 1 H to para-
magnetic impurities.
For MRD studies of denaturants and co-solvents, the nuclides 2 H and 19 F have
been used so far. The 2 H isotope is particularly convenient when the studied mol-
ecule can be deuterated at nonlabile positions, as with DMSO-d 6 in H2 O [95].
However, even species with labile deuterons can be studied, for example, urea-d 4
in D2 O [64], provided that the (pH- and temperature-dependent) hydrogen ex-
change with water is not too fast. Fluorinated co-solvents, like trifluoroethanol
[91], can be studied by 19 F MRD. Like 1 H, the 19 F nuclide has a predominantly
dipolar relaxation mechanism. Because T1 is long (several seconds) for these spin-
1/2 nuclei, paramagnetic O2 makes a significant contribution to the dispersion
profile. This contribution can be eliminated by O2 purging of the protein solution
with Ar or N2 gas.
8.7.2
Data Analysis
X fbk
R1x ðo0 Þ 1 R1 ðo0 Þ R bulk ¼ fa ðR a R bulk Þ þ ð1Þ
k
tk þ 1/R1bk ðo0 Þ
where o0 ¼ 2pn0 is the resonance frequency in angular frequency units. The first
(a) term in Eq. (1) refers to nuclei residing in molecules that have so short-lived
encounters with the protein that the local relaxation rate is frequency independent
throughout the investigated frequency range. The second (b) term is due to nuclei
in molecules that associate with the protein for sufficiently long periods (usually
meaning > 1 ns) that their local relaxation rate is frequency dependent within
the investigated frequency range. Such long-lived encounters generally involve
8.7 Experimental Protocols and Data Analysis 237
well-defined sites, each characterized by a mean residence time t k and a local lon-
P
gitudinal relaxation rate R1k ðo0 Þ. At any time, a fraction fb ¼ k fbk of the nuclei
reside in such sites. The simple form of the second term in Eq. (1) is valid provided
that fbk f 1 and R1bk g R bulk . This is nearly always the case in MRD studies of pro-
tein solutions.
This expression is exact for 2 H and an excellent approximation for 17 O. The spec-
tral density function, JðoÞ, is the cosine transform of the time autocorrelation func-
tion for the fluctuating electric field gradient. For a small molecule rigidly bound to
a protein undergoing spherical-top rotational diffusion, it has the Lorentzian form
2 tbk
Jbk ðoÞ ¼ oQk ð3Þ
1 þ ðotbk Þ 2
The effective correlation time tbk is determined by the residence time tk in site k
and the rotational correlation time tR of the biomolecule according to
1 1 1
¼ þ ð4Þ
tbk tR tk
The quadrupole frequency oQ depends on the spin quantum number, the quadru-
pole coupling constant, and the asymmetry parameter of the electric field gradient
tensor. For 2 H and 17 O in water, oQ ¼ 8:7 10 5 s1 and 7:6 10 6 s1 , respectively
[7]. Equation (3) is readily generalized to nonspherical proteins with symmetric-top
rather than spherical-top rotational diffusion. However, for globular proteins, the
effect of anisotropic rotational diffusion on the shape of the relaxation dispersion
is usually negligible.
In general, small molecules are not rigidly attached to the protein on the time
scale (tR ) of protein tumbling, but undergo restricted local rotational motions on
time scales short compared with tR . If these local motions are much faster than
the global motion (with correlation time tbk ) and the highest investigated MRD fre-
quency, then the generalization of Eq. (3) is [96]
( )
2 2 loc 2 tbk
Jbk ðoÞ ¼ oQk ð1 Sbk Þtbk þ Sbk ð5Þ
1 þ ðotbk Þ 2
loc
where tbk is an effective correlation time for the local restricted rotation. Further-
more, Sbk is the generalized second-rank orientational order parameter for site k,
238 8 Protein Conformational Transitions as Seen from the Solvent
which has a maximum value of unity for a rigidly attached molecule. When this
spectral density is substituted into Eq. (2), we obtain
$ %
loc 0:2 0:8
R1bk ðo0 Þ ¼ Rbk þ bk tbk þ ð6Þ
1 þ ðo0 tbk Þ 2 1 þ ð2o0 tbk Þ 2
bNT
Nb Sb2 ¼ ð9Þ
oQ2
P
Here, Nb ¼ k Nbk is the total number of small molecules in long-lived (> 1 ns)
association with the protein and Sb is their root-mean-square order parameter. Be-
cause 0 a Sb2 a 1, the quantity Nb Sb2 furnishes a lower bound on the number Nb.
The number Na of perturbed but short-lived solvent molecules is usually very
much larger than the number Nb of long-lived molecules. We can then neglect the
8.7 Experimental Protocols and Data Analysis 239
aNT
Na r a ¼ ð10Þ
R bulk
1
tR f tk f ð11Þ
bk tR
which may be said to define the MRD residence time window. Of course, sites that
do not obey these inequalities may still contribute to the dispersion, but do so with
loc
less than the maximum contribution fbk bk tR . If Rbk is neglected, violation of the
right-hand (fast-exchange) inequality in Eq. (11) reduces the effective dispersion
amplitude parameter and the effective correlation time by the relative amounts
1/2
b k0 1 þ tR /tk
¼ ð12Þ
bk 1 þ tR /tk þ bk tR tk
0
tbk 1
¼ ð13Þ
tR ½ð1 þ tR /tk Þð1 þ tR /tk þ bk tR tk Þ 1/2
For residence times on the central plateau of the MRD window, only lower
and upper bounds on tk can be established, as expressed by the inequalities in Eq.
(11). On the wide flanks of the MRD window, however, tk can be quantitatively de-
termined. On the short-tk flank, this requires independent information about tR . If
the tb value deduced from the MRD profile is much smaller than tR , then it can be
240 8 Protein Conformational Transitions as Seen from the Solvent
directly identified with the residence time tk without the need for an accurate esti-
mate of tR . In contrast, determination of residence times comparable to tR requires
knowledge of the precise value of tR . This can be obtained from 1 H or 2 H MRD
data, which generally contain contributions from labile protein hydrogens (with
tk g tR ).
Longer residence times can be determined on the long-tk flank of the MRD
window. According to Eqs (12) and (13), tk can be obtained from either b k0 or tbk 0
,
0 0 1/2
since b k /b k ¼ tbk /tR ¼ ð1 þ bk tR tk Þ in this regime. To determine tk at a given
temperature, tR and bk must be known. From a variable-temperature MRD study,
bk (assumed to be temperature independent) and the activation enthalpy of tk can
be determined if tR is taken to scale as h/T (as expected for rotational diffusion). If
tR is known at one temperature, tk is obtained at all investigated temperatures [97].
19
8.7.2.4 F Relaxation
19
The F nuclei in a partially fluorinated co-solvent molecule, such as trifluoroetha-
nol (TFE), are relaxed by fluctuating magnetic dipole–dipole interactions involving
19
Fa 19 F and 19 Fa 1 H pairs. In place of Eq. (2), we then have [91, 92]
FF FF FH FH
R1bk ðo0 Þ ¼ 0:2Jbk ðoF Þ þ 0:8Jbk ð2oF Þ þ 0:1Jbk ðoH oF Þ þ 0:3Jbk ðoF Þ
FH
þ 0:6Jbk ðoH þ oF Þ ð14Þ
where oF and oH are the 19 F and 1 H angular resonance frequencies. The dipolar
FF FH
spectral density functions Jbk ðoÞ and Jbk ðoÞ are given by Eq. (5), but with the
quadrupole frequency oQ replaced by the dipole-coupling frequencies oFF and
oFH . For the TFE molecule, the 19 F amplitude parameters a and b ¼ b FF þ b FH
are given by [91]
R bulk
a¼ Na ra ð15Þ
NTTFE
3 o 2FF
b FF ¼ 2 Nb Sb2 ð16aÞ
2 NTTFE
o 2FH
b FH ¼ 2 Nb Sb2 ð16bÞ
NTTFE
where NTTFE is the TFE/protein mole ratio and Na ; ra ; Nb , and Sb now refer to TFE
rather than to water. The factor 2 in Eq. (16) appears because each 19 F nucleus is
dipole coupled to two other 19 F nuclei in the CF3 group or to two protons in the
adjacent CH2 group. The factor 3/2 in Eq. (16a) reflects cross-relaxation within a
homonuclear pair of dipole-coupled nuclei. The dipole-coupling frequencies oFF
and oFH in Eq. (16) refer to bound TFE molecules that tumble with the protein
and are averaged over the much faster internal rotation of the CF3 group. From
the molecular geometry of TFE, one obtains oFF ¼ 3:39 10 4 rad s1 and
oFH ¼ 2:39 10 4 rad s1 .
8.7 Experimental Protocols and Data Analysis 241
where xN is the fraction native protein. It follows from Eq. (8), that the high-
frequency amplitude parameter a may be expressed as a population-weighted
average in the same way. In principle, the observed (b) dispersion is a population-
weighted superposition of dispersions from the native and unfolded states. In prac-
tice, the rotational correlation times (tR ) of the native and (partially) unfolded states
are often sufficiently similar that the dispersion profile can be accurately described
by a single Lorentzian. Also the amplitude parameter b can then be expressed as a
population-weighted average. Of course, if the long-lived solvent sites have been
lost in the unfolded state, then only the native state contributes to the b parameter.
KD a Dc
yD ¼ ð18Þ
a xW þ KD a Dc
242 8 Protein Conformational Transitions as Seen from the Solvent
References
9
Stability and Design of a-Helices
Andrew J. Doig, Neil Errington, and Teuku M. Iqbalsyah
9.1
Introduction
Proteins are built of regular local folds of the polypeptide chain called secondary
structure. The a-helix was first described by Pauling, Corey, and Branson in 1950
[1], and their model was quickly supported by X-ray analysis of hemoglobin [2].
Irrefutable proof of the existence of the a-helix came with the first protein crystal
structure of myoglobin, where most secondary structure is helical [3]. a-Helices
were subsequently found in nearly all globular proteins. It is the most abundant
secondary structure, with A30% of residues found in a-helices [4]. In this review,
we discuss structural features of the helix and their study in peptides. Some earlier
reviews on this field are in Refs [5–11].
9.2
Structure of the a-Helix
favourable. One reason why polypeptides may have been selected as the polymer of
choice for building functional molecules is that the sterically most stable confor-
mations also give strong hydrogen bonds.
The residues at the N-terminus of the a-helix are called N 0 -N-cap-N1-N2-N3-N4
etc., where the N-cap is the residue with nonhelical f; c angles immediately pre-
ceding the N-terminus of an a-helix and N1 is the first residue with helical f; c an-
gles [12]. The C-terminal residues are similarly called C4-C3-C2-C1-C-cap-C 0 etc.
The N1, N2, N3, C1, C2, and C3 residues are unique because their amide groups
participate in i; i þ 4 backbone–backbone hydrogen bonds using either only their
CO (at the N-terminus) or NH (at the C-terminus) groups. The need for these
groups to form hydrogen bonds has powerful effects on helix structure and stabil-
ity [13]. The following rules are generally observed for N-capping in a-helices: Thr
and Ser N-cap side chains adopt the gauche rotamer, hydrogen bond to the N3
NH and have c restricted to 164 G 8 . Asp and Asn N-cap side chains either adopt
the gauche rotamer and hydrogen bond to the N3 NH with c ¼ 172 G 10 , or
adopt the trans rotamer and hydrogen bond to both the N2 and N3 NH groups
with c ¼ 107 G 19 . With all other N-caps, the side chain is found in the gaucheþ
rotamer so that the side chain does not interact unfavorably with the N-terminus
by blocking solvation and c is unrestricted. An i; i þ 3 hydrogen bond from N3
NH to the N-cap backbone CbO is more likely to form at the N-terminus when an
unfavorable N-cap is present [14].
Bonds between sp 3 hybridized atoms show preferences for dihedral angles of
þ60 (gauche ), 60 (gaucheþ ) or 180 (trans). Within amino acid side chains
the preferences for these rotamers varies between secondary structures [15–18].
Within the a-helix, rotamer preferences vary greatly between different positions of
N-cap, N1, N2, N3, and interior [14, 19]. At helix interior positions, the gaucheþ
rotamer is usually most abundant (64%), followed by trans (33%) with gauche
very rare (3%), though considerable variations are seen. For example, Ser and Thr
are gauche 19% and 14% of the time, respectively, as their hydroxyl groups can
hydrogen bond to the helix backbone in this conformation. The b-branched side
chains Val, Ile, and Thr are the most restricted with gaucheþ very common.
9.2.1
Capping Motifs
The amide NH groups at the helix N-terminus are satisfied predominantly by side
chain hydrogen bond acceptors. In contrast, carbonyl CO groups at the C-terminus
are satisfied primarily by backbone NH groups from the sequence following the
helix [13]. The presence of such interactions would therefore stabilize helices. A
well-known interaction is helix capping, defined as specific patterns found at or
near the ends of helices [12, 20–24]. The patterns involve hydrogen bonding and
hydrophobic interactions as summarized in Table 9.1.
A common pattern of capping at the helix N-terminus is the capping box. Here,
the side chain of the N-cap forms a hydrogen bond with the backbone of N3 and,
9.2 Structure of the a-Helix 249
N-capping
p-XXp N-cap ! N3 Capping box Reciprocal H-bonds between residues 24, 25
at N-cap and N3, where S/T at N-cap
and E at N3 are the most frequently
observed residues
hp-XXph N 0 ! N4 Expanded capping Reciprocal H-bonds between residues 26, 298
box/Hydrophobic at N-cap and N3 accompanied by
staple hydrophobic interactions between
residues at N 0 and N4
hP-XXVh N 0 ! N4 Pro-box motif N-cap Pro residues are usually 28
associated with Ile and Leu at N 0 ,
Val at N3 and a hydrophobic
residue at N4
C-capping
hXp-XGh C3 ! C 00 Schellman motif H-bonds between C 00 NH and the C3 CO 31, 223
at and between C 0 NH and C2 CO,
respectively accompanied by hydrophobic
interaction between C3 and C 00
hXp-XGp C2 ! C 00 aL motifs As The Schellman motif but H-bond 32
between C 0 NH and C3 CO where C 00 polar
X-Pro C-cap ! C 0 Pro-capping motif A stabilizing electrostatic interaction 34
of the residues at positions C-cap
and Pro at C 0 with the helix macrodipole
reciprocally, the side chain of N3 forms a hydrogen bond with the backbone of the
N-cap [25]. The definition of the capping box was expanded by Seale et al. [26] to
include an associated hydrophobic interaction between residues N 0 and N4 and is
also known as a ‘‘hydrophobic staple’’ [27]. A variant of the capping box motif,
termed the ‘‘big’’ box with an observed hydrophobic interaction between nonpolar
side chain groups in residues N4 and N 00 (not N 0 ) [26].
The Pro-box motif involves three hydrophobic residues and a Pro residue at the
N-cap [28]. The N-cap Pro residue is usually associated with Ile or Leu at position
N 00 , Val at position N3, and a hydrophobic residue at position N4. This motif is ar-
guably very destabilizing since it discourages helix elongation due to the low N-cap
preference of Pro [29] and the poor helix propagation parameter value of Val [30].
This suggests that the Pro-box motif will not specially contribute to protein stability
but to the specificity of its fold.
The two primary capping motifs found at helix C-termini are the Schellman and
the aL motifs [31–33]. The Schellman motif is defined by a doubly hydrogen-
bonded pattern between backbone partners, consisting of hydrogen bonds between
250 9 Stability and Design of a-Helices
9.2.2
Metal Binding
In general, the stabilization of folded forms can be achieved by reducing the en-
tropy by cross-linking, such as via metal ions. Another way to stabilize helix confor-
mations, especially in short peptides, is to introduce an artificial nucleation site
composed of a few residues fixed in a helical conformation. For example, the
calcium-binding loop from EF-hand proteins saturated with a lanthanide ion
promotes a rigid short helical conformation at its C-terminus region [35].
Metal ions play a key structural and functional role in many proteins. There
is therefore interest in using peptide models containing metal complexes. In stabi-
lizing proteins, the rule of thumb in metal–ligand binding is that ‘‘hard metals’’
prefer ‘‘hard ligands’’. For example Ca and Mg prefer ligands with oxygen as the
coordinating atoms (Asp, Glu) [36]. In contrast, soft metals, such as Cu and Zn,
bind mostly to His, Cys, and Trp ligands and sometimes indirectly via water mole-
cules [37].
In studies of helical peptide models, soft ligands have been mostly used. In the
presence of Cd ions, a synthetic peptide containing Cys-His ligands i; i þ 4 apart
at the C-terminal region promoted helicity from 54% to 90%. The helicity of a sim-
9.2 Structure of the a-Helix 251
9.2.3
The 310 -Helix
9.2.4
The p-Helix
In contrast to the widely occurring a- and 310 -helices, the p-helix is extremely rare.
The p-helix is unfavorable for three reasons: its dihedral angles are energetically
unfavorable relative to the a-helix [51, 52], its three-dimensional structure has a
1 Å hole down the center that is too narrow for access by a water molecule, result-
ing in the loss of van der Waals interactions, and a higher number of residues
(four) must be correctly oriented before the first i; i þ 5 hydrogen bond is formed,
making helix initiation more entropically unfavorable than for a- or 310 -helices.
The p-helix may be of more than theoretical interest [53–57].
252 9 Stability and Design of a-Helices
9.3
Design of Peptide Helices
9.3.1
Host–Guest Studies
Extensive work from the Scheraga group has obtained helix/coil parameters using a
host–guest method. Long random copolymers were synthesized of a water-soluble,
nonionic guest (poly[N 5 -(3-hydroxypropyl)-l-glutamine] (PHPG) or poly[N 5 -(4-
hydroxybutyl)-l-glutamine] (PHBG)), together with a low (10–50%) content of the
guest residue. Using the s and s Zimm-Bragg helix/coil parameters (see below) for
the host homopolymer, it was possible to calculate those for the guest using helix/
coil theory as a function of temperature. The results from the host–guest work
are in disagreement with most of those from short peptides of fixed sequence (see
below).
9.3.2
Helix Lengths
9.3.3
The Helix Dipole
The secondary amide group in a protein backbone is polarized with the oxygen
negatively charged and hydrogen positively charged. In a helix the amides are
all oriented in the same direction with the positive hydrogens pointing to the
N-terminus and negative oxygens pointing to the C-terminus. This can be regarded
as giving a positive charge at the helix N-terminus and a negative charge at the he-
lix C-terminus [90–92]. In general, therefore, negatively charged groups are stabi-
254 9 Stability and Design of a-Helices
9.3.4
Acetylation and Amidation
A simple, yet effective, way to increase the helicity of a peptide is to acetylate its N-
terminus [22, 98]. This is readily done with acetic anhydride/pyridine, after com-
pletion of a peptide synthesis, but before cleavage of the peptide from the resin
and deprotection of the side chains. Acetylation removes the positive charge that
is present at the helix terminus at low or neutral pH; this would interact unfavor-
9.3 Design of Peptide Helices 255
ably with the positive helix dipole and free N-terminal NH groups. The extra CO
group from the acetyl group can form an additional hydrogen bond to the NH
group, putting the acetyl at the N-cap position. This has a strong stabilizing effect
by approximately 1.0 kcal mol1 compared with alanine [29, 30, 99]. The acetyl
group is one of the best N-caps. We found only Asn and Asp to be more stabilizing
[29].
Amidation of the peptide C-terminus is achieved by using different types of resin
in solid-phase peptide synthesis, resulting in the replacement of COO with
CONH2 . This is similar structurally to acetylation: the helix is extended by one hy-
drogen bond and an unfavorable charge–charge repulsion with the helix dipole
is removed. The energetic benefit of amidation is rather smaller, however, with
the amide group being no better than Ala and in the middle if the C-cap residues
are ranked in order of stabilization effect [29]. As most helical peptides studied to
date are both acetylated and amidated, and acetylation is more stabilizing than
amidation, the helicity of peptides is generally skewed so that residues near the
N-terminus are more helical than those near the C-terminus. Acetylation and ami-
dation are often also beneficial in removing a pH titration that can obscure other
effects.
9.3.5
Side Chain Spacings
Side chains in the helix are spaced at 3.6 residues per turn of the helix. Side chains
spaced at i; i þ 3, i; i þ 4, and i; i þ 7 are therefore close in space and interactions
between them can affect helix stability. Spacings of i; i þ 2, i; i þ 5, and i; i þ 6 place
the side chain pairs on opposite faces of the helix, avoiding any interaction. Care
must therefore be taken to avoid unwanted interactions when designing helices.
For example, we studied these peptides to determine the energetics of the Phe-
Met interaction [100]:
F7M12 Ac-YGAAKAFAAKAMAAKAA-NH2
F8M12 Ac-YGAAKAAFAKAMAAKAA-NH2
The control peptide F7M12 has Phe and Met spaced i; i þ 5 where they cannot
interact; F8M12 moves the Phe one place towards the C-terminus so that Phe
and Met are spaced i; i þ 4 where they can form a noncovalent interaction. The in-
crease in helix content was used to derive the Phe-Met interaction energy. These
sequences are not ideal, however. F7M12 contains a Phe-Lys i; i þ 3 interaction
that is replaced by a Lys-Phe i; i þ 3 interaction in F8M12. If either of these are sta-
bilizing or nonstabilizing the Phe-Met i; i þ 4 energy will be in error. While this
problem is likely to be small, in this case at least, our more recent designs try to
minimize such potential problems. For example, the following peptides, designed
to measure Ile-Lys i; i þ 4 energies, do not introduce or remove an unwanted
i; i þ 3 spacing [101]:
256 9 Stability and Design of a-Helices
IKc Ac-AKIAAAAKIAAAAKAKAGY-NH2
IKi Ac-AKAIAAAKAIAAAKAKAGY-NH2
9.3.6
Solubility
http://www.site.uottawa.ca/~turcotte/resources/HelixWheel/
http://marqusee9.berkeley.edu/kael/helical.htm
9.3 Design of Peptide Helices 257
9.3.7
Concentration Determination
9.3.8
Design of Peptides to Measure Helix Parameters
Numerous peptides have been studied to measure the forces responsible for helix
formation (see below). In general, one or more peptides are synthesized that
contain the interaction of interest, while all other terms that can contribute to
helix stability are known. The helix content of the peptide is measured and the
statistical weight of the interaction is varied until predictions from helix/coil
theory match experiment. The weight of the interaction (and hence its free energy,
as RT ln(weight)) is then known. In practice it is wise to also synthesize a control
peptide that lacks the interaction, but is otherwise very similar. The helix content of
this peptide should be predictable from helix/coil theory using known parameters.
For example, Table 9.2 gives sequences of control and interaction peptides used to
measure Phe-Lys i; i þ 4 side-chain interaction energies [106]. The FK control pep-
tide has no i; i þ 4 side-chain interactions and all the helix/coil weights for these
residues are known. The helicity predicted by SCINT2 [30] for this peptide is
in close agreement with experiment. This is an important confirmation that the
helix/coil model and the parameters used are accurate. The discrepancy between
prediction and experiment for the FK interaction peptide is because this peptide
contains two Phe-Lys i; i þ 4 side-chain interactions whose weights are assumed to
be 1 (i.e., DG ¼ 0). If the Phe-Lys weight is changed to 1.3, the prediction agrees
with experiment. Hence DG (Phe-Lys) ¼ RT ln 1:3 ¼ 0:14 kcal mol1 .
It is important to minimize the error in determining helix/coil parameters. Er-
rors can be calculated by assuming an error in measurement of % helix (G3% is
258
Tab. 9.2. Peptides designed to measure the energetics of the Phe-Lys interaction.
Peptide name Sequence Experimental Predicted nelicity (no Phe-Lys weight DG (Phe-Lys)
helicity side-chain interaction) to fit experiment kcal molC1
reasonable) and refitting the results across this experimental error range. For the
case above, DG (Phe-Lys) was calculated for 38.1% and 44.1% helix, giving a range
of 0.05 to 0.18 kcal mol1 . This is a remarkably small error range, resulting
from the high sensitivity of helix content to a small change in helix stability. This
sensitivity can be maximized by including multiple identical interactions; here the
FK interaction peptide has two Phe-Lys interactions, for example. The helix con-
tents of the control and interaction peptides should be close to 50%. This may be
difficult to achieve if the residues being used have low helix preferences. The best
way to maximize sensitivity is to calculate it in advance using possible sequences,
helix/coil theory, an error of G3% and guessing a sensible value for the interaction
energy. The interaction of interest can be placed at various positions in the peptide,
its length can be changed by adding further AAKAA sequences, terminal residues
such as a Tyr can be moved, and solubilizing side chains can be changed. Peptide
design can be lengthy, considering sensitivity and solubility, but is a valuable pro-
cess. The complex nature of the helix/coil equilibrium with frayed conformations
highly populated means that considerable variations can be seen for apparently
small changes, such as moving an interaction towards one terminus of a sequence.
9.3.9
Helix Templates
A major penalty to helix formation is the loss of entropy arising from the require-
ment to fix three consecutive residues to form the first hydrogen bond of the helix.
Following this nucleation, propagation is much more favored as only a single resi-
due need be restricted to form each additional hydrogen bond. A way to avoid this
barrier is to synthesize a template molecule that facilitates helix initiation, by fixing
hydrogen bond acceptors or donors in the correct orientation for a peptide to bond
in a helical geometry. The ideal template nucleates a helix with an identical geom-
etry to a real helix. Kemp’s group applied this strategy and synthesized a proline-
like template that nucleated helices when a peptide chain was covalently attached
to a carboxyl group (Figure 9.1a) [107–111]. The template is in an equilibrium be-
tween cis and trans isomers of the proline-like part of the molecule. Only the trans
isomer can bond to the helix so helix content is determined by measuring the cis/
trans ratio by NMR. Bartlett et al. reported on a hexahydroindol-4-one template
(Figure 9.1b) [112] that induced 49–77% helicity at 0 C, depending on the method
of determination, in an appended hexameric peptide. Several other templates were
less successful and could only induce helicity in organic solvents [113–115]. Their
syntheses are often lengthy and difficult, partly due to the challenging requirement
of orienting several dipoles to act as hydrogen bond acceptors or donors.
9.3.10
Design of 310 -Helices
(a)
(b)
ric acid (Aib) [116–119]. The presence of steric interactions from the two methyl
groups on the a-carbon in AIB results in the 310 geometry being energetically
favored over the a. Peptides rich in Ca; a -disubstituted a-amino acids are readily
crystallized and many of their structures have been solved (reviewed in Refs [50]
and [120]). Aib is conformationally restricted so that shorter Aib-based helices are
more stable than Ala-based a-helices [121]. Many examples of helix stabilization or
310 -helix formation by Aib have been reported. The a=310 equilibrium has been
studied in peptides. Yokum et al. [122] synthesized a peptide composed of Ca; a -
disubstituted a-amino acids that forms mixed 310 -/a-helices in mixed aqueous/
organic solvents. Millhauser and coworkers have studied the 310 /a equilibrium in
peptides by electron spin resonance (ESR) and nuclear magnetic resonance (NMR)
[47, 49, 123, 124] and argued that mixed 310 -/a-helices are common in polyalanine-
based helices. Yoder et al. [125] studied a series of l-(aMe)-Val homopolymers and
showed that their peptides can form coil, 310 -helix, or a-helix depending on concen-
tration, peptide length, and solvent. Kennedy et al. [126] used Fourier transform
infrared (FTIR) spectroscopy to monitor 310 - and a-helix formation in poly(Aib)-
based peptides. Hungerford et al. [127] synthesized peptides that showed an a to
310 transition upon heating. 310 -Helices have also been proposed as thermody-
namic intermediates where helical peptides of moderate stability exist as a mixture
of a- and 310 -structures [45, 128].
Guidelines for the inclusion of the 20 natural amino acids in 310 -helices can be
taken from their propensities in proteins, both at interior positions [42] and at N-
caps [14]. These are similar, but not identical to a preferences. Side-chain interac-
tions in the 310 -helix are spaced i; i þ 3, with i; i þ 4 on opposite sides of the helix.
9.4 Helix Coil Theory 261
9.3.11
Design of p-helices
To our knowledge, no peptide has yet been made that forms p-helix. There are 4.4
side chains per turn of the p-helix, so i; i þ 5 interactions may weakly stabilize p-
helix over a. Given the other strongly destabilizing features of the p-helix, however,
we doubt whether this effect will ever be strong enough. We expect that no peptide
composed of the 20 natural amino acids will ever fold to a monomeric p-helix in
aqueous solution. We are willing to back up this prediction, but in view of the out-
come of the Paracelsus challenge [129, 130], we are taking the advice of George
Rose [131] and offering only an ‘‘I Met the p-Helix Challenge’’ T-shirt as a prize.
9.4
Helix Coil Theory
Peptides that form helices in solution do not show a simple two-state equilibrium
between a fully folded and fully unfolded structure. Instead they form a complex
mixture of all helix, all coil or, most frequently, central helices with frayed coil
ends. In order to interpret experiments on helical peptides and make theoretical
predictions on helices it is therefore essential to use a helix/coil theory that con-
siders every possible location of a helix within a sequence. The first wave of work
on helix/coil theory was in the late 1950s and early 1960s and this has been re-
viewed in detail by Poland and Scheraga [132]. In 1992, Qian and Schellman
[133] reviewed current understanding of helix/coil theories. Our review covered
the extensive development of helix/coil theory since this date [134].
The simplest way to analyze the helix/coil equilibrium, still occasionally seen, is
the two-state model where the equilibrium is assumed to be between a 100% helix
conformation and 100% coil. This is incorrect and its use gives serious errors. This
is because helical peptides are generally most often found in partly helical confor-
mations, often with a central helix and frayed, disordered ends, rather than in the
fully folded or fully unfolded states.
9.4.1
Zimm-Bragg Model
The two major types of helix/coil model are (i) those that count hydrogen bonds,
principally Zimm-Bragg (ZB) [135] and (ii) those that consider residue conforma-
262 9 Stability and Design of a-Helices
Fig. 9.2. Zimm-Bragg and Lifson-Roig codes and weights for the a-helix.
tions, principally Lifson-Roig [136]. In the ZB theory the units being considered
are peptide groups and they are classified on the basis of whether their NH groups
participate in hydrogen bonds within the helix. The ZB coding is shown in Figure
9.2. A unit is given a code of 1 (e.g., peptide unit 5 in Figure 9.2) if its NH group
forms a hydrogen bond and 0 otherwise. The first hydrogen-bonded unit proceed-
ing from the N-terminus has a statistical weight of ss, successive hydrogen bonded
units have weights of s and nonhydrogen-bonded units have weights of 1. The
s-value is a propagation parameter and s is an initiation parameter. The most
fundamental feature of the thermodynamics of the helix/coil transition is that the
initiation of a new helix is much more difficult than the propagation of an existing
helix. This is because three residues need to be fixed in a helical geometry to form
the first hydrogen bond while adding an additional hydrogen bond to an existing
helix require that only one residue is fixed. These properties are thus captured in
the ZB model by having s smaller than s. The statistical weight of a homopoly-
meric helix of a N hydrogen bonds is ss N1 . The cost of initiation, s, is thus paid
only once for each helix while extending the helix simply multiplies its weight by
one additional s-value for each extra hydrogen bond.
9.4.2
Lifson-Roig Model
In the LR model each residue is assigned a conformation of helix (h) or coil (c),
depending on whether it has helical f; c angles. Every conformation of a peptide
of N residues can therefore be written as a string of N c’s or h’s, giving 2 N con-
formations in total. Residues are assigned statistical weights depending on their
conformations and the conformations of surrounding residues. A residue in an h
conformation with an h on either side has a weight of w. This can be thought of
as an equilibrium constant between the helix interior and the coil. Coil residues
are used as a reference and have a weight of 1. In order to form an i; i þ 4 hydro-
gen bond in a helix three successive residues need to be fixed in a helical confor-
mation. M consecutive helical residues will therefore have M–2 hydrogen bonds.
The two residues at the helix termini (i.e., those in the center of chh or hhc con-
formations) are therefore assigned weights of v (Figure 9.2). The ratio of w to v
gives the approximately the effect of hydrogen bonding (1.7:0.036 for Ala [30]
9.4 Helix Coil Theory 263
Equations for partition functions are determined as follows: Write a table of the
conformations and assigned statistical weights for residues. For the Lifson-Roig
model, the weights of a residue depend on its conformation and the weights of its
two neighboring residues as follows:
ccc 1
cch 1
chc v
chh v
hcc 1
hch 1
hhc v
hhh w
hh hc ch cc
0 1
hh w v 0 0
hc B
B0 0 1 1C
C
M¼ B C
ch @ v v 0 0A
cc 0 0 1 1
For each triplet, the state of the left-most residue is shown at the start of each row
in the matrix. The state of the right-most residue in each triplet is shown at the
end of the top of each column. The state of the center residue of each triplet is
shown as the barred residue in both the rows and columns. When the states of
the center residues differ (i.e., one is c while the other is h), the entry in the matrix
is zero. Otherwise, the matrix gives the weight, taken from the table. This appar-
ently simple change in presentation now (amazingly) allows the generation of the
partition function (Z) for a polypeptide of N residues as:
0 1
0
B1C
B C
Z ¼ ð0 0 1 1 ÞM N B C
@1A
1
The end vectors are present to ensure that the first and last residues in the peptide
cannot have a w-weighting. A w-weighting indicates that the residue is between two
residues that hydrogen bond to each other and this is impossible for terminal resi-
dues. Further extensions to helix/coil theory are dealt with in the same way, by
defining residue weight assignments, rewriting as a matrix and determining end
vectors. The work may be made easier by combining any identical columns. For
9.4 Helix Coil Theory 265
example, the Lifson-Roig matrix above has identical third and fourth columns so
can be simplified to a 3 3 matrix:
hh hc cðh W cÞ
hh 0w v 01
M¼ hc @0 0 1A
cðh W cÞ v v 1
Here, h W c shows the residues being combined. The matrix entries for the com-
bined positions are the entries for the noncombined entries, with zero being dis-
counted if combined with a nonzero entry. New end vectors are now required, as
their lengths must be the same as the order of the square matrix.
9.4.3
The Unfolded State and Polyproline II Helix
9.4.4
Single Sequence Approximation
(i.e., assigned statistical weights of zero). As peptide length increases, the approxi-
mation is no longer valid since multiple helical segments can be long enough to
overcome the initiation penalty. The single sequence approximation will also break
down when a sequence with a high preference for a helix terminus is within the
middle of the chain. The error from using the single sequence approximation will
therefore show a wide variation with sequence and could be potentially serious if a
sequence has a high preference to populate more than one helix simultaneously.
Conformations with two or more helices may also often include helix–helix tertiary
interactions that are ignored in all helix/coil models.
9.4.5
N- and C-Caps
In the original LR model weights are assigned to residues in the center of hhh trip-
lets (a weight of w for propagation) or in the center of chh or hhc triplets (a weight
of v for initiation). Residues in all other triplets have weights of 1. LR-based models
have been extended by assigning weights to additional conformations. N-capping
can therefore be included in LR theory be assigning a weight of n to the central
residue in a cch triplet [99]. Similarly, the C-cap is the first residue in a nonhelical
conformation (c) at the C-terminus of a helix. C-cap weights (c-values) are assigned
to central residues in hcc triplets. Application of the model to experimental data
where the C-terminal amino acid of a helical peptide was varied, allowed the deter-
mination of the c-values and hence free energies of C-capping (as RT ln c) of all
the amino acids [29].
A problem with the original definitions of the capping weights above is that
they apply to isolated h or hh conformations that are best regarded as part of the
random coil. A helical hydrogen bond can only form when a minimum of three
consecutive h residues are present. Andersen and Tong [151] and Rohl et al. [9]
therefore changed the definition of the N-cap to apply only to the c residue in a
chhh quartet. The N-cap residues in chc or chhc conformations have weights of
zero in the Andersen-Tong model and 1 in the Rohl et al. model.
9.4.6
Capping Boxes
The N-terminal capping box [25] includes a side chain–backbone hydrogen bond
from N3 to the N-cap (i; i 3). This is included in the LR model by assigning a
weight of w r to the chhh conformation, where r is the weight for the Ser back-
bone to Glu side chain bond [9].
9.4.7
Side-chain Interactions
As helices have 3.6 residues per turn, side chains spaced i; i þ 3 or i; i þ 4 are close
in space. Side-chain interactions are thus possible when four or five consecutive
9.4 Helix Coil Theory 267
residues are in a helix. They are included in the LR-based model by giving a weight
of w q to hhhh quartets and w p to hhhhh quintets. The side-chain interaction is
between the first and last side chains in these groups; the w weight is maintained
to keep the equivalence between the number of residues with a w weighting and
the number of backbone helix hydrogen bonds [100].
Scholtz et al. [152] used a model based on the one-helical-sequence approx-
imation of the LR model to quantitatively analyze salt bridge interactions in
alanine-based peptides. Only a single interaction between residues of any spacing
was considered, though this was appropriate for the sequences they studied. Sha-
longo and Stellwagen [153] also proposed incorporating side-chain interaction en-
ergies into the LR model, using a clever recursive algorithm.
9.4.8
N1, N2, and N3 Preferences
The helix N-terminus shows significantly different residue frequencies for the N-
cap, N1, N2, N3, and helix interior positions [12, 19, 154, 155]. A complete theory
for the helix should therefore include distinct preferences for the N1, N2, and
N3 positions. In the original LR model, the N1 and C1 residues are both
assigned the same weight, v. Shalongo and Stellwagen [153] separated these as vN
and vC . Andersen and Tong [151] did the same and derived complete scales for
these parameters from fitting experimental data, though some values were tenta-
tive. The helix initiation penalty is vN vC and so vN and vC values are all small
(A 0.04).
We added weights for the N1, N2, and N3 (n1, n2, and n3) positions as follows
[156]: The n1 value is assigned to a helical residue immediately following a coil res-
idue. The penalty for helix initiation is now n1.v, instead of v 2 , as v remains the C1
weight. An N2 helical residue is assigned a weight of n2.w, instead of w. The
weight w is maintained in order to keep the useful definition of the number of res-
idues with a w weighting being equal to the number of residues with an i; i þ 4
main chain–main chain hydrogen bond. The n2 value is an adjustment to the
weight of an N2 residue that takes into account the structures that can be adopted
by side chains uniquely at this position. Similarly, an N3 residue is now assigned
the weight n3.w, instead of w.
9.4.9
Helix Dipole
Helix-dipole effects were added to the LR model by Scholtz et al. [152], though they
used the one-sequence approximation so that only one or no dipoles in total are
present. In LR models helix-dipole effects are subsumed within other energies.
For example, N-cap, N1, N2, and N3 energies will include a contribution from the
helix-dipole interaction so the energy of interaction of charged groups at this posi-
tion with the dipole should not be counted in addition.
268 9 Stability and Design of a-Helices
9.4.10
310 - and p-Helices
The Lifson-Roig formalism can easily be adapted to describe helices of other coop-
erative lengths [157]. The fundamental difference between a 310 -helix and an a-
helix is that the 310 -helix has an i; i þ 3 hydrogen-bonding pattern rather than the
i; i þ 4 pattern characteristic of the a-helix. For a given number of units in helical
conformations, a 310 -helix will consequently have one more hydrogen bond than
an a-helix. To include this difference in the 310 -helix theory, one of the a-helical-
initiating residues (i.e., the central unit of either the h a h a c or the ch a h a triplet)
must become a 310 -helix-propagating residue. We arbitrarily chose to assign the
propagating statistical weight, wt , to the central unit of the h t h t c triplet such that
helix propagating unit i is associated with the hydrogen bond formed between the
CO of peptide i 2 and the NH of peptide i þ 1. The remainder of the statistical
weights applicable to the a-helix/coil theory are maintained.
The models described above for the a-helix/coil and 310 -helix/coil transitions can
be combined to describe an equilibrium including pure a-helices, pure 310 -helices,
and mixed a-/310 -helices [157]. In this model, three conformational states are pos-
sible, 310 -helical (h t ), a-helical (h a ), and coil (c). The pure 310 -helix and the mixed
a-/310 -helix models were subsequently extended to include side-chain interactions
[158]. Sheinerman and Brooks [128] independently produced a model for the
a/310 /coil equilibrium, based on the ZB formalism, rather than the LR model.
They similarly extend the classification of conformations from a/coil to a/310 /coil.
In a p-helix, formation of an i; i þ 5 hydrogen bond requires that four units be
constrained to the p-helical conformation, h p . The p subscript designates the con-
formation and weights describing the p-helix, whose dihedral angles are distinct
from a- and 310 -helices. Assigning statistical weights to individual units requires
consideration of the conformations of the unit itself and its three nearest neigh-
bors [157]. The initiating statistical weight, vp , is assigned to a helical unit when
one or more of its two N-terminal and nearest C-terminal neighbors are in the
coil conformation. The definition of helix-initiating units as the two N-terminal
and one C-terminal units of each helical stretch is again arbitrary. Units in a p-
helical conformation with three helical neighbors are assigned the propagating
statistical weight, wp . A p-helix propagating residue, i, is thus associated with the
hydrogen bond between the NH of residue i þ 2 and the CO of residue i 3.
9.4.11
AGADIR
were excluded [100, 159]. These were corrected in a later version, AGADIRms,
which considers all possible conformations [160]. If AGADIR and LR models are
both applied to the same data, to determine a side-chain interaction energy, for ex-
ample, the results are similar, showing that the models are now not significantly
different [160, 161]. The treatment of the helix/coil equilibrium differs in a num-
ber of respects from the ZB and LR models and these have been discussed in detail
in by Muñoz and Serrano [160]. The minimal helix length in AGADIR is four res-
idues in an h conformation, rather than three. The effect of this assumption is to
exclude all helices which contain a single hydrogen bond; only helices with two or
more hydrogen bonds are allowed. In practice, this probably makes little difference
as chhhc conformations are usually unfavorable and hence have low populations.
Early versions of AGADIR considered that residues following an acetyl at the N-
terminus or preceding an amide at the C-terminus were always in a c conforma-
tion; this was changed to allow these to be helical [162].
The latest version of AGADIR, AGADIR1s-2 [162], includes terms for electro-
statics [162], the helix dipole [162, 163], pH dependence [163], temperature [163],
ionic strength [162], N1, N2, and N3 preferences [164], and capping motifs such
as the capping box, hydrophobic staple, Schellman motif, and Pro-capping
motif [162]. The free energy of a helical segment, DG helical-segment , is given by
DG helical-segment ¼ DGInt þ DGHbond þ DGSD þ DGdipole þ DGnonH þ DGelectrost ,
which are terms for the energy required to fix a residue in helical angles (with
separate terms for N1, N2, N3, and N4), backbone hydrogen bonding, side-chain
interactions excluding those between charged groups, capping, and helix-dipole
interactions, respectively. Electrostatic interactions are calculated with Coulomb’s
equation. Helix-dipole interactions were all electrostatic interactions between the
helix dipole or free N- and C-termini and groups in the helix. Interactions of the
helix dipole with charged groups located outside the helical segment were also
included. pH-dependence calculations considered a different parameter set for
charged and uncharged side chains and their pK a values. The single sequence ap-
proximation (see above) is used again, unlike in AGADIRms. This means that it
must not be used for full protein sequences, though this has been done, even if
they do not have any tertiary interactions.
AGADIR is at present the only model that can give a prediction of helix content
for any peptide sequence, thus making it very useful. It can also predict NMR
chemical shifts and coupling constants. In order to do this it must include esti-
mates of all the terms that contribute to helix stability, notably the 400 possible
i; i þ 4 side-chain interactions. Since only a few of these interactions have been
measured accurately, the terms used cannot be precise. Further determination of
energetic contributions to helix stability is therefore still needed.
9.4.12
Lomize-Mosberg Model
Lomize and Mosberg also developed a thermodynamic model for calculating the
stability of helices in solution [165]. Interestingly, they extended it to consider hel-
ices in micelles or a uniform nonpolar droplet to model a protein core environ-
270 9 Stability and Design of a-Helices
ment. Helix stability in water is calculated as the sum of main chain interactions,
which is the free energy change for transferring Ala from coil to helix, the differ-
ence in energy when replacing an Ala with another residue, hydrogen bonding and
electrostatic interactions between polar side chains and hydrophobic side-chain in-
teractions. An entropic nucleation penalty of two residues per helix is included.
Different energies are included for N-cap, N1–N3, C1–C3, C-cap, hydrophobic
staples, Schellman motifs, and polar side-chain interactions, based on known em-
pirical data at the time (1996). Hydrophobic interactions were calculated from de-
creases in nonpolar surface area when they are brought in contact. Helix stability
in micelles or nonpolar droplets are found by calculating the stability in water then
adding a transfer energy to the nonpolar environment.
9.4.13
Extension of the Zimm-Bragg Model
Following the discovery of short peptides that form isolated helices in aqueous so-
lution, Vásquez and Scheraga extended the ZB model to include helix-dipole and
side-chain interactions [166]. The model is very general as it can include interac-
tions of any spacing within a single helix. It was applied to determine i; i þ 4 and
i; i þ 8 interactions. Long-range interactions, beyond the scope of LR models, can
thus be included. Roberts [167] and Gans et al. [88] also refined the ZB model to
include side-chain interactions.
9.4.14
Availability of Helix/Coil Programs
9.5
Forces Affecting a-Helix Stability
9.5.1
Helix Interior
Since the advent of crystal structures of proteins it had been noticed that some
amino acids appeared frequently in a-helices and others less frequently [168, 169].
9.5 Forces Affecting a-Helix Stability 271
For example alanine and leucine are abundant, whereas proline and glycine appear
rarely. As more of this kind of information became available a helix propensity
scale was derived [170] and eventually allowed prediction of the location of a-helices
(and other structures) in folded proteins from their sequence [171].
Different approaches have been used in order to determine the helical pro-
pensity or preference of individual amino acids. Scheraga and coworkers used a
host–guest strategy (see Section 9.3.1) to derive values for the helical preference
of various amino acid residues [172, 173]. This has been carried out for all 20 nat-
urally occurring amino acids [174]. This work has been criticized as the host side
chains can interact with each other [175]. The introduction of a guest residue thus
removes host–host interactions and replaces them with PHBG–guest or PHPG–
guest side-chain interactions that may obscure the intrinsic helix propensities. We
believe that helix propensities are best evaluated in an alanine background, where
no side-chain interactions can affect the helix stability.
Rohl et al. [30] used many alanine-based peptides with the general sequences Ac-
(AAKAA)m Y-NH2 (or with Q instead of K) to measure interior helix propensities.
Substitutions in the helix interior and subsequent measures of helicity using CD
spectroscopy in both water and 40% (v/v) trifluoroethanol (TFE) allowed both the
calculation of the Lifson-Roig w parameter and stabilization energy for all 20
amino acids (see Table 9.3). Kallenbach and coworkers also used synthetic peptides
Ala 0.00
Argþ 0.21
Leu 0.21
Met 0.24
Lysþ 0.26
Gln 0.39
Glu 0.40
Ile 0.41
Trp 0.49
Ser 0.50
Tyr 0.53
Phe 0.54
Val 0.61
His 0.61
Asn 0.65
Thr 0.66
Cys 0.68
Asp 0.69
Gly 1.00
Pro 3.16
9.5.2
Caps
Serrano and Fersht [20] explored the capping preferences at the N-cap by mutating
Thr residues at the N-cap of two helices in barnase. They found that negatively
274 9 Stability and Design of a-Helices
Glu 1.35
Met 1.2
Leu 1.14
Ile 1.14
Trp 1.11
Phe 1.09
Ala 1.07
Argþ 1.03
Tyr 1.02
Cys 0.99
Gln 0.98
Val 0.95
Lysþ 0.94
His 0.85
Thr 0.82
Asn 0.78
Asp 0.78
Ser 0.76
Gly 0.59
Pro 0.19
charged residues were favored at the N-cap with a rank order of Asp > Thr >
Glu > Ser > Asn > Gly > Gln > Ala > Val. Interestingly, their result conflicted
with the statistical survey result that Asn is one of the most frequently found
N-caps in proteins [12]. Experimentally Asn destabilized the helices by 1.3
kcal mol1 relative to Thr. The rank order of amino acids N-cap preferences in T4
lysozyme was found to be Thr > Ser > Asn > Asp > Val ¼ Ala > Gly [21]. They
suggested that Asn can be inherently as good an N-cap as Ser or Thr, but it re-
quires a change in backbone dihedral angles of N-cap residues, which might be al-
tered in native proteins as the results of tertiary contacts. Indeed Asn is the most
stabilizing residue at N-cap in a peptide model in the absence of tertiary contacts
and other side-chain interactions (see below).
The Kallenbach group [187] substituted several amino acids at the N-cap position
in peptide models in the presence of a capping box. They found that Ser and Arg
are the most stabilizing residues with DDG relative to Ala of 0.74 and 0.58
kcal mol1 , respectively, whilst Gly and Ala are less stabilizing. The results are
in agreement with the results of Forood et al. [23], who found that the trend in a-
helix inducing ability at the N-cap is Asp > Asn > Ser > Glu > Gln > Ala. A more
comprehensive work to determine the preferences for all 20 amino acids at the N-
cap position used peptides with a sequence of NH2 -XAKAAAAKAAAAKAAGY-
CONH2 [22, 29, 99]. N-Capping free energies ranged from Asn (best) to Gln
(worst) (Table 9.6).
9.5 Forces Affecting a-Helix Stability 275
Tab. 9.6. Amino acid propensities at N- and C-terminal positions of the helix.
Residue DDG relative to Ala for transition from coil to the position (kcal molC1 )
N-cap N1 N2 N3 C3 C2 C1 C-cap CO
A 0 0 0 0 0 0 0 0 0 0 0 0 0
C 0.2
C 1.4 1.0 0.9
D 0.5 0.7 0.2 0.3
D 1.6 0 0.2 1.1
E 1.0 0.2 0.4 0.3
E 0.7 0.1 0.4 0.6 0.5
F 0.7 1.4 0.9 1.3 0.6 0.1
G 1.2 1.0 0.7 0.4 0.8 2.1 1.0 0.6 0.4 0.1 1.1
H 0.7 0.7 0.8 2.6
Hþ 0.2 0.9
I 0.5 0.5 0.4 0.6 0.5 0.7 0.5 0.2 0.2 0.4 0.5 1.5
Kþ 0.1 0.7 0.9 0.9 0.1 0.1
L 0.7 0.4 0.2 0.5 0.5 0.8 0.4 0.1 0.1
M 0.3 0.5 0.1 0.7 0.3 0.7 0.4 0.1 0.3 0.1
N 1.7 0.6 1.7 0.7 0.7 0.5 0.7 0.4 0.3 0.1 0.4
P 0.4 0.6 0.5 1.2
Q 2.5 0.5 0.3 0.5 0.3 1.2 0.2 0.2 0.02 0.2 0.05 0.5 0.1
Rþ 0.1 0.7 0.8 0.4 0.2
S 1.2 0.4 0.4 0.7 0.5 1.1 0.6 0.6 0.5 0.7 0.5 0.8 0.3
T 0.7 0.5 0.5 0.5 0.5 1.2 0.6 0.8 0.6 0.5 0.8 1.1
V 0.1 0.6 0.5 0.4 0.4 0.3 0.4 0.7 0.6 0.9 1.6
W 1.3 0.4 0.8 4.0 0.7
Y 0.9 1.2 2.2
a NH
2 -XAKAAAAKAAAAKAAGY-CONH2
b CH
3 CO-XAAAAQAAAAQAAGY-CONH2
c CH
3 CO-XAAAAAAARAAARGGY-NH2
d CH
3 CO-AXAAAAKAAAAKAAGY-CONH2
e CH
3 CO-AXAAAAAARAAARGGY-NH2
f CH
3 CO-AAXAAAAKAAAAKAGY-CONH2
g CH
3 CO-AAXAAAAARAAARGGY-NH2
h NH
2 -YGGSAKEAAARAAAAXAA-CONH2
i Substitution of residue 32 (C2 position) of a-helix of ubiquitin.
j NH
2 -YGGSAKEAAARAAAAAXA-CONH2
k NH
2 -YGGSAKEAAARAAAAAAX-CONH2
l CH
3 CO-YGAAKAAAAKAAAAKAX-COOH
m Substitution 0
of residue 35 (C position) of a-helix of ubiquitin.
We have used a similar approach using peptide models to probe the preferences
at N1 [95], N2 [96], and N3 [97] using peptides with sequences of CH3 CO-XAAAA-
QAAAAQAAGY-CONH2 , CH3 CO-AXAAAAKAAAAKAAGY-CONH2 and CH3 CO-
AAXAAAAKAAAAKAGY-CONH2 , respectively. The results have given N1, N2,
276 9 Stability and Design of a-Helices
and N3 preferences for most amino acids for these positions (Table 9.6) and these
agree well with preferences seen in protein structures, with the interesting excep-
tion of Pro at N1. Petukhov et al. similarly obtained N1, N2, and N3 preferences for
nonpolar and uncharged polar residues by applying AGADIR to experimental heli-
cal peptide data, and found almost identical results [164, 188]. The complete se-
quences of peptides used can be seen in the table footnote. In general, at N1, N2,
and N3, Asp and Glu as well as Ala are preferred, presumably because negative
side chains interact favorably with the helix dipole or NH groups while Ala has
the strongest interior helix preference.
Although it is also unique in terms of the presence of unsatisfied backbone
hydrogen bonds, the C-terminal region is less explored experimentally. The C-
terminus of the a-helix tends to fray more than the N-terminus, making C-terminal
measurements less accurate. Preferences at the C-cap position differ from those at
the N-cap. At the N-terminus, the helix geometry favors side chain-to-backbone hy-
drogen bonding, so polar residues are preferred [14, 19]. At the C-terminus unsa-
tisfied backbone hydrogen bonds are fulfilled by interactions with backbone groups
upstream of the helix. Zhou et al. [48] found that Asn is the most favored residue
at the C-cap followed by Gln > Ser@Ala > Gly@Thr. Forood et al. [23] tested a lim-
ited number of amino acids at the C-terminus (C1) finding a rank order of Arg >
Lys > Ala. Doig and Baldwin [29] determined the C-capping preferences for all
20 amino acids in a-helical peptides. The thermodynamic propensities of some
amino acids at C 0 , C-cap, C1, C2, and C3 are also included in Table 9.6 [189, 190].
9.5.3
Phosphorylation
9.5.4
Noncovalent Side-chain Interactions
Many studies have been performed on the stabilizing effects of interactions be-
tween amino acid side chains in a-helices. These studies have identified a number
of types of interaction that stabilize the helix including salt bridges [83, 86, 88, 152,
194–198], hydrogen bonds [152, 198–200], hydrophobic interactions [100, 201–
9.5 Forces Affecting a-Helix Stability 277
9.5.5
Covalent Side-chain interactions
Lactam (amide) bonds formed between NH3 þ and CO2 side chains can stabilize a
helix, acting in a similar way to disulfide bridges in a protein by constraining the
side chains to be close, reducing the entropy of nonhelical states [205]. Lactam
bridges between Lys-Asp, Lys-Glu, and Glu-Orn spaced i; i þ 4 have been intro-
duced into analogs of human growth hormone releasing factor [206], and proved
to be stabilizing with Lys-Asp most effective. The same Lys-Asp i; i þ 4 lactam was
stabilizing in other helical peptide systems [207–210], while Lys-Glu i; i þ 4 lactam
bridges were less effective [211]. Two overlapping Lys-Asp lactams were even more
stabilizing [212]. The effect of the ring size formed by the lactam was investigated
by replacing Lys with ornithine or (S)-diaminopropionic acid. A ring size of 21 or
22 atoms was most stabilizing (a Lys-Asp i; i þ 4 lactam is 20 atoms) [206]. Lactams
between side chains spaced i; i þ 7 [213, 214] or i; i þ 3 [214]; [215], spanning two
or one turns of the helix have also been reported. i; i þ 7 disulfide bonds have been
introduced into alanine-based peptides, using (d)- and (l)-2-amino-6-mercaptohex-
anoic acid derivatives [216]. Strongly stabilizing effects were observed.
Some interesting recent work has shown that helix formation can be reversibly
photoregulated. Two cysteine residues are cross-linked by an azobenzene derivative
which can be photoisomerized from trans to cis, causing a large increase or de-
crease in the helix content of the peptide, depending on its spacing [217–219].
9.5.6
Capping Motifs
ently in aqueous solution but it is formed in 30% TFE [223]. This might be due to
the C-terminus being very frayed and the increase of helical content contributed
from this motif is small. Energetically this motif is not very favorable due to the
entropic cost of fixing a Gly residue at the position C 0 . The Schellman motif is be-
lieved to be a consequence of helix formation and does not involve a-helix nuclea-
tion [224]. The aL motif seems to be more stable than the alternative Schellman
motif [221].
9.5.7
Ionic Strength
Electrostatic interactions between charged side chains and the helix macrodipole
can stabilize the helix [92, 102, 225]. The interactions are potentially quite strong,
but are alleviated by the screening effects of water, ions, and nearby protein atoms.
In theory, increasing ionic strength of the solvent (up to 1.0 M) should stabilize the
helix through interactions with a-helix dipole moments by shifting the equilibrium
between a-helix and random coil, which has a random orientation of the peptide
dipoles [226]. The energetics of the interaction between fully charged ion pairs
can be diminished by added salt and completely screened at 2.5 M NaCl [197,
227]. In peptides containing side chain-to-side-chain interactions, the effect of ion
pairs and charge/helix-dipole interactions cannot be clearly separated. There are,
however, indications that the interactions of charged residues with the helix macro-
dipole are less affected than those between charged side chains [227, 228]. In
coiled-coil peptides, salt also affects hydrophobic residues by strengthening their
interactions at the coiled-coil interface. This can be explained through alterations
of the peptide–water interactions at high salt concentration. However, this requires
a strong kosmotropic anion to accompany the screening cation [229].
9.5.8
Temperature
Thermal unfolding experiments show that the helix unfolds with increasing tem-
perature [230–232]. There is no sign of cold denaturation, as seen with proteins.
Enthalpy and entropy changes for the helix/coil transition are difficult to deter-
mine as the helix/coil transition is very broad, precluding accurate determination
of high- and low-temperature baselines by calorimetry [230]. Nevertheless, isother-
mal titration calorimetric studies of a series of peptides that form helix when bind-
ing a nucleating La 3þ , find DH for helix formation to be 1.0 kcal mol1 [135,
233], in good agreement with the earlier work.
9.5.9
Trifluoroethanol
Peptides with sequences of helices in proteins usually show low helix contents in
water. An answer to this problem is to add TFE (2,2,2-trifluoroethanol) to induce
280 9 Stability and Design of a-Helices
helix formation [234–237]. For many peptides, the concentration of TFE used to
increase the helix content is only up to 40% [234–236, 238, 239]. TFE may act by
shielding CO and NH groups from the water solvent while leading to hydrogen
bond formation between them. The conformational equilibrium thus shifts toward
more compact structures, such as the a-helical conformation [184]. The mecha-
nism involves interaction between TFE and water with several interpretations.
One view suggests that TFE indirectly disrupts the solvent shell on a-helices [240,
241]. Another view proposes that TFE destabilizes the unfolded species and there-
by indirectly enhances the kinetics and thermodynamics of folding of the coiled
coil [242]. A more compromising view suggests that TFE forms clusters in water
solution, which at lower concentration pulls the water molecules from the surface
of proteins. At higher concentration, TFE clusters associate with appropriate hydro-
phobic side chains reducing their conformational entropy and switch the confor-
mation at TFE concentration > 40% [243].
The propagation propensities of all amino acids increase variably in 40% TFE
relative to water. The propagation propensities of the nonpolar amino acids in-
crease greatly in 40% TFE whilst other amino acids propensity increase uniformly.
However, glycine and proline are strong helix breakers in both in water and 40%
TFE solvents [30]. In addition, 40% TFE dramatically alters electrostatic (and polar)
interactions and increases the dependence of helix propensities on the sequence
[244].
9.5.10
pKa Values
9.5.11
Relevance to Proteins
Many of the features studied in peptide helices are also applicable to proteins and
can be used to rationally modify protein stability or to design new helical proteins.
Helices in proteins are often found on the surface with one face exposed to solvent
and the other buried in the protein core. Helix propensities and side-chain interac-
tions measured in peptides are thus directly applicable to the solvent-exposed face.
Substitutions at buried positions are much more complex and tertiary interactions
also make major contributions to stability. Tertiary interactions at helix termini are
rare; nearly all side-chain interactions are local [14]. Preferences for capping sites
and the first and last turn of the helix are therefore applicable to most protein he-
lices. The feature of protein helices of amphiphilicity, reflected in possession of
a hydrophobic moment [249], is irrelevant to monomeric isolated helices.
9.6
Experimental Protocols and Strategies
9.6.1
Solid Phase Peptide Synthesis (SPPS) Based on the Fmoc Strategy
Solid phase synthesis was first described by Merrifield [250]. It depends on the at-
tachment of the C-terminal amino acid residue to a solid resin, the stepwise addi-
tion of protected amino acids to a growing peptide chain covalently bound to the
resin and finally the cleavage of the completed peptide from the solid support.
This cyclical process is repeated until the chain assembly process is completed
(Figure 9.3).
To select the appropriate resin for the synthetic target, several key features of the
linker have to be taken into consideration, for example racemization-free synthetic
attachment of the first amino acid residue and resulting C-terminal functionality
[251]. A key choice is whether to have a C-terminal amide or carboxyl group, as
this is determined by the resin.
The Fmoc (9-fluorenylmethoxycarbonyl) group is the a-amino-protecting group,
which is widely used in solid state peptide synthesis for its adaptability to linkers,
resin, and cleavage chemistries [252, 253]. Other amino-protecting groups, e.g.,
BOC (t-butyloxycarbonyl), are also commonly used, but only the strategy using
Fmoc will be covered here. The procedure can be done either manually or automat-
ically in a peptide synthesizer. Here we describe the procedure of manual peptide
synthesis using Fmoc chemistry on a 0.1 mmol scale. There are many other meth-
ods in peptide synthesis using different procedures and chemicals [251].
8 Rink amide resin 100–200 mesh (other resins can be used depending on the
C-terminal functionality required, for example Wang resin is used to give C-
terminal carboxyl).
8 Deprotecting agent: 20% piperidine in DMF (dimethylformamide).
8 Activating/coupling agents:
0.45 M HBTU (2-(1H-benzotriazole-1-yl)-1,1,3,3-tetramethyluronium hexa-
fluoro phosphate) dissolved in a solution of HOBt (1-hydroxy-1H-benzotriazole)
in DMF. HATU, HCTU or TBTU are popular alternatives
2 M DIEA (N,N,diisoprophylethylamine)
9.6 Experimental Protocols and Strategies 283
8 NMP (N-methylpyrrolidone)
8 DCM (dichloromethane)
8 Diethyl ether
8 Pyridine
8 Acetic anhydride
8 TFA (trifluoroacetic acid)
8 TIS (triisopropyl silane) as scavengers (the choice of scavenger depends on pep-
tide sequence)
8 Fmoc-protected amino acids. Certain side chains contain functional groups that
can react with free amine groups during the formation of the amide bond.
Therefore, it is important to mask the functional groups of the amino acid side
chain. The following side chain protecting groups are compatible with TFA cleav-
age (see Section 9.6.1.4):
0:1 mmol
¼ 0:15 g of resin
0:67 mmol
activate the carboxyl group of the amino acid and couple it to the resin. The first
amino acid to use is the one at the C-terminus. Use a 10 molar excess of the
amino acid compared with the resin, for example 0.300 g of protected amino
acid with MW of 300 g mol1 to give 10 mmol of amino acid.
6. Transfer the solution to unprotected resin in the funnel. Leave it for 0.5–3 h,
depending on the difficulty of coupling (note: b-branched amino acids some-
times need longer coupling time, i.e., overnight). Swirl the mixture frequently.
7. Repeat step 4.
8. If necessary, do a Kaiser test (see below), a quantitative ninhydrin procedure
[254, 255], on a small amount of the peptide resin to check the completeness
of the coupling process. Wash the beads with methanol before conducting the
Kaiser test.
Continue cyclical process (repeat steps 3–8) of Fmoc deprotection and coupling un-
til the last residue left with a free N-terminus.
1. Add 2–3 mL methanol and 2–3 drops acetic acid to a test tube, and dry using a
vacuum evaporator.
2. Add a few resin beads to the test tube.
3. Add 4 drops phenol/ethanol (80 g in 20 mL), 8 drops KCN/pyridine (1 mM di-
luted with 100 mL), and 4 drops ninhydrin/ethanol (500 mg in 10 mL). Do the
same to a test tube without resin as a blank.
4. Incubate at 100 C for 5 min.
5. After heating is completed, remove the tube and immediately add 4.8 mL of
60% ethanol for a final volume of 5 mL, vortex and let the beads settle to the
bottom of the tube.
6. Pipette the sample into cuvette and read the absorbance of the sample (includ-
ing the blank) at 570 nm. Use 60% ethanol to zero the spectrophotometer.
7. Calculate the mmol g1 of amine as follows:
For an easier qualitative observation, the beads remain white and the solution yel-
low after heating (step 4) indicates a complete reaction. In contrast, a dark blue
color on the beads and in the solution indicates an incomplete reaction (positive
test). If free amino groups remain, the resin should be subjected again to coupl-
ing reaction (repeat Fmoc deprotection and coupling procedure using the same
amino acid) or acetylation (see Section 9.6.1.4) to increase the purity of the final
product.
9.6.2
Peptide Purification
9.6.2.2 Method
1. Wash the column thoroughly by running through 95% acetonitrile for at least
20 min.
2. When using a UV detector, set the detector wavelength at 214 nm.
3. Equilibrate the column with 5% H2 O for 10 min before the purification
run.
4. Load the crude peptide sample dissolved in deionized water and start a linear
gradient run. The amount of peptide loaded depends on its concentration. A
typical run usually consist of two eluents which are mixed using a linear gradi-
ent with a flow rate which will give a 0.5% to 1.0% change per minute.
5. From the chromatogram obtained, optimize the conditions for further purifi-
cation. This allows the change in flow rate and eluents composition across the
elution time. If necessary, the collected fractions can be further purified using
isocratic conditions.1
6. Inject sample again and start a new linear gradient run with optimized condi-
tions.
1) For example, if a peptide elutes out at 30% isocratic run (e.g., 28% (0.1% TFA) aqueous
aqueous acetonitrile (0.1% TFA) (from linear acetonitrile). The peptide peak will come out
gradient run), a lower concentration of 4–5 min after the solvent peak under isocratic
acetonitrile is chosen as the eluent for conditions.
9.6 Experimental Protocols and Strategies 287
9.6.3
Circular Dichroism
Circular dichroism (CD) is a very useful technique for obtaining secondary struc-
tural information on proteins and peptides. This area has been extensively re-
viewed in the past two decades in relation to both folding and secondary structure
determination. Greenfield [256] has reviewed methods of obtaining structural in-
formation from CD data. This covers the advantages and pitfalls of each technique
and experimental considerations, such as how the wavelength range of data acqui-
sition affects the precision of conformation determination, how precisely must the
protein concentration be determined for each method to give reliable answers and
what computer resources are necessary to use each method. Manning [257] has re-
viewed various applications of CD spectroscopy to the study of proteins including
analysis of denaturation curves, calculation of secondary structure composition, ob-
servation of changes in both secondary and tertiary structure and characterization
of folding intermediates. The excellent book Circular Dichroism and the Conforma-
tional Analysis of Biomolecules [258] covers a wide range of applications of CD in de-
tail, including a review of CD applied to a-helices by Kallenbach and coworkers [8].
In CD, the sample is illuminated with left and right circularly polarized light and
the difference in absorbance of each component of the light is measured. Typically
this is expressed as change in absorbance (DA) or as the ellipticity of the beam ex-
iting the sample (e, in millidegrees). This signal is proportional to the concentra-
tion of the sample, the optical path length of the cell or cuvette and the proportion
of the sample exhibiting chiral behavior.
2) The crude peptide obtained from the Sonication can be applied if necessary to
synthesis will contain many by-products increase the rate of dissolution. If the peptide
which are a result of deletion or truncated is still insoluble (e.g., peptides containing
peptides as well as side products stemming multiple hydrophobic amino acid residues),
from cleaved side chains or oxidation during addition of a small amount of dilute
the cleavage and deprotection process. (approximately 10%) acetic acid (for basic
3) The peptides mass can be verified by low peptides) or aqueous ammonia (for acidic
resolution, electrospray ionization (ESI) or peptides) can facilitate dissolution of the
MALDI mass spectrometry. ESI allows the peptide. For long-term storage of peptides,
peptide sample, which is dissolved in lyophilization is highly recommended.
acetonitrile/water (0.1% TFA) after HPLC Lyophilized peptides can be stored for years
purification, to be directly analyzed without at temperatures of 20 C or lower with
further treatment. For small molecules little or no degradation. Peptides in
(< 2000 Da) ESI typically generates singly or solution are much less stable. Peptides
doubly charged gaseous ions, while for large containing methionine, cysteine, or
molecules (> 2000 Da) the ESI process tryptophan residues can have limited storage
typically gives rise to a series of multiply time in solution due to oxidation. These
charged species. peptides should be dissolved in oxygen-free
4) It is recommended to first dissolve the solvents or stored with a reducing agent such
peptide in sterile distilled or deionized water. as DTT.
288 9 Stability and Design of a-Helices
Two conditions are necessary for obtaining a circular dichroism signal: The sam-
ple must be optically active or chiral and there must be a chromophore near to the
chiral center.
9.6.4
Acquisition of Spectra
Scanning parameters
The criteria for selecting scanning parameters for CD spectrophotometers are es-
sentially those for more common UV-visible spectrophotometers. There are five ba-
sic selections to be made:
9.6.5
Data Manipulation and Analysis
CD data are usually expressed as an ellipticity (y) or delta absorbance (DA). These
are easily interconvertable as DA ¼ y=32980. Ellipticity is also converted to molar
ellipticity ([y]) as follows:
½y ¼ y=ð10 C lÞ
½y ¼ yM=ð1000 c lÞ
Fig. 9.4. Typical a-helical spectrum from CD min with 4 accumulations (averages). Model
spectroscopy. Measured using a Jasco J-810 peptide – at 90 mM in 1 mm path length cell.
spectrophotometer at 0.1 C, data resolution Sequence: Ac-AKAAAAKAAAASAAAAKAAGY-
0.2 nm, Bandwidth 1 nm, scan speed 20 nm/ NH2 .
290 9 Stability and Design of a-Helices
can be used for this purpose. The equation below is a long-established equation for
this purpose.
fH ¼ ðyOBS yC Þ=ðy H yC Þ
Here fH is the average fraction of helix content, yC is the mean residue ellipticity
when the peptide is 100% coil, y H is the mean residue ellipticity when the peptide
is 100% helical and yOBS is the observed mean residue molar ellipticity from CD
measurements. yC and y H can be calculated from empirical formulae below de-
rived by Luo and Baldwin [259] where T is temperature ( C) and Nr is the chain
length in residues. The value of 3 in the length correction (1–3/Nr ) is the number
of backbone CO groups in a carboxyamidated peptide that are not hydrogen
bonded.
yC ¼ 2220 53T
y H ¼ ð44000 þ 250TÞð1 3=Nr Þ
cited state of the peptide. With model peptides the nonaromatic residues are com-
monly modeled as alanine to simplify the calculations (see, e.g., Andrew et al.
[106]).
Other methods for estimation of secondary structure content involve deconvolu-
tion of entire CD spectra. These methods are based upon data from high-quality
structure files for which secondary structure content is accurately known or use
iterative fitting methods.
Dichroprot (available at http://dicroprot-pbil.ibcp.fr/Documentation_dicroprot.
html) is a commonly used program using deconvolution/fitting analysis. It is
based mainly upon the work of Sreerama and Woody [264]. They derived a self-
consistent procedure fitting sums of spectra of different secondary structures to
the experimental spectrum to obtain estimates of secondary structure content.
9.6.6
Aggregation Test for Helical Peptides
9.6.6.2 Method
1. Record UV CD spectra of the buffer.
2. Record UV CD spectra of each peptide.
3. Subtract the average buffer reading from the average peptides readings.
4. Convert the unit of ellipticity (mdeg) to mean residual ellipticity.
5. Aggregation does not occur if the mean residual ellipticities of the peptides are
independent of concentration.
9.6.7
Vibrational Circular Dichroism
Circular dichroism can also be performed in the vibrational region of the spec-
trum. In vibrational CD (VCD) the characteristic transitions of the amide group
are resolved well from most other transition types [265]. VCD has important prop-
erties due to the local character of vibrational excitations. Overlap with aromatic
modes is avoided as aromatic side chains are locally nonchiral and do not couple
strongly to amides. Vibrational couplings are also short-range in nature, mostly to
the next residue and dipole moments are not strong. This leads to more local struc-
ture sampling than in conventional CD [258, 266]. For peptides, empirical correla-
tions between secondary structures and bandshapes were first developed over 20
years ago, mainly for peptides in nonaqueous solvents. A key determination was
that the amide band VCD depended on peptide chain conformation and not on
other functional groups. The main bandshape criterion was the helical chirality of
the backbone [267]. Proteins with a high a-helical content tend to have negative
signals in the amide II band with maximum negativity at around 1515–1520
cm1 . As b-sheet content increases a three-featured pattern (þ þ) appears in
the amide I band and amide II intensity drops [268].
VCD equipment is now available commercially [269] though most measure-
ments have been performed with modified dispersive or FTIR absorbance spec-
trometers. The commercial instruments give high-quality data for nonaqueous
solutions of small molecules but aqueous solution work is highly demanding, es-
pecially in determining baselines and related artifacts [265]. Improvements are
continually being made in these instruments in terms of polarization modulation
and birefringence effects [269, 270] and so data continue to improve.
9.6.8
NMR Spectroscopy
The full theory and practise of NMR spectroscopy is well beyond the scope of this
review. Many reviews of this technique and its application to secondary structure
determination exist already. Secondary structure determination by NMR techni-
ques does not require a full three-dimensional structural analysis as does X-ray
crystallography. Knowledge of the amide and a proton chemical shifts are in prin-
ciple all that is necessary, although if this information is available it is likely that
nearly complete assignments of side chain protons are also available. While obtain-
9.6 Experimental Protocols and Strategies 293
ing the sequential resonance assignments is a laborious task, the NMR method is
perhaps the most powerful method of secondary structure determination without a
crystal structure. Unlike CD and other kinds of spectroscopy, NMR secondary
structure determination does not give an average secondary structure content but
assigns secondary structures to different parts of the peptide chain.
Several parameters from NMR are needed for unambiguous structure as-
signments. A mixture of coupling constants, nuclear Overhauser effect (NOE) dis-
tances, amide proton exchange rates and chemical shifts may be necessary.
Coupling constants are through-bond or scalar (often called J-coupling) quan-
tities, J-couplings between pairs of protons separated by three or fewer covalent
bonds can be measured. The value of a three-bond J-coupling constant contains in-
formation about the intervening torsion angle. Unfortunately in general, torsion
angles cannot be unambiguously determined from such information since as
many as four different torsion angle values correlate with a single coupling con-
stant value. Similar relationships can be determined between the three-bond cou-
pling constant between the a proton and the b proton(s) yielding information on
the value of the side chain dihedral angle w1. Constraints on the dihedral angles f
and w1 are important structural parameters in the determination of protein three-
dimensional structures by NMR.
The three-bond coupling constant between the intraresidual a and amide pro-
tons ( 3 JHaHN) is the most useful for secondary structure determinations as it can
be directly related to the backbone dihedral angle f (Table 9.8). Unfortunately there
is no three-bond proton coupling that can be related to the angle c.
dues, and separated by four or more residues in the polypeptide sequence. A net-
work of these short interproton distances form the backbone of three-dimensional
structure determination by NMR.
A number of short (< 5 Å) distances are fairly unique to secondary structural ele-
ments. For example, a-helices are characterized by short distances between certain
protons on sequentially neighboring residues (e.g., between backbone amide pro-
tons, dNN, as well as between b protons of residue i and the amide protons of res-
idue i þ 1, dBN). Helical conformations result in short distances between the a
proton of residue i and the amide proton of residues i þ 3 and to a lesser extent
i þ 4 and i þ 2. These i þ 2; i þ 3, and i þ 4 NOEs are collectively referred to as me-
dium range NOEs while NOEs connecting residues separated by more than five
residues are referred to as long-range NOEs. Extended conformations (e.g., b-
strands), on the other hand, are characterized by short sequential, dAN, distances.
The formation of sheets also results in short distances between protons on adja-
cent strands (e.g., dAA and dAN).
13
9.6.8.3 C NMR
Spera and Bax [272] have used 13 C NMR to study secondary structure assignment,
helix distribution, capping parameters, and thermal transitions of helical peptide
systems. Spera and Bax derived an empirical relationship between backbone con-
formation and in combination with high-resolution X-ray structures for which 13 C
chemical shift assignments were available. Stellwagen and coworkers [273–275]
9.6 Experimental Protocols and Strategies 295
9.6.9
Fourier Transform Infrared Spectroscopy
For a-helical structures the mean frequency was found to be 1652 cm1 for the
amide I and 1548 cm1 for the amide II absorption [283]. The half-width of the
a-helix bands depends on the stability of the helix. For the most stable helices, the
half-width of about 15 cm1 corresponds with a helix/coil transition free energy of
more than 300 cal mol1 . Other a-helices display half-widths of 38 cm1 and helix/
coil transition free energies of about 90 cal mol1 . The pioneering work of Byler
and Susi [284] on spectral deconvolution has allowed much more detailed analysis
of FTIR spectra in terms of secondary structure assignment.
There is a wealth of information that can be used to derive structural informa-
tion by analyzing the shape and position of bands in the amide I region of the
spectrum. The presence of a number of amide I band frequencies have been corre-
lated with the presence of a-helical, antiparallel and parallel b-sheets and random
coil structures [285–287]. It is now well accepted that absorbance in the range
from 1650 to 1658 cm1 is generally associated with the presence of a-helix in
aqueous environments. Precise interpretation of bands in this region are difficult
because there is significant overlap of the helical structures with random struc-
tures. One way to resolve this issue is to exchange the hydrogen from the peptide
NaH with deuterium. If the protein contains a significant amount of random
structure, the HaD exchange will result in a large shift in the position of the ran-
dom structure (will now absorb at around 1646 cm1 ) and only a minor change in
the position of the helical band. Beta-sheet vibrations have been shown to absorb
between 1640 and 1620 cm1 (though there is considerable disagreement on as-
signments for parallel and antiparallel b-sheets). Absorbances centered at 1670
cm1 have been assigned to b-turns, or simply ‘‘turns’’ by some investigators.
9.6.10
Raman Spectroscopy and Raman Optical Activity
Raman spectroscopy is based on the Raman effect, which is the inelastic scattering
of photons by molecules. The Indian physicist, C. V. Raman, discovered the effect
in 1928 [288]. The Raman effect comprises a very small fraction, about 1 in 10 7 , of
the incident photons. In Raman scattering, the energies of the incident and scat-
tered photons are different. Raman spectroscopy is another vibrational mode tech-
nique as the incoming photon promotes a vibrational oscillation from the ground
state to a virtual, nonstationary state or, if the vibration is already in the virtual
state, it can cause relaxation to the ground state. Raman spectroscopy comes in var-
ious guises such as difference, Fourier transform and resonance Raman spectros-
copy, details of which can be found elsewhere [289, 290]. As with FTIR spectros-
copy, we are measuring and assigning vibration bands with these techniques and
it is the amide bands from peptide bonds that give most information regarding sec-
ondary structure (see Tables 9.9 and 9.10 for details). It is the precise positions of
the amide I and III bands that are sensitive to secondary structure [291]. Correla-
tions between these frequencies are widely noted in the literature (see, for exam-
ple, Ref. [292]). Table 9.10 gives some examples of helix-related bands.
The amide III vibration and the Ca H bending vibration in UV Raman resonance
spectroscopy are very sensitive to the amide backbone configuration [293]. The var-
9.6 Experimental Protocols and Strategies 297
ious contributing signals are mixed when c is around 120 , i.e., for ‘‘random coil’’
and b-sheet conformations, but are unmixed at c values around 60 (a-helix confor-
mation). This may allow the extraction of amide c angles from UV resonance
Raman spectra [293].
Raman optical activity (ROA) measures vibrational optical activity. This is due to
the small difference in the intensity of Raman scattering from chiral molecules in
right and left circularly polarized incident light [142]. The technique is routinely
applicable to biomolecules in aqueous environments but requires relatively large
amounts of sample (typically 20–100 mg mL1 ) [142]. It can provide new informa-
tion about solution dynamics as well as structure which is complementary to that
obtained from more conventional optical spectroscopic techniques described above.
Reviews of the theory of ROA can be found in the literature [294, 295] and are be-
yond the scope of this review.
ROA gives useful information regarding protein/peptide secondary structure in
aqueous solution as the signals in the spectrum tend to be dominated by bands
from the peptide backbone. Additionally it is sensitive to dynamic aspects of the
structure and so can give information on mobile regions and nonnative states.
Side chain bands are also present in the spectra though much less prominently.
The extended amide III band [296, 297] is very useful in ROA as it is very sensitive
to geometry, being composed of coupling between NaH and Ca -H deformations. A
good description and review of secondary structure assignments from ROE spectra
is given by Barron et al. [142].
9.6.11
pH Titrations
9.6.11.2 Method
1. Record UV CD spectra of the buffer at 222 nm.
2. Record UV CD spectra of the peptide at the starting pH. Typical parameters of
CD measurement are as in Aggregation test protocol.
3. Record UV CD spectra after pH adjustments that are made by the addition of
either NaOH or HCl. Record volume of acid or alkali added.
4. Subtract the average buffer reading from the average peptides readings.
5. Convert the unit of ellipticity (mdeg) to mean residual ellipticity. Note the con-
centration will decrease as acid or alkali is added.
6. Plot[y]222 versus pH.
7. Fit the data to a Henderson-Hasselbach equation depending on the number of
titrable groups in the peptide. For one apparent pK a , the value is given by:
1 1
½y222nm ¼ ½y222; high_pH 1 þ ½y 222; low_pH
1 þ 10 pHpKa 1 þ 10 pHpKa
where ½y222; highpH and ½y222; lowpH are the molar ellipticities measured at 222 nm
at the titration end points at high and low pH.
The equation for two different pK a values is:
1 1
½y222nm ¼ ½y222; mid_pH 1 þ ½y222; low_pH
1 þ 10 pHpKa1 1 þ 10 pHpKa1
1
þ ½y222; ðmid_pHhigh_pHÞ 1
1 þ 10 pHpKa2
where, pK a 1 and pK a 2 are the pK a values measured for the acid–base equilibrium
at low and high pH, respectively. ½y222; highpH ; ½y222; midpH , and ½y222; lowpH are the
molar ellipticities measured at 222 nm at the titration end points at high, mid,
and low pH. ½y222mid_pHhigh_pH is the change in molar ellipticity associated with
pK a 2.
References 299
Acknowledgments
We thank all our coworkers in this field, namely Avi Chakrabartty, Carol Rohl,
Buzz Baldwin, Tod Klingler, Ben Stapley, Jim Andrew, Eleri Hughes, Simon Penel,
Duncan Cochran, Nicoleta Kokkoni, Jia Ke Sun, Jim Warwicker, Gareth Jones, and
Jonathan Hirst. Current work in our lab on helices is supported by the Wellcome
Trust (grant references 057318 and 065106). TMI thanks the Indonesian govern-
ment for a scholarship.
References
1 Pauling, L., Corey, R. B., and (1999). Forming stable helical peptides
Branson, H. R. (1951). The structure using natural and artificial amino
of proteins: two hydrogen-bonded acids. Tetrahedron 55, 11711–43.
helical configurations of the poly- 11 Serrano, L. (2000). The relationship
peptide chain. Proc. Natl Acad. Sci. between sequence and structure in
USA 37, 205–211. elementary folding units. Adv. Protein
2 Perutz, M. F. (1951). New X-ray Chem. 53, 49–85.
evidence on the configuration of 12 Richardson, J. S. and Richardson,
polypeptide chains. Nature 167, 1053– D. C. (1988). Amino acid preferences
1054. for specific locations at the ends of a
3 Kendrew, J. C., Dickerson, R. E., helices. Science 240, 1648–52.
Strandberg, B. E. et al. (1960). 13 Presta, L. G. and Rose, G. D. (1988).
Structure of myoglobin. Nature 185, Helix signals in proteins. Science 240,
422–427. 1632–41.
4 Barlow, D. J. and Thornton, J. M. 14 Doig, A. J., MacArthur, M. W.,
(1988). Helix geometry in proteins. Stapley, B. J., and Thornton, J. M.
J. Mol. Biol. 201, 601–19. (1997). Structures of N-termini of
5 Scholtz, J. M. and Baldwin, R. L. helices in proteins. Protein Sci. 6, 147–
(1992). The mechanism of a-helix 55.
formation by peptides. Annu. Rev. 15 MacGregor, M. J., Islam, S. A., and
Biophys. Biomol. Struct. 21, 95–118. Sternberg, M. J. E. (1987). Analysis
6 Baldwin, R. L. (1995). a-helix of the relationship between side-chain
formation by peptides of defined conformation and secondary structure
sequence. Biophys. Chem. 55, 127–35. in globular proteins. J. Mol. Biol. 198,
7 Chakrabartty, A. and Baldwin, R. L. 295–310.
(1995). Stability of a-helices. Adv. 16 Swindells, M. B., MacArthur,
Protein Chem. 46, 141–76. M. W., and Thornton, J. M. (1995).
8 Kallenbach, N. R., Lyu, P., and Intrinsic f; c propensities of amino
Zhou, H. (1996). CD spectroscopy acids derived from the coil regions of
and the helix-coil transition in peptides known structures. Nat. Struct. Biol. 2,
and polypeptides. In Circular Dichroism 596–603.
and the Conformational Analysis of 17 Dunbrack, R. and Karplus, M.
Biomolecules G. D. Fasman, editor, (1994). Conformational analysis of
Plenum Press, New York, 201–59. the backbone-dependent rotamer
9 Rohl, C. A. and Baldwin, R. L. preferences of protein side-chains.
(1998). Deciphering rules of helix Nat. Struct. Biol. 1, 334–9.
stability in peptides. Meth. Enzymol. 18 Dunbrack, R. L. (2002). Rotamer
295, 1–26. libraries in the 21(st) century. Curr.
10 Andrews, M. J. I. and Tabor, A. B. Opin. Struct. Biol. 12, 431–40.
300 9 Stability and Design of a-Helices
19 Penel, S., Hughes, E., and Doig, the amino acids measured in alanine-
A. J. (1999). Side-chain structures in based peptides in 40 volume percent
the first turn of the a-helix. J. Mol. trifluoroethanol. Protein Sci. 5, 2623–
Biol. 287, 127–43. 37.
20 Serrano, L. and Fersht, A. R. (1989). 31 Schellman, C. (1980). The aL
Capping and a-helix stability. Nature conformation at the ends of helices.
342, 296–9. Protein Folding 53–61.
21 Bell, J. A., Becktel, W. J., Sauer, U., 32 Aurora, R., Srinivasan, R., and
Baase, W. A., and Matthews, B. W. Rose, G. D. (1994). Rules for a-helix
(1992). Dissection of helix capping termination by glycine. Science 264,
in T4 lysozyme by structural and 1126–30.
thermodynamic analysis of six amino 33 Aurora, R. and Rose, G. D. (1998).
acid substitutions at Thr 59. Helix capping. Protein Sci. 7, 21–38.
Biochemistry 31, 3590–6. 34 Prieto, J. and Serrano, L. (1997).
22 Chakrabartty, A., Doig, A. J., and C-capping and helix stability: the Pro
Baldwin, R. L. (1993). Helix capping C-capping motif. J. Mol. Biol. 274,
propensities in peptides parallel those 276–88.
in proteins. Proc. Natl Acad. Sci. USA 35 Siedlecka, M., Goch, G., Ejchart,
90, 11332–6. A., Sticht, H., and Bierzynski, A.
23 Forood, B., Feliciano, E. J., and (1999). a-Helix nucleation by calcium-
Nambiar, K. P. (1993). Stabilization of binding peptide loop. Proc. Natl Acad.
a-helical structures in short peptides Sci. USA 96, 903–8.
via end capping. Proc. Natl Acad. Sci. 36 Jernigan, R., Raghunathan, G., and
USA 90, 838–42. Bahar, I. (1994). Characterisation of
24 Dasgupta, S. and Bell, J. A. (1993). interactions and metal-ion binding-
Design of helix ends. Amino acid sites in proteins. Curr. Opin. Struct.
preferences, hydrogen bonding and Biol. 4, 256–63.
electrostatic interactions. Int. J. Pept. 37 Alberts, I. L., Nadassy, K., and
Protein Res. 41, 499–511. Wodak, S. J. (1998). Analyisis of zinc
25 Harper, E. T. and Rose, G. D. (1993). binding sites in protein crystal
Helix stop signals in proteins and structures. Protein Sci. 7, 1700–16.
peptides: the capping box. Biochemistry 38 Ghadiri, M. R. and Choi, C. (1990).
32, 7605–9. Secondary structure nucleation in
26 Seale, J. W., Srinivasan, R., and peptides – transition-metal ion
Rose, G. D. (1994). Sequence stabilized a-helices. J. Am. Chem. Soc.
determinants of the capping box, a 112, 1630–2.
stabilizing motif at the N-termini of a- 39 Kise, K. J. and Bowler, B. E. (2002).
helices. Protein Sci. 3, 1741–5. Induction of helical structure in a
27 Muñoz, V. and Serrano, L. (1995). heptapeptide with a metal cross-link:
The hydrophobic-staple motif and a Modification of the Lifson-Roig Helix-
role for loop residues in a-helix Coil theory to account for covalent
stability and protein folding. Nat. cross-links. Biochemistry 41, 15826–
Struct. Biol. 2, 380–385. 37.
28 Viguera, A. R. and Serrano, L. 40 Ruan, F., Chen, Y., and Hopkins, P.
(1999). Stable proline box motif at the B. (1990). Metal ion enhanced helicity
N-terminal end of a-helices. Protein in synthetic peptides containing
Sci. 8, 1733–42. unnatural, metal-ligating residues.
29 Doig, A. J. and Baldwin, R. L. (1995). J. Am. Chem. Soc. 112, 9403–4.
N- and C-capping preferences for all 41 Cline, D. J., Thorpe, C., and
20 amino acids in a-helical peptides. Schneider, J. P. (2003). Effects of
Protein Sci. 4, 1325–36. As(III) binding on a-helical structure.
30 Rohl, C. A., Chakrabartty, A., and J. Am. Chem. Soc. 125, 2923–9.
Baldwin, R. L. (1996). Helix propa- 42 Karpen, M. E., De Haset, P. L., and
gation and N-cap propensities of Neet, K. E. (1992). Differences in the
References 301
amino acid distributions of 310 -helices 54 Lee, K. H., Benson, D. R., and
and a-helices. Protein Sci. 1, 1333–42. Kuczera, K. (2000). Transitions from
43 Baker, E. N. and Hubbard, R. E. a to p helix observed in molecular
(1984). Hydrogen bonding in globular dynamics simulations of synthetic
proteins. Prog. Biophys. Mol. Biol. 44, peptides. Biochemistry 39, 13737–47.
97–179. 55 Weaver, T. M. (2000). The p-helix
44 Némethy, G., Phillips, D. C., Leach, translates structure into function.
S. J., and Scheraga, H. A. (1967). A Protein Sci. 9, 201–6.
second right-handed helical structure 56 Morgan, D. M., Lynn, D. G., Miller-
with the parameters of the Pauling- Auer, H., and Meredith, S. C.
Corey a-helix. Nature 214, 363–5. (2001). A designed Zn2þ-binding
45 Millhauser, G. L. (1995). Views of amphiphilic polypeptide: Energetic
helical peptides – a proposal for the consequences of p-helicity. Bio-
position of 310 -helix along the chemistry 40, 14020–9.
thermodynamic folding pathway. 57 Fodje, M. N. and Al-Karadaghi, S.
Biochemistry 34, 3872–7. (2002). Occurrence, conformational
46 Bolin, K. A. and Millhauser, G. L. features and amino acid propensities
(1999). a and 310 : The split personality for the p-helix. Protein Eng. 15, 353–8.
of polypeptide helices. Acc. Chem. Res. 58 Baldwin, R. L. (2003). In search of
32, 1027–33. the energetic role of peptide hydrogen
47 Miick, S. M., Martinez, G. V., Fiori, bonds. J. Biol. Chem. 278, 17581–8.
W. R., Todd, A. P., and Millhauser, 59 Epand, R. M. and Scheraga, H. A.
G. L. (1992). Short alanine-based (1968). Biochemistry 7, 2864–72.
peptides may form 310 -helices and not 60 Taniuchi, J. H. and Anfinsen, C. B.
a-helices in aqueous solution. Nature (1969). J. Biol. Chem. 244, 2864–72.
359, 653–5. 61 Spek, E. J., Gong, Y., and Kallen-
48 Zhou, H. X. X., Lyu, P. C. C., bach, N. R. (1995). Intermolecular
Wemmer, D. E., and Kallenbach, N. interactions in a helical oligo- and
R. (1994). Structure of C-terminal poly(L-glutamic acid) at acidic pH.
a-helix cap in a synthetic peptide. J. Am. Chem. Soc. 117, 10773–4.
J. Am. Chem. Soc. 116, 1139–40. 62 Brown, J. E. and Klee, W. A. (1971).
49 Fiori, W. R. and Millhauser, G. L. Biochemistry 10, 470–6.
(1995). Exploring the peptide 310 - 63 Bierzynski, A., Kim, P. S., and
helix/a-helix equilibrium with double Baldwin, R. L. (1982). A salt bridge
label electron spin resonance. stabilizes the helix formed by isolated
Biopolymers 37, 243–50. C-peptide of ribonuclease A. Proc. Natl
50 Toniolo, C. and Benedetti, E. Acad. Sci. USA 79, 2470–4.
(1991). The polypeptide 310 -helix. 64 Kim, P. S., Bierzynski, A., and
Trends Biochem. Sci. 16, 350–3. Baldwin, R. L. (1982). A competing
51 Ramachandran, G. N. and salt-bridge suppresses helix formation
Sasisekharan, V. (1968). by the isolated C-peptide carboxylate
Conformation of polypeptides and of Ribonuclease A. J. Mol. Biol. 162,
proteins. Adv. Protein Chem. 23, 283– 187–99.
437. 65 Kim, P. S. and Baldwin, R. L. (1984).
52 Low, B. W. and Grenville-Wells, H. A helix stop signal in the isolated S-
J. (1953). Generalized mathematical peptide of ribonuclease A. Nature 307,
relations for polypeptide chain helixes. 329–34.
The coordinates for the p helix. Proc. 66 Shoemaker, K. R., Kim, P. S., Brems,
Natl Acad. Sci. USA 39, 785–801. D. N. et al. (1985). Nature of the
53 Shirley, W. A. and Brooks, C. L. charged-group effect on the stability of
(1997). Curious structure in the C-peptide helix. Proc. Natl Acad.
‘‘canonical’’ alanine based peptides. Sci. USA 82, 2349–53.
Proteins: Struct. Funct. Genet. 28, 59– 67 Strehlow, K. G. and Baldwin, R. L.
71. (1989). Effect of the substitution Ala-
302 9 Stability and Design of a-Helices
107 Kemp, D. S., Boyd, J. G., and structure and properties of an a-helix
Muendel, C. C. (1991). The helical cap template derived from N-[(2S)-2-
s-constant for alanine in water derived chloropropionyl]-(2S)-Pro-(2R)-Ala-
from template-nucleated helices. (2S,4S)-4-thioPro-OMe which initiates
Nature 352, 451–4. a-helical structures. Tetrahedron 54,
108 Kemp, D. S., Allen, T. J., and Oslick, 15793–819.
S. L. (1995). The energetics of helix 116 Aleman, C. (1997). Conformational
formation by short templated peptides properties of a-amino acids
in aqueous solution. 1. Characteriza- disubstituted at the a-carbon. J. Phys.
tion of the reporting helical template Chem. 101, 5046–50.
Ac-HE1(1). J. Am. Chem. Soc. 117, 117 Sacca, B., Formaggio, F., Crisma,
6641–57. M., Toniolo, C., and Gennaro, R.
109 Groebke, K., Renold, P., Tsang, (1997). Linear oligopeptides. 401. In
K. Y., Allen, T. J., McClure, K. F., search of a peptide 310 -helix in water.
and Kemp, D. S. (1996). Template- Gazz. Chim. Ital. 127, 495–500.
nucleated alanine-lysine helices are 118 Tanaka, M. (2002). Design and
stabilized by position-dependent conformation of peptides containing
interactions between the lysine side a,a-disubstituted a-amino acids. J.
chain and the helix barrel. Proc. Natl Synth. Org. Chem. Japan 60, 125–36.
Acad. Sci. USA 93, 4025–9. 119 Lancelot, N., Elbayed, K., Raya, J. et
110 Kemp, D. S., Oslick, S. L., and Allen, al. (2003). Characterization of the 310 -
T. J. (1996). The structure and helix in model peptides by HRMAS
energetics of helix formation by short NMR spectroscopy. Chem. Eur. J. 9,
templated peptides in aqueous 1317–23.
solution. 3. Calculation of the helical 120 Karle, I. L. and Balaram, P. (1990).
propagation constant s from the Structural characteristics of a-helical
template stability constants t/c for Ac- peptide molecules containing Aib
Hel(1)-Ala(n)-OH, n ¼ 1–6. J. Am. residues. Biochemistry 29, 6747–56.
Chem. Soc. 118, 4249–55. 121 Banerjee, R. and Basu, G. (2002). A
111 Kemp, D. S., Allen, T. J., Oslick, short Aib/Ala-based peptide helix is as
S. L., and Boyd, J. G. (1996). The stable as an Ala-based peptide helix
structure and energetics of helix double its length. ChemBiochem 3,
formation by short templated peptides 1263–6.
in aqueous solution. 2. 122 Yokum, T. S., Gauthier, T. J.,
Characterization of the helical Hammer, R. P., and McLaughlin,
structure of Ac-Hel(1)-Ala(6)-OH. M. L. (1997). Solvent effects on the
J. Am. Chem. Soc. 118, 4240–8. 310 -/a-helix equilibrium in short
112 Austin, R. E., Maplestone, R. A., amphipathic peptides rich in a,a-
Sefler, A. M. et al. (1997). Template disubstituted amino acids. J. Am.
for stabilization of a peptide a-helix: Chem. Soc. 119, 1167–8.
Synthesis and evaluation of conforma- 123 Millhauser, G. L., Stenland, C. J.,
tional effects by circular dichroism and Hanson, P., Bolin, K. A., and van de
NMR. J. Am. Chem. Soc. 119, 6461–72. Ven, F. J. M. (1997). Estimating the
113 Arrhenius, T. and Sattherthwait, relative populations of 310 -helix and a-
A. C. (1990). Peptides: Chemistry, helix in Ala-rich peptides. A hydrogen
Structure and Biolog y: Proceedings exchange and high field NMR study.
of the 11th American Peptide J. Mol. Biol. 267, 963–74.
Symposium, ESCOM, Leiden. 124 Fiori, W. R., Lundberg, K. M., and
114 Muller, K., Obrecht, D., Knier- Millhauser, G. L. (1994). A single
zinger, A. et al. (1993). Perspectives carboxy-terminal arginine determines
in Medicinal Chemistry, pp. 513–531, the amino-terminal helix
Verlag Chemie, Weinheim. conformation of an alanine-based
115 Gani, D., Lewis, A., Rutherford, T. peptide. Nat. Struct. Biol. 1, 374–7.
et al. (1998). Design, synthesis, 125 Yoder, G., Polese, A., Silva, R. A.
References 305
147 Shi, Z. S., Woody, R. W., and 158 Sun, J. K. and Doig, A. J. (1998).
Kallenbach, N. R. (2002). Is Addition of side-chain interactions to
polyproline II a major backbone 310 -helix/coil and a-helix/310 -helix/coil
conformation in unfolded proteins? theory. Protein Sci. 7, 2374–83.
Adv. Protein Chem. 62, 163–240. 159 Muñoz, V. and Serrano, L. (1994).
148 Pappu, R. V. and Rose, G. D. (2002). Elucidating the folding problem of
A simple model for polyproline II helical peptides using empirical
structure in unfolded states of alanine- parameters. Nat. Struct. Biol. 1, 399–
based peptides. Protein Sci. 11, 2437– 409.
55. 160 Muñoz, V. and Serrano, L. (1997).
149 Shi, Z., Olson, C. A., Rose, G. D., Development of the multiple sequence
Baldwin, R. L., and Kallenbach, N. approximation within the AGADIR
R. (2002). Polyproline II structure in a model of a-helix formation: compari-
sequence of seven alanine residues. son with Zimm-Bragg and Lifson-
Proc. Natl Acad. Sci. USA 99, 9190–5. Roig formalisms. Biopolymers 41,
150 Rucker, A. L., Pager, C. T., 495–509.
Campbell, M. N., Qualis, J. E., and 161 Fernández-Recio, J., Vásquez, A.,
Creamer, T. P. (2003). Host-guest Civera, C., Sevilla, P., and Sancho,
scale of left-handed polyproline II J. (1997). The tryptophan/histidine
helix formation. Proteins: Struct. Funct. interaction in a-helices. J. Mol. Biol.
Genet. 53, 68–75. 267, 184–97.
151 Andersen, N. H. and Tong, H. 162 Lacroix, E., Viguera, A. R., and
(1997). Empirical parameterization of Serrano, L. (1998). Elucidating the
a model for predicting peptide helix/ folding problem of a-helices: local
coil equilibrium populations. Protein motifs, long-range electrostatics, ionic-
Sci. 6, 1920–36. strength dependence and prediction of
152 Scholtz, J. M., Qian, H., Robbins, NMR parameters. J. Mol. Biol. 284,
V. H., and Baldwin, R. L. (1993). The 173–91.
energetics of ion-pair and hydrogen- 163 Muñoz, V. and Serrano, L. (1995).
bonding interactions in a helical Elucidating the folding problem of
peptide. Biochemistry 32, 9668–76. helical peptides using empirical
153 Shalongo, W. and Stellwagen, E. parameters. II. Helix macrodipole
(1995). Incorporation of pairwise effects and rational modification of the
interactions into the Lifson-Roig helical content of natural peptides.
model for helix prediction. Protein Sci. J. Mol. Biol. 245, 275–96.
4, 1161–6. 164 Petukhov, M., Muñoz, V., Yumoto,
154 Argos, P. and Palau, J. (1982). N., Yoshikawa, S., and Serrano, L.
Amino acid distribution in protein (1998). Position dependence of non-
secondary structures. Int. J. Pept. polar amino acid intrinsic helical
Protein Res. 19, 380–93. propensities. J. Mol. Biol. 278, 279–89.
155 Kumar, S. and Bansal, M. (1998). 165 Lomize, A. L. and Mosberg, H. I.
Dissecting a-helices: position-specific (1997). Thermodynamic model of
analysis of a-helices in globular secondary structure for a-helical
proteins. Proteins 31, 460–76. peptides and proteins. Biopolymers 42,
156 Sun, J. K., Penel, S., and Doig, A. J. 239–69.
(2000). Determination of a-helix N1 166 Vásquez, M. and Scheraga, H. A.
energies by addition of N1, N2 and N3 (1988). Effect of sequence-specific
preferences to helix-coil theory. Protein interactions on the stability of helical
Sci. 9, 750–4. conformations in polypeptides.
157 Rohl, C. A. and Doig, A. J. (1996). Biopolymers 27, 41–58.
Models for the 310 -helix/coil, p-helix/ 167 Roberts, C. H. (1990). A hierarchical
coil, and a-helix/310 -helix/coil nesting approach to describe the
transitions in isolated peptides. Protein stability of a helices with side chain
Sci. 5, 1687–96. interactions. Biopolymers 30, 335–47.
References 307
206 Campbell, R. M., Bongers, J., and solvent conditions for solid-phase
Felxi, A. M. (1995). Rational design, cyclizations. Tetrahedron Lett. 37,
synthesis, and biological evaluation of 2173–6.
novel growth-hormone releasing-factor 215 Luo, P. Z., Braddock, D. T.,
analogs. Biopolymers 37, 67–8. Subramanian, R. M., Meredith, S.
207 Chorev, M., Roubini, E., McKee, C., and Lynn, D. G. (1994). Structural
R. L. et al. (1991). Cyclic parathyroid- and thermodynamic characterization
hormone related protein antagonists – of a bioactive peptide model of
lysine 13 to aspartic acid 17 [I to apolipoprotein-E–side-chain lactam
(I þ 40] side-chain to side-chain bridges to constrain the conformation.
lactamization. Biochemistry 30, 5968– Biochemistry 33, 12367–77.
74. 216 Jackson, D. Y., King, D. S.,
208 Osapay, G. and Taylor, J. W. (1992). Chmielewski, J., Singh, S., and
Multicyclic polypeptide model Schultz, P. G. (1991). General
compounds. 2. Synthesis and approach to the synthesis of short a-
conformational properties of a highly helical peptides. J. Am. Chem. Soc.
a-helical uncosapeptide constrained by 113, 9391–2.
3 side-chain to side-chain lactam 217 Kumita, J. R., Smart, O. S., and
bridges. J. Am. Chem. Soc. 114, 6966– Woolley, G. A. (2000). Photo-control
73. of helix content in a short peptide.
209 Bouvier, M. and Taylor, J. W. (1992). Proc. Natl Acad. Sci. USA 97, 3803–8.
Probing the functional conformation 218 Flint, D. G., Kumita, J. R., Smart,
of Neuropeptide-T through the design O. S., and Wolley, G. A. (2002).
and study of cyclic analogs. J. Med. Using an azobenzene cross-linker to
Chem. 35, 1145–55. either increase or decrease peptide
210 Kapurniotu, A. and Taylor, J. W. helix content upon trans-to-cis
(1995). Structural and conformational photoisomerization. Chem. Biol. 9,
requirements for human calcitonin 391–7.
activity – design, synthesis, and study 219 Kumita, J. R., Flint, D. G., Smart,
of lactam-bridged analogs. J. Med. O. S., and Woolley, G. A. (2002).
Chem. 38, 836–47. Photo-control of peptide helix content
211 Osapay, G. and Taylor, J. W. (1990). by an azobenzene cross-linker: steric
Multicyclic polypeptide model interactions with underlying residues
compounds. 1. Synthesis of a tricyclic are not critical. Protein Eng. 15, 561–9.
amphiphilic a-helical peptide using an 220 Jiménez, M. A., Muñoz, V., Rico, M.,
oxime resin, segment-condensation and Serrano, L. (1994). Helix stop
approach. J. Am. Chem. Soc. 112, and start signals in peptides and
6046–51. proteins. The capping box does not
212 Bracken, C., Gulyas, J., Taylor, necessarily prevent helix elongation.
J. W., and Baum, J. (1994). Synthesis J. Mol. Biol. 242, 487–96.
and nuclear-magnetic-resonance 221 Kallenbach, N. R. and Gong, Y. X.
structure determination of an a- (1999). C-terminal capping motifs in
helical, bicyclic, lactam-bridged model helical peptides. Bioorg. Med.
hexapeptide. J. Am. Chem. Soc. 116, Chem. 7, 143–51.
6431–2. 222 Petukhov, M., Yumoto, N., Murase,
213 Chen, S. T., Chen, H. J., Yu, H. M., S., Onmura, R., and Yoshikawa, S.
and Wang, K. T. (1993). Facile (1996). Factors that affect the
synthesis of a short peptide with a stabilization of a-helices in short
side-chain-constrained structure. peptides by a capping box.
J. Chem. Res. (S) 6, 228–9. Biochemistry 35, 387–97.
214 Zhang, W. T., and Taylor, J. W. 223 Viguera, A. R. and Serrano, L.
(1996). Efficient solid-phase synthesis (1995). Experimental analysis of the
of peptides with tripodal side-chain Schellman motif. J. Mol. Biol. 251,
bridges and optimization of the 150–60.
310 9 Stability and Design of a-Helices
224 Gong, Y., Zhou, H. X., Guo, M., and alanine based peptide. J. Am. Chem.
Kallenbach, N. R. (1995). Structural Soc. 123, 9235–8.
analysis of the N- and C-termini in a 233 Goch, G., Maciejczyk, M.,
peptide with consensus sequence. Oleszczuk, M., Stachowiak, D.,
Protein Sci. 4, 1446–56. Malicka, J., and Bierzynski, A.
225 Armstrong, K. M. and Baldwin, R. (2003). Experimental investigation of
L. (1993). Charged histidine affects a- initial steps of helix propagation in
helix stability at all positions in the model peptides. Biochemistry 42,
helix by interacting with the backbone 6840–7.
charges. Proc. Natl Acad. Sci. USA 90, 234 Nelson, J. W. and N. R., K. (1986).
11337–40. Stabilization of the ribonuclease S-
226 Scholtz, J. M., York, E. J., Stewart, peptide a-helix by trifluoroethanol.
J. M., and Baldwin, R. L. (1991). Proteins: Struct. Funct. Genet. 1, 211–
A neutral water-soluble, a-helical 17.
peptide: The effect of ionic strength 235 Nelson, J. W. and N. R., K. (1989).
on the helix-coil equilibrium. J. Am. Persistence of the a-helix stop signal
Chem. Soc. 113, 5102–4. in the S-peptide in trifluoroethanol
227 Smith, J. S. and Scholtz, J. M. solutions. Biochemistry 28, 5256–61.
(1998). Energetics of polar side-chain 236 Sonnichsen, F. D., Van Eyk, J. E.,
interactions in helical peptides: Salt Hodges, R. S., and Sykes, B. D.
effects on ion pairs and hydrogen (1992). Effect of trifluoroethanol on
bonds. Biochemistry 37, 33–40. protein secondary structure: an NMR
228 Lockhart, D. J. and Kim, P. S. and CD study using a synthetic actin
(1993). Electrostatic screening of peptide. Biochemistry 31, 8790–8.
charge and dipole interactions with 237 Waterhous, D. V. and Johnson, W.
the helix backbone. Science 260, 198– C. (1994). Importance of environment
202. in determining secondary structure in
229 Jelesarov, I., Durr, E., Thomas, R. proteins. Biochemistry 33, 2121–8.
M., and Bosshard, H. B. (1998). Salt 238 Jasanoff, A. and Fersht, A. R.
effects on hydrophobic interaction and (1984). Quantitative determination of
charge screening in the folding of a helical propensities. Analysis of data
negatively charged peptide to a coiled from trifluoroethanol titration curves.
coil (leucine zipper). Biochemistry 37, Biochemistry 33, 2129–35.
7539–50. 239 Albert, J. S. and Hamilton, A. D.
230 Scholtz, J. M., Marqusee, S., (1995). Stabilization of helical
Baldwin, R. L. et al. (1991). domains in short peptides using
Calorimetric determination of the hydrophobic interactions. Biochemistry
enthalpy change for the a-helix to coil 34, 984–90.
transition of an alanine peptide in 240 Walgers, R., Lee, T. C., and
water. Proc. Natl Acad. Sci. USA 88, Cammers-Goodwin, A. (1998). An
2854–8. indirect chaotropic mechanism for the
231 Yoder, G., Pancoska, P., and stabilization of helix conformation of
Keiderling, T. A. (1997). peptides in aqueous trifluoroethanol
Characterization of alanine-rich and hexafluoro-2-propanol. J. Am.
peptides, Ac-(AAKAA)n-GY-NH2 (n) Chem. Soc. 120, 5073–9.
1–4), using vibrational circular 241 Luo, P. and Baldwin, R. L. (1999).
dichroism and fourier transform Interaction between water and polar
infrared conformational determination groups of the helix backbone: An
and thermal unfolding. Biochemistry important determinant of helix
36, 5123–33. propensities. Proc. Natl Acad. Sci. USA
232 Huang, C. Y., Klemke, J. W., 96, 4930–5.
Getahun, Z., DeGrado, W. F., and 242 Kentsis, A. and Sosnick, T. R. (1998).
Gai, F. (2001). Temperature- Trifluoroethanol promotes helix
dependent helix-coil transition of an formation by destabilizing backbone
References 311
264 Sreerama, N. and Woody, R. W. 273 Shalongo, W., Dugad, L., and
(1993). A self-consistent method for Stellwagen, E. (1994). Distribution of
the analysis of protein secondary helicity within the model peptide
structure from circular dichroism. Acetyl(AAQAA)3 amide. J. Am. Chem.
Anal. Biochem. 209, 32–44. Soc. 116, 8288–93.
265 Keiderling, T. A. (2002). Protein and 274 Shalongo, W., Dugad, L., and
peptide secondary structure and Stellwagen, E. (1994). Analysis of the
conformational determination with thermal transitions of a model helical
vibrational circular dichroism. Curr. peptides using C-13 NMR. J. Am.
Opin. Struct. Biol. 6, 682–8. Chem. Soc. 116, 2500–7.
266 Keiderling, T. A. (2002). In Circular 275 Park, S.-H., Shalongo, W., and
Dichroism: Principles and Applications Stellwagen, E. (1998). Analysis of N-
(Berova, N., Nakanishi, K., and terminal Capping using carbonyl-
Woody, R. A., eds), pp. 621–666. carbon chemical shift measurements.
Wiley-VCH, New York. Proteins: Struct. Funct. Genet. 33, 167–
267 Tanaka, T., Inoue, K., Kodama, T., 76.
Kyogoku, Y., Hayakawa, T., and 276 Wuthrich, K. (1986). NMR of
Sugeta, H. (2001). Conformational Proteins and Nucleic Acids, John Wiley
study on poly[g-(a-phenethyl)-L- and Sons, New York.
glutamate] using vibrational circular 277 Kilby, P. M., van Eldick, L. J., and
dichroism spectroscopy. Biopolymers Roberts, G. C. K. (1995). Nuclear
62, 228–34. magnetic resonance assignments and
268 Baumruk, V., Pancoska, P., and secondary structure of bovine S100b
Keiderling, T. A. (1996). Predictions protein. FEBS Lett. 363, 90–6.
of secondary structure using statistical 278 Campbell, I. D. and Dwek, R. A.
analyses of electronic and vibrational (1984). Biological Spectroscopy.
circular dichroism and Fourier Benjamin Cummings, Menlo Park,
transform infrared spectra of proteins CA.
in H2 O. J. Mol. Biol. 259, 774–91. 279 Brey, W. S. (1984). Physical Chemistry
269 Nafie, L. A. (2000). Dual polarization and its Biological Applications.
modulation: a real-time, spectral Academic Press, New York.
multiplex separation of circular 280 Venyaminov, S.-Y. and Kalnin, N. N.
dichroism from linear birefringence (1990). Quantitative IR Spectro-
spectral intensities. Appl. Spectros. 54, photometry of peptide compounds
1634–45. in water (H2 O) solutions. 2. Amide
270 Hilario, J., Drapcho, D., Curbelo, absorption-bands of polypeptides
R., and Keiderling, T. A. (2001). and fibrous proteins in a-coil, b-coil,
Polarization modulation Fourier and random coil conformations.
transform infrared spectroscopy with Biopolymers 30, 1243–57.
digital signal processing: comparison 281 Susi, H., Timasheff, S. N., and
of vibrational circular dichroism Stevens, L. (1967). Infra-red spectrum
method. Appl. Spectros. 55, 1435–47. and protein conformations in aqueous
271 Rohl, C. A. and Baldwin, R. L. solutions. J. Biol. Chem. 242, 5460–6.
(1997). Comparison of NH exchange 282 Krimm, S. and Bandekar, J. (1986).
and circular dichroism as techniques Vibrational spectroscopy and
for measuring the parameters of the conformation of peptides, polypeptides
helix-coil transition in peptides. and protein. Adv. Protein Chem. 38,
Biochemistry 36, 8435–42. 181–364.
272 Spera, S. and Bax, A. (1991). 283 Nevskaya, N. A. and Chirgadze,
Empirical correlation between protein Y. N. (1976). Infrared spectra and
backbone conformation and Ca and resonance interactions of amide-I and
Cb 13 C nuclear magnetic resonance II vibration of a-helix. Biopolymers 15,
chemical shifts. J. Am. Chem. Soc. 113, 637–48.
5490–2. 284 Byler, D. M. and Susi, H. (1986).
References 313
10
Design and Stability of Peptide b-sheets
Mark S. Searle
10.1
Introduction
The pathway by which the polypeptide chain assembles from the unfolded, ‘‘disor-
dered’’ state to the final active folded protein has been the subject of intense inves-
tigation. Hierarchical models of protein folding emphasize the importance of local
interactions in restricting the conformational space of the polypeptide chain in the
search for the native state. These nuclei of structure promote interactions between
different parts of the sequence leading ultimately to a cooperative rate-limiting step
from which the native state emerges [1–4]. Designed peptides that fold autono-
mously in water (a-helices and b-sheets) have proved extremely valuable in probing
the relationship between local sequence information and folded conformation (the
stereochemical code) in the absence of the tertiary interactions found in the native
state of proteins. This has allowed intrinsic secondary structure propensities to be
investigated in isolation, and enabled the nature and strength of the weak interac-
tions relevant to a wide range of molecular recognition phenomena in chemistry
and biology to be put on a quantitative footing.
While the literature is rich in studies of a-helical peptides [5–8], water-soluble,
nonaggregating monomeric b-sheets have emerged relatively recently [9–11]. For
reasons of design and chemical synthesis, these are almost exclusively antiparallel
b-sheets, although others have used nonnatural linkers to engineer parallel strand
alignments [12–14]. Here we focus on contiguous antiparallel b-sheet systems.
Autonomously folding b-hairpin motifs, consisting of two antiparallel b-strands
linked by a reverse b-turn (Figure 10.1), represent the simplest systems for probing
weak interactions in b-sheet folding and assembly, although more recently a num-
ber of three- and four-stranded b-sheet structures have been described. From this
growing body of data, key factors have come into focus that are important in ratio-
nal design. The following will be considered: the role of the b-turn in promoting
and stabilizing antiparallel b-sheet formation; the role of intrinsic backbone f; c
propensities in preorganizing the extended conformation of the polypeptide chain;
the role of cooperativity in b-sheet folding and stability and in the propagation of
(a) O R H O R H
N N R
N N
H ψ R φ O R H O R H O O
N
N
R O H R O H HN
R O H R N N
N N R
H R O H R O
G-bulged
(c) non-native strand alignment type I turn
M1 H O I3 H O V5 H O N7 P8
N N N N
O
+H3N N N N
O Q2 H O F4 H O K6 H O N
H
H V16 O H L14 O H I12 O H
D9
H
N N N N N O
H N N N
O H E15 O H T13 O H T11 O G10
Fig. 10.1. a) Beta-strand alignment and which the native TLTGK turn sequence has
interstrand hydrogen bonds in an antiparallel been replaced by the sequence NPDG. b)
b-hairpin peptide; R groups represent amino Native strand alignment giving rise to a type I
acid side chains, main chain f and c angles NPDG turn. c) Nonnative strand alignment
are shown. N-terminal b-hairpin of ubiquitin in with a G-bulged type I turn (NPDGT).
10.2
b-Hairpins Derived from Native Protein Sequences
The early focus on peptides excised from native protein structures provided the
first insights into autonomously folding b-hairpins, preceding the more rational
approach to b-sheet design. Peptides derived from tendamistat [15], B1 domain of
316 10 Design and Stability of Peptide b-sheets
protein G [16, 17], ubiquitin [18, 19], and ferredoxin [20] showed that these se-
quences could exist in the monomeric form without aggregating, but that in most
cases they showed a very limited tendency to fold in the absence of tertiary con-
tacts. The use of organic co-solvents appeared to induce native-like conformation
[17, 18, 20]. The study by Blanco et al. [20] of a peptide derived from the B1 do-
main of protein G provided the first example of native-like folding in water of a
fully native peptide sequence. In contrast, the studies of hairpins isolated from
the proteins ubiquitin and ferredoxin, which are structurally homologous to the G
B1 domain (all form an a=b-roll fold) showed these to be unfolded in water [18,
20]. More recent studies of the native ubiquitin peptide have now revealed evidence
for a small population of the folded state in water [21], while a b-turn mutation has
been shown to significantly enhance folding [22].
The apparent lack of evidence for folding of the peptide derived from residues
15–23 of tendamistat (YQSWRYSQA) [15], and from the N-terminal sequence of
ubiquitin (MQIFVKTLTGKTITLEV) [19] led to partial redesign of the sequence to
enhance folding through modification to the b-turn sequence by introducing an
NPDG type I turn, which is the most common type I turn sequence in proteins.
Thus, in the former peptide SWRY was replaced by NPDG [15], and in the latter
the G-bulged type I turn TLTGK was replaced by NPDG [19]. By introducing this
tight two-residue loop across PD it was envisaged that both hairpins would be sta-
bilized. This was certainly the case, however, the most striking observation was that
both peptides folded into nonnative conformations with a three-residue G-bulged
type I turn (PDG) reestablished across the turn (Figure 10.1b and c). These initial
studies in rational redesign led to strikingly irrational results, prompting a much
more systematic approach to b-hairpin design. The above results revealed that the
b-turn sequence, which dictates the preferred backbone geometry, appears to be an
important factor in dictating the b-strand alignment. From the protein folding
viewpoint, as demonstrated by the redesigned ubiquitin hairpin sequence, it is evi-
dent that one important role of the native turn sequence may be to preclude the
formation of nonnative conformations that may be incompatible with formation
of the native state.
10.3
Role of b-turns in Nucleating b-hairpin Folding
diastereomeric type I 0 and II 0 turns have the f and c angles complementary to the
right-handed twisted orientation of the b-strands. These conclusions appear to ra-
tionalize, at least in part, the above observations with b-turn modifications intro-
duced into the b-hairpins of tendamistat and ubiquitin. The introduced NPDG
type I turn is not compatible with the right-handed twist of the b-strands resulting
in a refolding to a more flexible G-bulged type I turn.
The work of Gellman et al. [25] has shown the importance of backbone f; c
angle preferences for the residues in the turn sequence by comparing the stabil-
ity of a number of b-hairpin peptides derived from the ubiquitin sequence
(MQIFVKSXXKTITLVKV) containing either XX ¼ l-Pro-Xaa or d-Pro-Xaa. Replac-
ing l-Pro with d-Pro switches the twist from left-handed (type I or II) to right-
handed (type I 0 or II 0 ), making the latter compatible with the right-handed twist
of the two b-strands. NMR data (Ha shifts and long-range nuclear Overhauser ef-
fects (NOEs)) indicate that each of the d-Pro containing peptides showed a signifi-
cant degree of folding, whereas the l-Pro analogs appeared to be unfolded. Similar
conclusions were drawn from studies of a series of 12-mers containing XG turn
sequences, with X ¼ l-Pro or d-Pro [26].
A number of natural l-amino acids are commonly found in the aL region of con-
formational space and are compatible with the type I 0 or II 0 turn conformation.
Statistical analyses from a number of groups have identified Xaa-Gly as a fa-
vored type I 0 turn. A number of studies have used the Asn-Gly sequence to design
b-hairpin motifs and have demonstrated a high population of the folded struc-
ture with the required turn conformation and strand alignment [27–32]. The
work by de Alba et al. [28] identified two possible conformations of the peptide
ITSYNGKTYGR. The NOE data appear to be compatible with rapidly inter-
converting conformations involving an YNGK type I 0 turn and a NGKT type II 0
turn, each giving a distinct pattern of cross-strand NOEs. Ramirez-Alvarado et al.
[27] also described an NG-containing 12-mer (RGITVXGKTYGR; X ¼ N), which
they subsequently extended to a series of hairpins to examine the correlation be-
tween hairpin stability and the database frequency of occurrence of residues in
position X [29]. Using nuclear magnetic resonance (NMR) and circular dichroism
(CD) measurements they concluded that X ¼ Asn > Asp > Gly > Ala > Ser in
promoting hairpin folding, in agreement with the intrinsic f; c preferences of
these residues. Despite extensive analysis of NG type I 0 turns in the protein data-
base (PDB), it is still not entirely clear why Asn at the first position is so effective in
promoting turn formation. There is no evidence for specific side chain to main
chain hydrogen bonds that might stabilise the desired backbone conformation,
although specific solvation effects cannot be ruled out [30].
Evidence that the NG turn is able to nucleate folding in the absence of cross-
strand interactions was demonstrated in a truncated analog of one designed 16-
residue b-hairpin sequence KKYTVSINGKKITVSI (b1 in Figure 10.2), in which
the sequence was shortened to SINGKKITVSI, lacking the N-terminal six residues
[30]. Evidence from NOE data showed that the turn was significantly populated
with interactions observed between Ser6 Ha $ Lys11 Ha and Ile7 NH $ Lys10
318 10 Design and Stability of Peptide b-sheets
(a)
O K1 H O Y3 H O V5 H O I7 H
N N N N N8
N N N N
H O X2 H O T4 H O X6 H O O
O H X15 O H T13 O H X11 O H HN
N N N N H
-O N N N H G9
I16 O H V14 O H I12 O H K10 O
(b)
H O I7 H
N N N8
N
O S6 H O O
O H S15 O H T13 O H X11 O H HN
N N N N H
-O N N N H G9
I16 O H V14 O H I12 O H K10 O
Fig. 10.2. Structure of the 16-residue b-hairpin (1–5). The resulting peptide still shows the
sequence b1, and mutants b2; b3 and b4 in ability to fold around the turn sequence as
which cross-strand salt bridges (Lys-Glu) have evident from medium range NOEs (indicated
been introduced at position X2 -X15 and X6 -X11 . by arrows) that show that the INGK sequence
In b), hairpin b1 has been truncated by is adopting a type I 0 NG turn.
removing the N-terminal hydrophobic residues
NH (Figure 10.2b) that are only compatible with a folded type I 0 turn around NG.
Two destabilized b-hairpin mutants (KKYTVSINGKKITKSK with electrostatic re-
pulsion between the N- and C-terminal Lys residues, and KKATASINGKKITVSI
with the loss of key hydrophobic residues in one strand) showed no evidence
from NMR chemical shift data for cross-strand interactions; however, careful exam-
ination of NOE data revealed evidence for NG turn nucleation [30]. Titration with
organic co-solvent showed both peptides to fold significantly, indicating that the
turn sequence probably already predisposes the peptide to form a b-hairpin but
that favorable cross-strand interactions are required for stability.
The work of de Alba et al. convincingly illustrated this principle in a series of
six hairpin sequences (10-mers) where strand residues were conserved but turn
sequences varied [28]. Using a number of NMR criteria they were able to show
that changes in turn sequence could result in a variety of turn conformations in-
cluding two residue 2:2 turns, 3:5 turns and 4:4 turns (see earlier nomenclature
10.4 Intrinsic f; c Propensities of Amino Acids 319
[22–24]), with different pairings of amino acid side chains. As with the earlier
examples cited with the turn modification described for hairpins derived from ten-
damistat and ubiquitin, the bulged-type I turn (3:5 turn) appears to be an intrinsi-
cally stable turn with the necessary right-handed twist. Together these data strongly
support a model for hairpin folding in which the turn sequence strongly dic-
tates its preferred conformation, and that strand alignment, cross-strand interac-
tions and subsequently conformational stability are dictated by the specificity of
the turn.
10.4
Intrinsic f, c Propensities of Amino Acids
(a) (b)
180 180
120 120
60 60
ψ 0 ψ 0
-60 -60
-120 -120
-180 -180
-180 -120 -60 0 60 120 180 -180 -120 -60 0 60 120 180
φ φ
Fig. 10.3. Ramachandran plots of residue (f; c: 60 ; 0 is the aL region of conformational
backbone f and c angles taken from a space mainly occupied by Gly). The distribu-
database of 512 high-resolution protein X-ray tion in b) shows that residues have a natural
structures showing: a) residues in regular propensity to occupy the a and b regions of
b-sheet ðf; c: 120 ; 120 Þ and a-helix conformational space even when they are not
ðf; c: 60 ; 60 Þ, and b) residues in the involved in regular protein secondary structure.
irregular coil regions of the same structures Taken from Ref. [36].
Fig. 10.4. Effects of neighboring residues on a residue that prefers the b-sheet region – Ile,
residue b-propensities within the triplets XfX Val, Phe, or Tyr). The effects of neighboring
calculated from the data in Figure 10.3b. The residues on the average b-propensity is shown
b-propensity of residue f is a measure of the in (a), specific effects on Ser (b), Val (c) and
number of times a particular residues is found Lys (d) are also shown. While Val is relatively
in the b-region of the Ramachandran plot as a insensitive to the nature of the flanking
fraction of the total distribution between a- and residues, the smaller Ser residue can be forced
b-space ½b=ða þ bÞ. The context dependence to adopt a higher b-propensity if it has bulky
of the b-propensity is estimated by considering neighbors. Thus, the intrinsic b-propensity of a
the nature of the neighboring residue (X) particular residue is highly context dependent.
(X ¼ any residue, a is a residue that prefers Taken in part from Ref. [36].
the a-helical region – Asp, Glu, Lys, or Ser, b is
10.5
Side-chain Interactions and b-hairpin Stability
There is a strong case, at least in the context of isolated peptide fragments, that the
origin of the specificity of b-hairpin folding is largely dictated by the conforma-
tional preferences of the turn sequence. However, the stability of the folded state
has been attributed to interstrand hydrogen bonding and/or hydrophobic interac-
tions, though which dominates is still a matter of debate. Ramirez-Alvarado et al.
[27] reported that the population of the folded state of the hairpin RGITVNGK-
TYGR was significantly diminished by replacing residues on the N-terminal
322 10 Design and Stability of Peptide b-sheets
strand, and then the C-terminal strand, by Ala. The loss of stability was attributed
firstly to a reduction in hydrophobic surface burial, but also due to the intrinsically
lower b-propensity of Ala, the latter contributing through an adverse conforma-
tional entropy term. To compensate for this de Alba et al. [44] described a family
of hairpins derived from the sequence IYSNSDGTWT. The effects of residue sub-
stitutions in the first three positions was examined while maintaining the overall b-
character of the two strands. Several favorable cross-strand pair-wise interactions
were identified that were apparent in earlier, and more recent, PDB analysis of b-
sheet interactions [45, 46]. For example, Thr-Thr and Tyr-Thr cross-strand pairs
produced stabilizing interactions, whilst Ile-Thr and Ile-Trp had a destabilizing ef-
fect. Other studies of Ala substitution in one b-strand have similarly highlighted
hydrophobic burial as a key factor in conformational stability [30], while the obser-
vation of large numbers of side chain NOEs have been used as evidence for hydro-
phobic stabilization in water (see below) [19, 25–27, 30].
10.5.1
Aromatic Clusters Stabilize b-hairpins
(a)
W Y
1 V F
W Y
2 V F
W Y
3 V F
(b)
Fig. 10.5. Interplay between hydrophobic the turn, has the smallest energetic cost of
cluster and turn position in a family of forming the cluster which then significantly
isomeric b-hairpins. The distance of the WVYF stabilizes this core motif, but leads to fraying
cluster from the turn is shown in (a) for of the N- and C-termini. In contrast, in the
peptides 1, 2, and 3. A statistical mechanical case of peptide 3, there is a large energy
model was used to estimate the free energy of penalty in ordering the peptide backbone to
folding for the three peptides according to the bring the residues of the cluster together.
number of peptide hydrogen bonds formed Overall the core hairpin is less stable but more
and the relative position of the stabilizing residues are involved in ordered structure.
hydrophobic cluster (b). Thus, peptide 1, Adapted from Ref. [47].
which has the hydrophobic cluster closest to
324 10 Design and Stability of Peptide b-sheets
sequence extension in this study was limited to an all Thr extension or an alternat-
ing Ser-Thr (ST)n extension the conclusions should be viewed with caution. Since
cross-strand Ser/Ser and Thr/Thr pairings do not bury very much hydrophobic sur-
face area, it is not surprising perhaps that cross-strand side-chain interactions may
only just compensate for the entropic cost of organizing the peptide backbone.
Studies with other more favorable pairings may be enlightening.
Undoubtedly the most successfully designed structural motif to date has been
the tryptophan zipper (trpzip) motif [49], whose stability exceeds substantially all
those already described. The design is based around stabilizing nonhydrogen-
bonded cross-strand Trp-Trp pairs (Table 10.1).
The most successful designs involved two such Trp-Trp pairs which NMR struc-
tural analysis reveals are interdigitated in a zipper-like manner stabilized through
face–face offset p-stacking giving a compact structure (Figure 10.6). This arrange-
ment of the indole rings results in a pronounced signature in the CD spectrum
with intense exciton-coupling bands at 215 and 229 nm indicative of interactions
between aromatic chromophores in a highly chiral environment (see Figure 10.7).
The high sensitivity of the CD bands permits thermal denaturation curves to
be determined with high sensitivity, in contrast to other b-sheet systems where
only small changes in the CD spectrum at 216 nm are evident, resulting in poor
signal-to-noise. The thermal unfolding curves are sigmoidal and reversible, fitting
to a two-state folding model (Figure 10.7).
The trpzip hairpins 1, 2, and 3 show high Tm values (see Table 10.1) with folding
strongly enthalpy driven. Changing the turn sequence (GN versus NG versus d-
PN) has a significant impact on stability. The unfolding curve for trpzip2 with the
NG type I 0 turn gives the most cooperative unfolding transition and appears to be
the most stable of the three at room temperature, despite other studies that suggest
that the unnatural d-PN (type II 0 ) turn is the most stabilizing [25]. Clearly, context-
dependent factors are at work. Also, the strongly enthalpy-driven signature for fold-
ing is different to that reported previously for other systems [32], reflecting differ-
ences in the nature of the stabilizing interactions which in the case of the trpzip
peptides involve p–p stacking interactions. Despite the fact that the trpzip hairpins
represent a highly stabilized motif, no such examples have been found in the pro-
tein structure database. The authors suggest that on steric grounds it may be diffi-
cult to accommodate the trpzip motif within a multistranded b-sheet or through
packing against other structural elements.
10.5 Side-chain Interactions and b-hairpin Stability 325
Fig. 10.6. NMR structures of trpzips 1 and 2. 90 for the end-on view of the indole rings. The
A) Ensemble of 20 structures of trpzip1 backbone carbonyl of residue 6 is labeled to
(residues 2–11) showing the relative orienta- illustrate the different turn geometries of the
tions of the indole rings. B) Overlay of trpzip two hairpins (type II 0 and type I 0 ). Taken from
1 and 2 aligned to the peptide backbone of Ref. [49].
residues 2–5 and 8–11 (top), and rotated by
10.5.2
Salt Bridges Enhance Hairpin Stability
(a)
(b)
(c)
secondary structure has been less well investigated in terms of a detailed quantita-
tive description. To this end, peptide b1 (Figure 10.2) was mutated to introduce
Lys-Glu salt bridges at two positions within the hairpin involving substitution of
Ser ! Glu: one salt bridge is postioned adjacent to the b-turn, while the other in-
volves residues close to the N- and C-termini (see Figure 10.2) [54]. Using NMR
chemical shift data from Ha resonances we have estimated the net contribution to
stability from these two interactions. Although the contribution to stability in each
case is small, Ha chemical shifts are extremely sensitive to small shifts in the pop-
ulation of the folded state, as evident from the DdHa data in Figure 10.8. On this
basis, the individual contributions of these two interactions was estimated (K2-E15,
peptide b2, and E6-K11, peptide b3) compared with their K2-S15 and S6-K11 coun-
terparts (peptide b1) and found to be similar (1.2 and 1.3 kJ mol1 at 298K).
When the two salt bridges are introduced simultaneously into the hairpin
sequence (b4), the energetic contribution of the two interactions together
(3.6 kJ mol1 ) is significantly greater than the sum of the individual interactions.
This effect is readily apparent from the large increase in DdHa values in Figure
10.8, indicating that the contribution of a given interaction appears to depend on
the relative stability of the system in which the interaction is being measured. Sim-
ilar observations have been reported using a disulfide cyclized b-hairpin scaffold
[55]. The indication is that the strength of the interaction appears to depend on
the degree of preorganization of the b-hairpin template that pays varying degrees
of the entropic cost in bringing the pairs of side chains together.
This is more clearly illustrated by the schematic representation shown in Figure
10.9a that shows the thermodynamic cycle indicating the effects on stability of the
introduction of each mutation. Thus, the energetic contribution of the S15/E15
mutation could be measured either from b1 $ b2 or from b3 $ b4. The values ob-
tained are quite different, the latter suggesting that the interaction is more favor-
able (1.2 versus 2.3 kJ mol1 ). This appears to correlate with the fact that the
stability of b3 is greater than b1 and that the degree of preorganization of the for-
mer determines the energetic contribution of the interaction. The same principle is
evident when determining the energetics of the E6-K11 interaction. This can be es-
timated from b1 $ b3 or b2 $ b4. Again, the energetics are quite different (1.3
versus 2.4 kJ mol1 ) with the larger contribution from the b2 $ b4 pair, where
the intrinsic stability of the b2 reference state is higher.
The NMR data show that the folded conformation of b4 is highly populated
(> 70%) giving rise to an abundance of cross-strand NOEs (Figure 10.9b) [55]. On
the basis of 173 restraints (NOEs and torsion angle restraints from 3 JNH-Ha values)
we have calculated a family of structures compatible with the NMR data (Figure
10.9c). The large number of van der Waals contacts between hydrophobic residues
(V, I, and Y) evident from the NOE data leads us to conclude that hydrophobic in-
teractions still provide the overall driving force for folding with structure and
stability further consolidated by additional Coulombic interactions from the salt
bridges. CD melting curves for hairpin b4 in water and various concentrations
of methanol (Figure 10.10) show the same characteristics described above from
temperature-dependent NMR data for b1: a shallow melting curve for b4 in water
328 10 Design and Stability of Peptide b-sheets
Fig. 10.8. Effects of salt bridges (Lys-Glu) on Lys-Ser cross-strand pairs at the X2 -X15 and
b-hairpin stability. Ha chemical shift deviation X6 -X11 mutation sites (see Figure 10.2). a) b1
from random coil values (DdHa values) for b- versus b 2; b) b1 versus b3, and c) b1 versus
hairpin peptides b 2 to b4 compared with those b4. All data at 298K and pH 5.5. Taken from
of the reference hairpin peptide b1 containing Ref. [54].
10.5 Side-chain Interactions and b-hairpin Stability 329
(a)
(b)
(c)
Fig. 10.9. a) Thermodynamic cycle showing dependent, showing some degree of cooper-
the context-dependent energetic contribution ativity between the two mutation sites
to hairpin stability of each electrostatic according to the degree of preorganization of
interaction in the b1 to b4 family of hairpins the hairpin structure. b) Shematic illustration
(Figure 10.2) by comparing relative hairpin of the abundance of cross-strand NOEs in
stabilities at 298K determined from NMR hairpin peptide b4. c) Family of five NMR
chemical shift data; DDG values are shown for structures of hairpin b4 showing the peptide
Lys-Glu interactions at pH 5.5. The stabilizing backbone alignment, with some fraying at the
contribution of each salt bridge is context N- and C-termini. Taken from Ref. [54].
330 10 Design and Stability of Peptide b-sheets
Fig. 10.10. CD melting curves for b4 at (see Ref. [54]). The change in heat capacity on
various concentrations (% v/v) of MeOH, as folding in 30% and 50% aqueous methanol is
indicated. The nonlinear least squares fit is assumed to be zero; there is evidence for cold-
shown in each case from which thermo- denaturation in water and 10% methanol.
dynamic data for folding were determined Taken from Ref. [54].
with evidence for cold denaturation is consistent with the hydrophobic interaction
again providing the dominant driving force for folding (DH ¼ þ11:9 kJ mol1 and
DS ¼ þ38 J K1 mol1 ), while folding becomes strongly enthalpy driven in 50%
aqueous methanol (DH ¼ 39:8 kJ mol1 and DS ¼ 106 J K1 mol1 ).
10.6
Cooperative Interactions in b-sheet Peptides: Kinetic Barriers to Folding
Folding kinetics for a b-hairpin derived from the C-terminus of the B1 domain of
protein G have been described from measurements of tryptophan fluorescence fol-
lowing laser-induced temperature-jump [56, 57]. Kinetic analysis of this peptide
(and a dansylated analogue) reveals a single exponential relaxation process with
time constant 3:7 G 0:3 ms. The data indicate a single kinetic barrier separating
folded and unfolded states, consistent with a two-state model for folding. Subse-
quently, the authors developed a statistical mechanical model based on these obser-
vations, describing the stability in terms of a minimal numbers of parameters: loss
of conformational entropy, backbone stabilization by hydrogen bonding and forma-
10.7 Quantitative Analysis of Peptide Folding 331
tion of a stabilizing hydrophobic cluster between three key residues. This model
seems sufficient to reproduce all of the features observed experimentally, with a
rough, funnel-like energy landscape dominated by two global minima representing
the folded and unfolded states. The formation of the hydrophobic cluster appears
to be a key folding event. Nucleation by the turn seems most likely, consistent with
experimental measurements of loop formation on the timescale of @1 ms [58].
However, simulation studies by others suggest that folding may proceed by hydro-
phobic collapse followed by rearrangement to form the hydrophobic cluster, with
hydrogen bonds then propagating outward from the cluster in both directions
[59]. Such a model does not appear to require a turn-based nucleation event.
10.7
Quantitative Analysis of Peptide Folding
through a disulfide bridge seem to work equally well. Such an approach has been
used effectively to measure the thermodynamics of folding of a short hairpin carry-
ing a motif of aromatic residues [62]. The cyclic analogs show a much higher sta-
bility than their acyclic counterparts, including significant protection from amide
H/D exchange due to enforced interstrand hydrogen bonding. While peptide cycli-
zation seems a worthwhile approach to defining the fully folded state, there may
also be some caveats to this approach that have not been tested. When conforma-
tional constraints (b-turns) are imposed at both ends of the structure this may af-
fect the intrinsic twist of the two b-strands, resulting in a more pronounced twist
in the cyclic analog than in the acyclic hairpin. Further, the latitude for conforma-
tional dynamics in the fully folded state will also be different. Both of these factors
are likely to influence Ha shifts chemical shifts to some degree.
10.8
Thermodynamics of b-hairpin Folding
The number of b-hairpin model systems is expanding rapidly with a greater focus
now on quantitative analysis and thermodynamic characterization. The thermody-
namic signature for the folding of b1 (Figure 10.2) presents an insight into the
nature of the stabilizing weak interactions in various solvent milieu [30, 32]. b1 ex-
hibits the property of cold denaturation, with a maximum in the stability curve oc-
curring at 298K as judged by changes in Ha chemical shift (Figure 10.11a). Such
pronounced curvature is clear evidence for entropy-driven folding accompanied by
a significant change in heat capacity. Both of these thermodynamic signatures are
hallmarks of the hydrophobic effect contributing strongly to hairpin stability. More
recent studies by Tatko and Waters [63] have also shown that cold denaturation
effects can be observed for simple model peptides, consistent with substantial
changes in heat capacity on folding. Further examination of folding in the pres-
ence of methanol co-solvent shows that the signature changes such that folding be-
comes strongly enthalpy driven, and that in 50% aqueous methanol the tempera-
ture–stability profile is indicative of the absence of any significant contribution of
the hydrophobic effect to folding [32]. The population of the folded state appears to
be enhanced by methanol, reflecting similar observations in helical peptides where
the phenomenon has been attributed to the effects of the co-solvent destabilizing
the unfolded peptide chain so promoting hydrogen bonding interactions in the
folded state.
It is unlikely that the folded state of a model b-hairpin peptide resembles a
b-sheet in a native protein, with the former sampling a much larger number of
conformations of similar energy stabilized by a fluctuating ensemble of transient
interactions. IR analysis of the amide I band of b1 does not show significant dif-
ferences in the region expected for b-sheet formation (A 1630 cm1 ) from data on
a nonhydrogen-bonded short reference peptide [64]. However, this band does ap-
pear under aggregating conditions (Figure 10.11b). In other cases, IR spectral fea-
10.8 Thermodynamics of b-hairpin Folding 333
0.45
50% MeOH
0.40
(ii)
0.35
0.30
20% MeOH
0.25 (iii)
0.20
water
0.15
260 280 300 320 340 1720 1680 1640 1600 1560
[68]. In contrast with the above data, where the stabilizing hydrophobic interac-
tions involved aliphatic side chains, here an aromatic cluster is responsible for fold-
ing. A strongly enthalpy-driven transition is also apparent for the trpzip motifs
described above [49] all of which is consistent with p–p interactions stabilizing
b-hairpin structures through fundamentally electrostatic interactions rather than
through only solvophobic effects. Studies of a designed b-hairpin system, also
carrying the same motif of three aromatic residues, demonstrate qualitatively simi-
liar enthalpy-driven folding [62], while the thermodynamics of the N-terminal hair-
pin component of a designed three-stranded antiparallel b-sheet enables similar
conclusions to be drawn (see further below) [69]. The difference in the thermo-
dynamic signature for aliphatic versus aromatic side-chain interactions has been
suggested to have its origins in enthalpy–entropy compensation effects, such that
enthalpy-driven interactions may be fundamentally a consequence of tighter inter-
facial interactions, giving rise to stronger electrostatic (enthalpic) interactions [62].
More recent work by Tatko and Waters [70] probing the energetics of aromatic–
aliphatic interactions in hairpins suggests that there is a preference for the self-
association of aromatics over aromatic–aliphatic interactions, and that the unique
nature of this interaction may impart some degree of sequence selectivity. The lim-
ited data available from such systems suggest that the thermodynamic driving
force for b-hairpin folding is highly dependent on the nature of the side-chain in-
teractions involved. Further quantitative analysis of peptide folding is required to
substantiate these hypotheses.
10.9
Multistranded Antiparallel b-sheet Peptides
The natural extension of the earlier studies on b-hairpin peptides was to design
three- and four-stranded antiparallel b-sheets using the design principles already
discussed, focusing on the importance of turn sequence in defining stability and
strand alignment, and employing motifs of interacting side chains already identi-
fied to impart stability. An overriding question concerns the extent to which
cooperative interactions perpendicular to the strand direction are important in sta-
bilizing these structures (see Figure 10.12a); in other words, how good is a pre-
organized b-hairpin motif at templating the interaction of a third strand. Several
studies have attempted to address this important question. The earliest study
described a 24-residue peptide incorporating two NG turns (KKFTLSINGKKY-
TISNGKTYITGR) that showed little evidence for folding in water but was signifi-
cantly stabilised in aqueous methanol solutions [71, 72]. By comparison with a
16-residue b-hairpin analog consisting of the same C-terminal sequence (GKKY-
TISNGKTYITGR), it was possible to show that the Ha shift perturbations for the
C-terminal b-hairpin were greater in the presence of the interactions of the third
strand. Subsequent design strategies, incorporating the d-Pro-Gly loop, together
with other Asn-Gly-containing turn sequences have illustrated that it is possible to
design structures that fold in water [69, 73–75], some of which show a degree of
10.9 Multistranded Antiparallel b-sheet Peptides 335
(a) N
N
co-operativity
(i) N
N
C
C
C
C
N
N co-operativity
(ii) N
N
C
C C
C
(b)
C
C
N
N
C
C
N
N
Fig. 10.12. a) Models for the folding of b- state and partially folded states in which the
hairpin and three-stranded b-sheet peptides C-terminal b-hairpin is formed but the N-
illustrating the possibility of cooperative terminus is disordered, the N-terminal b-
interactions being propagated parallel i) and hairpin is formed but the C-terminus is
perpendicular ii) to the b-strand direction. b) disordered, and the fully folded state. The
Four-state model for the folding of a three- preformed hairpins can act as templates for
stranded antiparallel b-sheet peptide showing the folding of the third strand.
the presence at equilibrium of the unfolded
(a)
(b)
Fig. 10.13. a) Mean NMR structure of the isolated C-terminal hairpin peptide
fully folded state of the three-stranded (G 9 KYTVSING 17 KKITVSI) is also shown. A
antiparallel b-sheet structure of sequence larger Gly Ha splitting indicates a higher
KGEWTFVNG 9 KYTVSING 17 KKITVSI showing population of the folded hairpin. Different
the core cluster of hydrophobic residues profiles for Gly9 and Gly17 in the three-
(underlined). b) Temperature-dependent stranded sheet show that the peptide cannot
stability profiles for the various b-hairpin be folding via a simple two-state model
components of the three-stranded sheet involving only random coil and fully folded
using the Ha splitting of Gly9 and Gly17 in peptide. The data have been fitted assuming
the two b-turns: Gly9 (filled circles), Gly17 the four-state model shown in Figure 10.12b.
(open circles). Gly17 (open squares) in the Reproduced from Ref. [69].
338 10 Design and Stability of Peptide b-sheets
(a)
V O H S O H R O
N N N NH2
N N N
O O H E O H F O H
S
NH S
H O K H O I H O
N N N
N N N
O K H O F H O T H O O
HN
V O H T O H T O H
N N N N
N N N
O O H E O H Y O H K O
NH
H O K H O L H O
N N N D D D
N N NH2 P P P-cc
O O H O I H O Q
(b)
V O H T O H T O
N N N
N N N c(DP)2-II
O O H E O H Y O H
HN
NH
H O K H O L H O O
N N N
N N N
O O H O I H O Q
Fig. 10.14. Structure of the four-stranded b-sheet peptide
D D D
P P P-cc (a), and the cyclic reference peptide c(D P)2 -II (b)
used for estimating limiting values for the fully folded state of
the C-terminal hairpin of D PD PD P-cc. Adapted from the work of
Ref. [78].
rather than a native-like ‘‘crystalline’’ array of hydrogen bonds, may explain why
cooperative interactions have only a small effect on overall stability because only a
small energy barrier separates the folded from partially or fully unfolded states.
Several of the above studies have attempted to address the issue of whether co-
operative interactions are propagated orthogonal to the strand direction as the
number of b-strands increases. Gellman and coworkers have examined the influ-
ence of strand number on antiparallel b-sheet stability in designed three- and
four-stranded structures (see Figure 10.14), using the d-Pro-Gly b-turn sequence
to define b-turn position and strand length. The results are not dissimilar to those
described above; a third strand stabilizes an existing two-stranded b-sheet by
@2 kJ mol1 , while a fourth strand seems to confer a similar small stability incre-
ment [79].
10.10 Concluding Remarks: Weak Interactions and Stabilization of Peptide b-Sheets 339
10.10
Concluding Remarks: Weak Interactions and Stabilization of Peptide b-Sheets
References
1 Honig, B. J. Mol. Biol. 1999; 293, 12 Kemp, D. S., Bowen, B. R., Meundel,
283–93. C. C. J. Org. Chem. 1990; 55, 4650–7.
2 Brockwell, D. J., Smith, D. A., 13 Nowick, J. S., Smith, E. M., Pairish,
Radford, S. E. Curr. Opin. Struct. Biol. M. Chem. Soc. Rev. 1996; 401–15.
2000; 10, 16–25. 14 Diaz, H., Tsang, K. Y., Choo, D.,
3 Baldwin, R. L., Rose, G. Trends Espina, J. R., Kelly, J. W. J. Am.
Biochem. Sci. 1999; 24, 26–76. Chem. Soc. 1993; 115, 3790–1.
4 Baldwin, R. L., Rose, G. Trends 15 Blanco, F. J., Jimenez, M. A.,
Biochem. Sci. 1999; 24, 77–83. Herranz, J., Rico, M., Santoro, J.,
5 Chakrabartty, A., Baldwin, R. Adv. Nieto, J. L. J. Am. Chem. Soc. 1993;
Protein Chem. 1995; 46, 141–76. 115, 5887–9.
6 Munoz, V., Serrano, L. J. Mol. Biol. 16 Blanco, F. J., Rivas, G., Serrano, L.
1995; 245(3), 275–96. Nat. Struct. Biol. 1994; 1, 584–90.
7 O’Neil, K. T., Degrado, W. F. Science 17 Blanco, F. J., Jimenez, M. A.,
1990; 250(4981), 646–51. Pineda, A., Rico, M., Santoro, J.,
8 Hill, R., Degrado, W. J. Am. Chem. Nieto, J. L. Biochemistry 1994; 33,
Soc. 1998; 120(6), 1138–45. 6004–14.
9 Gellman, S. H. Curr. Opin. Chem. 18 Cox, J. P. L., Evans, P. A., Packman,
Biol. 1998; 2, 717–25. L. C., Williams, D. H., Woolfson,
10 Ramirez-Alvarado, M., Kortemme, D. N. J. Mol. Biol. 1993; 234, 483–
T., Blanco, F. J., Serrano, L. Bio-org. 92.
Med. Chem. 1999; 7, 93–103. 19 Searle, M. S., Williams, D. H.,
11 Searle, M. S. J. Chem. Soc. Perkin Packman, L. C. Nat. Struct. Biol. 1995;
Trans. 2001; 2, 1011–20. 2, 999–1006.
References 341
20 Searle, M. S., Zerella, R., Williams, 39 Frank, M., Clore, G. M., Gronen-
D. H., Packman, L. C. Protein Eng. born, A. Protein Sci. 1995; 4, 2605–
1996; 9, 559–65. 15.
21 Zerella, R., Evans, P. A., Ionides, 40 Buckler, D., Haas, E., Sheraga, H.
J. M. C. et al. Protein Sci. 1999; 8, Biochemistry 1995; 34, 15965–78.
1320–31. 41 Shortle, D. FASEB J. 1996; 10, 27–
22 Zerella, R., Chen, P. Y., Evans, 34.
P. A., Raine, A., Williams, D. H. Pro- 42 Klein-Seetharaman, J., Oikawa, M.,
tein Sci. 2000; 9, 2142–50. Grimshaw, S. B. et al. Science 2002;
23 Sibanda, B. L., Blundell, T. L., 295, 1719–22.
Thornton, J. M. J. Mol. Biol. 1989; 43 Shortle, D., Ackerman, M. S. Science
206, 759–77. 2001; 293, 487–9.
24 Hutchinson, E. G., Thornton, J. M. 44 De Alba, E., Rico, M., Jimenez, M. A.
Protein Sci. 1994; 3, 2207–16. Protein Sci. 1997; 6, 2548–60.
25 Haque, T. S., Gellman, S. H. J. Am. 45 Wouters, M. A., Curmi, P. M. G.
Chem. Soc. 1997; 119, 2303–4. Protein Struct. Funct. Genet. 1995; 22,
26 Stanger, H. E., Gellman, S. H. J. 119–31.
Am. Chem. Soc. 1998; 120, 4236–7. 46 Hutchinson, E. G., Sessions, R. B.,
27 Ramirez-Alvarado, M., Blanco, F. J., Thornton, J. M., Woolfson, D. N.
Serrano, L. Nat. Struct. Biol. 1996; 3, Protein Sci. 1998; 7, 2287–300.
604–12. 47 Espinosa, J. F., Munoz, V., Gellman,
28 De Alba, E., Jimenez, M. A., Rico, M. S. H. J. Mol. Biol. 2001; 306, 397–402.
J. Am. Chem. Soc. 1997; 119, 175–83. 48 Stanger, H. E., Syud, F. A.,
29 Ramirez-Alvarado, M., Blanco, F. J., Espinosa, J. F., Giriat, I., Muir, T.,
Niemann, H., Serrano, L. J. Mol. Gellman, S. H. Proc. Natl Acad. Sci.
Biol. 1997; 273, 898–912. USA 2001; 98, 12105–20.
30 Griffiths-Jones, S. R., Maynard, 49 Cochran, A. G., Skelton, N. J.,
A. J., Searle, M. S. J. Mol. Biol. 1999; Starovasnik, M. A. Proc. Natl Acad.
292, 1051–69. Sci. USA 2001; 98, 5578–83.
31 Maynard, A. J., Searle, M. S. J. 50 Karshikoff, A., Ladenstein, R.
Chem. Soc. Chem. Commun. 1997; Trends Biochem. Sci. 2001; 26, 550–6.
1297–8. 51 Kumar, S., Nussinov, R. Cell. Mol.
32 Maynard, A. J., Sharman, G. J., Life Sci. 2001, 58, 1216–33.
Searle, M. S. J. Am. Chem. Soc. 1998; 52 Takano, K., Tsuchimori, K., Yama-
120, 1996–2007. gata, Y., Yutani, K. Biochemistry 2000,
33 Swindells, M. B., Macarthur, 39, 12375–81.
M. W., Thornton, J. M. Nat. Struct. 53 Loladze, V. V., Ibarra-Molero, B.,
Biol. 1995; 2, 596–603. Sanchez-Ruiz, J., Makhatadze, G. I.
34 Fiebig, K. M., Schwalbe, H., Buck, Biochemistry 1999, 38, 16419–23.
M., Smith, L. J., Dobson, C. M. J. 54 Ciani, B., Jourdan, M., Searle,
Phys. Chem. 1996; 100, 2661–6. M. S. J. Am. Chem. Soc. 2003; 125,
35 Smith, L. J., Bolin, K. A., Schwalbe, 9038–47.
H., Macarthur, M. W., Thornton, 55 Russell, S. J., Blandl, T., Skelton,
J. M., Dobson, C. M. J. Mol. Biol. N. J., Cochran, A. G. J. Am. Chem.
1996; 255, 494–506. Soc. 2003; 125, 388–95.
36 Griffiths-Jones, S. R., Sharman, 56 Munoz, V., Thompson, P. A.,
G. J., Maynard, A. J., Searle, M. S. J. Hofrichter, J., Eaton, W. A. Nature
Mol. Biol. 1998; 284, 1597–609. 1997; 390, 196–199.
37 Jourdan, M., Griffiths-Jones, S. R., 57 Munoz, V., Henry, E. R., Hofrich-
Searle, M. S. Eur. J. Biochem. 2000; ter, J., Eaton, W. A. Proc. Natl
267, 3539–48. Acad. Sci. USA 1998; 95, 5872–9.
38 Neri, D., Billeter, M., Wider, G., 58 Hagen, S. J., Hofrichter, J., Szabo,
Wuthrich, K. Science 1992; 257, A., Eaton, W. A. Proc. Natl Acad. Sci.
1559–63. USA 1996; 93, 11615–17.
342 10 Design and Stability of Peptide b-sheets
11
Predicting Free Energy Changes of Mutations
in Proteins
Raphael Guerois, Joaquim Mendes, and Luis Serrano
The ability to predict with good accuracy the effect of mutations on protein stability
or complex formation is needed in order to design new proteins, as well as to un-
derstand the effect of single nucleotide polymorphisms on human health. In the
following sections we will summarize what is known and can be used for this pur-
pose. We will start by describing the forces that act on proteins, an understanding
of which is a prerequisite to understanding the following section dealing with pre-
diction methods. Finally we will briefly summarize other possible unexpected re-
sults of making mutants, such as changes in ‘‘in vivo’’ stability or inducing protein
aggregation.
11.1
Physical Forces that Determine Protein Conformational Stability
In this section, we will briefly discuss the physical forces that determine the con-
formational stability of proteins. This section is intended to serve as a basis for the
understanding of the following sections, and no effort has been made to compre-
hensively cover the relevant literature. However, many excellent reviews that thor-
oughly cover this topic have been written, several of which are cited in the titles
to the various subsections for the interested reader; more specific references are
given in the text.
11.1.1
Protein Conformational Stability [1]
Under physiological conditions, proteins exist in a biologically active state (the na-
tive (N) state), which is in dynamic equilibrium with a biologically inactive state
(the denatured (D) state). The relative thermodynamic stability of the N state with
respect to the D state is measured by the folding free energy: DGfold ¼ GN GD .
Although GN and GD are individually large numbers, DGfold is always small –
typically on the order of 5–20 kcal mol1 . This cancellation of large values is due
11.1.2
Structures of the N and D States [2–6]
be detected in the D state of some proteins using far-UV circular dichroism even
under relatively strong denaturing conditions, (ii) the D state of some proteins is
only slightly expanded relative to the N state and much more compact than would
be expected for a random coil, (iii) some single-point mutations can drastically alter
the compactness of the D state, and (iv) the free energy change arising from some
point mutations cannot be rationalized when a random coil model is assumed.
In relation to point (iv), point mutants have been found for which the folding
enthalpy is reduced to less than 50% of the wild-type value and the folding entropy
increases to a value that almost perfectly compensates the enthalpy change, such
that there is practically no difference in DGfold of the mutant and wild-type proteins
[8]. But by definition, there are no side chain–side chain interactions in a random
coil state and, therefore, the change in free energy of the D state accompanying
mutation depends only on the identity of the mutated residue in the mutant and
wild-type proteins, and not on the particular amino acid sequence; as such, under
the random coil assumption, the change in stability produced by a mutation can be
rationalized based solely on the structure of the N state. However, to explain the
changes in enthalpy and entropy in these mutants solely in terms of effects on
the native state requires (1) more than half of the chemical bonds stabilizing the
wild type N state to be broken in the mutant N state, and (2) an enormous increase
in the vibrational entropy of the mutant N state in order to avoid raising its free
energy. Crystallographic studies of the N state of many mutant proteins have invar-
iably shown only modest changes in atomic coordinates and thermal B-factors,
making this interpretation highly implausible.
A wealth of studies over the past decade or so have made considerable steps for-
ward in elucidating the structure of the D state. These studies are based on a vari-
ety of NMR experiments. In contrast to other spectroscopic and hydrodynamic
studies mentioned above, which yield data that correspond to a global property of
the protein, NMR spectroscopy provides information about individual residues.
The most important of these studies have directly observed the D state under phys-
iological conditions. This has been accomplished by destabilizing the N state rela-
tive to the D state, such that both exist in similar concentrations or the D state pre-
dominates. The destabilization has been achieved by particular pH conditions [9,
10], a single-point mutation [11, 12], or the deletion of residues at the N- and C-
termini [5, 13–15]. The conclusion of these studies is that considerable amounts
of secondary structure exist in the D state. This structure can be either native-like
or nonnative-like. It is not persistent but, rather, fluctuating. It can exist up to
high concentrations of chemical denaturants, but eventually melts out at very
high concentrations. Recent studies using paramagnetic probes or residual dipolar
couplings have found that, in addition to local secondary structure, a considerable
amount of nonlocal structure exists in the D state. For some proteins, the D state
appears to adopt the same overall topology of the N state. This nonlocal structure
persists even at very high concentrations of denaturant [16]. In the scope of protein
engineering it was shown that the specific destabilization of nonnative residual
structure in the denatured state could bring about a dramatic stabilization in an
SH3 domain [17].
346 11 Predicting Free Energy Changes of Mutations in Proteins
11.1.3
Studies Aimed at Understanding the Physical Forces that Determine Protein
Conformational Stability [1, 2, 8, 19–26]
A large number of studies have been undertaken to assess the relative importance
of different physical forces in stabilizing the N and D states of proteins. Two types
of study have been performed: (1) studies in which DGfold is decomposed into the
respective DHfold and DSfold components and these in turn into various component
forces, and (2) studies in which the change in DGfold produced by point mutations
is decomposed into component forces. These studies provided many important
insights and improved the understanding of protein conformational stability con-
siderably. However, some of their conclusions are still very controversial. The con-
troversy arises from several factors: (i) the large enthalpy–entropy compensation
accompanying folding leads to a very small DGfold , the consequence of which is
that a small error in the estimation of a particular force can lead to a stabilizing
force being interpreted as being destabilizing, or vice versa, (ii) the impossibility
of uniquely decomposing experimental thermodynamic quantities into intramolec-
ular and solvation contributions, (iii) the limited knowledge of the structural de-
tails of the D state and the lack of adequate models for it, and (iv) the considerable
plasticity of the N state, i.e., its ability to readjust structurally to accommodate
point mutations. In spite of the controversy, since DGfold is always small, even
forces that contribute only a few kcal mol1 make a large contribution to DGfold
and, thus, tip the balance in favor of the N state or of the D state. Therefore, as a
general conclusion that has emerged mainly from mutant studies, each protein
uses a different strategy to tweak this stability of the N state, taking advantage of
small increases in stability afforded by a particular type of force.
11.1.4
Forces Determining Conformational Stability [1, 2, 8, 19–27]
all collected together in the solvation free energy. Apart from the intramolecular in-
teraction energy and the solvation free energy, one must also take into account the
entropy of the N and D states when attempting to understand protein conforma-
tional stability.
11.1.5
Intramolecular Interactions
The intramolecular interactions that stabilize the N and D states of a protein are
predominantly nonbonded forces. Bonded forces are essential for maintaining the
covalent structure of the polypeptide chain, but are less important in determining
stability.
loss of entropy of the two side chains involved, formation of a second salt bridge
from a third residue to the already bridged pair requires only loss of the entropy
of one side chain. Salt bridge networks are very common in hyperthermophilic
proteins, and are thought to be important in stabilizing the N state of these pro-
teins [1].
Long-ranged electrostatic interactions between groups not directly involved in a
salt bridge with each other can also contribute to stabilize or destabilize the N state
relative to the D state. Thus, too many charges of the same type at the surface of
the protein destabilize the N state with respect to the D state, because in the more
expanded D state the like charges are further apart and repel each other less
strongly. On the other hand, some studies indicate that the distribution of charges
at the surface of the protein can be optimized such that the overall long-ranged
electrostatic interactions stabilize the N state [29–31].
It is frequently difficult to predict the effect of mutations that aim at increas-
ing protein stability by optimizing electrostatic interactions among charged groups
on the surface of the N state of the protein. In fact, the stability increase can be
considerably smaller than predicted and, in some cases, a decrease in stability is
observed when an increase was predicted. These results suggest that favorable
charge–charge interactions occur in the D state, and that the free energy of this
state may be decreased even more than that of the N state by the mutations
in charged residues that had aimed at improving the electrostatics of the N state
[32].
Hydrogen bonding The N state of proteins is extensively hydrogen bonded and in-
tramolecular hydrogen bonding is an important interaction in stabilizing this state.
In the D state, intramolecular hydrogen bonds are not persistent and, therefore,
probably do not contribute significantly to the stability of this state. Most hydrogen
bonds in proteins involve strong hydrogen bond donors and acceptors, and are of
type NaH O and OaH O. However, there is some structural evidence that the
a-hydrogen of the polypeptide backbone can participate in CaH O hydrogen
bonds, which, although somewhat weaker, could make a significant contribution
to the stability of the N state [33, 34].
N state of the protein through charge–dipole interactions between the charged res-
idues and the net dipole moment of the helix. Presumably, in these cases, the helix
is not formed or only partially formed in the D state and, therefore, the charge–
dipole interactions either do not exist or are much weaker in this state.
Aromatic interactions The aromatic groups of aromatic residues have a net elec-
tric quadrupole moment. This arises from the electronic distribution being concen-
trated above and below the aromatic plane, leading to a net negative charge above
and below this plane and a net positive charge within the plane and on the aro-
matic hydrogen atoms. The quadrupole moment is responsible for the particular
interactions in which aromatic groups are involved.
Aromatic groups in direct contact can interact with each other through
aromatic–aromatic interactions. Two stabilizing geometries are found in proteins:
in one, the hydrogen atoms at the edge of one group point towards the center of the
other, in the other the two aromatic planes are stacked on top of each other parallel
but off-centered. Intramolecular aromatic–aromatic interactions can be consider-
ably strong and can stabilize the N state of proteins significantly. In fact, aromatic
residues exist most frequently in the core of native proteins and are in contact with
each other more frequently than random.
Aromatic groups can also be involved in cation–p interactions within proteins
[35]. In this interaction, a positive ion is in close proximity to the negative partial
charge at the center of the aromatic ring, thus interacting favorably with the quad-
rupole moment. Intramolecular cation–p interactions can contribute considerably
to the stability of the N state of proteins.
The aromatic hydrogen atoms can also interact with the sulfur atom of Cys and
Met residues through a kind of weak hydrogen bond as has been experimentally
shown [36].
Aromatic groups can also participate in a variety of hydrogen bonds of differing
strengths [37].
11.1.6
Solvation
11.1.7
Intramolecular Interactions and Solvation Taken Together
and water, as a consequence of the much less dense, open structure of water,
which leads to fewer and weaker contacts than in the crystal [39]. They extend
this explanation to protein stability, arguing that the intramolecular van der Waals
interactions in the N state are stronger than the intermolecular van der Waals in-
teractions with water in the D state and, therefore, van der Waals interactions are
an important force in stabilizing the N state. However, based on molecular dynam-
ics simulations, Karplus and coworkers suggested a different interpretation of the
dissolution data [22]. They find that the intramolecular van der Waals interaction
energy in the crystal is approximately the same as the intermolecular van der
Waals energy of interaction between the organic molecule and water. Therefore,
they argue that van der Waals interactions hardly contribute to protein confor-
mational stability, and that the unfavorable enthalpy of dissolution results from
changes in water structure associated with cavity formation.
Although, the net contribution of aliphatic groups is to destabilize the D state
through the hydrophobic effect and to stabilize the N state through van der Waals
interactions, in some mutants an inverse hydrophobic effect has been observed for
very exposed hydrophobic residues in the N state [40]. This has been interpreted as
resulting from the hydrophobic residues being more buried in the D state than in
the N state.
Aromatic groups probably stabilize the N state more than they do the D state,
because they are most frequently buried in the N state establishing favorable intra-
molecular van der Waals and aromatic interactions, which strongly outweigh the
slightly favorable solvation when exposed in the D state. In fact, the very unfavor-
able free energies of transfer of aromatic compounds from the pure liquid into
water, in spite of their favorable solvation free energy, is thought to result from
the much more favorable aromatic–aromatic interactions in the pure liquid.
There is still controversy on the contribution of polar groups to protein confor-
mational stability. Some authors defend that the favorable solvation of these groups
when exposed to solvent in the D state almost exactly cancels the favorable intra-
molecular hydrogen bond and dipolar interactions they establish in the N state,
and therefore their net contribution to DGfold is practically zero. Other authors ar-
gue that the intramolecular interactions of the polar groups in the N state out-
weigh their favorable solvation in the D state, and conclude that polar group inter-
actions contribute about the same to DGfold as do the interactions of aliphatic and
aromatic groups. Whatever its net contribution to protein stability, hydrogen bond-
ing is always important for achieving a specific fold, since any unsatisfied hydro-
gen bond donor or acceptor group in the N state of the protein largely destabilizes
this state with respect to the D state, in which such groups establish hydrogen
bonds with water molecules.
11.1.8
Entropy
The entropy of the D state of proteins is considerably larger than that of the N
state, because it consists of a structurally much more diverse ensemble of confor-
mations. As such, entropy stabilizes the D state with respect to the N state. One
352 11 Predicting Free Energy Changes of Mutations in Proteins
11.1.9
Cavity Formation
the deleted group was hydrophobic, the exposure of polar groups to solvent, which
in the wild-type protein had unsatisfied hydrogen bonding, and the removal of an
exposed hydrophobic group will, in general, lead to a stabilization of the mutant.
11.1.10
Summary
11.2
Methods for the Prediction of the Effect of Point Mutations on in vitro Protein
Stability
11.2.1
General Considerations on Protein Plasticity upon Mutation
Over the past 20 years the protein engineering method was used as an efficient
means to probe the chemistry underlying protein structure and interactions. The
large amount of thermodynamic data obtained experimentally offers an appealing
support for the development and assessment of predictive methods that estimate
the effect of point mutation on protein stability. On average, single deletion or con-
servative point mutations can alter the stability of a protein by 10–20% of its total
354 11 Predicting Free Energy Changes of Mutations in Proteins
stability (the upper limit is around 50% of the total stability) and is supposed not to
cause too drastic changes in the three-dimensional structure. The observation that
the structure is little affected by single deletion or conservative point mutations has
been obtained from extensive works of different groups that have systematically
solved the structure of the mutated protein [47–52]. The mutation of buried resi-
dues is expected to be most deleterious to protein structure and is discussed briefly
here. As regards mutants in which a chemical group is deleted, slight adjustments
usually partially accommodate changes in side-chain volume, but only to a limited
degree [53–55]. In fact, it appears that full compensations are unlikely because it is
difficult to reconstitute the equivalent set of interactions that were present in the
wild-type structure (Figure 11.1). In the case of a new group inserted in a protein
core, the changes in structure tend to accommodate the introduced side chains by
rigid-body displacements of groups of linked atoms, achieved through relatively
small changes in torsion angles (Figure 11.2). It is rare, for instance, that a side
chain close to the site of substitution changes to a different rotamer [56].
L99
L99A
F153A
F153
A129
129W
Fig. 11.2. Representation of the structural perturbation of side-
chains in the core of the T4 lysozyme upon insertion of
chemical groups. The wild-type (PDB: 4LZM) is in yellow and
the mutant A129W (PDB: 1QTD) is in blue.
In a sense these observations are consistent with the fact that proteins in their
native state are in a deep energy well. Alternative structures are higher in energy
and not populated as long as the destabilization effect is limited. Only when sev-
eral side chains are mutated at one time are wider rearrangements observed in
the set of interactions that stabilize protein structure [55, 57]. In cases of extreme
perturbation (such as the insertion of amino acids inside a helix), experimental ob-
servations reveal an amazing plasticity of proteins that can successfully change and
adapt their structures (Figure 11.3, see p. 358) [58]. Such modifications are still far
from being predicted by computational methods. Yet, for the majority of the point
mutants considered by the predictive algorithms, the assumption that the structure
does not undergo large changes is a reasonable one, although some exceptions
have also been shown [59].
11.2.2
Predictive Strategies
A wide range of different strategies exist to account for the effect of point muta-
tions on protein stability [60]. The methods can be divided into three classes: (i)
statistically based methods (SEEF) that rely on the analysis of either sequence or
356 11 Predicting Free Energy Changes of Mutations in Proteins
11.2.3
Methods
tetrahedra whose edges represent all nearest neighbors of side chain centroids.
Quite strong correlations were obtained in the prediction of point mutations of hy-
drophobic core residues. Yet as discussed further, prediction of the effect of point
mutations on buried hydrophobic residues is not the most difficult to achieve. Sim-
ple counting methods were shown to perform quite well also [70]. The tessellation
strategy might be more difficult to extend to surface accessible positions and a
four-body potential method might be less successful for polar residues.
Statistical potentials based on the linear combination of database-derived poten-
tials such as the PoPMuSiC algorithm have also been shown to have good predic-
tive power, especially at fully exposed locations in proteins [71–73]. This program
(accessible at http://babylone.ulb.ac.be/popmusic/) was successfully used to guide
the engineering of an a1 -antitrypsin protein [74] and helped to identify that cation–
p interactions are likely to play a key role in the ability of proteins to undergo do-
main swapping [75]. The latter is an interesting example of how database-derived
potentials can reveal the role of specific interaction types that would have not been
detected with atomic potentials that do not explicitly integrate this type of interaction.
Insertion of
helix turn
[76, 77, 79, 80]). These terms have been introduced, either to follow experimental
data [64, 76, 81–86], or to correspond to the statistical analysis of the protein data-
base [84]. The latest version of one of these algorithms, AGADIR1s-2, includes:
local motifs, a position dependence of the helical propensities for some of the 20
amino acids, an electrostatic model that takes into consideration all electrostatic in-
teractions up to 12 residues in distance in the helix and random coil conformations
11.2 Methods for the Prediction of the Effect of Point Mutations on in vitro Protein Stability 359
and the effect of ionic strength [64]. This algorithm predicts with an overall stan-
dard deviation value of 6.6% (maximum helix is 100%), the CD helical content of
778 peptides (223 correspond to wild-type and modified protein fragments), as well
as the conformational shifts of the C H protons, and the 13 C and 15 N J coupling
values (web access at http://www.embl-heidelberg.de/Services/serrano/agadir/
agadir-start.html). The important point for our discussion here is that these meth-
ods, originally developed to predict the properties of peptides, could also be used to
predict the effect of mutation on protein stability. Several experimental results are
now confirming that the use of AGADIR could help understanding or designing
the effect of mutations on protein stability [43–45].
The development of an AGADIR-like program for the quantitative prediction of
b-hairpin stability has been somewhat less successful than in the case of helices.
Several nonaggregating peptides exhibiting a large amount of b-hairpin structure
were studied in the 1990s and allowed experimental analysis of the specific interac-
tions both at the turn regions and in the strands [87–91]. In general, a good corre-
lation was found between the experimental data and the statistical analysis of the
protein database. As more experimental information is available, the description of
hairpin formation in a beta/coil transition algorithm might be undertaken. How-
ever, due to the complexity of the interactions present in b-sheets, a residue-based
empirical description such as the one used in AGADIR is unlikely to be successful.
For example, the energetic contribution of a pairwise interstrand side-chain inter-
action appears to be dependent on the face of the b-sheet on which it is located,
and intrinsic propensities are likely to display a significant position dependence
along the strands as well as in the different turn positions, and will vary for the
different turn types. Furthermore parallel b-sheets will differ in their properties
from the antiparallel b-sheets. Taken together, these different contributions would
yield a huge number of parameters to be experimentally determined for any empir-
ical model of b-sheet formation.
As suggested in the next section, a rational approach to quantitatively predict the
effect of mutations on b-hairpin and b-sheet stability will be more successful if,
in addition to the residue based description used in AGADIR, an atomic-based
description is included. Simple peptide models adopting b-hairpin and conforma-
tions hence constitute crucial systems for testing and refining such algorithms.
network around the protein. When the structure of the mutant is not known, a fast
predictive method for positioning the water molecules is required. The one pro-
posed in [100] was for instance adapted in the FOLD-X method. It is important to
note that in their study, Kortemme et al. suggested that the method they used for
hydrogen bond calculation allowed an implicit treatment of the effect of these
structural water molecules [97].
The role and the way to compute the effects of hydrogen bonds in protein stabil-
ity has been a matter of debate for some years [101, 102]. In all the methods based
on protein engineering analysis, an explicit consideration of the hydrogen bond
with distance and angular constraints was used. Other recent studies also suggest
that, in order to score a unique structural conformation, the angular and distance
constraints inherent to the formation of hydrogen bonds are better modeled by a
specific expression than by a global term accounting for all the electrostatic interac-
tions [103, 104].
Different databases were used to test these potentials, which makes a ranking
comparison between the methods rather difficult. For the FOLDEF, SPMP, and
Kortemme et al. methods a precision ranging from 0.7 to 1 kcal mol1 of standard
deviation between prediction and experiments was obtained. The fourth method
(Lomize et al.) produced very low rmsd errors between theoretical and experimen-
tal (0.4 kcal mol1 and 0.57 kcal mol1 for the training database or the blind test
database respectively). Yet the mutant databases used in the latter studies were lim-
ited to fully buried residues (test carried out on solvent accessible residues led to
mean unsigned errors of 1.4 kcal mol1 ) and the blind test database was restricted
to 10 point mutants.
Interestingly, each method has specifically focused on at least one term of the
energy function: (i) the SPMP method has developed specific analyses for explicit
water molecules, (ii) the FOLD-X focused on the way entropy penalties are applied,
(iii) the Kortemme et al. method derived a distance and angular dependent ex-
pression for hydrogen bonds, and (iv) the Lomidze et al. method used a specific
Lennard Jones potential for each atom type interaction. A combination of these
specific developments may push forward the current limits of the predictive algo-
rithms. The way hydrogen bonds strength and charge–charge electrostatic interac-
tions are modulated with solvent accessibility and with long-range interactions is
probably one of the lines of improvement that can still be achieved in these meth-
ods. Yet, to be precise enough, this improvement will require the inclusion of side
chain or backbone conformational exploration as was already partly included in the
Kortemme et al. method.
An interesting test case to those questions has been proposed in the study of a
surface salt bridge K11/E34 in ubiquitin and of the mutant in the reverse orienta-
tion E11/K34 [31]. It is clearly shown that long-range interactions contribute to the
strength of these salt bridges and that a simple model can predict the effect of the
surrounding on this type of interaction. Another track for improvement is related
to the selection of the mutants tested in the fitting procedure. The fitting of the ad-
justable parameters tends to prioritize mutants with a larger DDG. Contexts or in-
teractions that involve smaller DDG or interactions that are little represented in the
362
Tab. 11.1. Summary of the implementation of the energetic factors included in four different methods for predicting free energy changes upon mutation.
Number of adjustable 9 8 8 18
parameters
Training database 54 mut. with X-ray 339 mut. 743 mut. 106 mut. with X-ray
structures structures
Blind test database 56 mut. with X-ray 667 mut. þ 82 in complexes 371 mut. þ 233 in complexes 10 mut. with X-ray
structures structures
Excluded mutations High B-factor residues None None Water-accessible residues
Accessibility ASA based Protein atomic density Protein atomic density ASA based
estimation from the atom volumic
contact parameter
Van der Waals None Set by the atom volumic Lennard-Jones with two Lennard-Jones like. Nine
contact (atomic density) parameters for attractive terms depending on
and repulsive terms atom types
Solvation ASA-based model (2 Derived from average of Lazaridis, and Karplus ASA-based model (five
11 Predicting Free Energy Changes of Mutations in Proteins
as follows:
X
1
0
DG ¼ GB GA ¼ RT lnheDH =RT il ; where DH 0 ¼ Hlþdl Hl ð2Þ
l¼0
This methodology has been applied to a wide range of free energy calculations
dedicated to the study of enzymatic reactions, conformational changes, binding of
proteins, nucleic acids or small molecules [108], and also of the effects of muta-
11.2 Methods for the Prediction of the Effect of Point Mutations on in vitro Protein Stability 365
tions on protein stability. Compared with other fields of application, the estimation
of protein stability is relatively difficult. Indeed, contrary to the analysis of binding
between macromolecules, the estimation of the DG of folding requires the energy
calculation of a highly flexible and still poorly characterized state, the denatured
state ensemble.
Since the early 1990s, molecular dynamics strategies coupled to free energy cal-
culations have been used to analyze the stability changes of about 30 different
point mutations in experimentally studied proteins [109–119]. The agreements be-
tween experimental and predictions have been good in the large majority of the
cases observed, although significant discrepancies could also be observed [120].
The importance of the denatured state in the free energy calculation has been
highlighted in many of the most recent calculations. It supports the experimental
observation of the existence of residual structure, often with native like properties,
in the denatured state of protein as reported in the first section. The systematic cal-
culations carried out on 10 different mutants of chymotrypsin inhibitor 2 (CI2),
with various conditions for the representation of the denatured protein emphasize
the importance of considering a proper denatured state structure ensemble [118].
This analysis illustrates that when the denatured state is taken into account in the
calculation, the predictions are much more successful (correlation going from 0.68
to 0.82). Besides, depending on the model, either an extended tripeptide or a more
realistic full-length simulated structure of the denatured state, clear improvements
in the simulations could be obtained. Hence, in the case of the hydrophobic muta-
tions considered there, the quality of the prediction was shown to depend on the
model of the denatured state.
Similarly, the works by Kitao’s group [114, 116] showed a better agreement be-
tween experimental values and theory when they changed their reference state
from a pentapeptide with an extended conformation to a more realistic pentapep-
tide model with the native like conformation.
As stated above, the advantage of molecular dynamics simulations or alternative
methods that are based on the basic principle of physics is that they allow a de-
tailed inspection of the origin of the stabilization or destabilization. Yet, the way
free energies can be divided into energy (such as Lennard Jones, electrostatic, etc.)
or residue terms is not a trivial issue. Several theories regarding the path depen-
dence of the free energy components have been advanced [107, 121–125]. Under
controlled conditions and proper choice of the l coupling parameter it seems pos-
sible to extract some meaningful information. Unexpected results were observed,
for instance, in the case of mutations involving apolar residues. The reason for
the loss of stability in the V57A mutant in CI2 or the L56F mutant in T4 lysozyme
could be interpreted as a larger variation in the electrostatic component. These par-
ticular mutants were not well predicted by the empirical-based methods and this
fact emphasizes the advantage of mixing various approaches to get further insight
in the origin of protein stabilization or destabilization.
Because it is based on molecular dynamics simulation, the free energy calcula-
tion method can account for the change in structure provoked by a mutation and
does not depend on the assumption that the structure is unchanged upon muta-
366 11 Predicting Free Energy Changes of Mutations in Proteins
tion. Yet compared with the less sophisticated empirical methods the computer
time required to run a calculation on one mutant is several orders of magnitude
larger. The most exhaustive study is still limited to eight to ten mutants. It would,
however, be of great interest to get results from the calculation of a larger number
of point mutants so that a statistical estimation of the error made by these methods
could be more easily made. To speed up and overcome the sampling problems
mixed strategies using Monte-Carlo sampling coupled to molecular dynamics
have been proposed [117].
It is also important to mention alternative strategies using optimized methods
for extensive sampling of the side chain conformational space. In this case how-
ever the backbone is considered rigid. Instead of extracting free energies from mo-
lecular dynamic simulation, these methods account for the conformational entropy
component by an extensive sampling of side chain conformations. Different levels
of sophistication in the force field can be included in these methods [126–128]. For
instance the method in Ref. [128] included an implicit solvation term that allowed
the calculation of a large number of point mutants not only at solvent-buried posi-
tions but also at solvent-exposed ones. This line of research is, to our view, one of
the most promising for the prediction of the effect of point mutation on protein
stability since it combines a high sampling quality and a force field that can de-
pend both on physical and statistical terms to achieve the highest prediction rate.
11.3
Mutation Effects on in vivo Stability
11.3.1
The N-terminal Rule
The N-end rule relates the in vivo half-life of a protein to the identity of its N-
terminal residue. Similar but distinct versions of the N-end rule operate in all
organisms examined, from mammals to fungi and bacteria. In eukaryotes, the N-
end rule pathway is a part of the ubiquitin system. In Escherichia coli, three genes
11.3 Mutation Effects on in vivo Stability 367
Arg 2 2
Lys 2 3
Phe 2 3
Leu 2 3
Trp 2 3
Tyr 2 10
His >600 3
Ile >600 30
Asp >600 3
Glu >600 30
Asn >600 3
Gln >600 10
Cys >600 >1200
Ala >600 >1200
Ser >600 >1200
Thr >600 >1200
Gly >600 >1200
Val >600 >1200
Pro ? ?
Met >600 >1200
have been identified, clpA, clpP, and aat, as part of the degradation machinery in-
volved in the N-terminal rule. In Table 11.2 we show data for the half-life of a pro-
tein having different N-terminal amino acids. For a review see Ref. [129].
11.3.2
The C-terminal Rule
In E. coli it has been found that when the ribosome encounters a problem in pro-
tein synthesis it attaches a 11 amino acid peptide (the ssrA degradation tag) to the
C-terminus of the stalled protein chain and then the protein is released and tar-
geted for degradation by the ClpX system [133]. Consequently it has been found
that proteins containing an unstructured C-terminal segment with a hydrophobic
amino acid at the C-terminus could be targeted for degradation by the ClpX ma-
chinery and that the efficiency of the degradation is related to the hydrophobic
composition of the last five amino acids [134].
368 11 Predicting Free Energy Changes of Mutations in Proteins
11.3.3
PEST Signals
Polypeptide regions rich in proline (P), glutamic acid (E), serine (S), and threonine
(T) (PEST) target intracellular proteins for destruction [135]. Attachment of PEST
sequences to proteins results in the fusion protein being degraded from 2- to al-
most 40-fold faster than the parental molecule [136].
11.4
Mutation Effects on Aggregation
References
57 Gassner, N. C., Baase, W. A., and structure prediction. Curr Opin Struct
Matthews, B. W. (1996). A test of the Biol 10, 139–45.
‘‘jigsaw puzzle’’ model for protein 67 Betancourt, M. R. and Thirumalai,
folding by multiple methionine D. (1999). Pair potentials for protein
substitutions within the core of T4 folding: choice of reference states and
lysozyme. Proc Natl Acad Sci USA 93, sensitivity of predicted native states to
12155–8. variations in the interaction schemes.
58 Vetter, I. R., Baase, W. A., Heinz, Protein Sci 8, 361–9.
D. W., Xiong, J. P., Snow, S., and 68 Thomas, P. D. and Dill, K. A. (1996).
Matthews, B. W. (1996). Protein An iterative method for extracting
structural plasticity exemplified by energy-like quantities from protein
insertion and deletion mutants in structures. Proc Natl Acad Sci USA 93,
T4 lysozyme. Protein Sci 5, 2399– 11628–33.
415. 69 Carter, C. W., Jr., LeFebvre, B. C.,
59 Consonni, R., Santomo, L., Fusi, P., Cammer, S. A., Tropsha, A., and
Tortora, P., and Zetta, L. (1999). A Edgell, M. H. (2001). Four-body
single-point mutation in the extreme potentials reveal protein-specific
heat- and pressure-resistant sso7d correlations to stability changes caused
protein from sulfolobus solfataricus by hydrophobic core mutations. J Mol
leads to a major rearrangement of the Biol 311, 625–38.
hydrophobic core. Biochemistry 38, 70 Serrano, L., Kellis, J. T., Jr., Cann,
12709–17. P., Matouschek, A., and Fersht, A.
60 Mendes, J., Guerois, R., and R. (1992). The folding of an enzyme.
Serrano, L. (2002). Energy estimation II. Substructure of barnase and the
in protein design. Curr Opin Struct contribution of different interactions
Biol 12, 441–6. to protein stability. J Mol Biol 224,
61 Rath, A. and Davidson, A. R. (2000). 783–804.
The design of a hyperstable mutant of 71 Gilis, D. and Rooman, M. (2000).
the Abp1p SH3 domain by sequence PoPMuSiC, an algorithm for
alignment analysis. Protein Sci 9, predicting protein mutant stability
2457–69. changes: application to prion proteins.
62 Wang, Q., Buckle, A. M., and Protein Eng 13, 849–56.
Fersht, A. R. (2000). Stabilization of 72 Gilis, D. and Rooman, M. (1997).
GroEL minichaperones by core and Predicting protein stability changes
surface mutations. J Mol Biol 298, upon mutation using database-derived
917–26. potentials: solvent accessibility
63 Bystroff, C. and Baker, D. (1998). determines the importance of local
Prediction of local structure in versus non-local interactions along the
proteins using a library of sequence- sequence. J Mol Biol 272, 276–90.
structure motifs. J Mol Biol 281, 565– 73 Gilis, D. and Rooman, M. (1996).
77. Stability changes upon mutation of
64 Lacroix, E., Viguera, A. R., and solvent-accessible residues in proteins
Serrano, L. (1998). Elucidating the evaluated by database-derived
folding problem of alpha-helices: local potentials. J Mol Biol 257, 1112–26.
motifs, long-range electrostatics, ionic- 74 Gilis, D., McLennan, H. R.,
strength dependence and prediction of Dehouck, Y., Cabrita, L. D.,
NMR parameters. J Mol Biol 284, 173– Rooman, M., and Bottomley, S. P.
91. (2003). In vitro and in silico design of
65 Pedelacq, J. D., Piltch, E., Liong, E. alpha1-antitrypsin mutants with
C. et al. (2002). Engineering soluble different conformational stabilities.
proteins for structural genomics. Nat J Mol Biol 325, 581–9.
Biotechnol 20, 927–32. 75 Dehouck, Y., Biot, C., Gilis, D.,
66 Lazaridis, T. and Karplus, M. (2000). Kwasigroch, J. M., and Rooman, M.
Effective energy functions for protein (2003). Sequence-structure signals of
References 373
137 Dobson, C. M. (2002). Getting out of analysis of the propensity for amyloid
shape. Nature 418, 729–30. formation by a globular protein.
138 Bousset, L., Thomson, N. H., EMBO J 19, 1441–9.
Radford, S. E., and Melki, R. (2002). 141 Chiti, F., Stefani, M., Taddei, N.,
The yeast prion Ure2p retains its Ramponi, G., and Dobson, C. M.
native alpha-helical conformation (2003). Rationalization of the effects
upon assembly into protein fibrils in of mutations on peptide and protein
vitro. EMBO J 21, 2903–11. aggregation rates. Nature 424,
139 Booth, D. R., Sunde, M., Bellotti, 805–8.
V. et al. (1997). Instability, unfolding 142 Fernandez-Escamilla, A. M.,
and aggregation of human lysozyme Rousseau, F., Schymkowitz, J., and
variants underlying amyloid Serrano, L. (2004). Prediction of
fibrillogenesis. Nature 385, 787–93. sequence-dependent and mutational
140 Chiti, F., Taddei, N., Bucciantini, effects on the aggregation of peptides
M., White, P., Ramponi, G., and and proteins. Nat Biotechnol 22, 1302–
Dobson, C. M. (2000). Mutational 1306.
377
Part I
2 Dynamics and Mechanisms of
Protein Folding Reactions
379
12.1
Kinetic Mechanisms in Protein Folding
Annett Bachmann and Thomas Kiefhaber
12.1.1
Introduction
12.1.2
Analysis of Protein Folding Reactions using Simple Kinetic Models
The aim of kinetic studies is trying to find the minimal model capable of describ-
ing the experimental data by ruling out as many alternative models as possible. It
is thereby assumed that the mechanism involves a finite number of kinetic species
that are separated by energy barriers significantly larger than the thermal energy
(>5kB T ), but each kinetic species may consist of different conformations in rapid
equilibrium. Thus, the transitions between large ensembles of states can be treated
using classical reaction kinetics and the interconversion between two species, X i
and X j , can be described with microscopic rate constants for the forward (k ij ) and
reverse (k ji ) reactions
k ij
Xi Ð Xj ð1Þ
k ji
12.1.2.1
General Treatment of Kinetic Data
X
n
Pt Py ¼ A i eli t ð2Þ
i¼1
12.1.2.2
Two-state Protein Folding
Experimental investigations have revealed simple two-state folding with single ex-
ponential folding and unfolding kinetics for more than 30 small proteins (Figure
12.1.1; for an overview see Ref. [6]):
kf
UÐN ð3Þ
ku
where k f and k u are the microscopic rate constants for the folding and unfolding
reactions, respectively. The single observable macroscopic rate constant (l) for this
12.1.2 Analysis of Protein Folding Reactions using Simple Kinetic Models 381
‡
ku
Free energy (G 0)
kf
∆G 0‡
f
∆G 0‡ U
u
∆G 0
N
Reaction coordinate
Fig. 12.1.1. Free energy profile for a highest free energy. The activation free
hypothetical two-state protein folding reaction. energies for passing the barrier from U or N,
The ground states U and N are separated by a DGz z
f and DGu , respectively, are reflected in
free energy barrier with the transition state as the microscopic rate constants k f or k u (see
point along the reaction coordinate with the Eq. (8)).
l ¼ kf þ ku ð4Þ
The equilibrium of the reaction is determined by the flows from the native state N
to the unfolded state U and vice versa:
½Neq kf
K :¼ ¼ ð6Þ
½Ueq ku
Thus, protein stability can be measured in two different ways, either by equilib-
rium methods or by kinetic measurements. The agreement between the free ener-
gies for folding inferred from equilibrium and kinetic measurements is commonly
used as a test for two-state folding (see Figure 12.1.2).
382 12.1 Kinetic Mechanisms in Protein Folding
A
3000
1000
-1000
0 1 2 3 4 5 6 7 8
GdmCl (M)
RTln(K eq)=–∆G 0 (kJ mol –1)
2×104 B
1×104
-1×104
-2×104
0 1 2 3 4 5 6 7 8
GdmCl (M)
2
C
0
λ
(s–1)
-2
ln l
-4
ku
-6 kf
-8
0 1 2 3 4 5 6 7 8
GdmCl (M)
Fig. 12.1.2. Equilibrium and kinetic data for logarithm of the single observable folding rate
two-state tendamistat folding and unfolding at constant (l ¼ k f þ k u ) gives a V-shaped profile
pH 2.0. A) GdmCl-induced equilibrium transi- commonly termed a chevron plot. This reveals
tion of wild-type tendamistat at pH 2.0 measured a linear dependence between [GdmCl] and the
by far-UV circular dichroism. B) Plot of RT ln K eq lnðkf ; u Þ as indicated by the dashed lines. The
for the data in the transition region result in a values for K eq and meq derived from kinetic
linear dependency on the denaturant concen- (panel C) and equilibrium data (panel B) are
tration. The slope corresponds to meq (see identical, which is a strong support for a two-
Eq. (11)). C) GdmCl dependence of the state folding mechanism.
12.1.2 Analysis of Protein Folding Reactions using Simple Kinetic Models 383
The same formalism can be applied to obtain information on the free energy of
activation (DGz ) for a reaction using transition state theory
z
k ¼ k 0 eDG =RT
ð8Þ
where k 0 is the pre-exponential factor and represents the maximum rate constant
for a reaction in the absence of free energy barriers. Combination of Eqs (6) and (8)
allows us to gain information on the free energy changes along the reaction coor-
dinate (see Figure 12.1.1).
DG ¼ DGz z
f DGu ð9Þ
In the original Eyring theory the pre-exponential factor corresponds to the fre-
quency of a single bond vibration [7, 8]:
kB T
k0 ¼ k A 6 10 12 s1 at 25 C ð10Þ
h
qDG
meq ¼ ð11Þ
q½Denaturant
qDGz
f;u
mf ; u ¼ ð12Þ
q½Denaturant
Since the meq -values were shown to be proportional to the changes in accessible
surface area (ASA) upon unfolding of the protein [15], the kinetic m-values are be-
lieved to reflect the changes in ASA between the unfolded state and the transition
384 12.1 Kinetic Mechanisms in Protein Folding
state (m f ) and the native state and the transition state (m u ). This can be used to
characterize the solvent accessibility of a protein folding transition state (see Chap-
ter 12.2).
The m-values allow an extrapolation of the data to any given denaturant concen-
tration using
where (H2 O) denotes the values in the absence of denaturant and (D) those at a
given denaturant concentration [D]. Combining Eq. (9) with Eqs (13)–(15) shows
that the comparison of the kinetic and equilibrium free energies and m-values pro-
vides a tool to test for the validity of the two-state mechanism (see Figure 12.1.2):
and
meq ¼ m f m u ð17Þ
12.1.2.3
Complex Folding Kinetics
For simple two-state folding reactions the microscopic rate constants are readily ob-
tained from the chevron plot (cf. Figure 12.1.2) and the properties of the transition
barriers can be analyzed using rate equilibrium free energy relationships [1, 16–
19] (REFERs; see Chapter 12.2). However, folding of most proteins is more com-
plex with several observable rate constants. These complex folding kinetics can
have different origins. The transient population of partially folded intermediates,
which frequently occurs under native-like solvent conditions (i.e., at low denatur-
ant concentrations), will lead to the appearance of additional kinetic phases (see
Section 12.1.3.2) and to a downward curvature in the chevron plot (Figure
12.1.3A) [13, 20–23]. Kinetic coupling between two-state folding and slow equili-
bration processes in the unfolded state provides another source for multiphasic
folding kinetics [2, 24, 25] (Section 12.1.3.1). Since these possible origins for com-
plex folding kinetics have far-reaching effects on the molecular interpretation of
the kinetic data it is crucial to discriminate between them and to elucidate the ori-
gin of complex kinetics. With the knowledge of the correct folding mechanism the
microscopic rate constants (k ij ) connecting the individual states can be determined.
6
4
2 λ2
0
ln l (s–1)
-2
-4 λ1
-6
-8 A
-10
0 1 2 3 4 5 6 7 8
[GdmCl] (M)
-1
-2
ln l (s–1)
-3
-4 B
0 1 2 3 4 5 6 7
[GdmCl] (M)
Fig. 12.1.3. Different origins for curvatures in experiments [52] (interrupted refolding
chevron plots: A) Population of a transient experiments). B) Chevron plot for ribonuclease
intermediate during folding of hen egg white T1 W59Y. The curvature in the refolding limb at
lysozyme. At low denaturant concentrations low denaturant concentrations is caused by
two rate constants are observed in refolding kinetic coupling between prolyl isomerization
(l2 and l1 ; open triangles and open circles). and protein folding [24, 25]. Only the slowest
For unfolding of native lysozyme a single refolding phase is shown. The fraction of faster
kinetic phase is observed (filled circles). l2 at folding molecules with all prolyl peptide bonds
higher denaturant concentrations (filled in the native conformation could not be
triangles) corresponds to unfolding of the detected since the experiments were carried
intermediate and can only be detected when out by manual mixing. Data for lysozyme were
the intermediate is transiently populated by a taken from Ref. [52] and data for RNase T1
short refolding pulse and then unfolded at high were taken from Ref. [25].
denaturant concentrations in sequential mixing
species as long as their interconversion is faster than the kinetic reactions leading
to the native state. Studies on the time scale of intrachain diffusion in unfolded
polypeptide chains in water showed that these processes occur on the 10–100 ns
time scale [9, 26], which is significantly faster than protein folding reactions [27,
28]. On the other hand, slow interconversion reactions between the different un-
folded conformations lead to kinetic heterogeneity. This was first observed by Garel
386 12.1 Kinetic Mechanisms in Protein Folding
and Baldwin [29] who showed that both fast and slow refolding molecules exist in
unfolded ribonuclease A (RNase A). In 1975 Brandts and coworkers [30] showed
with their classical double jump experiments that slow and fast folding forms of
RNase A are caused by a slow equilibration process in the unfolded state and pro-
posed cis–trans isomerization at Xaa-Pro peptide bonds as the molecular origin (see
Chapter 1). This was confirmed later with proline mutations [31] and catalysis of
the slow reactions by peptidyl-prolyl cis-trans isomerases [32]. The fast folding mol-
ecules have all Xaa-Pro peptide bonds in their native isomerization states, whereas
slow-folding molecules contain nonnative prolyl isomers, which have to isomerize
to their native isomerization state during folding. This isomerization process is
slow with time constants (t ¼ 1=l) around 10–60 s at 25 C, which usually limits
the folding reaction of the unfolded molecules with nonnative prolyl isomers (see
Chapter 25). Since the equilibrium population of the cis isomer in Xaa-Pro peptide
bonds is between 7 and 36%, depending on the Xaa position, [33] all proteins with
prolyl residues should show a significant population of slow folding molecules.
Prolyl isomerization has indeed been observed in folding of many proteins [34].
Whenever folding and proline isomerization have similar rate constants or when
folding is slower than isomerization, a pronounced curvature is observed in the re-
folding limb of the chevron plot, which looks similar to the effect of transiently
populated intermediates (Figure 12.1.3B). The effect of prolyl isomerization on
folding kinetics and its identification as a rate-limiting step is discussed in detail
in Chapter 10 in Part II. The effect on chevron plots is treated quantitatively in
Refs [2, 35, 36].
In addition to prolyl isomerization also nonprolyl peptide bond isomerization
has recently been shown to cause slow parallel folding pathways [37, 38]. Cis–trans
isomerization of nonprolyl peptide bonds has long been speculated to cause slow
steps in protein folding, since it is an intrinsically slow process with a high ac-
tivation energy [30, 33]. The large number of peptide bonds in a protein leads to a
significant fraction of unfolded molecules with at least one nonnative cis isomer,
although the cis isomer is only populated to about 0.15% in equilibrium in the un-
folded state [39]. Studies on a slow folding reaction in a prolyl-free tendamistat
variant revealed that folding of about 5% of unfolded molecules is limited by the
cis–trans isomerization of nonprolyl peptide bonds [38] (Figure 12.1.4). The slow
equilibrium between cis and trans nonprolyl peptide bonds in the unfolded state
was revealed by double jump experiments [38] (see also Chapter 25). In these ex-
periments (Figure 12.1.4C) the protein is rapidly unfolded at high denaturant con-
centrations often in combination with low pH. After various times, unfolding
is stopped by dilution to refolding conditions. The resulting folding kinetics are
monitored by spectroscopic probes. If a slow isomerization reaction in the un-
folded state occurs, the slow folding molecules will be formed slowly after unfold-
ing. Thus, the amplitude of the slow reaction will increase with unfolding time as
observed for the proline free tendamistat variant shown in Figure 12.1.4C. Figure
12.1.4 further reveals that nonprolyl isomerization has a rate constant around
1 s1 , which is significantly faster than prolyl peptide bond isomerization. This is
in accordance with rate constants for nonprolyl isomerization measured in model
12.1.2 Analysis of Protein Folding Reactions using Simple Kinetic Models 387
101 A
100
λ (s–1 )
10-1
10-2
0 2 4 6 8
[GdmCl] (M)
0.8 B
Relative amplitude
0.4
0.0
0 2 4 6 8
[GdmCl] (M)
Slow refolding amplitude (mV) ( )
0.025
Fast refolding amplitude (mV) ( )
0.4
0.020
0.3 0.015
0.2 0.010
0.1 0.005
C
0.0 0.000
0.0 0.5 1.0 1.5 2.0 2.5
Unfolding time (s)
Fig. 12.1.4. Complex folding kinetics caused and the slow-refolding (filled circles) molecules
by nonprolyl cis–trans isomerization in the during unfolding. Native protein was unfolded
unfolded state of a proline-free variant of at high denaturant concentrations for the
tendamistat. GdmCl dependence of the indicated time. Under the applied conditions
apparent rate constants, l1 (open circles), l2 unfolding was complete after 150 ms. After
(filled circles) for folding and unfolding (A) and various unfolding times refolding was initiated
the respective refolding amplitudes (A1 (open by a second mixing step and the amplitudes of
circles), A2 (filled circles) are shown. The lines the two refolding reactions (open circles, filled
in panels A and B represent the global fit of circles) in the presence of 0.8 M GdmCl (cf.
both apparent rate constants and amplitudes panel A) were measured. The results show that
to the analytical solution of a linear three-state the slow-refolding phase (filled circles) is
model. The fit reveals a GdmCl-independent formed slowly in the unfolded state whereas
isomerization reaction and a GdmCl- the fast-refolding phase is formed with the
dependent folding reaction (see Ref. [38]). C) same time constant as that with which
Double jump experiments to detect the slow unfolding of tendamistat occurs. Data were
isomerization process in the unfolded state. taken from Ref. [38].
Appearance of the fast-refolding (open circles)
388 12.1 Kinetic Mechanisms in Protein Folding
peptides [39]. The results on the effect of nonprolyl isomerization on protein fold-
ing imply that isomerization at non prolyl peptide bonds will dramatically effect
folding of large proteins. For a protein with 500 amino acid residues more than
50% of the unfolded molecules have at least one nonnative peptide bond isomer.
Since the isomerization rate constant is around 1 s1 , it will mainly affect the early
stages of folding and folding of fast folding proteins [38].
Another cause of kinetic heterogeneity in the unfolded state was identified in cy-
tochrome c, where parallel pathways were shown to be due to the exchange of the
heme ligands in the unfolded state [40–42] (see Chapter 15).
0.4
0.04
0.2
0.0 0.00
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
Time (s) Time (s)
-2000 150
Θ225 (mdeg cm2 dmol–1)
-6000 50
-8000 0 U
-10000 -50
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
Time (s) Time (s)
500
U
E
400
R G2
300
are occasionally also observed for small proteins, [50], the concentration depen-
dence of the folding reaction should be tested.
The next step in determining a folding mechanism is to locate the intermediates
on the folding pathway. If a single folding intermediate is observed two apparent
rate constants will be observed for all possible three-state mechanisms shown in
Scheme 12.1.1.
k IN
I
kU
(A)
k NI
U
kI
k UN
U k NU N
k UI k IN
U k IU
I k NI
N (B)
k IU k UN
I k UI
U k NU
N (C)
Scheme 12.1.1
1) Unfolded protein
U
2) Start refolding
3) Mixture of U, I, N
U I N
kIU kNU 4) Interrupt refolding at t=ti by
transferring the solution to high
denaturant concentrations and
measure amplitudes of the unfolding
reactions of N and I as a function of ti
U
Fig. 12.1.6. Principle of interrupted refolding experiments to
measure the time course of the population of a folding
intermediate (I) and of native proteins (N) starting from
completely unfolded protein (U). The experiments are
described in detail in the text.
which selectively measure the population of the individual kinetic species during
folding, have been introduced by Schmid [51] and were initially used to detect
slow and fast folding pathways during RNase A folding. They are commonly
termed interrupted refolding experiments. These experiments make use of the
fact that each state of the protein has its characteristic stability and unfolding rate
constant. Interrupted refolding experiments consist of two consecutive mixing
steps (Figure 12.1.6). In a first step refolding is initiated from completely unfolded
protein. The folding reaction is allowed to proceed for a certain time (t i ) after
which the solution is transferred to a high concentration of denaturant. This re-
sults in the unfolding of all native molecules and of partially folded states which
have accumulated during the refolding step. Each state (N or I) is identified by its
specific unfolding rate constant at high concentrations of denaturant, where un-
folding is virtually irreversible, so that little kinetic coupling occurs. The ampli-
tudes of the observed unfolding reactions reflect the amounts of the respective spe-
cies present at time (t i ) when refolding was interrupted. Thus, varying the time (t i )
allowed for refolding gives the time course of the populations of native protein and
of the intermediate during the folding process. With the use of highly reproducible
sequential mixing stopped-flow instruments these experiments also became feasi-
ble on a faster time scale and allow the discrimination between sequential and tri-
angular folding mechanism on fast folding pathways [22, 52].
The discrimination between the off-pathway mechanism and the triangular
mechanism is usually difficult, since also in the off-pathway model some mole-
392 12.1 Kinetic Mechanisms in Protein Folding
cules may fold directly from U to N, depending on the ratio of the rate constants
kUN and kUI [52]. The discrimination between these mechanisms usually requires
analysis of the denaturant dependence of all folding and unfolding rate constants
in combination with the time course of the different species [52]. The data are then
fitted to the analytical solutions of the differential equations describing the differ-
ent models. For folding reactions of monomeric proteins all reactions are of first
order and the analytical solutions of the differential equations can be obtained for
mechanisms with four species or less. These solutions are derived in many kinetic
textbooks and rate constants and amplitudes for three-state and four-state mecha-
nisms are given in Refs [24] and [52]. The solutions for the three-state models in
Scheme 12.1.1 are given in the Protocols (Section 12.1.6.1–12.1.6.3). A particularly
useful source for the analytical solution of a vast number of different kinetic mech-
anisms is the article by Szabo [4]. For more complex mechanisms numerical meth-
ods have to be applied to analyze the data. As an example for the discrimination
between the different kinetic models shown in Scheme 12.1.1, the elucidation of
the folding mechanism of lysozyme will be discussed in detail in Section 12.1.3.
1 kUI
l1 ¼ kUI þ kIU ; l2 ¼ kIN þ kNI with KUI ¼ ð18Þ
1 þ 1=KUI kIU
where 1=ð1 þ ð1=KUI ÞÞ represents the fraction of intermediate ( f (I)) in the pre-
equilibrium. Since I is productive in mechanism B and nonproductive in mecha-
nism C, the rate of formation of N depends on f (I) and 1 f (I), respectively.
Due to the commonly observed strong denaturant dependence of the microscopic
rate constants, the simplifications made above might not be valid at some denatur-
ant concentrations. Therefore, the simplified treatment of the data should only be
performed if formation and unfolding of the intermediate is significantly faster
than the kIN and kNI under all experimental conditions. Otherwise, the exact solu-
tions of the three state model must be used to fit the data [52] (see Protocols, Sec-
tion 12.1.6).
Under conditions where the simplifications given in Eqs (18) and (19) hold and
the triangular mechanism has been ruled out, these equations can provide a sim-
ple tool for identifying off-pathway intermediates. Destabilization of the intermedi-
12.1.2 Analysis of Protein Folding Reactions using Simple Kinetic Models 393
A high [D] 4
B
3
free energy (G 0 )
TS1
TS2
2
ln λ(/s-1)
1
0
low [D] -1
-2
0 1 2 3 4 5 6 7
U TS1 I TS2 N
reaction coordinate [GdmCl] (M)
Fig. 12.1.7. Transition barrier for apparent which indicate the switch between the two
two-state folding through a high-energy alternative transition states [57]. The black line
intermediate and the consequences on the represents a fit of the data to a three-state
shape of the chevron plot. A) The reaction model as described in the Protocols (Eqs (35)–
coordinate of tendamistat comprises two (42)). The lines labeled TS1 and TS2
discrete transition states and a high-energy correspond to constructed chevron plots for
intermediate. B) A single folding rate constant the two transition states. Data were taken from
is observed which shows a kink in the Ref. [57].
denaturant dependence of the chevron-plot
kUI kIN
U Ð I Ð N ð20Þ
kIU kNI
394 12.1 Kinetic Mechanisms in Protein Folding
12.1.3
A Case Study : the Mechanism of Lysozyme Folding
In the following, we will use lysozyme as a case study to discuss how sequential
mixing experiments in combination with the denaturant dependence of the appar-
ent rate constants allow a quantitative analysis of complex folding kinetics and the
determination of all microscopic rate constants.
12.1.3.1
Lysozyme Folding at pH 5.2 and Low Salt Concentrations
1.0
B
N
Fraction of N or I N
0.8
0.6
0.4
0.2 IN
0.0
0 500 1000 1500
Refolding time (ms)
C
0.6
Fraction of N or IN
0.4
0.2
0.0
05 0 100 150 200
Refolding time (ms)
Fig. 12.1.8. Double jump (A) and interrupted of native lysozyme, N (filled circles) and of
refolding experiments (B, C) for lysozyme the kinetic intermediate, IN (open circles)
folding at pH 5.2. Double jump experiments measured in interrupted refolding experiments.
show that amplitude of the slow folding C) The early time region of the kinetics shows
reaction (A1 , open circles) is formed with the that the faster process (t ¼ 30 ms) produces
same time constant as that at which unfolding both native molecules and the partially folded
of native lysozyme occurs. For comparison the intermediate IN . The lag phase for formation of
unfolding kinetics directly monitored by the N, expected for a linear on-pathway model
change in intrinsic Trp fluorescence is shown shown in Scheme 12.1.1B (- - -), is not
(solid line). B) Time course of the population observed. Data were taken from Ref. [52].
396 12.1 Kinetic Mechanisms in Protein Folding
bonding network and was thus proposed to produce an intermediate with native-
like hydrogen bonding network [70]. However, this reaction was not observed
with any other spectral probe or in SAXS experiments (see Figure 12.1.5), which
obscured its interpretation.
The observation of two apparent rate constants (treating the burst-phase collapse
as a pre-equilibrium uncoupled from the slower reactions) cannot rule out any of
the mechanisms shown in Scheme 12.1.1. However, comparing mechanism A
with mechanisms B and C offers a simple way to distinguish obligatory from
nonobligatory folding intermediates. In the case of an obligatory intermediate, all
molecules have to fold through this state, resulting in a lag phase in the formation
of native molecules [5]. Direct spectroscopic measurements or measurements of
changes in R G are not able to monitor formation of native lysozyme directly, since
changes in spectroscopic and geometric properties occur in all folding steps (see
above). Further, the use of inhibitor binding to detect formation of active lysozyme
did not give any clear-cut results since binding is too slow under the applied exper-
imental conditions [22, 66]. Thus, interrupted refolding experiments were used to
monitor the time course of native protein and of IN . Two unfolding reactions were
observed in interrupted refolding experiments [22, 52]. One of them corresponds
to the well-characterized unfolding kinetics of native lysozyme, and a second
much faster one corresponds to unfolding of the intermediate (see Figure 12.1.6).
The faster reaction is only observed at short and intermediate refolding times but
is not observed when native lysozyme is unfolded. The collapsed state unfolds too
fast under all conditions to be measured by stopped-flow unfolding. The results
show that the intermediate is not obligatory for lysozyme folding, since no lag
phase in forming native lysozyme is observed [22] (Figure 12.1.8B,C). Rather, for-
mation of IN and formation of 20% of the native molecules occur with the faster
kinetic phase (t ¼ 30 ms). The slower process (t ¼ 400 ms) reflects the intercon-
version of the intermediate to the native state, resulting in the remaining 80% of
the molecules folding to the native state [22, 52]. As discussed above, identical ap-
parent rate constants for forming the intermediate and native molecules on a fast
pathway are expected, since any three-state model gives rise to only two observable
rate constants.
The lack of an initial lag phase during formation of native lysozyme ruled out a
linear on-pathway model, but it could not distinguish between the triangular
model and the off-pathway model [22]. Performing a least-squares fit of the dena-
turant dependence of the two apparent rate constants to the analytical solutions of
both models (see Protocols, Section 12.1.6) revealed that only the triangular model
was able to describe the data [52] (Figure 12.1.9). In the case of the off-pathway
model, unfolding of the intermediate should become the rate-limiting step for fold-
ing at very low denaturant concentrations. This would predict an increase in the
folding rate with increasing denaturant concentration, which was, however, not ob-
served (Figure 12.1.9). Fitting the data to the circular three-state model allowed the
determination of all six microscopic rate constants (Figure 12.1.10). For this analy-
sis it was crucial to know the unfolding rate constant of the intermediate at high
denaturant concentrations (l2 in Figures 12.1.3A and 12.1.9). Since unfolding of
12.1.3 A Case Study : the Mechanism of Lysozyme Folding 397
4 l2
2
0
lnl (s–1)
-2
-4
-6 l1
-8
-10
0 1 2 3 4 5 6 7
[GdmCl] (M)
Fig. 12.1.9. GdmCl dependence of the that the off-pathway model is not able to
apparent rate constants of lysozyme folding at describe the data. To increase the stability of
pH 5.2. l2 (open and filled triangles) and l1 the inter-mediate and thus allow a better
(open and filled circles) are fit to the analytical distinction between the on- and the off-
solutions of the triangular model (Scheme pathway model the experiments were carried
12.1.1A, —) and to the linear off-pathway out in the presence of 0.5 M Na2 SO4 . Data
model (Scheme 12.1.1C, - - - -). The fits show were taken from Ref. [52].
R G = 16.7Å
I
2.
3
s -1
9. s-
s -1
10 7 1
25
5
s -1
3.
U C N
R G = 23.5Å R G = 19.6Å R G = 15.3Å
Fig. 12.1.10. Folding mechanism of lysozyme l1 and l2 (Figure 12.1.3A) and the results from
at pH 5.2 and low salt concentrations obtained interrupted refolding experiments (Figure
by analyzing a large number of experimental 12.1.8B,C). Adapted from Refs [3], [43] and
data using various probes to detect refolding [52].
(see Figure 12.1.5), the GdmCl dependence of
398 12.1 Kinetic Mechanisms in Protein Folding
12.1.3.2
Lysozyme Folding at pH 9.2 or at High Salt Concentrations
The procedure described above to elucidate the three-state mechanism for lyso-
zyme folding at pH 5.2 can also be applied to elucidate the mechanism of more
12.1.3 A Case Study : the Mechanism of Lysozyme Folding 399
103
A
102 λ3
101
λ2
λ (s –1 )
100
10-1
10-2 λ1
10-3
10-4
0 1 2 3 4 5 6 7 8
[GdmCl] (M)
1.0 B
Fraction of I1, I2 , or N
0.8 I2
N
0.6 I1
0.4
0.2
0.0
10-2 10-1 100 101 102
Time (s)
Fig. 12.1.11. Folding of lysozyme at pH 9.2. A) during refolding at 0.6 M GdmCl. A global fit
Effect of GdmCl on the three apparent rate to the mechanism shown in Scheme 12.1.3 can
constants l1 ; l2 , and l3 measured by a describe the data (solid lines in all panels) and
combination of single and sequential mixing allowed all the microscopic rate constants to
experiments [73]. B) Time course of the two be determined (see Scheme 12.1.3).
kinetic intermediates and the native state
complex folding reactions with more than one intermediate. For lysozyme folding,
a third kinetic reaction is observed at high salt concentrations [75] and at high pH
(>8.5) [73], indicating the stabilization of an additional intermediate under these
conditions. Thus, a four-state model with an additional rapid pre-equilibrium is re-
quired to describe the folding kinetics under these conditions. Again interrupted
refolding experiments were able to monitor the time course of the population of
all intermediates and of native molecules (Figure 12.1.11B). Together with the de-
naturant dependence of all three rate constants obtained from single and sequen-
tial mixing stopped-flow refolding and unfolding experiments (Figure 12.1.11A)
this allowed the identification of the minimal kinetic mechanism for lysozyme
folding under these conditions. The results showed that the additional intermedi-
ate induced at high salt concentrations or at high pH is located on a third parallel
folding pathway. The four-state mechanisms shown in Schemes 12.1.2 and 12.1.3
400 12.1 Kinetic Mechanisms in Protein Folding
I1
1.35 s -1
0.
s -1
28
<1
7
s-
s -1
9.
1
–6
3
s-
4.
1
>2000 s -1 <10–6 s -1
U C 1.21 s -1
N
1.77 s -1
s -1
0.
87
s -1
24
s-
1
6.
0.
9
0 –6
s-
1
<1
I2
Scheme 12.1.2. Mechanism and rate constants for lysozyme
folding at pH 5.2 in the presence of 0.85 M NaCl. Adapted
from Ref. [75].
I1
s -1
0.
84
6.87 ±0.17 s -1
2
±0
s -1
.
±1
<1
.0
3
0
.3
–5
6
0.
46
s-
1
±
s-
1
0
4.
>2000 s -1 <10–5 s -1
U C 1.78 ±0.13 s -1
1.70 ±0.06 s -1 N
0.
s -1
1±
0.
13
s -1
03
.0
.4
<0
±0
0 –5
s-
1
.6
<1
6
s-
1
I2
Scheme 12.1.3. Mechanism and rate constants for lysozyme
folding at pH 9.2. Adapted from Ref. [73]
show the minimal models describing lysozyme folding at pH 9.2 and high salt con-
centrations, respectively. In addition the microscopic rate constants obtained from
global fits of all data are shown.
This case study demonstrates the need for methods capable of detecting the time
course of individual kinetic species during the folding process. Direct spectroscopic
measurements are usually not able to provide this information. In addition, the
folding studies on lysozyme show that for complex reactions it is usually impossi-
ble to assign experimentally observed rate constants directly to individual steps on
folding pathways, since they are complex functions of several microscopic rate con-
stants. In order to determine all microscopic rate constants for a mechanism it
12.1.5 Conclusions and Outlook 401
12.1.4
Non-exponential Kinetics
Non-exponential kinetics were first reported by Kohlrausch [76] for the relaxation
of glass fibers after stretching. He described a broad distribution of relaxation
times covering time scales of several orders of magnitude. Such non-exponential
kinetics can be treated with a stretched exponential term of the kind
b
A ¼ A 0 eðktÞ ð21Þ
The stretch factor (b) indicates a time-dependent change in the rate constant (k),
with b ¼ 1 corresponding to the special case of single exponential kinetics. The ki-
netics become increasingly stretched with decreasing b. Stretched exponential be-
havior, which has also been referred to as ‘‘strange’’ kinetics [77], has recently been
observed both in theoretical [78, 79] and in experimental work on biological sys-
tems [80]. The best-studied experimental model is the structural relaxation of my-
oglobin at low temperature [80] or at high solvent viscosity [81]. A prerequisite for
strange kinetics is a rough energy landscape with significant barriers between a
large number of local minima [78]. Thus, it was argued that protein folding reac-
tions might exhibit stretched behavior, under conditions where local minima on
the folding landscape become stabilized (e.g., at low temperature). Up to date there
are only a few experimental reports on strange kinetics in protein folding, which
describe the refolding of ubiquitin [82], phosphoglycerate kinase [82], and a WW
domain at low temperature [83] measured in laser temperature-jump experiments
on the microsecond time scale. The difficulty of unambiguously assigning the ob-
served kinetics to stretched behavior may be seen from the fact that the kinetics
only cover about 2.5 orders of magnitudes of life-times and can also be described
by the sum of two or three exponentials. Nevertheless, these first indications of
non-exponential behavior in protein folding on the very fast time scale provide an
interesting new aspect to the understanding of energy landscapes for protein fold-
ing (see Chapter 14 for a more detailed discussion of non-exponential kinetics in
protein folding).
12.1.5
Conclusions and Outlook
12.1.6
Protocols – Analytical Solutions of Three-state Protein Folding Models
12.1.6.1
Triangular Mechanism
I
*
*
k IN
I
kU
k NI
U
)
kI
)
kUN
U ) * NH ð22Þ
kNU
with
The apparent rate constants, l1 and l2 are the solution of Eq. (23) and are given by:
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
B G B 2 4C
l1; 2 ¼ ð25Þ
2
where UðtÞ; I(t), and N(t) represent the concentrations of the respective species at a
given time t. The pre-exponential factors give the amplitude of the observable rate
constant (see Eq. (2)) for monitoring the time course of the respective species and
the constant terms represent the equilibrium concentrations of each species. The
principle of microscopic reversibility provides an additional constraint for fitting
the experimental data to the triangular mechanism
The time-dependent behavior of the different species for starting from the native
state (N0 ¼ 1; U0 ¼ 0; I0 ¼ 0) or from the intermediate (I0 ¼ 1; U0 ¼ 0; N0 ¼ 0)
are obtained by substituting the respective microscopic rate constant in Eq. (26).
12.1.6.2
On-pathway Intermediate
kUI kIN
UÐIÐN ð28Þ
kIU kNI
f ðlÞ ¼ l 2 l ðkUI þ kIU þ kIN þ kNI Þ þ ðkUI kIN þ kUI kNI þ kIU kNI Þ ¼ 0 ð29Þ
The apparent rate constants, l1 and l2 are the solution of Eq. (29) (see Section
12.1.6.1). The time-dependent behavior of the different kinetic species for refolding
starting from the unfolded state (U0 ¼ 1; I0 ¼ 0; N0 ¼ 0) is given by
kUI ðl1 kIN kNI Þ l1 t kUI ðkNI þ kIN l2 Þ l2 t kNI kIU
UðtÞ ¼ U0 e þ e þ
l1 ðl1 l2 Þ l2 ðl1 l2 Þ l1 l2
kUI ðkNI l1 Þ l1 t kUI ðl2 kNI Þ l2 t kUI kNI
IðtÞ ¼ U0 e þ e þ ð30Þ
l1 ðl1 l2 Þ l2 ðl1 l2 Þ l1 l2
kUI kIN kUI kIN l2 t kUI kIN
NðtÞ ¼ U0 el1 t þ e þ
l1 ðl1 l2 Þ l2 ðl1 l2 Þ l1 l2
The solutions for starting from the native state (N0 ¼ 1; U0 ¼ 0; I0 ¼ 0) are the
same as for starting from the unfolded state with replacing the respective micro-
scopic rate constants in Eq. (30).
12.1.6.3
Off-pathway Mechanism
kIU kUN
IÐUÐN ð32Þ
kUI kNU
f ðlÞ ¼ l 2 lðkUI þ kIU þ kUN þ kNU Þ þ ðkIU kUN þ kIU kNU þ kUI kNU Þ ¼ 0 ð33Þ
The apparent rate constants, l1 and l2 are the solution of Eq. (33) (see Section
12.1.6.1). The time-dependent behavior of the different kinetic species for refolding
starting from the unfolded state (U0 ¼ 1; I0 ¼ 0; N0 ¼ 0) is given by
ðkIU l1 ÞðkNU l1 Þ l1 t ðl2 kNU ÞðkIU l2 Þ l2 t kIU kNU
UðtÞ ¼ U0 e þ e þ
l1 ðl1 l2 Þ l2 ðl1 l2 Þ l1 l2
kUI ðkNU l1 Þ l1 t kUI ðl2 kNU Þ l2 t kNU kUI
IðtÞ ¼ U0 e þ e þ ð34Þ
l1 ðl1 l2 Þ l2 ðl1 l2 Þ l1 l2
kUN ðkIU l1 Þ l1 t kUN ðl2 kIU Þ l2 t kIU kUN
NðtÞ ¼ U0 e þ e þ
l1 ðl1 l2 Þ l2 ðl1 l2 Þ l1 l2
The solutions for starting from the native state (N0 ¼ 1; U0 ¼ 0; I0 ¼ 0) or from
the intermediate (I0 ¼ 1; U0 ¼ 0; N0 ¼ 0) are obtained in the same way as dis-
cussed in Section 12.1.6.2.
12.1.6.4
Folding Through an On-pathway High-Energy Intermediate
kUI kIN
U Ð I Ð N ð35Þ
kIU kNI
with Eq. (35) allows the rate constants k f and k u to be expressed for apparent two-
state folding as a function of the microscopic rate constants k ij for the three-state
model for the different regimes shown in Figure 12.1.7A. At low denaturant
concentrations (TS1 limit) formation of the high-energy intermediate (kUI ) is rate-
limiting for folding (kUI f kIN ) and thus we can approximate k f in the TS1 limit
(k f (TS1)) as
The unfolding reaction in the TS1 limit requires going through I . Since I is
higher in free energy than N (kNI f kIN ) and TS1 represents the highest barrier,
the crossing of transition state 2 can be treated as a pre-equilibrium. Thus, k u in
the TS1 limit (k u (TS1)) is given by
1
kNI kIN
ku ðTS1Þ ¼ kI U ¼ kNI ð37Þ
kIN kIU
For folding and unfolding at high denaturant concentrations TS2 represents the
highest barrier (TS2 limit). Thus, in analogy to Eqs (36) and (37), k f and k u can
be expressed in the TS2 limit as:
kUI kIN
k f ðTS2Þ ¼ kIN ¼ kUI ð38Þ
kIU kIU
These considerations show that the presence of a kink in the chevron plot allows
the determination of the ratio kIU /kIN , which is equivalent to the equilibrium con-
stant between the two transition states (KTS1=TS2 ) and can be converted into the dif-
ference in free energy between the two transition states (DGTS1=TS2 )
z
ðTS2ÞDGz ðTS1ÞÞ DGTS2=TS1 =RT
kIN =kIU ¼ KTS1=TS2 ¼ eð1=RTÞðDG ¼e ð40Þ
with
406 12.1 Kinetic Mechanisms in Protein Folding
therefore allows the determination of kUI and kNI and their denaturant depend-
encies mUI and mNI , respectively. It further yields the parameters kIN /kIU and
mIN mIU . Since only the ratios kIN /kIU and mIN mIU are defined for folding
through a high-energy intermediate, these ratios have to be used for data fitting
[57]. Folding through a high-energy intermediate shows apparent two-state behav-
ior and thus only the smaller one of the two apparent rate constants (l1 ) should
have the complete folding and unfolding amplitude (A1 ¼ 1) whereas A2 ¼ 0 (see
Ref. [57]).
Acknowledgments
We thank Manuela Schätzle and Beat Fierz for comments on the manuscript and
all members of the Kiefhaber lab for help and discussion.
References
12.2
Characterization of Protein Folding Barriers
with Rate Equilibrium Free Energy
Relationships
Thomas Kiefhaber, Ignacio E. Sánchez, and Annett Bachmann
12.2.1
Introduction
12.2.2
Rate Equilibrium Free Energy Relationships
For many reactions in organic chemistry the changes in activation free energy
ðDGz Þ induced by changes in solvent conditions or by modifications in structure
are linearly related to the corresponding change in equilibrium free energy ðDG Þ
between reactant and product [3]. A proportionality constant, a x , was defined by
Leffler to quantify the energetic sensitivity of the transition state relative to the
ground states with respect to a perturbation, qx [3]:
qDGz =qx
ax ¼ ð1Þ
qDG =qx
where a x is a measure for the position of the transition state along the reaction co-
‡
ku
kf
Free energy
‡
∆G0f
0‡ U
∆G u
∆G 0
N
∂∆G /∂x
0‡
f ∂∆G 0‡
u /∂x
∂∆G 0 /∂x
Reaction coordinate
Fig. 12.2.1. Free energy profile for a two-state folding reaction.
The sensitivity of the reaction to changes in a parameter qx as
a test for the respective reaction coordinate according to the
Leffler postulate (see Eq. (1)) are indicated.
Free energy
Free energy
high stability
high stability
U TS region N U TS N
Reaction coordinate Reaction coordinate
low stability
C
Free energy
high stability
U TS1 I TS2 N
Reaction coordinate
Fig. 12.2.2. Schematic representation of the (B). An apparent movement of the position of
response of different types of free energy the transition state can also be due to a switch
barriers to the same perturbation. The position between consecutive transition states on a
of the transition state along the reaction linear pathway (C). The position of the highest
coordinate is more sensitive to the perturba- point along the barrier region is indicated by
tion if the free energy shows a broader maxi- an arrow.
mum (A) than if the maximum is narrow
the transition state along a broad barrier region (Hammond behavior; Figure
12.2.2A,B) [4, 5]; (2) a change in the rate-limiting step on a sequential pathway
(Figure 12.2.2C) [1]; (3) a change in the mechanism of the reaction due to a switch
between parallel pathways [1]; or (4) structural changes in the ground state(s)
(ground state effects) [6]. If the rate-limiting step (Figure 12.2C) or the mechanism
of a reaction changes, a discrete jump in the position of the transition state along
the reaction coordinate will be observed. A gradual transition state movement (Fig-
ure 12.2.2A) will result in a rather smooth structural shift of the position of transi-
tion state upon perturbation. Detailed analysis of nonlinear REFERs have been
widely used to characterize the shape of free energy barriers and to elucidate reac-
tion mechanisms in organic chemistry and of complex biochemical reactions like
the catalytic mechanisms of enzymes [2]. In the following, we will discuss how
these concepts can be applied to characterize the barriers for protein folding reac-
tions and results from linear and nonlinear REFERs will be presented.
414 12.2 Characterization of Protein Folding Barriers with Rate Equilibrium Free Energy Relationships
12.2.2.1
Linear Rate Equilibrium Free Energy Relationships in Protein Folding
REFERs for protein folding reactions can be derived using various perturbations.
Let us consider the Gibbs fundamental equation
X
dDG ¼ DV dp DS dT þ Dmi dn i ð2Þ
P
where DG ; DV ; DS , and Dmi are the differences in Gibbs free energy, volume,
entropy, and chemical potential, respectively, between the unfolded and the native
states. In protein folding the most common perturbation of chemical potential is
the addition of chemical denaturants (D) like urea and guanidinium chloride
(GdmCl), which were shown to have a linear effect on DG [7, 8] (see Chapters
3 and 12.1). Including the empirically observed effect of a chemical denaturant,
Eq. (2) can be written as
Assuming a free energy barrier between the unfolded and the native protein, the
Gibbs equation can be applied to the activation free energies of the folding ðDGz
f Þ
and unfolding ðDGz
u Þ reaction:
dDGz z z
f ; u ¼ DVf ; u dp DSf ; u dT þ m f ; u ½D ð4Þ
since DGz z
f and DGu were also shown to be linearly dependent on [D] [9] (see also
Chapter 12.1). Combining Eqs (1)–(3) shows that different properties of the transi-
tion state and thus different reaction coordinates can be probed by applying
REFERs to protein folding. a p and aT -values can be obtained by changes in pres-
sure at constant temperature or changes in temperature at constant pressure, re-
spectively [10, 11].
qDGz
f =qp DVfz
ap ¼ ¼ ð5Þ
qDG =qp DV
qDGz
f =qT DSz
f
aT ¼ ¼ ð6Þ
qDG =qT DS
This gives information on the volume and on the entropy of the transition state,
respectively. From Eq. (6) we can further calculate the change in enthalpy for for-
mation of the transition state using the Gibbs-Helmholtz equation. The value of aT
changes with temperature due to the change in molar heat capacity ðDCp Þ typically
associated with protein folding reactions [12]. This allows the definition of another
reaction coordinate [13]
12.2.2 Rate Equilibrium Free Energy Relationships 415
z
DCpðf Þ
aC ¼ ð7Þ
DCp
qDG
meq ¼ ð8Þ
q½Denaturant
qDGz
f;u
mf ; u ¼ ð9Þ
q½Denaturant
This allows us to use the m f - and meq -values to calculate a denaturant-induced RE-
FER to obtain aD [9] according to
qDGz
f =q½Denaturant mf
aD ¼ ¼ ð10Þ
qDG =q½Denaturant meq
-1.104
-2.104
0 1 2 3 4 5 6 7 8
GdmCl (M)
2
B
0
λ
(s–1)
-2
ln l
-4
ku
-6 kf
-8
0 1 2 3 4 5 6 7 8
GdmCl (M)
2
0
-2
ln k f
-4
-6
-8 C
-6 -4 -2 0 2 4 6
ln K eq
Fig. 12.2.3. Relationship between equilibrium GdmCl concentration on the equilibrium
ðDG Þ and activation free energies ðDGz Þ for constant ðK eq Þ and folding rate constant ðk f Þ
tendamistat folding at pH 2. A) Change in calculated from the data shown in panels A
DG with GdmCl concentration calculated and B. The slope corresponds to the aD -value
using the equilibrium constant determined and a linear fit (solid line) gives a value of
from the data shown in Chapter 12.1, Fig. 0.67. The folding rate constant ðk f Þ under
12.1.2A. B) Effect of GdmCl on the apparent unfolding conditions was calculated from the
folding rate constant ðl ¼ k f þ k u Þ. The unfolding rate constant ðk u Þ using k f ¼ K eq k u .
V-shaped plot is commonly termed ‘‘chevron Only data from the linear regions of the
plot’’ and allows calculation of k f and k u as chevron plot ðjln K eq j > 2Þ were used since
indicated (cf. Eqs (8) and (9) and Chapter near the minimum of the chevron plot both k f
12.1. C) Leffler plot comparing the effect of and k u significantly contribute to l.
12.2.2 Rate Equilibrium Free Energy Relationships 417
tor for folding. This will not influence the slope of the Leffler plot but it poses
some uncertainty on the absolute value of DGz f .
Other solvent additives such as alcohols (2,2,2-trifluoroethanol), polyols (glycerol,
sugars), salts (Na2 SO4 , NaCl), and D2 O as solvent can also lead to a change in pro-
tein stability and can therefore be used to define the corresponding a-values [10,
11] (see Table 12.2.1). Specific ligand binding is another source of changes in pro-
tein stability and thus allows characterization of the transition state [9]. Many pro-
teins specifically bind ions, substrates, or cofactors, and all bind hydrogen ions at
ionizable side chains, which can be used to compare the effect on kinetics and
thermodynamics. The changes in free energy upon binding are commonly propor-
tional to the logarithm of the ligand concentration and can be used to determine
an aL -value [18–23], which represents the ligand-binding ability of the transition
state.
Equations (5)–(7) and (10) can be considered as medium- or solvent-induced RE-
FERs. A detailed structural information on interactions formed in the transition
state can be obtained by analyzing structure-induced REFERs [16, 24, 25], which
compare the effect of amino acid replacements introduced by site-directed muta-
genesis on DGz f and DG .
qDGz
f =qStructure
aS ¼ ff ¼ ð11Þ
qDG =qStructure
ln k f 2
-1
3 4 5 6 7 8 9 10
ln K eq
Fig. 12.2.4. Structure-induced Leffler plot for indicating negligible effect of the mutations on
multiple mutations at position 24 of the fyn transition state structure even for highly
SH3 domain. The line corresponds to a linear destabilizing mutants. Data were taken from
fit of the data and gives a slope (ff -value) of Ref. [28] and the plot was adapted from Ref.
0:33 G 0:01. The relationship between k f and [26].
K eq is linear over the complete stability range
12.2.2.2
Properties of Protein Folding Transition States Derived from Linear REFERs
The most popular REFER in protein folding studies is the study of the effect of
denaturants on rate and equilibrium constants (see Figure 12.2.3). Experimentally
20
u (kJ mol )
15
–1
10
∆∆G 0‡
0 10 20
∆∆G0S (kJ mol–1)
Fig. 12.2.5. Structure-induced Leffler plot for all residues in
CI2 to determine the average ff -value ðhff i. The line
represents a linear fit of the data. Since DGz
u is plotted vs.
DGS the slope corresponds to 1 hff i. The plot was adapted
from Ref. [26] and data were taken from Ref. [30].
12.2.2 Rate Equilibrium Free Energy Relationships 419
determined aD -values are usually native-like with values between 0.6 and 1, indicat-
ing that protein folding transition states have a rather native-like ASA. Tempera-
ture and pressure have less frequently been used to determine REFERs in protein
folding and are more difficult to interpret since both have large and compensating
effects on the solvent, on the protein and on solvent–protein interactions. The
most straightforward parameter to interpret is the a C -values derived from heat ca-
pacity changes (Eq. (7)), which reflects the interactions of the solvent (water) with
the protein. All reported a C -values are also native-like but generally slightly lower
than aD -values. Analysis of the effect of temperature further revealed both entropic
and enthalpic contributions to protein folding barriers at room temperature. How-
z
ever, due to the large changes in DCpðf Þ these values are strongly temperature de-
pendent. Even fewer data are available for the effect of pressure on protein folding
transition states. High-pressure stopped-flow measurements on tendamistat re-
vealed a rather native-like volume of the transition state with an a p -value of 0.6,
which increases with increasing denaturant concentration (see Section 12.2.3.1).
Structural information on the properties of transition states from ff ð¼ aS Þ-value
analysis has been obtained for several proteins (see Chapter 13). A major problem
in these studies is the small data set at a single position with usually just a single
mutation and the wild-type protein determining a REFER. It was recently shown
that ff -values from such a two-point analysis are highly inaccurate if the stability
change of a mutation ðDDGS Þ is smaller than 6–7 kJ mol1 [26]. This uncertainty
is in part due to intrinsic statistical errors, which should, however, give reliable ff -
values for jDDGS j > 3 kJ mol1 for high-quality data sets [26]. The additional un-
certainty in ff -values probably arises from changes in the structure of the unfolded
state, which is observed in many mutants [11, 26] (see Section 12.2.4.1). This not
only changes the structure of the ground state but very likely also influences the
pre-exponential factor for folding by changing the dynamics of the unfolded state
[32, 33]. The contributions from these effects will be more pronounced for muta-
tions with small jDDGS j [26].
The results from reliable ff -values give a picture of transition states as distorted
native states for the major part of a protein (diffuse transition states) or for large
substructures (polarized transitions states). For diffuse transition states the ff -
values are around 0.1–0.5 throughout the structured regions of the protein and av-
erage ff -values are between 0.2 and 0.4, indicating partial formation of the native
set of interactions [26]. For these proteins a Leffler plot of all mutations throughout
the protein usually gives a linear REFER supporting a structurally homogeneous
transition state (Figure 12.2.5). In proteins with polarized transition states ff -
values are significantly higher (up to 0.8) in a large substructure of a protein
whereas they are 0 or close to 0 in other parts. These results suggest that the
formation of the native topology for the whole protein or for a large substructure
of the protein is a major part of the rate-limiting step in folding of small single
domain proteins and might explain the correlation between the contact order
and protein folding rate constants [34, 35]. The results contradict a nucleation-
condensation model with a small structural nucleus formed by a few specific inter-
actions in the transition state [36].
420 12.2 Characterization of Protein Folding Barriers with Rate Equilibrium Free Energy Relationships
12.2.3
Nonlinear Rate Equilibrium Free Energy Relationships in Protein Folding
As pointed out above, the characterization of nonlinear REFERs can give informa-
tion on various properties of the transition barriers. In the following we will first
describe methods to detect and to analyze nonlinearities in REFERs and will then
discuss the possible origins of nonlinear REFERs in more detail. Finally, results
from nonlinear REFERs will be discussed in terms of general properties of protein
folding barriers.
12.2.3.1
Self-Interaction and Cross-Interaction Parameters
qa x q 2 DGz
px ¼ ¼ ð12Þ
qDGx ðqDGx Þ 2
By definition, a shift in the position of the transition state towards the destabilized
state (e.g., as a result of Hammond behavior or due to sequential barriers) will give
a positive p x -value.
For a folding reaction perturbed by addition of denaturant or destabilized by mu-
tations, the corresponding self-interaction parameters p D and pS are [10, 11]:
qaD q 2 DGz
pD ¼ ¼ ð13Þ
qDGD ðqDGD Þ 2
qff q 2 DGz
pS ¼ ¼ ð14Þ
qDGS ðqDGS Þ 2
qa x q 2 DGz qay
p xy ¼ ¼ ¼ ¼ pyx ð15Þ
qDGy ðqDGx ÞðqDGy Þ qDGx
For nonlinear REFERs the value of a x changes with the amount of a second pertur-
bation qy, resulting in a nonzero p xy -value. A shift in the position of the transition
state towards the destabilized state will yield positive p xy -values.
For tendamistat, the denaturant dependence of the folding reaction was mea-
sured at different pressures [39]. This allowed the calculation of the pressure/
denaturant cross-interaction parameter ð p Dp Þ. Figure 12.2.6 shows equilibrium
transition curves and chevron plots at pressures between 20 and 1000 bar. Increas-
ing pressure destabilizes tendamistat leading to a shift of the GdmCl-induced un-
folding transition to lower GdmCl concentrations (Figure 12.2.6A) and indicating
that the native state has a larger volume than the unfolded state. Interestingly, the
volume of the transition state exceeds the volume of the native state at high dena-
turant concentrations as indicated by a decrease in the unfolding rate constant with
increasing pressure above 5 M GdmCl (Figure 12.2.6B). This scenario is not
treated in the original Leffler formalism, which assumes that the properties of
the transition state are between those of the reactants and the products. The
422 12.2 Characterization of Protein Folding Barriers with Rate Equilibrium Free Energy Relationships
1.0
A
0.8
Fraction of N
0.6
0.4
0.2
0.0
0 2 4 6 8
50bar
[GdmCl] (M)
200bar
400bar
1.5 600bar
1.0
B 800bar
1000bar
0.5
ln l (s–1)
0.0
-0.5
-1.0
-1.5
0 2 4 6 8
[GdmCl] (M)
Fig. 12.2.6.Effect of pressure on A) the pH 2. The lines represent global fits of the
GdmCl-induced unfolding transition and data with qm f =qp ¼ qDVfz =q½GdmCl ¼
B) the GdmCl dependence of the apparent 2:5 G 0:6 cm 3 mol1 M1 . Data and fits
rate constant for tendamistat folding at were taken from Ref. [39].
qaD q 2 DGz qa p
p Dp ¼ ¼ ¼ ¼ ppD ð16Þ
qDGp ðqDGp ÞðqDGD Þ qDGD
Figure 12.2.7 shows the effect of qDGp on the aD -value for the denaturant/
pressure-dependent folding data shown in Figure 12.2.5. The aD -value increases
significantly with decreasing stability indicating a positive p Dp -value (Figure
12.2.7A). Figure 12.2.7B shows that the effect of qDGp is only observed in the ki-
netic m-values while meq is unchanged due to compensating changes in m f and
m u . This indicates a true denaturant/pressure-induced transition state movement.
12.2.3 Nonlinear Rate Equilibrium Free Energy Relationships in Protein Folding 423
1.0 5
A B
3
0.6
2
0.4
1
-16 -14 -12 -16 -14 -12
0 0
DGp (kJ mol ) -1
∆Gp (kJ mol ) –1
1.4
C 60 D
1.2
DV 0 (cm3 mol–1)
40
1.0
ap
0.8 20
0.6
0
0.4
0.2 -20
-10 0 10 20 -10 0 10 20
0 0
∆G D (kJ mol–1) ∆G D (kJ mol–1)
Fig. 12.2.7. Analysis of the pressure- DVfz (diamonds), and DVuz (squares),
denaturant cross-interaction parameter ppD respectively. As expected for real Hammond
and p Dp for tendamistat folding at pH 2 behavior, the same effect is observed for
(see Figure 12.2.6). Panels A and B show the qaD =qDGp qa p =qDGD and for qa p =qDGD, i.e.,
effect of qDGp on aD (A) and on meq (open ppD ¼ p Dp ; V ¼ 0:025 G 0:006 mol/kJ. The
circles), m f (diamonds), and m u (squares), original data used to construct the cross-
respectively. Panels C and D show the effect interaction parameters are shown in Figure
of qDGD on a p (C) and on DV (circles), 12.2.6 and were taken from Ref. [39].
As postulated by Eq. (16) the same effect is observed when the effect of qDGD on
the a p -value is analyzed (Figure 12.2.7C,D). Here a change in a p is caused by com-
pensating changes in DVfz and DVuz with constant DV .
In mutational studies the changes in DGz and DG are usually determined as a
function of the denaturant concentration for each mutant. This directly allows the
calculation of the denaturant/structure cross-interaction parameter ð p DS Þ [10, 11].
424 12.2 Characterization of Protein Folding Barriers with Rate Equilibrium Free Energy Relationships
p DS tests whether changes in the stability of the native state caused by mutations
ðqDGS Þ have an effect on the relative solvent exposure of the transition state for
folding ðaD Þ.
If the effect of a change in structure by mutation is tested in the background of
the wild type ðqStructureÞ and of another variant ðqStructure 0 Þ, we can calculate a
structure/structure cross-interaction parameter.
qfSf
pSS 0 ¼ ð18Þ
qDGS 0
12.2.3.2
Hammond and Anti-Hammond Behavior
12.2.3.3
Sequential and Parallel Transition States
6 A
4
ln λ ( /s-1)
2
0
-2
-4 320
0 310
2 300
[Gdm 4 290 K)
Cl] (M 6 8 280 T(
)
4
B
3
TS1
TS2
2
ln λ(/s-1)
1
0
-1
-2
0 1 2 3 4 5 6 7
[GdmCl] (M)
0 C TS2
TS1
ln k f ( /s-1)
-5
-10
-15
-15 -10 -5 0 5
ln Keq
Fig. 12.2.9. Nonlinear rate equilibrium free adapted from Ref. [43]. C) Leffler plot of the
energy relationship for folding of the data shown in panel B. The kink in the chevron
tendamistat C45A/C73A variant caused by plots causes a downward kink in the Leffler
two consecutive transition states with a high- plot. The red and the blue lines represent
energy intermediate (cf. Figure 12.2.2C.). hypothetical Leffler plots for the early transition
A) Temperature dependence of the chevron state (TS1 with aD ¼ 0:45) and for the late
plots indicating a kink in the unfolding limb. transition state (TS2 with aD ¼ 0:88),
B) Chevron plot at 25 C. The black line respectively. The folding rate constants ðk f Þ
represents a fit to a linear three-state under unfolding conditions were calculated
mechanism with a high-energy intermediate from the unfolding rate constant ðk u Þ using
(see Chapter 12.1). The thin lines represent k f ¼ K eq k u . Data in the region jln K eq j < 2:5
hypothetical chevron plots for two-state folding were not used since here both k f and k u
limited either by the early transition state (TS1) significantly contribute to the apparent rate
of the late transition state (TS2). Plots were constant ðlÞ.
428 12.2 Characterization of Protein Folding Barriers with Rate Equilibrium Free Energy Relationships
energy intermediate (Figure 12.2.2B). Figure 12.2.9B shows the chevron plot at
25 C with the fit of the data to a three-state model with a high-energy inter-
mediate (see Chapter 12.1). This allows information to be gained on both transition
states. Figure 12.2.9C shows the same data displayed as a Leffler plot, also showing
a clear kink in the slope from 0.4 to 0.9 from going from low to high protein sta-
bility. Thus, a kink in the Leffler plot with a positive p x value provides a simple way
to show that an intermediate is on-pathway if the intermediate is higher in free
energy than the unfolded and the native state. As discussed in Chapter 12.1, an
on-pathway intermediate is more difficult to distinguish from an off-pathway inter-
mediate when it is more stable than the unfolded state and thus becomes popu-
lated to detectable amounts. Depending on the a-values of the consecutive transi-
tion states and on the rate constants it might be difficult to distinguish a kink from
a gradual shift in the position of the transition state caused by Hammond behavior.
If folding occurs through parallel pathways, the reaction with the most native-
like transition state will be most sensitive to a reduced stability of the native state
and an alternative parallel pathway with a more unfolded-like transition state may
become rate limiting with increasing denaturant concentrations. This will lead to a
decrease in a x with decreasing protein stability resulting in a negative p x -value.
Thus parallel pathways will lead to an upward kink in the chevron plot (Figures
12.2.10A,B) and consequently also in the Leffler plot (Figure 12.2.10C,D) [11].
This phenomenon is sometimes referred to as anti-Hammond behavior in the pro-
tein folding literature [30, 44] but should be clearly distinguished from genuine
anti-Hammond behavior discussed in Section 12.2.3.2, which is based on the prop-
erty of a single barrier (see Figure 12.2.8).
12.2.3.4
Ground State Effects
Changes in the length of the reaction coordinate due to structural changes in the
native or unfolded protein caused by the effects of a mutation or by a change in
solvent conditions (ground state effects) can easily be mistaken for genuine transi-
tion state movement [6, 10, 11]. Both phenomena have similar effects on the ex-
perimentally observable REFERs in Leffler plots but they are based on completely
different free energy barriers. Figure 12.2.11 illustrates the difference between
Hammond behavior and ground state effects, showing as an example the effect of
a change in protein stability caused by mutations ðqDGS Þ on the location of the
transition state in terms of its ASA measured in Chevron plots (aD -value). This cor-
responds to the determination of a structure/denaturant cross-interaction parameter
ð pSD Þ defined in Eq. (17). The reference conditions of the wild-type protein are
given in Figure 12.2.11D. In the case Figure 12.2.11B the mutation leads to a tran-
sition state movement along the reaction coordinate relative to both ground states.
The length of the reaction coordinate is unchanged (i.e., the structure of both
ground states is not affected by the mutation, or both states are affected to the
same extent). This scenario is in accordance with Hammond behavior if we as-
12.2.3 Nonlinear Rate Equilibrium Free Energy Relationships in Protein Folding 429
sume that the native state is destabilized by the mutation. Figure 12.2.11C and D
show apparent transition state movements caused by ground state effects. In both
cases the absolute position and the structure of the transition state remains un-
changed but the structure of either the unfolded state (Figure 12.2.11C) or of the
native state (Figure 12.2.11D) changes, which leads to a change in length of the
reaction coordinate. In this case an apparent transition state movement will be ob-
served if the position of the transition state is normalized against the length of the
reaction coordinate. This demonstrates that the characterization of transition state
movements requires the determination of the effects of a perturbation on the rate
constants for the forward and backward reaction, on the equilibrium constant and
on the length of the reaction coordinate. The effect on the length of the reaction
coordinate can easily be tested by determining the effect of qx on the respective
equilibrium properties. In the case discussed above, the length of the reaction co-
ordinate is reflected by meq.
The same considerations apply for other self- and cross-interaction parameters.
For pressure- and temperature-induced REFERs the length of the reaction coordi-
nate can be tested by determining the effect of DV , and DS =DCP , respectively.
Changes in these parameters with qx indicate ground state effects induced by the
change in conditions or by mutation. Structural changes in the unfolded state
should also be detectable by NMR spectroscopy. However, it would be very time
consuming to determine NMR structures of the unfolded state of each mutant
and would not directly allow a correlation with the observed ground state effects,
since they are defined by changes in free energy.
Figure 12.2.12 shows the results of a test for ground states effects and transition
state movement in mutants of the Sso7d SH3 domain (Figure 12.2.12 upper pan-
els) and for CI2 (Figure 12.2.12 lower panels). The effect of a change in DG in-
duced by mutations ðqDGS Þ on the aD -values and on the meq -, m f -, and m u -values
are shown. Both proteins show increasing aD -values with decreasing protein sta-
bility in accordance with Hammond behavior. However, the origins for the chang-
ing aD -values are different in the two proteins. In the SH3 domain the meq -value
changes significantly upon mutation, indicating structural changes it at least one
of the ground states. The value of m f changes by the same amount as meq while
m u is essentially independent of protein stability. This is seen more clearly in a
plot of m f and m u vs. meq , which shows that m f correlates with meq whereas m u
is virtually independent of meq (Figure 12.2.13A). This indicates that the structure
of the transition state does not move relative to the structure of the native state
(cf. Figure 12.2.11). The increase in meq - and m f -values point at the disruption of
interactions in the unfolded state upon mutation which leads to a less compact
unfolded state. These results show that the apparent shift in the position of the
transition state along the reaction coordinate seen for the SH3 domain is a ground
state effect on the unfolded state rather than Hammond behavior. Structural
changes in the unfolded state can be rationalized in the light of recent results on
residual structure of unfolded proteins (see Section 12.2.4.1). These interactions
may be disrupted by mutations and thus lead to changes in the meq -values [11].
430 12.2 Characterization of Protein Folding Barriers with Rate Equilibrium Free Energy Relationships
5 5
A B
0
0
ln (λ/s-1) -5
ln (λ/s-1)
TS1 TS2
-10
-5
TS2 -15
TS1
-10 -20
0 2 4 6 8 10 0 2 4 6 8 10
[Denaturant] (M) [Denaturant] (M)
5 5
C D
0 0
-5
-5
ln (kf/s-1)
TS2 -10
-15
-15
-20
-25 -20 TS1
TS1
-25
-30 -20 -10 0 10 -10 0 10 20 30
ln Keq ln Keq
Fig. 12.2.10. Simulated effect of denaturant m f 1 ¼ 4:95 kJ mol1 M1 , mu1 ¼ 4:95
concentration on the apparent rate constant kJ mol1 M1 ; m f 2 ¼ 8:92 kJ mol1 M1 ;
(filled circles and black line) for the folding of a mu2 ¼ 0:99 kJ mol1 M1 in both cases. This
protein through two parallel pathways with the corresponds to aD -values of 0.5 and 0.9 for
transition states TS1 and TS2. The individual TS1 and TS2, respectively. The plots show that
chevron plots corresponding to folding over significantly different aD -values for the two
TS1 and TS2 are shown. For two parallel parallel pathways are required to obtain a clear
pathways with the folding and unfolding rate nonlinearity. Panels C and D show Leffler plots
constants of kf 1 ; ku1 (for TS1) and kf 2 ; ku2 (for of the data displayed in panels A and B,
TS2) the single observable rate constant ðlÞ is respectively. Both Leffler plots show a clear
given by the sum of the two apparent rate upward kink indicating the shift between the
constants (l1 ¼ kf 1 þ ku1 and l2 ¼ kf 2 þ ku2 Þ two parallel pathways. The red and blue lines
for the parallel pathways: l ¼ l1 þ l2 . Data represent hypothetical Leffler plots for the two
were simulated with the rate constants for parallel pathways with aD -values of 0.5 (TS1)
folding and unfolding of A) kf 1 ¼ 0:0015 s1 , and 0.9 (TS2). Data in the region jln K eq j < 2:5
ku1 ¼ 3 107 s1 ; kf 2 ¼ 100 s1 ; ku2 ¼ 0:02 were not used since here both k f and k u
s1 and B) kf 1 ¼ 5 s1 , ku1 ¼ 3 107 s1 ; significantly contribute to the apparent rate
kf 2 ¼ 100 s1 ; ku2 ¼ 0:02 s1 . The mi -values constant ðlÞ.
ðmi ¼ RTðq ln k i =q½DenaturantÞÞ were
12.2.3 Nonlinear Rate Equilibrium Free Energy Relationships in Protein Folding 431
‡
ku
kf
Free energy
‡
∆G 0f
0‡ U
∆G u
∆G 0
N
mf mu
meq
(A)
U ‡ N
(B)
U ‡ N
(C)
U ‡ N
(D)
U ‡ N
Fig. 12.2.11. Schematic representation of the the unfolded (C) and native state (D). In all
possible effect of a perturbation on the three cases (B–D) the position of the
position of the ground states and on the transition state will change by the same
transition state along a reaction coordinate amount relative to the ground states (i.e.,
probed by the change in denaturant identical p DS ¼ qaD =qDGS values). However,
concentrations (q[Denaturant]). The reference for genuine Hammond behavior (B) meq will
conditions (A) are compared with real not be affected by mutation and m f and m u
Hammond behavior (B) and with apparent will show compensating changes. Changes in
transition state movements caused by ground the ASA of the ground states (C, D) will lead to
state effects due to changes in the structure of changes in meq .
0.8 10
Sso7d SH3 Sso7d SH3
8
0.6
4
0.5
2
0.4
-25 -20 -15 -28 -24 -20 -16 -12
DG 0S (kJ mol )
–1
DG 0 (kJ mol ) –1
S
1.0
10 CI2
0.6
6
0.4 4
CI2
0.2 2
-30 -20 -10 -30 -20 -10
DG 0 (kJ mol–1) DG 0 (kJ mol–1)
S S
Fig. 12.2.12. Effect of changes in DGS on aD as m f for CI2. This indicates that the effect of
(left panels) and on meq (open circles), m f mutations DGS on aD is caused by ground
(diamonds) and m u (squares; right panels) state effects on the unfolded state in Sso7d
for the Sso7d SH3 domain and for CI2. Linear SH3 and by major ground state effects in
fits of the data show that in both proteins addition to small Hammond behavior in CI2.
aD ; meq , and m f are sensitive to changes in Plots were adapted from Ref. [11], and
DGS , whereas m u is constant for Sso7d SH3 constructed based on data from Refs [29, 87].
and slightly changing in the opposite direction
12.2.4
Experimental Results on the Shape of Free Energy Barriers in Protein Folding
12.2.4.1
Broadness of Free Energy Barriers
From the observation of nonlinear chevron plots for apparent two-state folders [45,
46] and from changing aD -values in mutants of several proteins [45–48] it was
originally concluded that folding of many proteins shows Hammond behavior and
thus a single broad barrier region with a continuum of states with similar free en-
ergy was proposed (Figure 12.2.1A). As discussed above for tendamistat (Figure
12.2.9), the frequently observed curvature in chevron plots may also be caused by
a switch between a few consecutive narrow barriers on a sequential folding path-
12.2.4 Experimental Results on the Shape of Free Energy Barriers in Protein Folding 433
7 8
Sso7d SH3 7 CI2
mf,u (kJ mol–1 M–1)
way (Figure 12.2.2C) [43, 49–51]. Analysis of curved chevron plots reported for a
large number of different proteins showed that a sequential folding model with
consecutive transition states and at least one metastable high-energy intermediate
is in better accordance with experimental data than a gradual transition state move-
ment along a broad barrier [51]. This suggests that folding of many apparent two-
state folders proceeds along a few defined consecutive barriers and through high-
energy intermediates.
In ff -value analysis the folding and unfolding rate constants of mutants are typ-
ically obtained from chevron plots. This allowed a test for ground state effect and
Hammond behavior in a large number of variants in several proteins by investigat-
ing the effect of the mutation on aD , and the kinetic and equilibrium m-values as
shown in Figure 12.2.12 [10, 11]. The analysis revealed that in about half of the
proteins that have been extensively mutagenized the position of the transition state
changes with mutation in accordance with apparent Hammond behavior (in-
creased aD -value with decreased protein stability; cf. Figure 12.2.12). However, the
large majority of proteins with apparent transition state movement showed the
same behavior as the SH3 domain displayed in Figure 12.2.12, indicating that
the change in the position of the transition state is due to ground state effects on
the structure of the unfolded state and not caused by genuine Hammond behavior
[11]. In these proteins the structure of the transition state does not move relative to
the structure of the native state (constant m u -values), indicating that both the na-
tive state and the transition state are structurally well-defined and robust against
mutations. In contrast, both the meq - and the m f -values increase with decreasing
in protein stability, which argues for a disruption of interactions in the unfolded
state upon mutation leading to a less compact unfolded state.
Structural changes in the unfolded state can be rationalized in the light of the
increasing evidence for residual structure of unfolded proteins. Recently, several
434 12.2 Characterization of Protein Folding Barriers with Rate Equilibrium Free Energy Relationships
7.5
7.0 8.0
ln k f (s–1)
ln k f (s–1)
6.5 7.0
6.0
5.5 6.0
4.0
8.0
3.0
6.0
ln k f (s–1)
ln k f (s–1)
2.0
4.0
1.0
ctAcP 2.0 Im9
0.0
4.0 4.5 5.0 4.7 4.8 4.9 5.0
mf (kJ mol–1 M–1) mf (kJ mol–1 M–1)
Fig. 12.2.14. Effect of changes in m f on ln k f transition stats. The correlation coefficients are
for Sso7d SH3, ADA2h, ctAcP, and Im9 which 0.57, 0.59, 0.90, and 0.77, respectively.
exhibit ground state effects on the unfolded Plots were adapted from Ref. [11]. Original
state. The filled symbols indicate mutants that data used to calculate the correlations were
have significantly different m f -values and are taken from Refs [75, 87–89].
thus likely to have a different rate-limiting
high-resolution NMR studies have shown that both native-like and nonnative-like
interactions are present in unfolded states of several proteins [52–63] (see also
Chapter 21). These interactions may be disrupted by mutations and thus lead to
changes in the meq -values [32]. Only for a few proteins like CI2 weak Hammond
behavior is observed, which is usually accompanied by significantly stronger
ground state effects (Figures 12.2.12 and 12.2.13).
The presence of interactions in the unfolded state that are sensitive to mutation
raises the question of which way these interactions influence the rate of protein
folding. Figure 12.2.14 shows the correlation between the rate constant for folding
ðk f Þ and the folding m-value ðm f Þ for four proteins (Ss07d SH3, ADAh2, ctAcP,
and Im9) [11]. These proteins show significantly decreased folding rates with in-
creasing meq - and m f -values (significance of correlation larger than 90%). The
same effect was observed for CI2, protein G, and protein L [11]. None of the pro-
teins shows increased folding rates with increasing m f and meq values.
Obviously, the disruption of residual structure in the unfolded state on average
slows down folding, although the structure of the transition state is unchanged.
This has three important consequences. First, it shows that the majority of the
12.2.4 Experimental Results on the Shape of Free Energy Barriers in Protein Folding 435
residual interactions in the unfolded state of these proteins are also present in the
transition state. The presence of interactions in the unfolded state that are not
present in the transition state would result in an energetic cost for the folding re-
action, since the unfolded state would be stabilized relative to the transition state.
This would lead to increased folding rates when these interactions are destabilized,
which is not observed. The data do not allow, however, discrimination between na-
tive and nonnative interactions. Since f-value analysis showed that transition states
commonly contain a subset of the native interactions at reduced strength, our re-
sults indicate that the residual interactions in the unfolded state of these seven pro-
teins are predominantly native-like. The second consequence of the results is that
residual interactions in unfolded proteins can accelerate folding. It has been pro-
posed that the accumulation of native-like structure in the unfolded state should
stabilize the unfolded state relative to the transition state and therefore slow down
folding [64]. This reasoning assumes, however, that the free energy of the transi-
tion state is independent of the strength of a specific interaction in the unfolded
state, which can only be assumed for interactions that are not present in the tran-
sition state. Changes in stability of an interaction will affect the free energies of the
unfolded state and of the transition state, if the interaction is present in both states.
The destabilization and disruption of residual interactions obviously has on aver-
age a stronger effect on the free energy of the transition state compared with the
unfolded state. This indicates that the interactions are stronger in the transition
state than in the unfolded state and thus more sensitive to mutations. The third
conclusion from these results is that a subset of the interactions present in the
transition state for folding can be directly pointed out from structural studies on
the unfolded state.
As discussed in Section 12.2.3.2, transition state movements in protein folding
should be highly correlated since different reaction coordinates should be sensitive
to protein–protein and protein–solvent interactions in the transition state. Transi-
tion state movements should thus show up in different p x - and p xy -values. Analysis
of other self- and cross-interaction parameters in available folding data (Table
12.2.1) indicates the absence of genuine transition state movements for folding of
most of the proteins [11]. Out of 21 investigated proteins only CI2 showed a small
but coherent transition state movement. These results suggest that Hammond be-
havior is not common in protein folding reactions. Most protein folding transition
states seem to be robust, conformationally restricted maxima in the free energy
landscape and, similar to native states, they usually do not undergo extensive struc-
tural rearrangements upon mutation or changes in solvent conditions. These re-
sults argue for the generality of free energy landscapes with a limited number of
sequential but structurally well-defined transition states in apparent two-state fold-
ing. The clearest examples for significant transition states movement is the effect
of pressure and denaturant on the transition state for tendamistat folding (Figures
12.2.6 and 12.2.7). Due to the lack of other data on pressure-induced REFERs and
the scarcity of data on temperature-induced REFERs it is difficult to judge whether
transition state movements are frequently induced by perturbants other than dena-
turant or mutation.
436
Tab. 12.2.1. Proteins for which more than one self- or cross-interaction coefficient can be determined. A px or pxy -value is considered to be larger (> 0) or
smaller (< 0) than zero if it differs from zero in more than two standard deviations.
mutants mutants
12.2.4 Experimental Results on the Shape of Free Energy Barriers in Protein Folding 437
12.2.4.2
Parallel Pathways
Hammond behavior. However, a more detailed analysis of the effects revealed that
in both proteins the negative p x -values are compatible with parallel pathways [11, 68].
Up to date there is no clear example for anti-Hammond behavior for protein fold-
ing reactions supporting the picture of narrow and defined transition state regions.
12.2.5
Folding in the Absence of Enthalpy Barriers
12.2.6
Conclusions and Outlook
are most commonly caused by structural changes in the unfolded state. This sup-
ports the presence of residual structure in unfolded proteins. The analysis of the
denaturant-induced REFERs for a large number of proteins further supports se-
quential folding pathways with obligatory high-energy intermediates as a simple
and general explanation for curvatures in chevron plots. This suggests that appar-
ent two-state folding and folding through transiently populated intermediates
share similar free energy landscapes with consecutive barriers on sequential path-
ways. The major difference between two-state and multistate folding is the relative
stability of partially folded intermediates.
Up to date mainly mutational analysis in combination with denaturant depen-
dence of folding rate constants has been applied to characterize protein folding bar-
riers. It will be interesting to see whether protein folding transition states are also
robust along other reaction coordinates, which can be probed by changes in pres-
sure, temperature and by ligand binding or protonation of specific side chains. In
addition, more experimental results on backbone interactions in the transition
state and on the existence of parallel transition states will be required to get a
more complete picture of the structure of protein folding transition states and on
the properties of the free energy barriers.
Acknowledgments
We thank Andi Möglich for preparing Figure 12.2.8A, Manuela Schätzle and Beat
Fierz for comments on the manuscript and all members of the Kiefhaber lab for
help and discussion.
References
13
A Guide to Measuring
and Interpreting f-values
Nicholas R. Guydosh and Alan R. Fersht
13.1
Introduction
Arguably the most important state to characterize on a folding pathway, and in-
deed in any reaction, is the transition state. But, its ephemeral existence and com-
position of unstable interactions render conventional tools inadequate in determin-
ing an atomic level structure. But the structures of many transition states have
been solved using protein engineering techniques developed over the last two dec-
ades: f-value analysis [1, 2]. The folding pathways for numerous proteins have now
been experimentally characterized with f-values [1, 3–5]. These data have in turn
been used for benchmarking molecular dynamics simulations of protein folding
[6].
This chapter outlines the basic theory behind a f-value analysis and discusses
how structural information is derived from it. This includes a discussion on choos-
ing mutations for minimizing error in calculations. The techniques for measur-
ing the thermodynamic and kinetic parameters used to compute f-values are also
presented.
13.2
Basic Concept of f-Value Analysis
in a f-value analysis. Kinetic and thermodynamic data from these mutants are
then used to determine the extent of native-like interactions formed in a protein
folding transition state. For the wild-type protein and each mutant, two quantities
are required: the free energy stability of the native state with respect to the dena-
tured state (DG ND ) and rate of folding from the denatured state to the native state
(rate constant k f ).
The stability change for the folding reaction upon mutation is defined as
Here, DG ND typically ranges from 3 to 15 kcal mol1 for small, single-domain
proteins. The primed terms represent the quantities for the mutant.
Next, transition state theory can be applied to kinetic rate constants for folding to
obtain the energetic difference between denatured and transition states, in terms
of exponential prefactors. When taking the ratio of the rate constants for the mu-
tant and wild-type proteins (k 0 f and k f , respectively), the prefactors cancel, with the
assumption that they do not change upon mutation. This gives an expression for
DDG TSD as
k 0f
DDG TSD ¼ DG TSD DG TS 0 D 0 ¼ RT ln ð2Þ
kf
The notation is as before, but with TS symbolizing the transition state. The ratio of
these two values is then defined as f.
DDG TSD
f¼ ð3Þ
DDG ND
DDG TS-N
DG
DDG D-N
R R
R
R R R
D TS N D TS N
or introduce new interactions. Ideally, they should not rearrange the structure of
the protein around the site of mutation in either the denatured or native states,
though information can still be obtained if these latter two conditions are not met
(see below). Mutations found to have minimal effects on the pathway and structure
are small, hydrophobic deletions without change in stereochemistry (‘‘nondisrup-
tive deletions’’ [2]). By making mutations such as Ile to Val, Thr to Ser, Ala to Gly,
Leu to Ala, or Phe to Ala, cavities are created in the hydrophobic core, leaving the
folding pathway largely the same. Correlation between native state stability and
points of contact of the deleted atoms within a small, 6-Å radius supports the as-
sumption that the structural region probed is defined locally around the deleted
atoms [9].
More radical mutations, particularly those affecting electrostatic interactions, are
more difficult to interpret because new, unpredictable interactions are introduced,
which could be taking place more distant from the site of interest. In cases where
these sorts of mutations are not possible, the residue of interest can be first trun-
cated to Ala and then to Gly, to get some idea of the role of that particular position
to the transition state.
Mutation of Ala to Gly is a special case because the Gly amino acid is much
more conformationally flexible about its f and c bond angles. Such a mutation
not only probes the effect of the deleted steric bulk of the alanine methyl group,
but also backbone conformational flexibility at the site of mutation. Estimates for
448 13 A Guide to Measuring and Interpreting f-values
the conformational entropy contribution of Gly are @1 kcal mol1 , which can be a
significant percentage of a protein’s typical stability [10, 11]. By carefully making
Ala to Gly mutations, the role of backbone flexibility can be probed. Along with
other single methyl group deletions (Ile to Val and Thr to Ser), Ala to Gly muta-
tions can probe secondary structural interactions. The extent of formation of the
hydrophobic core can be probed by larger deletions from fully buried residues.
The stability of mutated proteins is an indicator of possible error in computation
of f-values. Mutants that have a stability change (DG ND ) of less than @0.6 kcal
mol1 run the risk of producing f-values with significant error since the ratio
used to compute f is of two small numbers. Similarly, mutations that substantially
destabilise the protein are of concern because larger energetic changes are more
likely to contain significant structural changes.
13.3
Further Interpretation of f
where the notation is as before. Here, the term DGD 0 D is simply the change in
denatured state relative free energy upon mutation. Since the denatured state is
unstructured for the most part, this term reflects the change in solvation and con-
formational entropy of the denatured state as a result of a mutation. This term is
often negligible for small hydrophobic deletions. However, in cases where polar or
D N D TS
DG N-D DG TS-D
DG N’-D’ DG TS’-D’
D’ N’ D’ TS’
Fig. 13.2. Thermodynamic cycles between the denatured state
(D) and transition state (TS), or native state (N) for a wild-type
(unprimed) and mutant (primed) protein.
13.3 Further Interpretation of f 449
charged groups are exchanged for hydrophobic groups, solvation energies are no
longer simply fractions of a kcal, as in the case of hydrophobic deletions [5]. Cases
where glycine is deleted or introduced also introduce conformational entropy
terms that are of similar magnitude [11].
There are two cases, as given earlier, of simple interpretation. When DG TS 0 TS ¼
DG N 0 N , f ¼ 1. Here, the native and transition states are equally perturbed.
Whether or not the denatured state energy, DGD 0 D , changes is not critical. Struc-
turally, the residues mutated have the same structural interactions in the transition
and native states.
In the second case, DG TS 0 TS ¼ DGD 0 D and f ¼ 0. In this situation, changes in
interactions in the native state are largely not important (so long as they do not
change the folding pathway). Since the energy DGD 0 D is often very small, the tran-
sition and denatured state mutated atoms have the same unstructured form. How-
ever, it could also indicate equal breakdown of residual structure in the denatured
and transition states. In such a case, however, the interpretation is the same: native
interactions do not form until a position past the transition state on the folding
pathway.
Fractional f-values are more difficult to interpret. In one simplifying case, some
speculation can be made. When the energy of solvation for a particular mutation
is zero (DGD 0 D ¼ 0), the expression for f simplifies to DG TS 0 TS /DG N 0 N . This is
the case when the particular mutation is buried within the protein and makes no
contact with solvent. This fractional f-value is interpreted as an indication of a par-
tially formed structure in the transition state. However, the energetics of the struc-
tural interactions is unlikely to be simply linear, so exact structural interpretation is
difficult. Some f-values are close to 0 or 1 in value, making interpretation easier.
Fractional values are more likely to suggest partial structure formation when they
appear in groups within a particular region of a protein structure, such as the hy-
drophobic buried core.
A large number of f-values that are fractional over a specific region of a protein
structure can help rule out the possibility of parallel folding pathways: one with
f-values around 1 and one with f-values around 0 [12]. This is most readily
accomplished by graphically plotting the f-value data. Data are plotted with DG TSD
on the vertical axis and DG ND on the horizontal axis (or D ln kDN against
D ln KDN ). Points that fall onto a linear plot likely fold with the mutated atoms
structured to a similar extent in the transition state. If more mutations are made
in the same element of structure, the expectation is that points continue to fall on
the same line. However, in the case of multiple pathways, additional mutations,
which stabilize the native state of the protein to a greater or lesser extent, may
favor one of the folding pathways and change the slope of the line to a new value.
Beyond multiple pathways, it is also important to rule out the possibility of the
addition of new interactions upon mutation. In such a situation, a double mutant
cycle can elucidate the problem. A known interaction between two residues is bro-
ken by mutating each residue in separate experiments and then both residues in a
double mutant experiment [13]. A rough estimate of the energy of interaction be-
450 13 A Guide to Measuring and Interpreting f-values
tween residues can be made, giving some indication of possible structural rear-
rangements upon mutation.
Occasionally, uncanonical f-values (greater than 1 or less than 0) are reported.
Such cases can be explained by an overpacked, and hence destabilized hydrophobic
core, which may be preferentially stabilized in comparison to the transition state
when a deletion mutant is made. Such problems are more common in engineered
sequences [14, 15]. Nonnative interactions in the transition state can also lead to
these sorts of unconventional f-values.
It should also be noted that f-values are not limited to characterization of the
transition state. It is also possible to determine f-values for intermediates along
the folding pathway, substituting DGID for DG TSD in Eq. (3), where I represents
the intermediate state. In the case of the barnase, f-value analysis of the inter-
mediate showed a structure consistent with the complete folding mechanism as-
sembled from NMR studies on the denatured state and f-values determined of
the transition state [16].
Thus far, discussion of f-values has focused on the forward, folding reaction. It
should be noted, however, that the reverse (unfolding) reaction also has an associ-
ated f-value, which can be understood with a similar analysis. For a simple two-
state reaction, this is simply 1 f for folding. Often, this unfolding f-value is sim-
pler to compute, because intermediate states are most likely to occur between the
denatured state and transition state, which affect the rate of folding and not unfold-
ing. The unfolding pathway from the folded state to the transition state is more
typically a first-order reaction that can be readily determined by fitting the unfold-
ing arm of a chevron plot (see below).
13.4
Techniques
DTm
DDGDN A DTm DSDN ¼ DHDN ð5Þ
Tm
13.4 Techniques 451
Similar to the thermal unfolding case, fits to denaturation curves will provide
two numbers: the m-value (mDN ) and midpoint of the denaturation ([D]50% ). The
m-value is usually taken to be a constant that scales with the change in solvent-
accessible surface area of the protein upon unfolding. Often, the mutated and wild-
type proteins have negligible differences in m-values, given that so few atoms are
deleted. However, significant errors can be introduced when the stability is extrapo-
lated to zero denaturant concentration. To minimize this error, the average m-value
is computed. This is then multiplied against the change in the denaturation mid-
point (a precisely determined value like the change in Tm ) to give the change in
stability upon mutation.
where m TSN and m TSD are the kinetic m-values. Summation of the kinetic m-
values should give the equilibrium m-value. A common explanation for any short-
fall is the existence of intermediate states on the folding pathway. The ratio of the
kinetic m-value to the equilibrium m-value gives a measure of the solvent exposure,
or compactness, of the transition state. This information, in the form of Tanford b,
defined below as b T , is complementary to the f-value, which is a structurally more
specific measure of transition state structure.
m TSN
bT ¼ ð9Þ
mDN
Application of transition state theory to Eq. (8) gives equations describing the de-
pendence of rate on denaturant concentration, [D]
452 13 A Guide to Measuring and Interpreting f-values
where kfH2 O and kuH2 O are the rate constants for folding and unfolding, respectively,
in the absence of denaturant. These equations describe the characteristic chevron
plot of kobs as a function of denaturant concentration. Deviations from a v-shaped
plot with two linear arms, representing the folding and unfolding parts of the reac-
tion, are strong indicators that the folding reaction is more complicated than a sim-
ple two-state process. Further, it should be understood that a v-shaped chevron plot
cannot alone be taken as evidence for a two-state transition without first fitting it to
a standard equation and recovering reasonable values for kinetic constants. Fitting
the entire plot also reduces error that may be present in single rate constant at a
particular denaturant concentration.
The particular denaturant concentration used for computation of f-values is not
critical in the case of a two-state folding protein, where the value of the Tanford b
does not change upon mutation and the folding pathway is unchanged. In general,
it is wisest to compute values of DDG TSD at a variety of denaturant concentrations
to observe whether the folding pathway changes with denaturant concentration
[18]. In cases where intermediate states are believed to exist between the denatured
and transition states, it may be best to use values exclusively at high denaturant
concentrations or to simply fit the unfolding arm of the chevron plot with a linear
fit, since such intermediates are contributing little to the observed rate constant,
kobs , at high denaturant concentration.
13.5
Conclusions
The method described here for carrying out a f-value analysis is the only experi-
mental method available for atomic level structural information of folding transi-
tion states. Extensive f-value analyses have been carried out on the proteins
barnase and chymotrypsin inhibitor 2 [1, 3–5]. The transition state structural in-
formation obtained in such studies, when combined with molecular dynamics
simulation and NMR structural information, can then be used to determine the
complete, atomistic picture of protein folding [19]. As computing power further
improves to bring simulated folding temperatures into agreement with experimen-
tal temperatures, a combination of f-value analysis and computational techniques
will be essential for increasing the repertoire of proteins for which the complete,
atomistic folding pathway is known.
References
14
Fast Relaxation Methods
Martin Gruebele
14.1
Introduction
of ca. 1 ms. It was found that many proteins have an unresolved burst phase, even
though the complete folding process may require from milliseconds to seconds [6].
More recently, proteins and peptides have been discovered that fold completely on
the submillisecond time scale, so none of the folding kinetics can be resolved by
conventional stopped-flow techniques [7]. Relaxation techniques were developed
to break the ‘‘ms-barrier,’’ and so processes in the nanosecond to microsecond
range will be considered ‘‘fast.’’ Relaxation and other fast techniques have proved
that none of the elementary events for folding take more than a millisecond, and
far more stringent limits on the time scales have been set in the meantime.
Fast relaxation studies of biologically relevant systems have a long history. Eigen
himself had studied imidazole protonation by 1960 [8]. Flynn and coworkers
studied the association of proflavine in 1972 by laser temperature jump [9]. Holz-
warth and coworkers studied lipid phase transitions [10] and dye–DNA binding in
the 1980s, and Turner’s group extensively investigated relaxation of nucleic acid
oligomers in the 1980s [11]. Submillisecond relaxation was first applied to peptide
and protein folding in the 1990s [12–16], including photolysis-induced, tempera-
ture, electrochemical [17], and later pressure jumps [3].
The relaxation experiments discussed here are not to be confused with dielectric
relaxation spectroscopy [18], which does not involve a sudden initiation step, or
with NMR relaxation experiments [7], which can study protein folding kinetics on
a microsecond time scale via analysis of the lineshapes. These, as well as stopped-
flow techniques and submillisecond continuous flow techniques are discussed in
Chapter 23.
14.2
Techniques
14.2.1
Fast Pressure-Jump Experiments
Pressure (or solvent density) is a useful variable for controlling protein folding.
Although folded proteins are more compact than unfolded proteins in terms of
radius of gyration, the cores are not perfectly packed and the unfolded state gener-
ally has a smaller partial volume of solvation than the folded state. According to
Le Châtelier’s principle, proteins thus unfold at higher pressure, and this gener-
ally occurs at pressures of 1–5 10 8 Pa. Increased pressure also raises the cold-
denaturation temperature of proteins and decreases the heat-denaturation temper-
ature. To a first approximation, the folding–unfolding phase diagram can be
derived from a quadratic expansion of the free energy in T and P. Supporting evi-
dence shows that in the low-pressure regime, the stability of some proteins is actu-
ally enhanced by pressure [19, 20] (see Chapter 5).
A number of kinetic studies using pressure relaxation with millisecond or slower
time resolution have been reported [21]. Submillisecond time resolution was
achieved by Schmid and coworkers [3], who used a piezoelectric driver design first
456 14 Fast Relaxation Methods
(a) (b)
k in 1/s
Time (ms) 1/T in 1000/K
Fig. 14.1. a) Refolding of 6 mM Bc-Csp after a pressure jump
from 100 to 3 bar at 75.3 C. b) Arrhenius plot of the micro-
scopic rate constant k of refolding of (open circle) wild-type
Bs-CspB and the (filled circle) Phe27Ala, (triangle) Phe17Ala,
and (square) Phe15Ala variants. From Ref. [3].
14.2.2
Fast Resistive Heating Experiments
14.2.3
Fast Laser-induced Relaxation Experiments
Lasers can be used to induce relaxation via a variety of photochemical and photo-
physical processes. A substrate can be removed by photolysis, thereby causing
rearrangements of the binding protein. Redox agents or proton donors can be
light-activated, changing the oxidation or protonation state of a protein. Disulfide
bridges or other linkages can be broken, releasing the strain of a nonnative confor-
mation. Other photochemical processes include pH jumps and dissociation of un-
natural amino acids. Among photophysical processes, chromophore relaxation has
been used to study the folding speed limit, and the most commonly used is heat-
ing of the solvent (temperature jump). We discuss these approaches in turn.
k contact (s–1)
Delay time
Fig. 14.5. Decay of thiyl radical absorbance in the 17-mer
peptide discussed in the text. The dynamics can be fitted by
powerlaws ta with a near 0.1. From Ref. [44].
picosecond pulse [44, 45]. Disulfide bridges cleave by a directly dissociative process
of <1 ps duration, providing an enormous dynamic range over which two point
correlations of the thiyl termini can be studied. Figure 14.5 shows the relaxation
of an alanine-rich 17-mer initially cyclized by a disulfide bridge. The relaxation pro-
cess is highly nonexponential and extends from picoseconds to microseconds.
Laser temperature-jumps can achieve dead times as short as 2 ps [13]. The lower
limit is imposed by the need for thermal equilibration of the solvent after a specific
solvent mode has been excited by the laser. (Backbone modes of the protein should
of course not equilibrate, since one wishes to observe them after the temperature
jump.) The achievable temperature jumps generally lie in the range of 5–30 K, suf-
ficient to cause large changes in protein K eq values. Most laser temperature-jump
experiments are carried out with nanosecond heating pulses in the 1.5–1.9 mm
range to pump directly H2 O or D2 O stretching vibrations, which thermalize within
the nanosecond pump profile. These wavelengths can be obtained from Nd:YAG
lasers Raman-shifted in hydrogen (1.9 mm) or methane (1.5 mm) [67], or by differ-
ence frequency generation in LiNbO3 crystals [53]. Fast temperature jumps can po-
tentially cause pressure and temperature fronts to propagate through the sample,
but such effects can be mitigated by using the proper counterpropagating pump-
ing geometry, initial temperature, pump spot size, and a noncollinear detection
method [67].
Protein unfolding, peptide relaxation, and protein refolding were initially ob-
served by laser temperature jumps in the mid-1990s [13, 15, 16]. In apomyoglobin,
formation of the hydrophobic AGH helical core happened 5 orders of magnitude
faster (<10 ms) than complete refolding of the protein, resolving the burst phase
seen by stopped-flow kinetics [6, 16]. Small peptides of well-defined secondary
structure exhibit even faster kinetics [15, 55, 68–72]. Figure 14.6 shows the relax-
ation kinetics of a b-hairpin and of a short helix, common motifs in protein sec-
ondary structure. Analysis of the hairpin data is compatible with formation of the
turn as a rate-limiting step [70], although models with early interstrand contacts
have similar time scales [73]. The helix data use 13 C isotopic labeling to dissect
local peptide kinetics, and confirms nonexponential dynamics previously observed
for nonnatural helical polymers and expected from helix formation models [71, 74,
75].
Two-state and more complex folding mechanisms of larger proteins have also
been observed by laser temperature jumps. Phi value and other mutation analyses
(see Chapter 13) have been reported [59], as well as solvent-tuning studies [76].
Like resistive temperature-jump experiments and temperature-dependent pressure
jumps, laser temperature-jump studies have shown strong curvatures in Arrhenius
plots for the folding rate of fast folders [59]. Figure 14.7 illustrates laser temper-
ature jumps of proteins with the relaxation kinetics of two very fast folding l-
repressor fragment mutants [65]. All mutants of this molecule with >50 ms folding
times exhibit single exponential kinetics, while all mutants with t < 50 ms, irre-
spective of the location of the mutations, exhibit an additional fast phase lasting
for a few microseconds.
14.2.4
Multichannel Detection Techniques for Relaxation Studies
Time (ns)
Fig. 14.6. Relaxation response of 16-mer beta hairpin (top)
and of unlabeled and 13 C-labeled helix peptides (bottom),
showing a significant change in the fast-phase amplitude. From
Refs [69] and [71].
tic spectroscopy [42], or transient gratings [43]. Here multichannel techniques will
be described in more detail. Multichannel techniques provide an array of data at
each kinetic measurement, such as a full infrared spectrum or a time-resolved flu-
orescence decay. They can extract additional information not available when only
rate constants from a single probe are used to characterize folding. These probes
address different aspects of protein structure.
(a) (b)
Scaled fluorescence
Scaled fluorescence
Time (µs)
[78], but they have not been combined with the relaxation experiments described
here.
secondary structure loss examined in detail for the folded and several partially
folded forms of apomyoglobin [5].
In the near-IR and visible region, the local environment of prosthetic groups
such as heme (Soret band at A420 nm) or side chains such as tyrosine [31] can
be probed. Parallel data collection is possible with a broadband excitation source,
such as a femtosecond Ti:Sapphire laser or fluorescence from a laser-pumped dye
(typically 50–100 nm bandwidth) dispersed onto an array detector. This again has
the advantage that the entire absorption spectrum is collected after a single relax-
ation event. The first submillisecond measurements of diffusional dynamics of a
protein (cytochrome c) were carried out using absorption measurements and laser
photolysis (see above) in 1993 [12]. Recently the method has been used to study the
dissociation dynamics of light-harvesting complexes into dimer and monomer sub-
units (see Chapter 14.5 for precautions in the analysis of such relaxation) [64].
Some relaxation experiments (e.g. temperature jumps and pressure jumps) can
induce significant variations in the solvent refractive index. In such cases, great
care must be taken when applying direct absorption to detect protein kinetics, as
the probe beam may wander or may be lensed as it passes through the solvent [64].
detection technique has been applied extensively to fast folding coupled with con-
tinuous mixers (Chapter 15) [89]. Compared with absorption techniques, Raman
and fluorescence emission have an important advantage: their relatively isotropic
light collection makes them less sensitive to refractive index fluctuations and other
variations in the sample induced by the initiation step. In addition, neither the
probe beam nor the scattered Raman light are absorbed by the aqueous solvent,
and Raman is an inherently parallel probe technique which does not require scan-
ning to obtain an entire spectrum. The main disadvantage is that dispersed Raman
spectra are quite weak. Intense pump lasers can help, but photodamage must then
be taken into account [88].
fer (FRET) theory, although some caution is required when attempting to extract
quantitative results because assumptions about orientational effects are often nec-
essary. FRET techniques and single molecule techniques are described in detail in
Chapter 17.
14.3
Protein Folding by Relaxation
14.3.1
Transition State Theory, Energy Landscapes, and Fast Folding
A general overview of the classical transition state theory and Kramer’s theory as
applied to protein folding is given in Chapters 12.1 and 22. Here certain aspects
of energy landscape theory addressed by fast folding experiments are described in
more detail.
Figure 14.8 illustrates the connection between folding funnels, folding free en-
ergy surfaces, and smoothed free energy surfaces for folding. Energy landscape
theory, based on random heteropolymer theory with an energetic bias towards the
native state [100, 101], posits that the energy of a protein gradually decreases as it
becomes more compact. This gives a plot of energy versus configurational entropy
per residue the funnel-like shape shown in Figure 14.8 (column 1).
Superimposed on the funnel shape are energy fluctuations caused by interac-
tions of the backbone, side chain, and solvent if some coordinates are not averaged
over, but kept as reaction coordinates ‘x’. Nonnative interactions over a wide en-
ergy scale are the main cause of landscape roughness. Among the largest of these
is the classic example of proline cis–trans barrier (other side chains can also create
conformational traps; see Chapter 25) [102, 103]. Although proline isomerization
is not a necessary feature of folding [104, 105], even proteins whose energy land-
scape has been smoothed by design as much as possible retain some roughness
[65]. This is so because not all nonnative interactions can be removed – in the lan-
guage of energy landscapes, a residual frustration persists [106]. A typical exam-
ples of such an unavoidable residual interaction would be the steric clashes of dif-
ferent side chains [107, 108].
Among the traps on the funnel (assumed to have a Gaussian depth-distribution
in random-energy theory) some can be rather deep [109], thus qualifying for spe-
cific treatment as ‘‘intermediates’’ rather than a statistical treatment. The best ex-
ample is the noncoincidence between chain collapse and core drying – the latter a
transition where water molecules are squeezed from the protein core. A larger en-
ergy fluctuation is shown at the entrance of the top funnel in Figure 14.8. Rapidly
formed folding intermediates are discussed further in a separate section below.
Free energies are related to energies and entropies by F ¼ E TS, and so the
funnel can be transformed into a picture more amenable for experimental work,
which is generally carried out at constant temperature, not at constant entropy. Of
14.3 Protein Folding by Relaxation 467
course the computation of the Gibbs free energy is somewhat more involved be-
cause it depends on enthalpy and total entropy (not energy and configurational en-
tropy alone). The middle column of Figure 14.8 shows an example with a reaction
coordinate linearly related to the configurational entropy.
The three free energy surfaces shown differ by the degree of compensation be-
tween energy and entropy. If entropy reduction during collapse is not fully com-
pensated by contacts, a barrier results (top middle in Figure 14.8). In addition the
free energy surface has a rough structure, and some of the local minima may be
sufficiently prominent to deserve classification as ‘‘intermediates’’ or ‘‘high-energy
intermediates’’ [110, 111]. The two are distinguished simply by whether the inter-
mediate comes within 2–3 kB T of the denatured or native states (is populated), or
not. The top middle case is very common in relaxation studies of small proteins [3,
14, 16, 26, 28, 31, 53, 57, 59].
468 14 Fast Relaxation Methods
It is not obvious that a free energy surface such as the one in the top middle
of Figure 14.8 should give rise to single exponential kinetics. As surmised by
Creighton and discussed by Zwanzig it does, if the interconversion among pairs
of states is fast compared to traversing the bottleneck to the folded state [112,
113]. In that case, the activated folding rate coefficient ka is much smaller than
the diffusional time scale km for moving among the local minima, allowing the
protein to equilibrate locally on the surface compared to the time scale required
for barrier crossing. The local diffusion events characterized by km occur among
the unavoidable minima left over when the protein is ‘‘minimally frustrated’’.
Computations using Go models have shown that the free energy fluctuations
caused by minimal frustration are generally <3 kB T ’’ [114–116]. The three arrows
in Figure 14.8 show part of a sequence of events occurring on the time scale km .
Because of the residual roughness of the free energy surface, km is slower [65]
than the fastest events the backbone can undergo (such as formation of a hairpin
or the collision of two residues in a random peptide of length N in a y-solvent
tuned to eliminate all barriers) [15, 48, 49, 51, 68, 117]. A lower limit on km is set
by such elementary-event peptide experiments, and ranges from ke ¼ (10–100
ns)1 , depending on peptide length and residue composition.
In the large barrier case (Figure 14.8 top) only ka can be observed, not km . This
leads to an ambiguity in using transition state-like theories, which contain two
time scales: the prefactor, and the smaller rate coefficient, which differs from the
prefactor by a Boltzmann factor (see Chapter 12.1 and equation 3 in Section
14.5.4.1). The choice of prefactor then depends on how we treat the free energy
surface: on the rough free energy surface in the top middle of Figure 14.8, we
would use (10–100 ns)1 , and treat the small barrier crossings caused by frustra-
tion explicitly as an additional free energy barrier. On the smoothed free energy
surface to the right, we would use km as our slower prefactor, which has been
shown to correspond to a renormalization of the diffusion constant to a smaller
value D [118]. Experimentally, this renormalization factor can be derived by com-
bining peptide relaxation studies [48, 68] and protein relaxation studies. For
lambda repressor, it turns out to be about 40, equivalent to residual free energy
fluctuations of 0.6 kB2 T 2 caused by frustration [65].
Can km be measured directly? The middle and lower rows of Figure 14.8 show
what happens when the energy and entropy decrease are better compensated in
the funnel: the free energy barrier disappears. In the completely downhill case on
a rough surface, Zwanzig has shown with well-defined assumptions (see below)
that the kinetics can be exponential, but with the renormalized slower diffusion
constant D [118]. In the intermediate case (center row of Figure 14.8), the as-
sumption that km g ka discussed earlier breaks down when the main barrier be-
comes comparable to the minimal roughness of the free energy surface in one
dimension [56, 65, 119]. Both time scales can then be observed simultaneously by
relaxation experiments with proteins engineered to fold downhill in a few micro-
seconds, yielding a value of km ¼ (2 ms)1 for the case of lambda repressor [65].
Explicit calculations for lambda repressor [120], as well as general theoretical calcu-
lations also yield speed limits in the 0.5–2 ms range [121].
14.3 Protein Folding by Relaxation 469
The speed limit km is not a universal quantity, but depends on the complexity of
the protein’s fold or ‘‘topology,’’ as it is often referred to in the folding literature.
The reason is that more complex proteins have to undergo more elementary events
to fold and are also more likely to be subject to residual frustration [114, 119]. For
example, the most optimized fast-folding sequence for a two-helix bundle should
fold faster (<1 ms) than the most optimized 8-helix myoglobin fold (>7 ms). This
has been expressed in a relationship between the contact order (average loop
length between to native contacts) and logarithm of the rate constant [122, 123]; a
rigorous analysis must also include chain-length effects [124]. ln k vs. contact order
is usually plotted for any two-state folders at hand, resulting in a large amount of
scatter (2–3 orders of magnitude) along the vertical axis. The analysis outlined
above shows that one should use instead the largest value of km , obtained with
the most optimized sequence for a given fold, i.e., the actual speed limit of a fold
subject to minimal frustration. Measurements of the fastest folding rates will allow
a more quantitative experimental analysis of the rate–contact order relationship by
revealing the degree of energetic frustration a given sequence imparts on a fold.
Changing the bias of the free energy by optimizing folding conditions has an ef-
fect even when a significant barrier persists. Consider the top and middle cases in
Figure 14.8, and imagine for the moment that the barriers in the middle case are
still significant (by making the denatured minimum a bit deeper). Then the top
transition state is ‘‘late,’’ and the middle transition state is ‘‘early.’’ Whether one
describes the motion of the transition state as a smooth shift [125], or as a switch
across a high-energy intermediate [111], depends on the degree of smoothing ap-
plied to the free energy surface. Thus it is necessary to define a rule for how much
the free energy surface should be smoothed in kinetic descriptions. A useful rule
is that the smoothing should leave intact minima deeper than 2–3 kB T, since this
is the crossover where fast relaxation experiments and Langevin simulations have
shown Kramer’s exponential rate theory to be applicable [65]. This degree of
smoothing removes the roughness caused by minimal frustration, but leaves
more pronounced intermediates intact. When smaller barriers have to be treated
(middle right in Figure 14.8), Langevin simulations are required to obtain the cor-
rect dynamics on the one-dimensional potential [65, 126].
One can choose to not apply any coarse-graining at all, and pick the classical
value kB T/h A 300 fs for the prefactor [28], along with an atomic scale motion as
the reaction coordinate. In such a description of the kinetics, the reaction coordi-
nate can change dramatically from realization to realization (e.g., in a series of
MD simulations, each successive folding trajectory has a different atomic-level re-
action coordinate at the free energy maximum). Thus, a structural justification for
smoothing of the free energy surface is to apply smoothing until a consistent set of
collective coordinates emerges.
The question then arises: how many coordinates should be used for a good
description – are the one-dimensional cartoons in Figure 14.8 sufficient? As dis-
cussed earlier, Zwanzig has shown that with certain assumptions, even diffu-
sion on a rough surface can be treated as an exponential process [118]. A one-
dimensional reaction coordinate (such as the plot in Figure 14.8 bottom) is an
470 14 Fast Relaxation Methods
14.3.2
Viscosity Dependence of Folding Motions
Viscosity effects on folding rates provide important clues to the mechanism. This
is discussed in detail in Chapter 22, but a few observations from relaxation studies
are worth reemphasizing here. A proper discussion requires at least Kramer’s
theory [131] because the same solvent–protein coupling both activates the protein
for reaction and slows down the reaction when the coupling (and hence the viscos-
ity) becomes large.
Viscogenic agents, such as sucrose or ethylene glycol, can affect both the prefac-
tor (@h1 in the strongly damped Kramer’s regime) and free energy barriers, as
well as the compactness of the denatured state. Early observations of submillisec-
ond refolding kinetics indicated a decrease of the rate with increasing viscosity, but
14.3 Protein Folding by Relaxation 471
did not compensate for barrier effects [16, 33, 132]. Recent work has carefully cali-
brated the free energy at different viscogen concentrations, finding that the viscos-
ity dependence of the prefactor is very close to h1 [133–135]. These authors also
make the argument that barriers are required to protect the native state from con-
tinuous unfolding, supported by the recent observation that very fast folding pro-
teins without a substantial free energy barrier are aggregation prone [65].
There has been an ongoing discussion of deviations from Kramer’s strongly
damped regime caused by self-friction of the protein molecule during folding [16,
33]. Such deviations have been modeled by phenomenological equations of the
type k @ ðh þ h0 Þ1 or k @ hd where d < 1. Small peptides indeed have a length-
dependent viscosity dependence, with powers deviating from 1 to as low as 0.8
[136]. In larger proteins, the evidence generally favors the simple Kramer’s result.
Some of the insights from simulation are discussed further below.
14.3.3
Resolving Burst Phases
As discussed in the section on transition states, some local minima in the free en-
ergy surface are deep enough to be amenable to individual analysis by relaxation
experiments. Even when they lie at high energy, they can have subtle effects on
the folding kinetics [137]. It has been postulated that a discrepancy between the
time scale for collapse and the time scale for desolvation of the protein core could
give rise to compact but solvated globular states [138]. Fast relaxation experiments
support this general idea. An intermediate with native-like quenching of the apo-
myoglobin AGH helices forms on a microsecond time scale, and amide exchange
experiments show that parts of this intermediate are protected, while others are
still undergoing rapid exchange [6, 16]. Several nonnative forms of apomyoglobin
have been investigated by Dyer and coworkers, and some have been shown to form
a compact core at the diffusion limit [76]. Resistive temperature-jump experiments
have been used to study the biphasic folding of barstar. Phi-value analysis shows
that two of the helices consolidate rapidly (A300 ms), but a fluorescence shift to-
wards a native-like structure requires nearly a second [25].
The question of the barrier height separating the unfolded state from the inter-
mediate has been discussed extensively in the literature. Stretched- or multiexpo-
nential formation of a folding intermediate of the two-domain protein phospho-
glycerate kinase indicates that for one of the two domains, the formation of the
intermediate is limited by a hopping process on a rough free energy surface, rather
than by a single barrier crossing [56, 63]. This is also supported by the observation
that cold denaturation, which should smooth out the free energy surface, brings
the kinetics back towards single-exponential folding. More work has been done
using flow techniques (see Chapter 15 for details): Using a prefactor of 50 ns (to in-
clude free energy landscape roughness in the activation barrier) yields values below
8 kB T for molecules such as cytochrome c or ubiquitin [138–141]. It has been
argued that in many cases, no barrier beyond residual roughness exists [56, 142,
143], or that these barriers come after the main transition state [144, 145]. In one
472 14 Fast Relaxation Methods
case, it has also been convincingly shown that the intermediate contains significant
nonnative structure, but nonetheless has direct access to the native state [146].
14.3.4
Fast Folding and Unfolded Proteins
It has long been known that unfolded states contain sequence-dependent residual
structure [147]. As discussed in Chapters 20 and 21, this has come to be much better
understood in recent years. Residual unfolded structure is particularly important
when interpreting fast relaxation experiments: the barriers are small, and changes
in the unfolded structural ensembles can have a major effect on folding kinetics
[31, 148]. For example, a variety of relaxation experiments have shown that unfold-
ing rates tend to have ‘‘normal’’ linear Arrhenius plots, while folding rates have
highly curved plots (see for example Figure 14.1). This has been taken as a sign
that transition states remain relatively native like upon temperature tuning [31,
32], while the unfolded state becomes more native-like at low temperature. The
effect of unfolded states on the rate constant can be reduced, by making them
equally rigid at all temperatures. In that case motions of the transition state make
a larger contribution to the rate turnover effect. Many natural proteins that are not
highly optimized for folding probably fall in this category.
When the contributions of the unfolded state to the rate are not too large, a con-
tinuous analog of F-value analysis (see Chapter 13) can be used to analyze protein
folding. This has the advantage that arbitrarily small perturbations can be applied,
while mutations apply a defined-size perturbation [59]. Denaturant, pressure, and
temperature have been tuned to this effect [57, 125, 137]. By itself, tuning thermo-
dynamic variables provides no structural information, but when combined with
mutations, structural information can be revealed. One useful application of con-
tinuous tuning is to test mutations for ‘‘conservativeness’’: a mutant with a very
different temperature dependence of the free energy from the wild type is less
likely to be a conservative mutation one wishes to use in F-value analysis [59].
The bottom line is this: whenever unfolded states are significantly altered by any
perturbation, whether it is a mutation or the change of a thermodynamic variable,
caution must be exercised when using free energy derivative analysis [32, 119]. An
example in the case of temperature has already been given above, and is discussed
in more detail in Ref. [148]. As an example of a mutation, truncation of a large side
chain to a smaller side chain may reduce a nonnative interaction in the unfolded
state, causing apparent shifts in F-values that are not related to changes in the
transition state [32]. The characterization of unfolded proteins has only scratched
the surface so far, but the number of experiments characterizing unfolded states
of proteins is steadily increasing.
14.3.5
Experiment and Simulation
simplest level, master equations are used, such as the n-state kinetic schemes dis-
cussed throughout this volume [4, 149]. Two common ones, prototypical of fast
folding and association reactions, are discussed in more detail in the Experimental
Protocols (Chapter 14.5). Their breakdown, outlined qualitatively earlier in the dis-
cussion, is also discussed more mathematically in terms of linear response theory
[150], which explains rigorously when the transition from activated kinetics to the
molecular time scale ðkm Þ1 occurs. More sophisticated kinetic schemes, which
provide a more microscopic treatment of the dynamics by including large numbers
of states, have yielded insights into the formation of small segments of secondary
structure, as well as the folding of proteins [70, 113, 123, 151]. These models trace
their ancestry to classic treatments of secondary structure such as the Zimm-Bragg
model [152]. An important concept in such models is how many groups of states
can independently undergo a transition. In ‘‘single sequence approximations,’’
native-like stucture must spread out from a single locus. In ‘‘n-sequence approxi-
mation’’ models, more sites are allowed where native structure can form [123].
The latter models yield lower barriers, and are generally in close accord with full
theoretical treatments and experimental data.
Motion on continuous folding free energy surfaces can be treated by Langevin
dynamics, revealing some interesting behavior. For example, it has been shown
that small minima in the barrier region can actually accelerate folding if they
increase the local curvature at the barriers [126]. Simulations on smoothed down-
hill surfaces such as Figure 14.8 (middle right) have verified nonexponential decays
with renormalized diffusion constants ranging as small as D A 0:00016 nm 2
ns1 , considerably smaller than the free diffusion constant for a small segment of
secondary structure, in accord with analytical models [65].
Lattice models [116, 153–155] have proved very fruitful in comparison with fast
folding experiments. Originally two-dimensional and allowing only for two types of
residues, they have been extended to include more residue types and different lat-
tice geometries [156, 157], for example to describe nonexponential helix relaxation
kinetics [74]. Lattice models were among the first to show that proteins could fold
via a rapid non-cooperative collapse, followed by crossing a relatively small barrier
[155].
Off-lattice models are useful because they still coarse-grain the folding problem,
thus making larger systems accessible, yet they do so with fewer restrictions than
lattice models [158]. The viscosity dependence of folding reactions has been inves-
tigated by Klimov and Thirumalai in a model coupling a coarse-grained model of
the protein with Langevin dynamics to represent the solvent. In agreement with
several experiments, the normal Kramer’s regime (k @ h1 ) provides a good fit for
a-helical and b-sheet peptides [159]. There are however certain conditions where a
fractional power smaller than 1 can occur [160]. Off-lattice models have also re-
vealed a rough free energy surface which can give rise to nonexponential kinetics
[114, 127].
Molecular dynamics (MD) simulation has become an increasingly important tool
for protein folding, as fast relaxation time scales and the capabilities of MD simu-
lation on large computers are beginning to merge on the microsecond time scale
[161]. A detailed overview is given in Chapters 32 and 33; here a few examples of
474 14 Fast Relaxation Methods
14.4
Summary
Fast relaxation experiments are testing predictions of the free energy landscape
theory [166], such as roughness of the free energy surface and downhill folding.
DG (kJ mol-1)
Fig. 14.9. Free energy surface for the hairpin trpzip2 with the
Ca RMS deviation from the native state as one-dimensional
reaction coordinate. The unfolded state is a plateau with
multiple shallow minima, like the middle case in Figure 14.8
and the fitted surface for lambda repressor in Ref. [4].
14.5 Experimental Protocols 475
They have directly measured the time scales of elementary folding processes, in-
cluding secondary structure formation and loop formation. The low barriers of
fast folding proteins allow their free energy surfaces to be explored beyond the na-
tive and unfolded basins. Experimental folding kinetics and molecular dynamics
simulations are meeting in the nanosecond to microsecond regime.
14.5
Experimental Protocols
14.5.1
Design Criteria for Laser Temperature Jumps
Water absorption in the 1.5 mm region is about 200 m1 , limiting pathlengths to
about 250 mm in single pass, or 500 mm in counterpropagating geometry, if a tem-
perature uniformity of 10% is to be maintained in a 10 C temperature jump.
About 100 mJ of near infrared light is required to achieve sufficiently large temper-
ature differentials. This output level can be obtained from a 1.5 m long Raman cell
at 20 atm (2 10 6 Pa) of methane, with a collimated input beam diameter of 0.5
cm and input power levels of 500 mJ. Currently, an unamplified single-oscillator
Nd:YAG laser is capable of meeting these specifications without requiring optical
isolation to prevent feedback from reflections into the laser cavity. Care needs to
be taken in the choice of optics: b3 mm sapphire or b6 mm fused silica windows
with 1.06 mm antireflection coating are sufficient to withstand the pressure and
laser intensity. In designing a Galilean (convex/concave lens) beam telescope, one
must keep in mind that at 500 mJ, even second-surface reflections on optical
curvatures can focus down to a damaging level. Upon output, a Pellin-Broca or
equilateral glass prism can be used to separate the YAG fundamental from the
Raman-shifted light. An IR image intensifier and Kodak Linagraph paper pro-
vide convenient means of tracing the Raman beam, which can be split in a
50:50 beam splitter for a counterpropagating pump geometry. For heating pur-
poses, the beam needs to be brought to ca. 1–2 mm diameter when interacting
with the cell. This can be achieved at the beamwaist of a long-focal length lens
(ca. 1 m), or by placing a shorter lens sufficiently close to the sample so the focus
lies beyond the sample. Because of the Fourier-transform property of focusing, the
resulting beamprofile is generally of much higher quality than the original beam-
profile, as verified by knife-edge measurements or a CCD camera.
The following two temperature-jump calibration methods are straightforward
to implement: A A 1:5 mm wavelength diode laser beam can be chopped, sent
through the sample, and demodulated by a lock-in amplifier. The change in water
absorbance can simply be calibrated by reading diode transmission changes over
476 14 Fast Relaxation Methods
14.5.2
Design Criteria for Fast Single-Shot Detection Systems
In a single-shot detection system, the light source either provides sufficiently short
pulses so that fluorescence decays can be directly resolved, enough bandwidth so a
full absorption spectrum can be measured, and sufficiently high pulse repetition
rates that kinetics can be fully resolved. Currently, the most convenient laser sys-
tems for excitation are mode-locked Ti:sapphire lasers pumped by intracavity-
doubled diode-pumped YAG lasers. Such systems easily achieve 1 W average out-
put power in <100 fs pulses at ca. 80 MHz pulse repetition rate. They are tunable
in the 750–950 nm wavelength range, as well as at half, third, and quarter wave-
lengths by nonlinear sum-frequency generation (with maximum output levels
typically 200 mW, 50 mW, and 10 mW, respectively). This provides at least 10 4
photon/pulse, or 1:10 2 shot noise. Systems with sufficiently high bandwidth can
also be difference-frequency generated in silverthiogallatre crystals to produce up
to 200 cm1 bandwidth mid-infrared pulses (1000–1800 cm1 ) [81]. Commercial
lasers and sum-frequency generators are available.
The most important distinction between detection by transmission or by fluores-
cence lies in the allowed cell pathlength. Backround absorption in mid-IR absorp-
tion, even in D2 O, limits pathlengths to 50–100 mm. In addition, a thermal lens
created by the variation of the refractive index about the ca. 1–2 mm diameter heat-
ing profile limits the pathlength in transmission experiments to <100 mm. The ef-
fect can be minimized by heating an at least 10 larger diameter of the sample
than is probed, but even this precaution is not sufficient when cell pathlengths sig-
nificantly >100 m are used.
Fluorescence detection is less sensitive. To avoid polarization artifacts, the pump
beam should be depolarized and fluorescence should be detected at the magic
angle. However, artifacts caused by cavitation (formation of vapor bubbles) during
large temperature jumps are generally a more severe limiting factor. Light guides
filled with a UV-transmitting fluid can be used in lieu of lens-based imaging
optics to collect the fluorescent output. Models with f -ratios as large as 0.8 and
diameters a4 mm are available, allowing efficient collection of the fluorescence
and piping to remote detection systems.
Details of the electronics of a detector which can detect a fluorescence decay ev-
ery 14 ns with a 500 ps risetime are given in Ref. [67], details of a detector which
can collect an entire fluorescence spectrum from 300 to 400 nm every 100 ns are
given in Ref. [98], and details of a detector that can detect a mid-IR spectrum of
14.5 Experimental Protocols 477
200 cm1 with 300 ns time resolution are given in Ref. [81]. Additional circuit dia-
grams and software are available from the author.
For experiments with cold-denatured proteins, the choice of cell is critical to
avoid freezing supercooled samples because of nucleation on cell defects. In ex-
truded fused silica cells, temperatures as low as 15 C are routinely possible in
water, and 20 C have been observed. Such cells are available in pathlengths
from 0.3 mm to 1 mm. The author has observed laser-induced temperature jumps
between two temperatures below freezing without nucleation. Cell temperaures are
most conveniently adjusted by thermoelectric coolers (for fast adjustment) in com-
bination with a recirculating bath.
14.5.3
Designing Proteins for Fast Relaxation Experiments
Three design criteria have proved particularly useful in increasing the refolding
rate of protein mutants:
1. Increase unfolded structure: this leads to a more structured unfolded state. For
example, Oas has replaced glycine residues in helical positions 46 and 48 of
lambda repressor fragment l6a85 by alanines, thereby greatly increasing the
folding rate [167].
2. Eliminate functional residues: a significant fraction of the energetic frustration
in proteins probably comes from functional residues that are not optimal for
folding. For example, replacing the distal histidine in apomyoglobin by a phe-
nylalanine increases the folding significantly [168], and grafting the shorter
FBP WW loop onto the Pin WW domain increases the folding rate by a factor
of 3 [169].
3. Decrease the contact order: simplifying the fold reduces residual roughness of
the free energy. As a rough guide, 2–3 helix bundles can be expected to reach a
speed limit around 0.5 ms (W. F. DeGrado, private comminication, 2003), 5 helix
bundles around 1–2 ms [65], and 8 helix bundles around 2–5 ms [52]. The time
scales observed thus far for beta-rich proteins are somewhat longer (e.g., b1 ms
for a simple double-stranded hairpin) [69].
14.5.4
Linear Kinetic, Nonlinear Kinetic, and Generalized Kinetic Analysis of Fast Relaxation
Many complicated kinetic schemes can arise in the discussion of relaxation experi-
ments [170] (see Chapter 22). Here we discuss in detail three cases that illustrate
the type of information that can be extracted, and the pitfalls that arise in the anal-
ysis. Other cases can be found in texts on reaction kinetics. Only kinetic schemes
are discussed here; the actual analysis of raw experimental data is discussed in the
next protocol.
d½D
¼ k f ½D þ kd ½F with ½D þ ½F ¼ Ptot : ð1Þ
dt
The solution to this equation, using the equilibrium constant K eq ¼ Feq /Deq ¼
k f /kd , is given by
Kramer’s version of transition state theory (which allows for recrossings of the bar-
rier induced by the solvent) yields a forward rate constant [131]
y
k f ¼ ny ½hðTÞeDG /kT ð3Þ
where the barrier crossing prefactor ny depends on the solvent viscosity h. (ny gen-
erally decreases with increasing solvent viscosity. It may level off at small solvent
viscosities because the protein self-friction becomes dominant. It may even de-
crease with decreasing solvent viscosity because the solvent–reactant collisions
no longer are strong enough to activate the reactant when the solvent viscosity is
very low; this ‘‘inverted Kramer’s’’ regime has not been observed during protein
folding.)
According to the rate law given above, the kinetics remain single exponential for
any size relaxation; the forwards and backwards rate coefficients can be extracted
independently if the equilibrium constant is known. However, under certain condi-
tions nonsingle exponential kinetics can be observed: if the relaxation significantly
changes the properties of the states D or F, and if their spectroscopic signatures
change, then the populations before the jump are no longer local equilibrium pop-
ulations. As a result, rapid relaxation within the states D and F may be observed
before the barrier crossing kinetics [56].
d½A
¼ ka ½A 2 þ 2kd ½A2 with ½A þ 2½A2 ¼ A tot ð4Þ
dt
whose solution, with the initial condition [A](t ¼ 0) ¼ A 0 and [A2 ](t ¼ 0) ¼ 0 can
be written in compact form as
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
k0 kb þ A 0kf kb
½AðtÞ ¼ tanh k 0 t þ tanh1 ; k0 ¼ k 2b þ 2k b k f A 0 ð5Þ
kf k0 kf
(Note that this evaluates to a real quantity even in cases where (k b þ A 0 k f Þ/k 0 > 1.)
Clearly the functional form of the kinetics is not single exponential, and the shape
14.5 Experimental Protocols 479
of the relaxation actually depends on the initial concentration A 0 . This differs from
the case above, where the relaxation remains exponential for any initial condition,
no matter how far from equilibrium.
If this system is near equilibrium however, then the relaxation becomes single
exponential. This can be seen most easily by writing the rate equation in terms of
½A ¼ dA þ A eq , where dA represents the small deviation of the concentration from
the equilibrium value. Neglecting quadratic terms in dA, the rate equation is ap-
proximated by
ddA
A ð2k f A eq þ k b ÞdA ð6Þ
dt
Thus the rate constant depends on the concentration, but the kinetics are single
exponential. By varying the concentration, the forwards and backwards rate coeffi-
cients can be extracted independently.
In this equation, RC is a reaction coordinate. RC0 is the position along the reaction
coordinate that separates the states D and F (the position of the transition state in
conventional transition state theory). nF is the folded population obtained by inte-
grating over the population from reaction coordinate RC0 A 0 towards the F side of
the free energy surface. d is the Dirac delta function so only trajectories at RC ¼ 0
are counted. v is the velocity of the protein trajectory along the reaction coordinate.
480 14 Fast Relaxation Methods
14.5.5
Relaxation Data Analysis by Linear Decomposition
14.5.5.2 w-Analysis
When the thermodynamic data for a folding reaction as a function of temperature
(or some other variable) can be represented by a cooperative two-state folding
model, the following question becomes relevant: Are the folding kinetics those of
an activated two-state system, and hence single exponential? In that case the reac-
tion proceeds as
a1 ðtÞ
w1 ðtÞ ¼ ð11Þ
a1 ðtÞ þ a2 ðtÞ
It describes how the shape of the fluorescence profile evolves from a more un-
folded signature towards a more folded signature, independent of the signal am-
plitude. This function has three important properties, which make it an excellent
choice for representing folding kinetics. It approaches a signal-to-noise ratio lim-
ited only by the Poisson statistics of photon numbers in each laser shot (like ideal
fluorescence intensity analysis); it is immune to laser intensity fluctuations (like
fluorescence lifetime analysis); it allows one to extract species populations for two-
state folders, and distinguishes two-state from multi-state folding (unlike ampli-
tude or lifetime analysis). Further details are discussed in Ref. [98].
Acknowledgments
The author was supported by NSF grant MCB-0316925 while this work was pre-
pared, and would like to thank the research groups who have generously allowed
482 14 Fast Relaxation Methods
figures from their publications to be used in this work. Additional material for this
review was taken from other primary literature publications by the author listed in
the references.
References
apomyoglobin: probing the upper for b-Hairpin Kinetics. Proc. Natl Acad.
reaches of the folding energy Sci. USA 95, 5872–9.
landscape. Biochemistry 40, 5137– 71 Huang, C. Y., Getahun, Z., Wang,
43. T., DeGrado, W. F. & Gai, F. (2001).
61 Snow, C., Nguyen, H., Pande, V. & Time-resolved infrared study of the
Gruebele, M. (2002). Absolute helix-coil transition using C-13-labeled
comparison of simulated and helical peptides. J. Am. Chem. Soc.
experimental protein folding 123, 12111–12.
dynamics. Nature 420, 102–6. 72 Getahun, Z., Huang, C. Y., Wang,
62 Osváth, S. & Gruebele, M. (2003). T., Leon, B. D., DeGrado, W. F. &
Proline can have opposite effects on Gai, F. (2003). Using nitrile-derivated
fast and slow protein folding phases. amino acids as infrared probes of local
Biophys. J. 85, 1215–22. environment. J. Am. Chem. Soc. 125.
63 Osváth, S., Sabelko, J. & Gruebele, 73 Klimov, D. K. & Thirumalai, D.
M. (2003). Tuning the heterogeneous (2000). Mechanisms and kinetics of
early folding dynamics of phospho- beta-hairpin formation. Proc. Natl
glycerate kinase. J. Mol. Biol. 333, Acad. Sci. USA 97, 2544–9.
187–99. 74 Yang, W., Prince, R., Sabelko, J.,
64 Pandit, A., Ma, H., Stokkum, I. v., Moore, J. S. & Gruebele, M. (2000).
Gruebele, M. & Grondelle, R. v. Transition from exponential to
(2003). The time resolved dissociation nonexponential kinetics during
reaction of the light-harvesting 1 formation of an artificial helix. J. Am.
complex of Rhodospirillum rubrum, Chem. Soc. 122, 3248–9.
studied with an infrared laser-pulse 75 Hummer, G., Garcia, A. E. & Garde,
temperature jump. Biochemistry 41, S. (2000). Conformational diffusion
15115–20. and helix formation kinetics. Phys.
65 Yang, W. & Gruebele, M. (2003). Rev. Lett. 85, 2637–40.
Folding at the speed limit. Nature 423, 76 Gilmanshin, R., Callender, R. H. &
193–7. Dyer, R. B. (1998). The core of
66 Nguyen, H., Jäger, M., Gruebele, M. apomyoglobin E-form folds at the
& Kelly, J. (2003). Tuning the free- diffusion limit. Nat. Struct. Biol. 5,
energy landscape of a WW domain by 363–5.
temperature, mutation and truncation. 77 Low, D. W., Winkler, J. R. & Gray,
Proc. Natl Acad. Sci. USA 100, 3948– H. B. (1996). Photoinduced oxidation
53. of microperoxidase-9: generation of
67 Ballew, R. M., Sabelko, J., Reiner, ferryl and cation-radical prophyrins.
C. & Gruebele, M. (1996). A single- J. Am. Chem. Soc. 118, 117–20.
sweep, nanosecond time resolution 78 Pollack, L., Tate, M. W., Darnton,
laser temperature-jump apparatus. N. C., Knight, J. B., Gruner, S. M.,
Rev. Sci. Instrum. 67, 3694–9. Eaton, W. A. & Austin, R. H. (1999).
68 Eaton, W. A., Muñoz, V., Thompson, Compactness of the denatured state of
P. A., Henry, E. R. & Hofrichter, J. a fast-folding protein measured by
(1998). Kinetics and dynamics of submillisecond small-angle X-ray
loops, a-helices, b-hairpins, and fast- scattering. Proc. Natl Acad. Sci. USA
folding proteins. Acc. Chem. Res. 31, 96, 10115.
745–53. 79 Gilmanshin, R., Callender, R. H.,
69 Muñoz, V., Thompson, P. A., Dyer, R. B. & Woodruff, W. H.
Hofrichter, J. & Eaton, W. A. (1998). Fast event in protein folding:
(1997). Folding dynamics and the time evolution of primary
mechanism of b-hairpin formation. processes. Annu. Rev. Phys. Chem. 49,
Nature 390, 196–9. 173–202.
70 Muñoz, V., Henry, E. R., 80 Wang, J. & El-Sayed, M. A. (1999).
Hofrichter, J. & Eaton, W. A. Temperature jump-induced secondary
(1998). A Statistical Mechanical Model structural change of the membrane
486 14 Fast Relaxation Methods
A simple model for calculating the protein folding reaction. Proc. Natl
kinetics of protein folding from three- Acad. Sci. USA 94, 5622–7.
dimenstional structures. Proc. Natl 135 Bhattacharyya, R. P. & Sosnick,
Acad. Sci. USA 96, 11311–16. T. R. (1999). Viscosity dependence of
124 Koga, N. & Takada, S. (2001). Roles the folding kinetics of a dimeric and
of native topology and chain-length monomeric coiled coil. Biochemistry
scaling in protein folding: a 38, 2601–9.
simulation study with a Go-like 136 Bieri, O., Wirz, J., Hellrung, B.,
model. J. Mol. Biol. 313, 171–80. Schutkowski, M., Drewello, M. &
125 Silow, M. & Oliveberg, M. (1997). Kiefhaber, T. (1999). The speed limit
High-energy channeling in protein of protein folding measure by triplet-
folding. Biochemistry 36, 7633–7. triplet energy transfer. Proc. Natl Acad.
126 Wagner, C. & Kiefhaber, T. (1999). Sci. USA 96, 9597–601.
Intermediates can accelerate protein 137 Pappenberger, G., Saudan, C.,
folding. Proc. Natl Acad. Sci. USA 96, Becker, M., Merbach, A. E. &
6716–21. Kiefhaber, T. (2000). Denaturant-
127 Skorobogatiy, M., Guo, H. & induced movement of the transition
Zuckerman, M. (1998). Non- state of protein folding revealed by
Arrhenius modes in the relaxation of high-pressure stopped-flow measure-
model proteins. J. Chem. Phys. 109, ments. Proc. Natl Acad. Sci. USA 97,
2528–35. 17–22.
128 Metzler, R., Klafter, J. & Jortner, J. 138 Khorasanizadeh, S., Peters, I. &
(1999). Hierarchies and logarithmic Roder, H. (1996). Evidence for a
oscillations in the temporal relaxation three-state model of protein folding
patterns of proteins and other complex from kinetic analysis of ubiquitin
systems. Proc. Natl Acad. Sci. USA 96, variants with altered core residues.
11085–9. Nat. Struct. Biol. 3, 193–205.
129 Becker, O. M. & Karplus, M. (1997). 139 Shastry, M. C. R. & Roder, H. (1998).
The topology of multidimensional Evidence for barrier-limited protein
potential energy surfaces: theory and folding kinetics on the microsecond
application to peptide structure and time scale. Nat. Struct. Biol. 5, 385–92.
kinetics. J. Chem. Phys. 22, 1495– 140 Hagen, S. J. & Eaton, W. A. (2000).
517. Two-state expansion and collapse of a
130 Becker, O. M. (1998). Principal polypeptide. J. Mol. Biol. 297, 781–9.
coordinate maps of molecular 141 Qin, Z., Ervin, J., Larios, E.,
potential energy surfaces. J. Comput. Gruebele, M. & Kihara, H. (2002).
Chem. 19, 1255–67. Formation of a compact structured
131 Kramers, H. A. (1940). Brownian ensemble without fluorescence signa-
motion in a field of force and the ture early during ubiquitin folding.
diffusion model of chemical reactions. J. Phys. Chem. B 106, 13040–6.
Physica 7, 284. 142 Finkelstein, A. V. & Shakhnovich,
132 Plaxco, K. W. & Baker, D. (1998). E. I. (1989). Theory of cooperative
Limited internal friction in the rate- transitions in protein molecules. ii.
limiting step of a two-state protein phase diagram for a protein molecule
folding reaction. Proc. Natl Acad. Sci. in solution. Biopolymers 28, 1681–94.
USA 95, 13591–6. 143 Parker, M. J. & Marqusee, S. (1999).
133 Jacob, M., Schindler, T., Balbach, J. The cooperativity of burst phase
& Schmid, F. X. (1997). Diffusion reactions explored. J. Mol. Biol. 293,
control in an elementary protein 1195–210.
folding reaction. Proc. Natl Acad. Sci. 144 Englander, S. W., Sosnick, T. R.,
USA 94, 5622–7. Mayne, L. C., Shtilerman, M., Qi,
134 Jacob, M., Geeves, M., Holtermann, P. X. & Bai, Y. (1998). Fast and slow
G. & Schmid, F. X. (1999). Diffu- folding in cytochrome c. Acc. Chem.
sional barrier crossing in a two-state Res. 31, 767–74.
References 489
15
Early Events in Protein Folding Explored
by Rapid Mixing Methods
Heinrich Roder, Kosuke Maki, Ramil F. Latypov, Hong Cheng,
and M. C. Ramachandra Shastry
15.1
Importance of Kinetics for Understanding Protein Folding
As with any complex reaction, time-resolved data are essential for elucidating the
mechanism of protein folding. Even in cases where the whole process of folding
occurs in a single step, which is the case for many small proteins [1], the kinetics
of folding and unfolding provide valuable information on the rate-limiting barrier.
The effects of temperature and denaturant concentration give insight into activa-
tion energies and solvent-accessibility of the transition state ensemble, and by
measuring the kinetic effects of mutations, one can gain more detailed structural
insight [2–4]. If the protein folding process occurs in stages, i.e., if partially struc-
tured intermediate states accumulate, kinetic studies can potentially offer much
additional insight into the structural and thermodynamic properties of intermedi-
ate states and intervening barriers [5–9]. Rapid mixing techniques have played
a prominent role in kinetic studies of protein folding [5–7, 10–12]. The combina-
tion of quenched-flow techniques with hydrogen exchange labeling and NMR has
proven to be particularly fruitful for the structural characterization of transient
folding intermediates [13–15].
Theoretical models and computer simulations describe the process of protein
folding in terms of a diffusive motion of a particle on a high-dimensional free en-
ergy surface [16–18]. This ‘‘landscape’’ description of protein folding predicts that
a protein can choose among a large number of alternative pathways, which eventu-
ally converge toward a common free energy minimum corresponding to the native
structure. In contrast, the time course of protein folding monitored by optical and
other experimental probes generally shows relaxation kinetics with one or a few
exponential phases, which are adequately described in terms of a simple kinetic
scheme with a limited number of populated states (the chemical kinetics descrip-
tion). These apparently conflicting models can be consolidated if the free energy
surface is divided into several regions (basins) separated by substantial free energy
barriers due to unfavorable enthalpic interaction or entropic factors (conformational
bottlenecks). The protein can rapidly explore conformational space within each
basin corresponding to a broad ensemble of unfolded or partially folded states,
but has to traverse substantial kinetic barriers before entering another basin. This
type of free energy surface can thus give rise to multi-exponential folding kinetics.
Dissecting the sequence of structural events associated with the folding of a pro-
tein poses formidable technical challenges. Important structural events occur on
the microsecond time scale, which cannot be accessed by conventional kinetic tech-
niques, such as stopped-flow mixing. Temperature-jump and other rapid perturba-
tion methods have shown that isolated helices and b-hairpins form and decay over
a time window ranging from about 50 ns to several microseconds [19–23], which
is short compared with the time it takes to complete the process of folding, even
for the most rapidly folding proteins (reviewed in Ref. [24]). While these results
demonstrate that secondary structure formation is not the rate-determining step
in folding, they do not rule out the possibility that these elementary structural
events affect the overall rate of folding if they represent energetically unfavorable,
but obligatory, early steps. Recent advances in rapid mixing techniques combined
with structurally informative spectroscopic probes made it possible to resolve con-
formational events on the submillisecond time scale preceding the rate-limiting
step in the folding of globular proteins [25–27]. Historically, some of the earliest
rapid kinetic measurements with millisecond time resolution used a continuous-
flow arrangement combined with absorbance measurements of the reaction prog-
ress at different points downstream [28]. However, continuous-flow experiments
were later replaced by the more versatile and economic stopped-flow experiment,
which can be coupled with a wide range of spectroscopic probes to monitor reac-
tions with millisecond time resolution [29, 30]. Continuous-flow techniques have
experienced a renaissance in recent years due to advances in mixer design and
detection methods, which made it possible to push the time resolution into the mi-
crosecond time range [25, 26, 31, 32]. By coupling an efficient capillary mixer with
a digital camera, we can routinely extend fluorescence- or absorbance-based kinetic
measurements to times as short as 50 ms, which has yielded a wealth of new in-
sight into early stages of protein folding [33–40]. Other techniques make use of
two or more consecutive mixing steps in order to prepare the system in a particular
initial state (double-jump stopped-flow), or to execute multiple reaction steps in
sequence (quenched-flow). If a reaction can be quenched by manipulating solution
conditions (e.g., pH) or lowering temperature, quenched-flow or freeze-quench
protocols can be used in combination with slower analytical techniques, such as
NMR, electron paramagnetic resonance (EPR), or mass spectrometry [41]. To
achieve efficient turbulent mixing conditions requires high flow rates and relatively
large channel dimensions, which can consume substantial amounts of material. A
promising alternative to turbulent mixing with improved sample economy uses hy-
drodynamic focusing to mix solutions under laminar flow conditions [42, 43].
15.2
Burst-phase Signals in Stopped-flow Experiments
0.8
S (U)pred a
‘burst phase’
0.6
Fluorescence
S(0)
0.4
0.2
S( )
8
0.0
0.00 0.02 0.04 0.06 0.08 0.10
Time (s)
folded unfolded
1.0
S (U)pred
0.8
Fluorescence
0.6
S(0)
0.4
0.2
S( ) b
8
0.0
0 1 2 3 4 5 6
[GdmCl] (M)
Fig. 15.1.Stopped-flow fluorescence evidence the unfolded state under refolding conditions,
for an unresolved rapid process (burst phase) Spred (U), obtained by linear extrapolation of the
during folding of cyt c (pH 5, 10 C). a) unfolded-state baseline (see dashed line in b).
Tryptophan fluorescence changes during b) Effect of the denaturant concentration on
refolding of acid-unfolded cytochrome c (pH 2, the initial (squares) and final (circles)
@15 mM HCl) at a final GdmCl concentration fluorescence signal, Sð0Þ and SðyÞ, measured
of 0.7 M. The initial signal Sð0Þ at t ¼ 0 in a series of stopped-flow refolding
(determined on the basis of a separate dead- experiments at different final GdmCl
time measurement) falls short of the signal for concentration.
time (e.g., Refs [44, 45–49]). This is illustrated by Figure 15.1, which shows the ki-
netics of refolding of horse cytochrome c (cyt c) measured by stopped-flow fluores-
cence along with equilibrium fluorescence data vs. denaturant concentration [50].
The protein was unfolded by addition of 4.5 M guanidine hydrochloride (GdmCl),
which lies in the baseline region above the cooperative unfolding transition (Figure
15.1b). The refolding reaction was triggered by sixfold dilution with buffer (0.1 M
494 15 Early Events in Protein Folding Explored by Rapid Mixing Methods
15.3
Turbulent Mixing
Most rapid mixing schemes rely on turbulent mixing to achieve complete mixing
of two (or more) solutions. Mixers of various design are in use ranging from a sim-
ple T-arrangements to more elaborate geometries, such as the Berger ball mixer
[52]. The goal is to achieve highly turbulent flow conditions in a small volume.
The turbulent eddies thus generated can intersperse the two components down to
the micrometer distance scale. However, the ultimate step in any mixing process
relies on diffusion in order to achieve a homogeneous mixture at the molecular
level. Given that the diffusion time t varies as the square of the distance r over
which molecules have to diffuse, it takes a molecule with a diffusion constant
D ¼ 105 cm 2 s1 about 1 ms to diffuse over a distance of 1 mm (t ¼ r 2 /D). Thus,
the mechanical mixing step has to intersperse the two components on a length
scale of less than 1 mm in order to achieve submillisecond mixing times. The onset
of turbulence is governed by the Reynolds number, Re, defined as
Re ¼ rnd/h ð1Þ
where r is the density (g cm3 ), n is the flow velocity (cm s1 ), d describes the char-
acteristic dimensions of the channel (cm), and h is the viscosity of the fluid (e.g.,
15.4 Detection Methods 495
0.01 poise for water at 20 C). To maintain turbulent flow conditions in a cylindri-
cal tube, Re has to exceed values of about 2000.
Turbulence is important not only for achieving efficient mixing, but also for
maintaining favorable flow conditions during observation. In stopped-flow and
quenched-flow experiments, turbulent flow insures efficient purging of the flow
lines. In continuous-flow measurements, turbulent flow conditions in the observa-
tion channel lead to an approximate ‘‘plug flow’’ profile, which greatly simplifies
data analysis compared to the parabolic profile obtained under laminar flow condi-
tions. The time resolution of a rapid mixing experiment is governed not only by
the mixing time, which in practice is difficult to quantify, but also the delay be-
tween mixing and observation. The effective delay between initiation of the reac-
tion and the first reliably measurable data point is defined as the dead time,
Dtd . In both stopped- and continuous-flow experiments, any unobservable volume
(dead volume), DV, between the point where mixing is complete and the point of
observation contributes an increment Dt ¼ DV/ðdV=dtÞ to the dead time (dV/dt is
the flow rate). Additional contributions to the effective dead time include the time
delay to stop the flow and any artifacts that can obscure early parts of the kinetic
trace (see Appendix).
Increasing the flow rate promotes more efficient mixing by generating smaller
turbulent eddies, and yields shorter time delays Dt, and thus should lead to shorter
dead times. However this trend does not continue indefinitely. Aside from practical
problems due to back pressure and, in the case of stopped-flow measurements, var-
ious stopping artifacts, the time resolution of a rapid mixing experiment is ulti-
mately limited by cavitation phenomena [53]. Under extreme conditions, the pres-
sure gradients across turbulent eddies can become so large that the solvent begins
to evaporate, forming small vapor bubbles that can take a long time to dissolve.
The result is an intensely scattering plume that makes meaningful detection of
the kinetic signal virtually impossible.
15.4
Detection Methods
The design principles and performance tests of common rapid mixing instru-
ments, including a typical commercially available stopped-flow apparatus and the
continuous-flow instrument developed in our laboratory [26], are described in the
Appendix. A major strength of rapid mixing methods is that they can be combined
with a wide range of detection methods. Table 15.1 lists common detection meth-
ods used in rapid mixing studies of the kinetics of folding, which are illustrated
here with selected examples.
15.4.1
Tryptophan Fluorescence
Tab. 15.1. Common detection methods used in kinetic studies of protein folding.
a
Relative fluorescence Native
3
Intermediate
1
Unfolded
0
300 320 340 360 380 400 420 440
Wavelength (nm)
b
3
Relative fluorescence
2 Trp76
Trp140
1
-5 -4 -3 -2 -1 0 1
10 10 10 10 10 10 10
Time (s)
Fig. 15.2. Folding mechanism of SNase by global analysis of the fluorescence spectra
probed by tryptophan fluorescence. a) as a function of urea concentration (pH 5.2,
Fluorescence emission spectra of the Trp76 15 C). b) Time course of folding (triggered by
variant of SNase under native and denaturing a pH jump from 2 to 5.2) for WT* SNase
conditions (solid) and a folding intermediate (Trp140) and a single-tryptophan variant
populated at equilibrium (dashed). The (Trp76) measured by continuous-flow (< 103 s)
spectrum of the intermediate was determined and stopped-flow (> 10 3 s) fluorescence.
native state was detected in equilibrium unfolding experiments (dashed line in Fig-
ure 15.2a). In contrast to WT* SNase (P47G, P117G and H124L background),
which shows no changes in tryptophan fluorescence prior to the rate-limiting fold-
ing step (@100 ms), the F76W/W140H variant shows additional changes (enhance-
ment) during an early folding phase with a time constant of about 80 ms (Figure
15.2b). The fact that both variants exhibit the same number of kinetic phases with
very similar rates confirms that the folding mechanism is not perturbed by the
F76W/W140H mutations. However, the Trp at position 76 reports on the rapid for-
mation of a hydrophobic cluster in the N-terminal b-sheet region while the wild-
type Trp140 is silent during this early stage of folding.
498 15 Early Events in Protein Folding Explored by Rapid Mixing Methods
15.4.2
ANS Fluorescence
0.8
Relative fluorescence (ANS)
a U A
0.6
U N
0.2
native control
0.0
0 200 400 600 800
Time (µs)
30
b
25 A+ANS A·ANS
Rate (103 s–1)
20 U A
15 U N
10
0
0 50 100 150 200 250
[ANS] (µM)
Fig. 15.3. An early folding event in Trp76 ANS binding kinetics in the presence of 1 M
SNase detected by ANS fluorescence. a) ANS KCl at pH 2.0; native control: ANS binding
fluorescence changes during ANS binding/ kinetics under the native condition (pH 5.2). b)
folding of Trp76 SNase measured by ANS concentration dependence of the rates for
continuous-flow experiments at 15 C in the the major (fast) kinetic phases observed
presence of 160 mM ANS. U ! A: Salt during the U ! A (triangles), U ! N (filled
concentration jump from 0 M to 1 M KCl at circles), and A þ ANS ! A ANS (filled
pH 2.0; U ! N: refolding induced by a pH- squares) reactions.
jump from 2.0 to 5.2; A þ ANS ! A ANS:
15.4 Detection Methods 499
and during formation of the A-state, the compact acid-denatured state of SNase
(U ! A). Also shown is the kinetics of ANS binding to the preformed A-state.
While the rate of ANS binding to the A-state shows the linear dependence on
ANS concentration characteristic of a second-order binding process (panel b), the
rates observed under refolding conditions (both to the native state and compact
A-state) level off at @120 mM ANS. The limiting ANS-independent rate at higher
concentrations thus is due to an intramolecular conformational event that precedes
ANS binding. The rate of this process closely matches that of the earliest phase
detected by intrinsic fluorescence of Trp76 (Figure 15.3b), confirming that both
processes reflect a common early folding step. The results are consistent with the
rapid accumulation of an ensemble of states containing a loosely packed hydropho-
bic core involving primarily the b-barrel domain. In contrast, the specific interac-
tions in the a-helical domain involving Trp140 are formed only during the final
stages of folding.
15.4.3
FRET
Fluorescence resonant energy transfer (FRET) can potentially give more specific
information on the changes in average distance between fluorescence donors and
acceptors. For example, cyt c contains an intrinsic fluorescence donor–acceptor
pair, Trp59 and the covalently attached heme group, which quenches tryptophan
fluorescence via excited-state energy transfer [64]. We have made extensive use of
this property to characterize the folding mechanism of cyt c [14, 50, 65], including
the initial collapse of the chain on the microsecond time scale [33, 34].
We recently combined ultrafast mixing experiments with FRET in order to mon-
itor large-scale structure changes during early stages of folding of acyl-CoA-binding
protein (ACBP), a small (86-residue) four-helix bundle protein [38]. ACBP contains
two tryptophan residues on adjacent turns of helix 3, which served as fluorescence
donors, and an AEDANS fluorophore covalently attached to a C-terminal cystein
residue was used as an acceptor (Figure 15.4a). Earlier equilibrium and kinetic
studies, using intrinsic tryptophan fluorescence, showed a cooperative unfolding
transition and single-exponential (un)folding kinetics consistent with an apparent
two-state transition. Even when using continuous-flow mixing to measure intrinsic
tryptophan fluorescence changes on the submillisecond time scale (Figure 15.4b),
we found only minor deviations from two-state folding behavior. However, when
we monitored the fluorescence of the C-terminal AEDANS group while exciting
the tryptophans, we observed a large increase in fluorescence during a fast kinetic
phase with a time constant of 80 ms, followed by a decaying phase with a time con-
stant ranging from about 10 ms to 500 ms, depending on denaturant concentra-
tion (Figure 15.4c). The large enhancement in FRET efficiency is attributed to a
major decrease in the average distance between helix 3 and C-terminus of ACBP.
The fact that the early changes are exponential in character suggests that the initial
compaction of the polypeptide is limited by an energy barrier rather than chain dif-
fusion. The subsequent decrease in AEDANS fluorescence during the final stages
500 15 Early Events in Protein Folding Explored by Rapid Mixing Methods
a Ile86 / C-terminus
Trp55
Trp58
N-terminus
b
1
Relative fluorescence
0.8
0.6
0.4
0.2
c
2
0.34 M
Relative fluorescence
1.8
1.6
0.73 M
1.4
1.06 M
1.2 1.42 M
1
0 0.2 0.4 0.6 0.8 1 20 40 60 80 100
Time (ms)
Fig. 15.4. FRET-detection of an early folding refolding kinetics of unmodified ACBP (b) and
intermediate in a helix-bundle protein, ACBP. AEDANS-labeled ACBP,I86C (c) in pH 5.3
a) Ribbon diagram of ACBP, based on an NMR buffer containing 0.34 M GdmCl at 26 C. In
structure. The two tryptophan residues and the both panels data from continuous-flow (open
mutated C-terminal isoleucine are shown in circles) and stopped-flow (open triangles)
ball and stick. The two lower panels show experiments were matched and combined.
folding event. These observations indicate that the early (80 ms) folding phase
marks the formation of a collapsed, but loosely packed and highly dynamic
ensemble of states with overall dimensions (in terms of fluorescence donor–
acceptor distance) similar to that of the native state. Accumulation of partially
structured states with some native-like features may facilitate the search for the
native conformation. Because of their short lifetime and low stability, such inter-
mediates can easily be missed by conventional kinetic techniques, whereas the con-
tinuous-flow FRET technique offers the temporal resolution and structural sensi-
tivity to detect even marginally stable intermediates populated during early stages
of folding.
15.4.4
Continuous-flow Absorbance
15.4.5
Other Detection Methods used in Ultrafast Folding Studies
15.5
A Quenched-Flow Method for H-D Exchange Labeling Studies on the Microsecond
Time Scale
H-D exchange labeling experiments coupled with NMR detection [13–15, 78, 79]
are important sources of structural information on protein folding intermediates.
These experiments generally rely on commercial quenched-flow equipment to
carry out two or three sequential mixing steps, which limits the time resolution to
a few milliseconds or longer. In order to push the dead time of quenched-flow
measurements into the microsecond time range, we made use of our highly effi-
cient capillary mixers [26]. The device (illustrated in Figure 15.6a) uses a quartz
capillary mixer similar to that used for optical measurements (see Appendix), but
without observation cell, in order to generate a homogeneous mixture of solutions
15.5 A Quenched-Flow Method for H-D Exchange Labeling Studies on the Microsecond Time Scale 503
0.10
405 nm
0.05 410 nm
Absorbance change (OD)
0.00
-0.05
400 nm
-0.10
385 nm
-0.15
395 nm
-0.20
390 nm
-0.25
0.0 0.2 0.4 0.6 0.8 1.0 1.2
Time (ms)
Fig. 15.5. Initial stages of refolding of acid-denatured oxidized
cyt c at pH 5 monitored by continuous-flow absorbance
measurements at different wavelengths spanning the Soret
heme absorbance band. The lines represent a global fit of a
four-state folding mechanism to the family of kinetic traces.
A and B. The mixture emerges from the capillary as a fine (200 mm diameter) jet
with a linear velocity of up to 40 m s1 at the highest flow rate used (1.25 mL s1 ).
A second mixing event can be achieved simply by injecting the jet into a test tube
containing a third solution C; the extremely high flow velocity ensures very effi-
cient mixing.
To determine the dead time (i.e., the shortest delay between the two mixing
events), we carried out a series of H-D exchange experiments on a pentapeptide
(YGGFL). Rapid exchange of the backbone amide protons with solvent deuterons
was achieved by mixing an H2 O solution of the peptide with fivefold excess of
D2 O buffered at pH* 9.7 (uncorrected pH meter reading). The exchange reaction
was quenched by injecting the mixture into ice-cooled acetate buffer at pH* 3.
Under these quench conditions, the rate of exchange for some of the peptide NH
groups (Gly3, Phe4, and Leu5) is sufficiently slow (10, 45, and 70 min, respectively)
to determine their residual NH intensity by recording one-dimensional 1 H NMR
spectra. Figure 15.6b shows the results of a series of experiments in which the
capillary was raised from direct contact with the quench solutions to a distance
of about 40 mm corresponding to a ‘‘time-of-flight’’ of @1 ms. For Gly3 and Phe4,
exponential fits of the decay in residual NH intensity with the incremented time
delay yields exchange rates of 5600 and 4400 s1 , respectively, in agreement with
published intrinsic exchange rates [80] (the rate of the C-terminal amide group is
too slow to be measurable over a 1-ms time window, and the Gly2 NH continues to
504 15 Early Events in Protein Folding Explored by Rapid Mixing Methods
a
Unfolded Refolding /
protein in D2O pulse buffer
(H2O, pH 8-11)
Outer
quartz Inner quartz
capillary capillary
Ridges
Pt sphere
t d = 60 µs
Quenching
buffer, pH 3
b t d = 60 µs
0.6
0.5
0.4 G3
I(NH) /I(C H)
F4
ε
0.3 L5
G2
0.2
0.1
0.0
0 200 400 600 800 1000
Time (µs)
Fig. 15.6. A capillary mixing device for quench buffer solution. b) Quenched-flow
quenched-flow measurements on the NMR measurement of the H-D exchange
microsecond time scale. a) A capillary mixer reaction of backbone NH groups of a model
similar to that described in the Appendix peptide (YGLFG). Extrapolation of the
(Figure 15.16), but without flow cell, is used to exponential decay in normalized NH resonance
generate a fast free-flow jet. A second mixing intensity (solid curves) yields an estimated
event occurs upon impact of the jet with a dead time of 60 ms.
15.6
Evidence for Accumulation of Early Folding Intermediates in Small Proteins
15.6.1
B1 Domain of Protein G
Continuous-flow Stopped-flow
a b
1.0 [GdmCl]/M N
0.45
Relative fluorescence
0.8 0.73
0.91
0.6 1.12
1.31
0.4
2.00
U
k ui k in
U I N
k iu k ni
Scheme 15.1
ln k ij ¼ ln k ij 0 þ ðm ij z /RTÞ c ð2Þ
a
kIU
Rate (s )
103 kIN
102
–1
kUI
101
kNI
100
b
1.0
Rel. amplitude
0.5
0.0
0 1 2 3 4 5 6
[GdmCl] (M)
c TS1
TS2
Fre e ene rgy
[GdmCl]/M
0.0
I
2.5
U
N
a = 0 0.29 0.85 0.85 1
Reaction coordinate
Fig. 15.8. GdmCl-dependence of the rate conditions where the intermediate, I is well
constants (a) and kinetic amplitudes (b) of the populated (0 M) and unstable (2.5 M GdmCl).
fast (squares) and slow (circles) kinetic phases a represents the change in solvent-accessible
observed during folding of GB1. (c) Free surface area relative to the unfolded state U.
energy diagrams for folding of GB1 under
Three-state analysis of a complete set of kinetic data, such as that of GB1 re-
ported by Park et al. [35], provides a comprehensive description of the folding
mechanism in terms of the size of the free-energy barriers separating the interme-
diate from the unfolded and native states (labeled TS1 and TS2 in Figure 15.8c) and
the stability of the intermediate (DG ¼ RT lnðkUI /kIU ). In addition, the denatur-
ant dependence of the four elementary rate constants provides valuable insight
into the changes in solvent-accessible surface area associated with each transition.
In Figure 15.8c, each state and intervening transition state is labeled with the cor-
508 15 Early Events in Protein Folding Explored by Rapid Mixing Methods
15.6.2
Ubiquitin
The 76-residue a/b protein ubiquitin is another well-studied small protein for
which three-state folding behavior has been reported under some conditions. Khor-
15.6 Evidence for Accumulation of Early Folding Intermediates in Small Proteins 509
Continuous-flow Stopped-flow
Residuals 0.04
0.00
-0.04
1.0 N
Relative fluorescence
0.8
0.6
0.4
U
0.2
10 -4 10 -3 10 -2 10 -1
Time (s)
Fig. 15.9. Folding kinetics of GB1 at 1.12 M GdmCl (pH 5.0,
20 C, 0.4 M sodium sulfate) monitored by continuous-flow
(circles) and stopped-flow (diamonds) fluorescence. Single-
and double-exponential fits and residuals are shown with solid
and dashed lines, respectively.
asanizadeh et al. [48, 58] found deviations from two-state behavior in the folding
kinetics of a tryptophan-containing ubiquitin variant (F45W mutant), including
a downward curvature in the rate profile (log(rate) vs. [denaturant] plot) and a con-
comitant drop in the relative amplitude of the main folding phase at low denatur-
ant concentration. They were able to account for both phenomena (rollover and
burst phase) in terms of a three-state mechanism (U $ I $ N) with an obligatory
on-path intermediate. As detailed above for GB1, this simple kinetic mechanism
explains the leveling-off of the rate constant and diminishing amplitude of the
principal (rate-limiting) folding phase at GdmCl concentrations below 1 M. How-
ever, without direct observation of the inferred fast folding phase, it is not possible
to rule out alternative mechanisms with off-path or non-obligatory intermediates
(further discussed in Section 15.7.3). Moreover, the uncertainty in the burst-phase
amplitude is substantial if the main observable folding phase approaches the dead
time of the kinetic experiment, which is the case for ubiquitin under stabilizing
conditions. Krantz and Sosnick [90] remeasured the folding kinetics of F45W ubiq-
uitin, using a stopped-flow instrument with a dead time of @1 ms. They found a
linear rate profile for the main folding phase and no indications of a burst phase,
and concluded that our earlier evidence for a folding intermediate was based on
fitting artifacts.
510 15 Early Events in Protein Folding Explored by Rapid Mixing Methods
a b 20
20 10
-3
-3
10 0
x10
x10
0
-10 -10
-20 -20
20 20
10 10
-3
-3
0 0
x10
x10
-10 -10
-20 -20
1.0 M GdmCl
1.2 0.5 M GdmCl 1.2 1.0 M GdmCl
0.5 M GdmCl
1.0 1.0
150 µs 130 µs
fl(t)/fl(U)
fl(t)/fl(U)
0.8 0.8
1.8 ms
0.6 0.6 2.1 ms
7.1 ms
0.4 0.4
4-exp 68 ms 3-exp 30 ms
0.2 0.2
0.0001 0.001 0.01 0.1 1 0.0001 0.001 0.01 0.1 1
Time (s) Time (s)
Fig. 15.10. Comparison of quadruple (a) and with respect to the unfolded protein in 6 M
triple (b) exponential fitting of the kinetics of GdmCl. The residuals (top two traces in each
refolding of F45W ubiquitin at final GdmCl panel) indicate that four exponentials are
concentrations of 0.5 and 1.0 M (pH 5, 25 C). required to obtain a satisfactory fit of the data
Fluorescence traces measured in continuous- over the time window shown.
and stopped-flow experiments were normalized
1000
l1
kIU
kIN
Rate (s )
l2
–1
100
kUI
l3
10
residuals with amplitudes much larger than the scatter of the data points (Figure
15.10b). Under the most stabilizing conditions studied (0.5 M GdmCl), a satisfac-
tory fit of the main fluorescence decay between 300 ms to 10 ms requires two dis-
tinct phases, l1 and l2 , with time constants t1 ¼ 1:8 and t2 ¼ 7:1 ms, respectively.
A fourth phase (t3 @ 70 ms) is necessary to account for a minor decay observed
between 20 and 200 ms. The fact that both the rate and amplitude of l3 are essen-
tially independent of denaturant concentration points toward a heterogeneous pro-
cess, perhaps involving a minor population of molecules containing nonnative (cis)
proline isomers.
A plot of the rate constants for the three decaying phases vs. GdmCl concentra-
tion (Figure 15.11) shows a pronounced rollover for l2 (squares) with a denaturant-
independent regime below 0.75 M followed by a linear decrease above 1 M GdmCl,
whereas l1 shows a shallow upward curvature. As in the case of GB1 (Figure 15.8),
this behavior is fully consistent with a three-state mechanism (Scheme 15.1). Fig-
ure 15.11 also shows the apparent rates for the main decaying phase obtained by
fitting only three exponentials (symbol x). Since they represent an average of l1
and l2 weighted according to the relative amplitudes, they continue to increase
with decreasing GdmCl concentration below 1 M. Thus, the controversy whether
an intermediate accumulates during folding of ubiquitin boils down to a rather
subtle curve-fitting problem.
In our initial study, we detected only the slower process, l2 , which exhibits the
rollover and missing-amplitude effects indicative of a burst-phase intermediate
512 15 Early Events in Protein Folding Explored by Rapid Mixing Methods
15.6.3
Cytochrome c
Cyt c has played a central role in the development of new kinetic approaches
for exploring early events in protein folding. The presence of a covalently attached
heme group in this 104-residue protein (Figure 15.12) serves as a useful optical
marker, and its redox and ligand binding properties provide unique experimental
opportunities for rapid initiation and observation of folding [19, 25, 93]. Its sole
tryptophan residue, Trp59, is located within 10 Å of the heme iron (Figure 15.12),
resulting in efficient fluorescence quenching through a Förster-type energy trans-
fer mechanism. Strongly denaturing conditions (e.g., 4.5 M GdmCl, or acidic pH
M80 C
60’s H18
H33
W59
H26
Fig. 15.12. Ribbon diagram of horse cytochrome c, based on the crystal structure [146].
15.6 Evidence for Accumulation of Early Folding Intermediates in Small Proteins 513
at low ionic strength) result in a large increase in Trp59 fluorescence (up to @60%
of that of free tryptophan in water) indicative of an expanded chain conforma-
tion with an average tryptophan–heme distance greater than 35 Å [32, 64]. While
numerous studies have shown that folding of oxidized cyt c is accompanied by
changes in coordination of the heme iron [14, 25, 45, 50, 73, 74, 94–96], these com-
plications can be largely avoided by working at mildly acidic pH (4.5–5) where the
protein is still stable, but histidine residues are protonated and no longer can bind
to the heme iron [73, 97].
Our capillary mixing apparatus [26] enabled us to resolve the entire fluorescence-
detected folding kinetics of cyt c, including the elusive initial collapse of the chain
[33]. Figure 15.13a shows the decay in Trp59 fluorescence observed during refold-
ing of acid-unfolded cyt c (pH 2, 10 mM HCl) induced by a pH jump to native con-
ditions (pH 4.5, 22 C). The continuous-flow data covering the time range from 45
ms to @1 ms are accurately described by a biexponential decay with a major rapid
phase (time constant 59 G 6 ms) and a minor process in the 500 ms range. This fit
extrapolates to an initial fluorescence of 1:0 G 0:05 (relative to the acid-unfolded
protein) at t ¼ 0, indicating the absence of additional, more rapid fluorescence
changes that remain unresolved in the 45 ms dead time. The same data are plotted
in Figure 15.13b on a logarithmic time scale, along with a stopped-flow trace mea-
sured under matching conditions, which accounts for the final 10% of the total flu-
orescence change.
The combined kinetic trace covers six orders of magnitude in time and accounts
for the complete change in Trp59 fluorescence associated with refolding of cyt c.
An Arrhenius plot of the rate of the initial phase (Figure 15.13a, inset) yields an
apparent activation enthalpy of 30 kJ mol1 , which is significantly larger than that
expected for a diffusion-limited process [98]. In subsequent laser T-jump studies,
Hagen, Eaton and colleagues [77, 99] detected a relaxation process with similar
rates and activation energy, confirming the presence of a free energy barrier be-
tween unfolded and collapsed conformations of cyt c.
In other continuous-flow experiments, we measured the kinetics of folding of
cyt c starting from either the acid-unfolded (pH 2, 10 mM HCl) or the GdmCl-
unfolded state (4.5 M GdmCl, pH 4.5 or pH 7), and ending under various final
conditions (pH 4.5 or pH 7 and GdmCl concentrations from 0.4 to 2.2 M) [33]. In
each case, we observed a prominent initial decay in fluorescence with a time con-
stant ranging from 25 to 65 ms. In particular, the rate of the initial phase, measured
under the same final conditions (pH 4.5, 0.4 M GdmCl), was found to be indepen-
dent of the initial state (acid- or GdmCl-unfolded). These observations clearly indi-
cate that a common rate-limiting step is encountered during the initial stages of cyt
c folding. The large amplitude of the initial phase (as much as 70% of the total
change in fluorescence under strongly native conditions) is consistent with the for-
mation of an ensemble of compact states, which was confirmed in subsequent
small-angle X-ray (SAXS) studies [71, 72].
Using their continuous-flow CD instrument to follow refolding of acid-
denatured cyt c after a dead time of @400 ms, Akiyama et al. [67] observed signifi-
cant changes in the far-UV region during the 500 ms and 2 ms folding phases of cyt
514 15 Early Events in Protein Folding Explored by Rapid Mixing Methods
a 1.0
11
0.8 10
Relative fluorescence
ln(rate/s)
9
0.6
8
7
0.4
3.2 3.3 3.4 3.5 3.6
1000/T (K-1)
0.2
b
0.6
Relative fluorescence
t1 = 60 ms (a1=0.6)
0.4
t2 = 430 ms (a2=0.3)
0.2
t3 = 4.8 ms (a3=0.1)
0.0
-2-4 -1 -3 0 1
Log (time/s)
Fig. 15.13. Refolding kinetics of acid-unfolded submillisecond phases. b) Combined plot of
cyt c at pH 4.5, 22 C [33]. a) Submillisecond continuous-flow and stopped-flow kinetic
kinetics measured by continuous-flow mixing, traces measured under identical conditions.
indicating heme-induced quenching of Trp59 The time constants and relative amplitudes (in
fluorescence associated with chain collapse. parenthesis) are indicated for the three major
Inset: Arrhenius plot for the rates of the phases.
major (circles) and the minor (squares)
c, indicating that most of the native helical secondary structure is acquired during
the second and third (final) stages of folding. Extrapolation of the CD signal at
222 nm to shorter times indicated that the product of the 60 ms collapse phase is
about 20% more helical than the GdmCl-unfolded state, but has a similar helix
content as the acid-unfolded state. In their subsequent continuous-flow SAXS
15.7 Significance of Early Folding Events 515
measurements, Akiyama et al. [72] found that a partially collapsed cyt c interme-
diate (R g ¼ 20:5 Å) accumulates within the 160 ms dead time of their experiment,
which corresponds to a significant compaction relative to the acid-denatured state
(R g ¼ 24 Å). In a subsequent phase on the millisecond time regime, the protein
passes through a second, more compact intermediate (R g ¼ 18 Å) before reaching
the native state (R g ¼ 13:9 Å). These findings confirm that cyt c undergoes a major
chain collapse during the initial folding phase on the 10–100 ms time scale detected
in previous continuous-flow and T-jump measurements of the heme-induced
quenching of Trp59 fluorescence [33, 77, 99].
15.7
Significance of Early Folding Events
15.7.1
Barrier-limited Folding vs. Chain Diffusion
In Sections 15.4 and 15.6, we presented continuous-flow results on the folding ki-
netics of five proteins (GB1, cyt c, ubiquitin, ACBP, and SNase) all of which show
significant fluorescence changes in one or more rapid phases on the submillisec-
ond time scale that precede the rate-limiting folding step [33, 35, 36, 38, 60]. Other
examples include the bacterial immunity protein Im7 [36], which will be dis-
cussed further in Section 15.7.3, cytochrome c551 from Pseudomonas aeruginosa
[39], b-lactoglobulin [37] and the Y92W mutant of ribonuclease A (Welker, Maki,
Shastry, Juminaga, Bhat, Roder & Scheraga, submitted). For most proteins
(GB1, ubiquitin, Im7, b-lactoglobulin, SNase, and RNase), we observed an increase
in Trp fluorescence during the fast phase, which we attribute to the burial of the
tryptophan fluorophores within an apolar environment. For three proteins (cyt c,
cytochrome c551 , and ACBP), we measured a large increase in energy transfer effi-
ciency during the fast phase, confirming that these early conformational events in-
clude a large-scale collapse of the polypeptide chain. Thus, we conclude that for all
of these structurally diverse proteins, major conformational changes, including a
large-scale collapse of the chain, precede the rate-limiting step in folding. In every
case the initial phase follows an exponential time course, indicating that a discrete
free energy barrier separates the ensemble of compact states from the more ex-
panded ensemble of unfolded states. In contrast, if the collapse of the polypeptide
chain were a continuous process governed by diffusive dynamic, as predicted for
homopolymers [100], it would be characterized by a broad distribution of time con-
stants likely to give rise to nonexponential (‘distributed’) kinetics [16, 101].
In fact, Sabelko et al. [102] observed such distributed folding kinetics in
temperature-jump experiments under conditions where folding can proceed down-
hill from a saddle point in the free energy surface. While the observation of expo-
nential kinetics may not always be a sufficient condition for the presence of an
activation barrier [103], the case for a barrier-limited process is strengthened con-
siderably if the process also exhibits a significant activation enthalpy, as observed
516 15 Early Events in Protein Folding Explored by Rapid Mixing Methods
for cyt c [33]. Moreover, comparison of the pH-induced folding traces of cyt c moni-
tored by Trp59 fluorescence (Figure 15.13) and heme absorbance (Figure 15.5) in-
dicates that the time constant of the initial phase is independent of the probe used
(59 G 6 ms by fluorescence compared to 65 G 4 ms by absorbance), which strongly
supports the presence of a barrier (a diffusive collapse process is likely to lead to
probe-dependent kinetics).
The existence of a prominent free energy barrier between expanded and compact
states justifies the description of the initial folding events in terms of a model in-
volving an early folding intermediate. The first large-scale conformational change
during folding can, thus, be described as a reversible two-state transition. The find-
ing that the rate of collapse in cyt c is insensitive to initial conditions [33] suggests
that the barrier may represent a general entropic bottleneck encountered during
the compaction of the polypeptide chain, which is consistent with the moderate
activation enthalpy associated with the initial transition (Figure 15.13). Bryngelson
et al. [16] classified this type of folding behavior as a ‘‘type I’’ scenario where a free
energy barrier arises because of the unfavorable reduction in conformational en-
tropy, which can be compensated by favorable enthalpic and solvent interactions
only during the later stages of collapse.
15.7.2
Chain Compaction: Random Collapse vs. Specific Folding
Although denatured proteins often retain far more structure than expected for a
random coil polymer [104], they are substantially more expanded than the native
state [105]. The folding process therefore must be accompanied by a net decrease
in chain dimensions. However, there are persistent controversies surrounding the
question whether chain contraction occurs prior to or concurrent with the rate-
limiting folding step, and whether compact states are the result of random hydro-
phobic collapse or a more specific structural events. Our earlier conclusion that the
optical changes on the submillisecond time scale (burst-phase events) seen for cyt
c and many other proteins [7, 9] reflect the formation of productive folding inter-
mediates has been challenged by Englander and coworkers [106, 107] on the basis
of fluorescence and CD measurements on two heme-containing peptide fragments
of cyt c (residues 1–65 and 1–80, respectively). Although the fragments are unable
to assume a folded structure, they nevertheless show rapid (<ms) denaturant-
dependent fluorescence and far-UV CD changes resembling the burst-phase be-
havior of the intact polypeptide chain (cf. Figure 15.1).
Assuming that the fragments remain fully unfolded under all conditions, Sos-
nick et al. [106, 107] concluded that both the optical properties of the fragments
and the burst-phase changes seen during refolding of intact cyt c reflect a rapid
solvent-dependent readjustment of the unfolded polypeptide chain rather than for-
mation of partially folded states. However, this conclusion is inconsistent with our
observation that refolding of both GdmCl- and acid-denatured cyt c is accompanied
by an exponential fluorescence decay [33], which indicates that the ensemble of
states formed on the submillisecond time scale are separated by a free energy bar-
rier from the unfolded states found immediately after adjustment of the solvent
15.7 Significance of Early Folding Events 517
conditions. That these states are not just part of a broad distribution of more or less
expanded denatured conformations is further supported by the observation of non-
random amide protection patterns [79]. Apparently, both the full-length and trun-
cated forms of cyt c can assume a compact ensemble of states upon lowering of the
denaturant concentration. This leads to the prediction that the fragments will also
exhibit an exponential fluorescence change on a time scale similar to the intact
protein (40–60 ms at 22 C and GdmCl concentrations of 0–0.4 M [33]). This pre-
diction was recently confirmed by Qiu et al. [99], who observed an exponential re-
laxation process with a time constant of 20–30 ms at room temperature and a sub-
stantial activation energy (60–70 kJ mol1 ) for the 1–65 and 1–80 fragments of cyt
c, using a laser T-jump apparatus.
In a subsequent study on ribonuclease A (RNase A), Englander and col-
leagues [108] reported that the GdmCl-dependence of the CD signal for the fully
disulfide-reduced form closely matches the initial kinetic amplitude observed in
stopped-flow refolding experiments on the protein with all four disulfide bonds
intact. As in the case of the cyt c fragments, Qi et al. [108] assumed that reduced
RNase remains in a random denatured state even under nondenaturing solvent
conditions, which needs to be explored further. As an alternative interpretation,
we again suggest that both in the presence and absence of disulfide bonds, RNase
A rapidly forms a nonrandom ensemble of states, which is consistent with the
shallow, but distinctly sigmoidal GdmCl dependence of the far-UV CD signal
[108]. Since the native RNase A structure relies on disulfide bonds for its stability,
folding of the reduced form cannot proceed beyond this early intermediate while
the oxidized protein continues to fold to the native state.
15.7.3
Kinetic Role of Early Folding Intermediates
Several small proteins exhibit two-state behavior under certain conditions (e.g.,
elevated denaturant concentration, destabilizing mutations), but show evidence
for populated intermediates (burst phase and/or nonlinear rate profiles) under
other, more stabilizing, conditions [32, 38, 48, 58, 93, 109–111]. Thus, apparent
two-state behavior can be considered a limiting cases of a multistate folding mech-
anism with unstable (high-energy) intermediates [48]. On the other hand, there are
many well-documented examples of small proteins that show two-state folding
behavior even under stabilizing conditions (reviewed in Refs [1, 112]). In several
cases, the rate of folding at low denaturant concentrations was found to extend
into the submillisecond time range and could be resolved only by methods such
as NMR lineshape analysis [113, 114], electron transfer triggering [115], pressure-
jump [116] or temperature-jump techniques [24].
While these findings clearly indicate that many small proteins can fold rapidly
without going through intermediates, there is little support for the notion that
early intermediates slow down the search for the native conformation [74, 117,
118], except in cases where they contain features inconsistent with the native fold,
such as nonnative proline isomers, metal ligands [73, 74, 119], or intermolecular
interactions [120]. In fact, solution conditions that favor accumulation of inter-
518 15 Early Events in Protein Folding Explored by Rapid Mixing Methods
k iu k in
I U N
k ui k ni
Scheme 15.2
I
kui kin
kiu kni
kun
U N
knu
Scheme 15.3
a b Continuous-flow Stopped-flow
1.2
0.5 M urea
Relative fluorescence
1.0
0.8
0.6
0.4
10000
1000 kui
kui
Rate (s-1)
100
1 knu kun
kni
0.1
0 2 4 6 8 0 2 4 6 8
[Urea] (M) [Urea] (M)
Fig. 15.14. Kinetic mechanism of Im7 folding. intermediates fail to reproduce the data
a) Ribbon diagram of Im7. b) Representative (dashed line). c) Observed (symbols) and
kinetic trace measured by continuous-flow predicted (solid lines) rates of folding and
(f) and stopped-flow (b) fluorescence. unfolding, based on mechanisms with
The kinetics at this and all other urea on-pathway (left) and off-pathway (right)
concentrations measured is accurately intermediates. Dashed lines indicate the
predicted by on on-pathway mechanism corresponding elementary rate constants.
(solid line) while schemes with off-pathway
the 2.5-ms dead time of their stopped-flow instrument. However, the data could be
fitted equally well to models involving either on- or off-pathway intermediates. Ca-
paldi et al. extended these measurements into the microsecond time range [36], us-
ing our continuous-flow mixing instrument to detect the changes in fluorescence
of a single tryptophan (Trp75) associated with folding under various conditions
(Figure 15.14b). The initial increase in fluorescence above the level of the dena-
tured state in 6 M urea over the 100 ms to 1 ms time range is consistent with the
rapid formation of a compact state resulting in a solvent-shielded environment for
520 15 Early Events in Protein Folding Explored by Rapid Mixing Methods
Trp75. The native structure is formed within a few milliseconds or longer (depend-
ing on the final concentration of urea), which is accompanied by partial quenching
of Trp75 fluorescence due to close contact with a histidine. The tight kinetic cou-
pling of the two folding events, together with the distinct fluorescence properties
of the three conformational states involved, allowed us to discriminate among var-
ious possible kinetic mechanisms. The time course of folding predicted by solving
the kinetic equations corresponding to the on-path mechanism (Scheme 15.1) ac-
curately describes the observed behavior (Figure 15.14b, solid line). By contrast, the
off-path mechanism (Scheme 15.2) failed to reproduce the data, especially at low
urea concentrations where the intermediate is well populated (Figure 15.14b,
dashed line). Likewise, the quality of the fits does not improve when a second, par-
allel pathway is introduced (Scheme 15.3) and deteriorates when more than 25% of
the flux bypassed the intermediate.
These findings were further confirmed by a more detailed analysis of the depen-
dence of the two observable rate constants on urea concentration (Figure 15.14c).
While Scheme 15.1 reproduces all of the kinetic data, there are major discrepancies
in the rate of the fast phase predicted by Schemes 15.2 and 15.3, which can thus be
ruled out.
By eliminating alternative mechanisms, we have thus been able to show that
accumulation of a compact intermediate on the submillisecond time scale is a pro-
ductive and obligatory event in folding of Im7 [36]. This supports the notion that
rapid formation of compact states can facilitate the search for the native conforma-
tion. Interestingly, a recent mutation study revealed that the Im7 folding interme-
diate contains non-native tertiary interactions among the three major a-helices, A,
B and D, while the short helix C forms only during the final stages of folding [128].
Apparently, intermediates can be kinetically productive even if they contain some
structural features not found in the final native state.
15.7.4
Broader Implications
Table 15.2 lists the time constants of the earliest folding phase measured by
continuous-flow fluorescence for some of the proteins discussed in this chapter
along with their size and structural type. The initial folding times vary by an order
of magnitude, but show no apparent correlation with protein size. This argues
against the notion that the early stages of folding are dominated by a nonspecific
hydrophobic polymer collapse, in which case the rate of the initial phase would
depend primarily on the size of the hydrophobic core and would be insensitive to
structural details.
There appears to be no simple relationship between the time scale of early fold-
ing events and secondary structure content. For example, the b-sheet containing
proteins, GB1 and SNase, are near opposite ends of the time range, and Im7 has
a @fivefold slower initial rate compared to the structurally similar ACBP. Thus, the
individual structural and topological features have a profound effect already during
early stages of folding.
The finding that the earliest detectable folding event in GB1 domain is an order
Appendix 521
Tab. 15.2. Comparison of the time constants of the initial folding phase with protein size and
secondary structure type for proteins with multi-state folding kinetics.
of magnitude slower than that of cyt c may well be related to the fact that GB1
contains b-sheet structure (Figure 15.7a) while cyt c is primarily a-helical (Figure
15.12). With its parallel pairing of b-strands from opposite ends of the chain, GB1
has a relatively high contact order, a metric of fold complexity that was found to
correlate with the logarithm of the folding time of proteins with two-state mecha-
nisms [129]. Such a trend is not expected for proteins with multistate folding
mechanisms, since the rate-limiting step in structurally and kinetically more com-
plex proteins is likely to involve structural events unrelated to the overall fold, such
as side chain packing and specific tertiary interactions. However, if compact early
intermediates have a native-like chain topology, one might expect a correlation be-
tween the time constant of the initial phase and contact order. Some of the trends
seen in Table 15.2 are consistent with this idea. For instance, cyt c and ACBP have
low contact order and collapse much faster than GB1, which has the highest con-
tact order among the proteins studied. The surprisingly fast initial phase of SNase
(75 ms) in comparison to the much smaller GB1 (600 ms) may be related to the fact
that the folding step in SNase detected by Trp76 and ANS fluorescence reflects
a relatively local structural event within its antiparallel b-barrel domain, whereas
the fluorescence of Trp43 in protein G may report on a more global conformational
change involving the parallel pairing of N- and C-terminal b-strands.
Finally, we note that several small proteins (or isolated domains of larger pro-
teins) can complete the process of folding on a comparable time scale as the early
stages of folding for some of the multistate proteins discussed here (see, for exam-
ple, Refs [130–133]; see Ref. [24] for a review). The underlying conformational
events may be similar, but the fast-folding two-state proteins can directly reach the
native state in a concerted collapse/folding event while most proteins encounter
additional steps after crossing the initial barrier.
Appendix
motors. Total flow rates in the range of 5–10 mL s1 with channel diameters of the
order of 1 mm insure turbulent flow conditions (Re > 5000). After delivering a vol-
ume sufficient to purge and fill the observation cell with freshly mixed solution,
the flow is stopped abruptly when a third syringe hits a stopping block, or by clos-
ing a valve. Commercial instruments can routinely reach dead times of a few milli-
seconds. Recent improvements in mixer and flow-cell design by several manufac-
turers of stopped-flow instruments resulted in dead times well under 1 ms.
The upper end of the time scale that can be reliably measured in a stopped-flow
experiment is determined by the stability of the mixture in the flow cell, which is
limited by convective flow or diffusion of reagents in and out of the observation
volume. For slow reactions with time constants longer than a few minutes, manual
mixing experiments are generally more reliable. Stopped-flow mixing is usually
coupled with real-time optical observation using absorbance (UV through IR), flu-
orescence emission or circular dichroism spectroscopy, but other biophysical tech-
niques, including fluorescence lifetime measurements [134, 135], NMR [136, 137],
and small-angle X-ray scattering (SAXS; [138]), have also been implemented.
The interpretation of stopped-flow data requires a careful calibration of the
instrumental dead time by measuring a pseudo-first-order reaction tuned to the
time scale of interest (i.e., a single-exponential process with a rate-constant ap-
proaching the expected dead time) and an optical signal matching the application.
Common test reactions for absorbance measurements include the reduction of 2,6-
dichlorophenolindophenol (DCIP) or ferricyanide by ascorbic acid [66].
A convenient test reaction for tryptophan fluorescence measurements is the irre-
versible quenching of N-acetyltryptophanamide (NATA) by N-bromosuccinimide
(NBS). For fluorescence studies in or near the visible range, one can follow the
pH-dependent association of the Mg 2þ ion with 8-hydroxyquinoline, which results
in a fluorescent chelate [139], or the binding of the hydrophobic dye 1-anilino-8-
naphthalene-sulfonic acid (ANS) to bovine serum albumin (BSA), which is associ-
ated with a large increase in fluorescence yield [140].
As a practical example, Figure 15.15 shows a series of DCIP absorbance meas-
urements at several ascorbic acid concentrations used to estimate the dead time of
our Bio-Logic SFM-4 stopped-flow instrument equipped with a FC-08 microcuvette
accessory (Molecular Kinetics, Indianapolis, IN, USA). The reaction was started by
mixing equal parts of DCIP (0.75 mM in water at neutral pH, where the dye is sta-
ble) with l-ascorbic acid at pH 2 at final concentrations ranging from 6 to 25 mM.
Absorbance changes measured at 525 nm, near the isosbestic point between the
acidic and basic forms of DCIP, are plotted relative to the absorbance of the reac-
tant in water, measured in a separate control (Figure 15.15, inset). After the flow
comes to a full stop, the absorbance decays exponentially with a rate constants of
about 290, 780, and 1400 s1 at ascorbate concentrations of 6, 14, and 25 mM,
respectively, which is consistent with a pseudo-first-order reduction process with a
second-order rate constant k 00 A 5:7 10 5 s1 M1 .
At shorter times, the exponential fits intersect at an absorbance DA ¼ 0, which
corresponds to the level of the oxidized DCIP measured in the control. The delay
between this intercept and the first data point that joins the fitted exponential pro-
Appendix 523
td
0.00
0.00
-0.02
-0.04
∆A 525 (OD)
-0.04
-0.08
-0.06
∆A 525 (OD)
-0.12
-0.08
-0.16
-0.12
6 mM
-0.14 14 mM
25 mM
-0.16
0 2 4 6 8 10
Time (ms)
Fig. 15.15. Estimation of the dead time of were mixed at a total flow rate of 12 mL s1 .
stopped-flow absorbance measurement on a The inset shows an expanded plot at early
Biologic SFM-4 instrument with FC-08 times with dashed lines indicating the dead
microcuvette accessory, using the reduction of time (@0.5 ms). Absorbance changes at 525
DCIP by ascorbic acid as a test reaction. Equal nm are plotted relative to the absorbance of a
parts of DCIP in water (pH 7) and sodium control (diamonds) measured by mixing DCIP
ascorbate (pH 2) at final concentrations of 6 with 10 mM HCl.
(circles), 14 (squares) and 25 mM (triangles)
vides an estimate of the instrumental dead time, td ¼ 0:55 G 0:1 ms. Alternatively,
the dead time can be estimated from the signal level of the continuous-flow regime
prior to closure of the stop valve, Icf , using the following equation:
where Iy is the baseline at long times, I0 is the signal of the reactant (in this case,
DCIP mixed with an equal volume of water), and k is the first-order rate constant
obtained by exponential fitting. Note that tcf does not account for the finite stop
time and other stop-related artifacts, and is thus always shorter than td , which ex-
plains why some manufacturers of stopped-flow instruments prefer to cite tcf over
the more realistic operational dead time, td . In the present example, Eq. (3) yields
tcf in the range of 0.25 and 0.4 ms, compared with td A 0:55 ms obtained by the
extrapolation (Figure 15.15).
524 15 Early Events in Protein Folding Explored by Rapid Mixing Methods
Syringes
a In-line filters
Syringe drive
Mixer
Filter
Monochromator Cylindrical Lens
Arc lamp
lens
Cell holder and
flow-cell assembly 1317x1035
CCD detector
b
50 µm Pt wire
140 µm inner
capillary
20 µm posts
200 µm
Pt sphere
Flow-cell
250 µm channel
Fig. 15.16. Continuous-flow capillary mixing apparatus in
fluorescence mode. a) Schematic diagram of the solution
delivery system, mixer, observation cell and optical
arrangement. b) Expanded view of the mixer/flow cell assembly.
The sphere is formed by melting the end of 50 mm diameter platinum wire. Thin
glass rods fused to the inner wall of the outer capillary (tapering down to diameter
of @20 mm) prevent the sphere from plugging the outlet. The reagents are forced
through the narrow gap between the sphere and the outer wall where mixing
occurs under highly turbulent flow conditions. The fully homogeneous mixture
emerging from the mixer is injected into the 250 mm 250 mm flow channel of a
fused-silica observation cell (Hellma) joined to the outer capillary by means of a
hemispherical ground-glass joint. Typical flow rates are 0.6–1.0 mL s1 resulting
in a linear flow velocities of 10–16 m s1 through the 0:25 0:25 mm 2 channel
of the observation cell.
The reaction progress in a continuous-flow mixing experiment is measured by
recording the fluorescence profile vs. distance downstream from the mixer (Figure
526 15 Early Events in Protein Folding Explored by Rapid Mixing Methods
15.16). A conventional light source consisting of an arc lamp (we currently use a
350 W Hg arc lamp in a lamp housing from Oriel, Stratford, CT, USA), collimating
optics and monochromator, is used for fluorescence excitation. Relatively uniform
illumination of the flow channel over a length of 10–15 mm is achieved by means
of a cylindrical lens. A complete fluorescence vs. distance profile is obtained by
imaging the fluorescent light emitted at a 90 angle onto the CCD detector of a
digital camera system (Micromax, Roper Scientific, Princeton, NJ, USA) containing
a UV-coated Kodak CCD chip with an array of 1317 1035 pixels. The camera is
equipped with a fused silica magnifying lens and a high-pass glass filter or a
band-pass interference filter to suppress scattered incident light.
Figure 15.17 illustrates a typical continuous-flow experiment, using the quench-
ing of NATA by NBS as a test reaction. The upper panels show raw data obtained
by averaging the intensity across the flow channel vs. the distance d downstream
from the mixer. Panel a shows the intensity profile for the quenching reaction,
Ie ðdÞ, at final NATA and NBS concentrations of 40 mM and 4 mM, respectively
(mixing 1 part of 440 mM NATA in water with 10 parts of 4.4 mM NBS in water).
Panel b shows the distribution of incident light intensity, Ic ðdÞ, measured by mix-
ing the same NATA solution with water. Panel c shows the scattering background,
Ib ðdÞ, measured by passing water through both capillaries. Panel d shows the cor-
rected kinetic trace, fl rel ðtÞ, obtained according to
Distance was converted into time on the basis of the known flow rate (0.8 mL s1
in this example), the cross-sectional area of the flow channel (0.0625 mm 2 ) and
the length of the channel being imaged (12.5 mm, or 9.5 mm per pixel). The signal
measured at points below the entrance to the flow channel are well fitted by a
single-exponential decay. The trace at fl rel ¼ 1 in panel d of Figure 15.17 represents
a control for the mixing efficiency measured as follows. The 440 mM NATA stock
solution used in the experiment above was diluted 11-fold with water and filled
into both syringes. The continuous-flow trace recorded at the same flow rate was
then background-corrected and normalized according to Eq. (3), using the Ic trace
recorded previously in which the same NATA solution was diluted 11-fold in the
capillary mixer. Immediately after entering the flow channel, the ratio approaches
unity, indicating that NATA is completely mixed with water. Alternatively, mixing
efficiency could be assessed using a very fast reaction completed within the dead
time, such as the quenching of tryptophan (or NATA) by sodium iodide.
To estimate the dead time of the continuous-flow experiment, we measured the
pseudo-first-order NATA-NBS quenching reaction at a final NATA concentration
of 40 mM and several NBS concentrations in the range 2–32 mM. Figure 15.18a
shows a semilogarithmic plot of the relative fluorescence, fl rel , calculated according
to Eq. (3), along with exponential fits (solid lines). Since fl rel ¼ 1 corresponds to the
unquenched NATA signal expected at t ¼ 0, the intercept of the fits with fl rel ¼ 1
indicates the time point t ¼ 0 where the mixing reaction begins. The delay between
this point and the first data point that falls onto the exponential fit corresponds to
Appendix 527
a b c
100
Intensity
50
0 2 4 6 8 10 12 0 2 4 6 8 10 12 0 2 4 6 8 10 12
Distance (mm) Distance (mm) Distance (mm)
1.2
d
Relative fluorescence 1.0
0.8
0.6
0.4
0.2
0.0
0 200 400 600 800 1000
Time (µs)
Fig. 15.17. A typical continuous-flow mixing with water). c) Scattering background (water
experiment. The upper panels show raw deliverd from both syringes). Panel d shows
intensity profiles vs. distance from the mixing the corrected kinetic trace calculated according
region. a) Raw kinetic trace (NATA mixed with to Eq. (3).
NBS). b) Fluorescence control (NATA mixed
a 45 µs [NBS] (mM) b
1
2 25
Relative fluorescence
4 20
Rate (103 s-1 )
0.1 15
8
10
5
0.01 32 16
0
the dead time of the experiment, Dtd , which in this example is 45 G 5 ms. In Figure
15.18b, the rate constants obtained by exponential fitting are plotted as a function
of NBS concentration. The slope of a linear fit (solid line) yields a second-order rate
constant for the NBS-induced chemical quenching of NATA of 7:9 10 5 M1 s1 ,
which is consistent with data obtained by stopped-flow measurements at lower
NBS concentration [51]. This agreement, together with the linearity of the second-
order rate plot (Figure 15.18b), demonstrates the accuracy of the kinetic data.
Acknowledgments
This work was supported by grants GM56250 and CA06927 from the National In-
stitutes of Health, grant MCB-079148 from the National Science Foundation, and
an Appropriation from the Commonwealth of Pennsylvania.
References
intermediates. Proc. Natl Acad. Sci. & Hagen, S. J. (2002). Smaller and
USA 94, 6084–6086. faster: the 20-residue Trp-cage protein
121 Matouschek, A., Serrano, L. & folds in 4 micros. J. Am. Chem. Soc.
Fersht, A. R. (1994). Analysis of 124, 12952–12953.
protein folding by protein engineer- 132 Mayor, U., Guydosh, N. R.,
ing. In Mechanisms of Protein Folding: Johnson, C. M. et al. (2003). The
Frontiers in Molecular Biology (Pain, complete folding pathway of a protein
R. H., Ed.), pp. 137–159. Oxford from nanoseconds to microseconds.
University Press, Oxford. Nature 421, 863–867.
122 Raschke, T. M., Kho, J. & Marqusee, 133 Zhu, Y., Alonso, D. O., Maki, K.
S. (1999). Confirmation of the et al. (2003). Ultrafast folding of
hierarchical folding of RNase H: a alpha3D: a de novo designed three-
protein engineering study. Nature helix bundle protein. Proc. Natl Acad.
Struct. Biol. 6, 825–831. Sci. USA 100, 15486–15491.
123 Walkenhorst, W. F., Green, S. M. & 134 Beechem, J. M., James, L. & Brand,
Roder, H. (1997). Kinetic evidence for L. (1990). Time-resolved fluorescence
folding and unfolding intermediates studies of the protein folding process:
in Staphylococcal nuclease. new instrumentation, analysis and
Biochemistry 63, 5795–5805. experimental approaches in time-
124 Kiefhaber, T. (1995). Kinetic traps in resolved laser spectroscopy in
lysozyme folding. Proc. Natl Acad. Sci. biochemistry. SPIE Proc. 1204, 686–
USA 92, 9029–9033. 698.
125 Heidary, D. K., Gross, L. A., Roy, M. 135 Jones, B. E., Beechem, J. M. &
& Jennings, P. A. (1997). Evidence for Matthews, C. R. (1995). Local and
an obligatory intermediate in the global dynamics during the folding of
folding of interleukin-1b. Nature Escherichia coli dihydrofolate reductase
Struct. Biol. 4, 725–731. by time-resolved fluorescence
126 Jennings, P., Roy, M., Heidary, D. & spectroscopy. Biochemistry 34, 1867–
Gross, L. (1998). Folding pathway of 1877.
interleukin-1 beta. Nature Struct. Biol. 136 Balbach, J., Forge, V., van Nuland,
5, 11. N. A. J., Winder, S. L., Hore, P. J. &
127 Tsui, V., Garcia, C., Cavagnero, S., Dobson, C. M. (1995). Following
Siuzdak, G., Dyson, H. J. & Wright, protein folding in real time using
P. E. (1999). Quench-flow experiments NMR spectroscopy. Nature Struct. Biol.
combined with mass spectrometry 2, 865–870.
show apomyoglobin folds through and 137 Frieden, C. (2003). The kinetics of
obligatory intermediate. Protein Sci. 8, side chain stabilization during protein
45–49. folding. Biochemistry 42, 12439–12446.
128 Capaldi, A. P., Kleanthous, C. & 138 Segel, D. J., Bachmann, A.,
Radford, S. E. (2002). Im7 folding Hofrichter, J., Hodgson, K. O.,
mechanism: misfolding on a path to Doniach, S. & Kiefhaber, T. (1999).
the native state. Nature Struct. Biol. 9, Characterization of transient
209–216. intermediates in Lysozyme folding
129 Plaxco, K. W., Simons, K. T. & with time resolved small-angle X-ray
Baker, D. (1998). Contact order, scattering. J. Mol. Biol. 288, 489–499.
transition state placement and the 139 Brissette, P., Ballou, D. P. &
refolding rates of single domain Massey, V. (1989). Determination of
proteins. J. Mol. Biol. 277, 985–994. the dead time of a stopped-flow
130 Myers, J. K. & Oas, T. G. (2001). fluorometer. Anal. Biochem. 181, 234–
Preorganized secondary structure as 238.
an important determinant of fast 140 Gibson, Q. H. & Antonini, E. (1966).
protein folding. Nature Struct. Biol. 8, Kinetics of heme–protein intractions.
552–558. In Hemes and Hemeproteins (Chance,
131 Qiu, L., Pabit, S. A., Roitberg, A. E. B., Estabrook, R. W. & Yonetani, T.,
References 535
16
Kinetic Protein Folding Studies using NMR
Spectroscopy
Markus Zeeb and Jochen Balbach
16.1
Introduction
-12 -9 -6 -3 0 3 (s)
R1, R2, het. NOE R1ρ, Rex, line shapes real time NMR
N
dipolar couplings H -exchange
Fig. 16.1. Logarithmic NMR time scale. the rotating frame; Rex , chemical exchange
Illustration of accessible protein motions and contribution to R 2 ; line shapes, line shape
reactions by the respective NMR technique: analysis of proton resonances; dipolar
R1 , spin-lattice relaxation rate; R 2 , spin-spin couplings, residual dipolar couplings in
relaxation rate; het. NOE, heteronuclear different alignment media; HN -exchange,
Overhauser enhancement; R1r , relaxation in proton–deuterium exchange of amide protons.
folding rates are accessible by line shape analyses of NMR resonances [11, 12],
from 15 N R1r relaxation rates [13], from 15 N transverse cross-correlation relaxation
rates [14], or from 15 N R 2 relaxation dispersion experiments [15]. These fast fold-
ing rates can be well predicted by the diffusion-collision model for several small
proteins [16] or derived experimentally by extrapolating the denaturant dependence
of the un- and refolding rates from stopped-flow experiments to 0 M denaturant
[12, 14]. Recently, the analysis of residual dipolar couplings measured in different
alignment media have allowed the determination of the dynamics between nano-
seconds and milliseconds and filled the gap between very fast and slower motions
of proteins [17, 18]. All these methods have in common that the populations of un-
folded and folded polypeptide chains are at equilibrium but differ in their chemical
shifts of the NMR active nuclei. During the detection or delay times in the pulse
sequence (between about 1 ms and 1 s), fluctuations between the populations
cause additional contributions to the natural relaxation of the different states from
Brownian motions. These contributions can be converted into un- and refolding
rates.
Nonequilibrium reactions can be directly followed in the NMR probe, where
rates above 5 min1 require a stopped-flow device to be observable. Since very early
NMR studies of protein reactions in ‘‘real time’’ several of these devices have been
designed [19–21]. Unfolding and refolding of proteins can be initiated by rapid
mixing of solutions inside the NMR spectrometer with short experimental dead
times [22–25]. One very powerful approach has been to use 19 F NMR to study pro-
teins with 19 F-labeled tryptophan residues and a data collection started within
100 ms of the initiation of the mixing. Distinct steps of the folding reaction of dihy-
drofolate reductase could be resolved by these pioneering studies by Frieden and
coworkers [23, 26, 27]. Recently, this method was successfully applied to resolve
538 16 Kinetic Protein Folding Studies using NMR Spectroscopy
the consecutive steps of folding and assembly of the two-domain chaperone PapD
using a 5-mm 19 F cryoprobe [28].
Very slow protein folding reactions, such as those limited by isomerization of
peptidyl-prolyl bonds [29] can be initiated by manual mixing [30–33] and followed
by the sequential accumulation of one-dimensional spectra. These studies revealed
important information about the properties of intermediates during un- and refold-
ing. An extension to two-dimensional techniques has been possible in a few cases,
to increase the number of specific spin labels in well-resolved NMR spectra [34].
Most recently, Mizuguchi et al. succeeded in recording a series of 3D NMR spectra
during the refolding reaction of apoplastocyanin [35] and were able to resolve
changes of distances between protons during refolding.
For several proteins denaturant-induced unfolding transitions monitored by
NMR uncovered equilibrium intermediates, which were silent in circular dichro-
ism (CD)- or fluorescence-detected transitions [36–38]. This demonstrates the
unique sensitivity of the chemical shifts of NMR resonances towards the local en-
vironment even in poorly structured protein states. Many features of the molten
globule state, the archetypal protein folding intermediate, have been characterized
by NMR spectroscopy [39] and especially the close correspondence between equi-
librium and kinetic molten globules [40].
The use of NMR spectroscopy to study protein folding has been extensively re-
viewed [9, 23, 41–49]. The first part of this chapter concentrates on the develop-
ments of real-time NMR spectroscopy during the last 5 years in following very
slow protein folding reactions. Earlier achievements have been already reviewed
[24, 50, 51]. The very rapid development of the interpretation of NMR relaxation
data in terms of micro- and millisecond motions during the last few years has
been excellently reviewed [10]. Based on this theoretical background the second
part of this contribution will focus on the derivation of millisecond protein folding
rates from NMR relaxation experiments.
16.2
Following Slow Protein Folding Reactions in Real Time
PTFE tube
Y-connector
Tube cap
Optical fiber
Capillary tube
PTFE collar
Solution to be injected
Micropipette
Buffer solution
NMR coil
Shigemi tube
Fig. 16.2. Schematic representation of the tube by an air bubble to avoid mixing of the
rapid mixing device used for real-time NMR solutions before injection. The transfer line to
experiments. The NMR tube in the probe of the syringe, which is outside the magnet, has
the spectrometer contains the refolding or to be filled with solvent and the piston is
unfolding buffer, which is separated from pneumatically triggered by 10 bar for a rapid
the denatured or native protein solution, transfer of the protein solution into the NMR
respectively, in the micropipette and capillary tube.
within the active volume of the detection coil of the probe head during the dead
time of the device (for further details see Section 16.6.1). Most presented examples
in this report are multiscan NMR experiments with proteins, which require a relax-
ation period of at least 300 ms between scans for a sufficient build-up of polariza-
tion. Therefore, the present dead time of this simple stopped-flow mixing device is
satisfactory.
The first proteins studied by real-time NMR employing such a device were bo-
vine a-lactalbumin (a-LA) and hen lysozyme [24]. a-LA was particularly suitable
for setting up the device [40], because it is a well-defined system for many aspects
540 16 Kinetic Protein Folding Studies using NMR Spectroscopy
of protein folding and its folding kinetics can be varied over several orders of mag-
nitude by control of the Ca 2þ concentration of the solution [52, 53]. The folding
reaction is initiated by changing the solvent conditions such as the pH [54, 55] or
the concentration of denaturants [40, 56] or stabilizing agents [57, 58]. Subse-
quently, various NMR techniques can be applied to monitor the change of different
properties of the polypeptide chain during the folding reaction. In the simplest
case, a series of 1D spectra follows the rapid mixing step, allowing kinetic studies
for processes with rates below 0.1 s1 . These 1D real-time NMR experiments have
been successfully applied to more than a dozen proteins: bovine a-LA [40, 59, 60],
ribonuclease T1 of Aspergillus oryzae [61], human CDK inhibitor P19INK4d [38],
phosphocarrier HPr of Escherichia coli [62], SH3 domain of phosphatidylinositol
3 0 -kinase [63], human muscle acylphosphatase [64], French bean apoplastocyanin
[32], thioredoxin of E. coli [33], rat intestinal fatty acid binding protein [23], dihy-
drofolate reductase of E. coli [27], bovine pancreatic ribonuclease A [31], hen lyso-
zyme [25, 65, 66], chymotrypsin inhibitor 2 [67], barnase [67], barstar [68, 69],
staphylococcal nuclease [70], yeast prion Sup35 [71], HIV I protease [72, 73] as
well as peptides [74]. All studies make use of distinct chemical shifts for the differ-
ent states changing with time, to determine the kinetics of the respective protein
folding reaction. The dispersion of the resonances and the line width can also be
used to distinguish possible folding intermediates from the unfolded and folded
state. The appearance of 1D spectra of folding intermediates range between mainly
unfolded, molten-globule-like, and native-like. Intermediate populations below 5%
of the protein molecules are difficult to detect.
Figures 16.3 and 16.4 illustrate one example for the use of 1D real-time NMR
spectroscopy to study the refolding of P19INK4d , human cyclin D-dependent kinase
inhibitor [38]. After dilution from 6 M to 2 M urea two fast refolding phases pro-
duce 83% of native molecules during the dead time of the NMR experiment. These
two fast phases have been revealed by stopped-flow far-UV circular dichroism spec-
troscopy during the formation of the five native ankyrin repeats comprising ten a-
helices. Refolding of the remaining 17% of unfolded molecules is retarded by a
slow cis/trans isomerization of prolyl residues resulting in a refolding rate of 0.017
s1 . This slow phase can be accelerated by the prolyl isomerase cyclophilin 18 from
the cytosol of E. coli. The native population of P19INK4d can be directly detected by
the high-field shifted resonance at 0.7 ppm. To analyze the decay of the unfolded
population at 0.9 ppm or 1.45 ppm, the native contribution to the 1D spectra has to
be subtracted [38, 61]. First, this subtraction revealed identical un- and refolding
rates. Secondly, each 1D spectrum can be represented by a linear combination of
the spectrum of the unfolded and the folded population. This suggests that no fur-
ther intermediate gets significantly populated during this final folding event of
P19INK4d and that the polypeptide chain cannot form a native-like ordered structure
prior to the prolyl isomerization.
A similar analysis was possible for the S54G/P55N variant of ribonuclease T1
from Aspergillus oryzae [61, 75]. About 15% of the unfolded S54G/P55N-RNase T1
molecules contain the native cis prolyl peptide bond Tyr38-Pro39 and fold to the
native state within the dead time of the NMR experiment. They give rise to a burst
16.2 Following Slow Protein Folding Reactions in Real Time 541
unfolded state
native state
300
200
100 Time (s)
0
1.6 1.0 0.4 0.2
1
H (ppm)
Fig. 16.3. Stacked plot of the high field region 2 M. For this representation two 1D experi-
of the 1D 1 H NMR spectra at different times ments with four transients were averaged to
of refolding. Human CDK inhibitor P19INK4d in yield a resolution of 11 seconds per spectrum.
50 mM Na-phosphate pH 7.4 (10% 2 H2 O) at Resonances of the unfolded and native state
15 C was refolded by a tenfold dilution from are indicated and the integral of the latter is
6 M urea to yield a final urea concentration of plotted in Figure 16.4a.
of all resonances of the native state in the very first spectrum recorded after initia-
tion of the refolding reaction. The refolding of the remaining 85% unfolded mole-
cules is very slow and limited in rate by the trans ! cis isomerization of Pro39. In
contrast to the above-mentioned example of P19INK4d , where the molecules were
largely unfolded before the very slow rate-limiting step, S54G/P55N-RNase T1 rap-
idly forms a folding intermediate arrested at this stage of folding because of the
nonnative prolyl peptide bond. The major fraction of their intensities decays with
a rate of 1:4 104 s1 at 10 C, which is very close to the formation of the native
state with a rate constant of 1:25 104 s1 . A minor fraction (about 7%) reveals
50 times higher rates and belongs to a second shorter-lived intermediate. The en-
tire dataset could be decomposed according to the time dependence of a basis set
of 1D spectra of the native state, the major intermediate I 39t and the minor inter-
mediate (for a detailed description see Ref. [61]). The dispersion and line widths of
the resonances indicated that the long-lived intermediate I 39t has a defined tertiary
structure already close to the native state, but several resonances of I 39t revealed
not-yet-native chemical shifts. The shorter lived transient species is mainly un-
structured.
These examples of S54G/P55N-RNase T1 and P19INK4d substantiate that non-
native prolyl peptide bonds can either prevent the formation of native protein or can
arrest the protein in an already well-structured but not-yet-native conformation.
542 16 Kinetic Protein Folding Studies using NMR Spectroscopy
(a) (b)
0.6
0.20
kf=0.017 s-1
NMR intensity (a.u.)
0.15
83% 0.4
0.10
ku=0.018 s-1
0.18 0.2
0.05 0.16
0 200 400 0
Time (s)
0
0 200 400 0 200 400
Time (s) Time (s)
Fig. 16.4. Folding kinetics of P19INK4d data with enlarged ordinate. b) Decay of the
extracted from the experiment depicted in contribution of the unfolded state to the NMR
Figure 16.3. a) The build-up of the native state spectra. A single exponential fit was applied to
is represented by the integral of the NMR the integral of the NMR signals between 0.78
intensities between 0.65 ppm and 0.70 ppm. ppm and 0.90 ppm and reveals a rate constant
The continuous line corresponds to a single of 0.018 s1 , which is shown as a continuous
exponential fit with a rate constant of 0.017 s1 line. Prior to the integration, the native
and a burst phase in the dead time of the component was subtracted from every 1D
experiment of 83%. The inset shows the same spectrum (see Section 16.6.1).
Therefore, for a protein function in the cell several levels of regulation based on
prolyl peptide bonds are possible. Ubiquitous peptidyl prolyl isomerases with and
without substrate specificity have been identified and proposed as key for this kind
of regulation [29, 76]. Most recently that kind of a proline-driven conformational
switch has been structurally characterized [77, 78].
A particularly important approach to the characterization of transient protein
folding intermediates is to monitor nuclear Overhauser effects (NOEs) between
protons of these states and how they evolve during the folding process. NOEs can,
in principle, be converted into interproton distances, which is the basis for struc-
ture determination of proteins by NMR spectroscopy. The molten globule state
formed rapidly after initiation of refolding of a-LA can be studied by real-time 1D
NOE experiments [57]. Although the resolution of the spectrum is limited, the
overall NOE on the aliphatic proton resonances as a consequence of saturating all
aromatic resonances can be measured. It had been shown previously that this
approach works for the A-state of a-LA, which is populated at equilibrium at pH 2
[55]. The intensity of the steady-state NOE of the kinetic molten globule observed
transiently during refolding at pH 7.0 is already 80% of the steady-state NOE inten-
sity of the native state. This is even larger than the value of 70% measured previ-
ously for the equivalent NOE effect in the A-state at equilibrium. The average ra-
dius of the species formed after initiation of folding can be estimated from the
16.2 Following Slow Protein Folding Reactions in Real Time 543
steady-state NOE to be within 4% of that of the native state [57]. This experiment
shows that the folding of a-LA involves the rapid formation of a highly collapsed
but partially disorganized molten globule, which subsequently reorganizes slowly
to form the fully native state mainly by packing of the side chains.
Additional evidence for a native-like compactness of the molten globule of a-LA
came from 1D real-time NMR diffusion experiments [56]. This method allows the
determination of the hydrodynamic radius of kinetic folding intermediates. The
experiment is based on a pulsed field gradient spin echo pulse sequence to de-
termine the diffusion coefficient of molecules in liquids [79, 80]. The latter can
subsequently be converted into the effective hydrodynamic radius for nonspherical
molecules (R h ) such as proteins (for further details see Section 16.6.1 and Refs
[81, 82]). The application of these spin echoes during refolding of a-LA from 6 M
GdmHCl revealed R h of the kinetic molten globule, which is only 8% larger than
the R h of apo a-LA [56]. The equivalent experiment was performed during refold-
ing of S54G/P55N-RNase T1 and showed that R h of the I 39t intermediate is only
about 5% larger than the radius of the native state [75]. Consequently, the major
compaction of the extended unfolded polypeptide chain occurs before the rate-
limiting folding step regardless whether the side chains are already in a well-
defined conformation (S54G/P55N-RNase T1) or not (a-LA).
In parallel to the successful formation of the native state, polypeptide chains can
follow misfolding pathways. This failure of proteins to fold correctly or to remain
folded is implicated in the onset of a range of diseases [83, 84]. To characterize
these misfolding reactions, it is necessary to use a variety of complementary spec-
troscopic methods. Real-time NMR spectroscopy can be used to study the asso-
ciation and aggregation of proteins in solution by monitoring the aggregation-
competent but still soluble states during these processes. Carver and coworkers
used a-LA as a model system to study protein aggregation, which occurs during
unfolding of the protein by reducing its four disulfide bonds. They initiated the ag-
gregation reaction by rapid mixing of apo a-LA or holo a-LA with a large excess of
the reducing agent dithiothreitol (DTT) [85–87]. Reduction of the disulfide bonds
promotes unfolding of a-LA through several well-characterized intermediate states,
which expose hydrophobic surfaces prone to aggregation. The well-dispersed NMR
spectrum of the native apo protein gets lost during the dead time of the experi-
ment (1.2 s) after the addition of DTT. The first obtained spectrum shows broad
resonances with significantly reduced signal intensity and resembles the molten
globule or A-state of a-LA. After several minutes the resonance intensity disappears
since aggregation occurs leading to substantial precipitation out of the solution
with an apparent rate of 1:5 103 s1 at 37 C. Aggregation can be significantly
decelerated and precipitation totally prevented by adding chaperones such as a-
crystallin [85, 87] or clusterin [86] to the refolding buffer of the real-time NMR ex-
periment. Both chaperones bind to the aggregation competent monomeric molten
globule state of apo a-LA and therefore stabilize the protein. Reduction and con-
comitant aggregation of holo a-LA is less efficiently prevented by a-crystallin be-
cause it is only partially reduced and more structured compared with the fully re-
duced and disordered form of apo a-LA. Additionally, holo a-LA precipitates more
544 16 Kinetic Protein Folding Studies using NMR Spectroscopy
rapidly, which is important for the chaperone action of a-crystallin, since it is much
more efficient with slowly precipitating species [87].
Solution aggregation of acid-denatured cold shock protein A (CspA) from E. coli
is also a very slow process, which results in formation of insoluble fibrils [88]. Sev-
eral 2D 1 Ha 15 N heteronuclear single-quantum coherence (HSQC) spectra, 15 N re-
laxation experiments and even four 3D HSQC-NOESY spectra can be recorded dur-
ing this self-assembly process. A combined analysis of these data reveal that mainly
residues from the b-strands of CspA monitor a fast exchange on the NMR chemi-
cal shift time scale between acid-denatured monomers and soluble aggregates dur-
ing the lag phase of aggregation. During the exponential phase of aggregation the
first three b-strands are predominantly involved in the association process.
The aggregation of the prion-determining region of the NM domain of Sup35 in
yeast, Sup35 5a26 , has been followed recently by 1D real-time NMR spectroscopy
[71]. As expected, only signals originating from soluble peptide molecules are visi-
ble in the spectra and after 80 min 50% of the peptide became insoluble. Adding
catalytic amounts of Hsp104 prevents aggregation for about 80% of the peptide
molecules by binding to the hexameric/tetrameric species of Sup35 5a26 and releas-
ing intermediate and monomeric species. These conclusions could be drawn be-
cause binding of Hsp104 causes significant changes of chemical shifts of several
tyrosine resonances. Based on these resonances, the molecular weight of the
Sup35 5a26 oligomers could be estimated by NMR diffusion experiments.
In photo-CIDNP (chemically induced dynamic nuclear polarization) NMR ex-
periments, the polarization of solvent-exposed protons is induced more rapidly
than the natural spin-lattice relaxation via a dye excited by a laser flash [89]. This
method reduces the time required to polarize spins transferred into the NMR
probe from a lower field region of the magnet [66]. Together with the very small
portion of the sample illuminated by the laser beam, the dead time of the device
shown in Figure 16.2 can be further reduced [25]. The strength of the photo-
CIDNP approach, however, is not only in its time resolution but also in its spectral
resolution. The latter is significantly improved because the induced polarization for
example by flavin mononucleotide as dye is only generated in a limited number of
residues (tyrosine, tryptophan, and, under some conditions, histidine). Moreover,
polarization occurs only when these residues are accessible to the photoexcited
dye molecules in the solution, enabling the environment of individual residues to
be probed directly in real time during folding. Hen lysozyme for example forms
initially (after 30 ms experimental dead time) a relatively disordered collapsed state
with largely buried tryptophan residues [25, 66]. The structurally closely related
bovine a-LA also forms very early a collapsed state during refolding, which very
closely resembles the photo-CIDNP spectrum of the pH 2 molten globule state of
this protein [59, 60]. Subsequently, a reorganization of these aromatic residues oc-
curs during the course towards the fully native proteins in both cases.
Histidine residues only get excited by the photo-CIDNP process when the pro-
tein contains no tryptophan residues or when all tryptophan residues are buried
and the histidine residues are exposed. These very rare conditions are met by the
histidine-containing phosphocarrier protein HPr from E. coli. The wild-type pro-
16.3 Two-dimensional Real-time NMR Spectroscopy 545
tein contains no tryptophan and tyrosine residues, giving rise to several well-
resolved histidine resonances in the photo-CIDNP spectrum [90]. In several
tryptophan-containing variants of HPr no histidine resonances of the native and
unfolded state were detectable because of the exposed Trp. Directly after refolding
has been initiated, in most variants the Trp side chain was buried and some His
residues could be transiently excited by the dye until the native conformation was
formed.
16.3
Two-dimensional Real-time NMR Spectroscopy
A major limitation of 1D NMR experiments with proteins is, of course, the low res-
olution due to severe signal overlap. Therefore, only a few well-resolved resonances
allow a detailed analysis with molecular resolution. In many cases the general ap-
pearance of the 1D spectrum rather than single resonances is analyzed. Thus, cur-
rent work focuses on the extension of real-time NMR experiments to utilize the
power of multidimensional NMR in kinetic experiments. Very slow folding reac-
tions can be directly followed by sequentially recorded 2D spectra such as 2D
1
Ha 15 N HSQC spectra [34]. An equivalent proton–nitrogen correlation spectrum
can be recorded in much less time (in about 200 s per 2D spectrum) if a heteronu-
clear multiple-quantum coherence (HMQC) pulse sequence is used in combina-
tion with Ernst angle pulses [56, 91]. During the refolding of S54G/P55N-RNase
T1 from 6 M GdmHCl at 1 C, for example, a series of 128 of these fast-HMQC
spectra could be recorded [75]. After correction for the dead time events, the com-
plete 2D spectrum of the kinetic folding intermediate I 39t could be obtained. The
spectrum exhibits the same dispersion of the resonances and the same line widths
as the spectrum of the native state, indicating a well-defined tertiary structure of
I 39t . The backbone resonances of 22 (out of 104) residues of I 39t show native chem-
ical shifts and identify therefore regions in which a native environment is already
present in the intermediate state. For 66 backbone amide probes, which differ in
chemical shifts between the native state and I 39t , single exponential refolding ki-
netics could be extracted from the 2D real-time NMR experiment. All showed the
same time constant within experimental error indicating that the trans ! cis iso-
merization of Pro39 fully synchronizes the rate-limiting step of refolding, which
is the transition of not-yet-native regions in I 39t towards the native conformation.
Interestingly, amide protons in not-yet-native environments are not only located
close to the nonnative trans Tyr38-Pro39 peptide bond, but are spread throughout
the entire protein.
The slowest step during the refolding of barstar, the intracellular inhibitor of bar-
nase, is also dominated by the formation of a cis peptidyl prolyl bond of Tyr47-
Pro48 in the native state [92, 93]. A set of 16 2D 1 Ha 15 N HSQC spectra recorded
in real time revealed that the intermediate with the trans Tyr47-Pro48 bond has a
predominantly native-like conformation [68]. Only the trans prolyl peptide bond
and three residues in close proximity to Pro48 rearrange upon complete refolding.
546 16 Kinetic Protein Folding Studies using NMR Spectroscopy
2D (I N)
I N I&N
I&N
0 Time
2D (N)
I N I&N
2D (N) 2D (I N)
I&N
Fig. 16.5. Schematic line shapes and reference spectrum of the native state after
intensities in the indirect dimension F1 for refolding, and c) the result of subtracting the
different cross-peaks in a) the kinetic spectrum kinetic spectrum from the reference spectrum.
recorded during the refolding reaction, b) the
action occurs during the accumulation of data in the experiment, it influences the
amplitude of the different signals recorded at various incrementation times. This
modulation determines both the line shape and intensity of the cross-peaks in the
resulting 2D spectrum, namely in the indirect dimension (Figure 16.5). The prin-
ciple and effectiveness of this strategy have been demonstrated for a-LA using a
1
Ha 15 N HSQC experiment [54]. The resonances of the native state of a-LA showed
in the kinetic spectrum the expected lower intensities and negative components in
548 16 Kinetic Protein Folding Studies using NMR Spectroscopy
119
D66 D66 D66
121
N (ppm)
A22 A22 A22
123
15
A19 A19 A19
A95 A95 A95
125
(a) (b) (c)
7.8 7.6 7.4 7.2 7.0 7.8 7.6 7.4 7.2 7.0 7.8 7.6 7.4 7.2 7.0
1 1 1
H (ppm) H (ppm) H (ppm)
Fig. 16.6. Two-dimensional real-time 1 H- 15 N the native state. Therefore, the cross-peaks are
HSQC spectra of S54G/P55N-RNase T1 present in (a) and (b) and cancel out in the
recorded during and after the refolding difference spectrum (c). A19, A22, K25, I61,
reaction from 6 M GdmHCl at 15 C. Section and A95 are in a nonnative environment in the
of a) the kinetic HSQC, b) the reference HSQC intermediate and depict different chemical
of the native protein, and c) the difference shifts as well as line broadening. These cross-
spectrum between the kinetic HSQC and the peaks of the intermediate state are absent in
reference HSQC. Positive NMR intensities in (b) and show negative intensities in (c).
(a) and (c) are indicated with black lines, Correspondingly, the emerging cross-peaks of
negative signals by red lines. D66 and F50 are the native state can be identified by the two
in a native region of the intermediate. The flanking negative components along the 15 N
cross-peaks of the amide protons depict dimension in (a) and reveal positive cross-
equivalent chemical shifts, sharp lines and peaks in (c).
equivalent intensities in the intermediate and
the wings of the peaks compared to the reference spectrum (see the line shapes
and intensities of N in panel (a) in the schematic illustration in Figure 16.5). In
the special case of a-LA, only very few signals of the molten globule (represented
by I in Figure 16.5) are observable, because most resonances are widened.
All three line shapes shown in panel (a) of Figure 16.5 could be experimentally
observed in a single 2D real-time 1 Ha 15 N-HSQC NMR experiment during the re-
folding of S54G/P55N-RNase T1 from 6 M GdmHCl at 15 C [98]. The resonances
of F50 and D66 show native chemical shifts in the folding intermediate I 39t in
both, the 1 H and 15 N dimension of the spectrum. Thus, sharp lines have been ob-
served in the 2D real-time NMR experiment (Figure 16.6a) along the 15 N dimen-
sion. For A19, A22, K25, I61, and A95, the resonances of I 39t differ in their chemi-
cal shifts from the final native state and therefore depict line broadening along the
indirect F1 dimension. The corresponding resonances of the native state can be
identified by the two flanking negative components along the 15 N dimension (neg-
ative intensities are shown in red in Figure 16.6a).
Beside identification and assignment of the resonances of transient species, real-
time 2D NMR can be used to determine folding kinetics from these described line
shapes (see Section 16.6.3) [54, 61]. Here, the key aspect is that folding kinetics can
16.3 Two-dimensional Real-time NMR Spectroscopy 549
the native protein to facilitate the classification of NOE effects. This approach, in
principle, allows the 3D structure of a protein folding intermediate to be deter-
mined. About 200 NOEs have been analyzed for a detailed characterization of the
I 39t intermediate of S54G/P55N-RNase T1 [61]. They revealed, for example, that
the b-strands involved in disulfide bonds already have the native conformation in
I 39t as well as the N-terminal half of the a-helix. The C-terminal half of the a-helix
has a not-yet-native conformation together with many other regions interspersed
among the native-like sections of this protein folding intermediate. This very de-
tailed structural information about I 39t considerably extended the qualitative pic-
ture we got by just comparing its chemical shifts with the native state as discussed
above.
16.4
Dynamic and Spin Relaxation NMR for Quantifying Microsecond-to-Millisecond
Folding Rates
Most folding reactions are too fast to be monitored directly by real-time NMR spec-
troscopy, because the time required to record one 1D 1 H spectrum of a protein is at
least 100 ms. For signal-to-noise reasons, usually more than one scan is needed
with an interscan delay for sufficient relaxation of excitable magnetization of at
least 300 ms. Therefore, not more than three 1D scans are possible within one
second after a sufficient fast mixing to initiate refolding. Reactions with time con-
stants below 500 ms cause line broadening of the proton resonances in 1D spectra,
but even with a fast stopped-flow mixing device only single-scan experiments are
possible, which is not very practicable for protein NMR [21]. Interleaved methods
with different delay times after mixing to record the first 1D spectrum as well as
continuous-flow experiments require very large amounts of protein, because NMR
with biological samples works at millimolar concentrations, and are therefore rare
[55, 66, 100].
Because of these technical problems of nonequilibrium experiments, fast protein
folding reactions have been measured under equilibrium conditions. Fluctuations
on a 10 ms to 10 ms time scale between two or more states, where the NMR active
nuclei experience different chemical environments, have profound effects on the
shape and position of their resonances [101]. For proteins, the analysis of line
shapes has been used to study ligand binding, local fluctuations as well as global
fluctuations such as complete un- and refolding of the polypeptide chain (a very
good overview and references are given in Ref. [10]). One major drawback during
the quantification of NMR line shapes in terms of intrinsic unfolding and refold-
ing rates is that a folding model has to be assumed. In most cases simple two-state
protein folding systems have been studied or an apparent two-state model such as
free and complexed protein has been applied to analyze the line shapes. The first
protein folding system that could be quantitatively studied by dynamic NMR spec-
troscopy on the above-mentioned time scale was the N-terminal domain of phage l
repressor, l 6a85 [11]. The urea dependence of the intrinsic un- and refolding rate
16.4 Dynamic and Spin Relaxation NMR for Quantifying Microsecond-to-Millisecond Folding Rates 551
constants could be determined between 1.3 M and 3.2 M urea with a maximum
value for k f ¼ 1200 s1 and a minimum value for k u ¼ 100 s1 at 1.3 M urea.
From a linear extrapolation of the chevron plot to 0 M urea, k f 0 ¼ 3600 G 400 s1
and k u 0 ¼ 27 G 6 s1 have been obtained. Later, Oas et al. verified these extrapo-
lated rates by 1 H transverse relaxation experiments (see below) [12]. Typically, the
resonances of aromatic protons are analyzed in 2 H2 O samples with completely ex-
changed amide protons, because only then can the former resonances between 6
and 11 ppm be assigned for both folded and unfolded state. These chemical shifts
are needed at any urea concentration for an accurate line shape analysis (see Sec-
tion 16.6.2). According to a two-state folding mechanism, only aromatic resonances
of the folded and unfolded state are present within this region. In the case of l 6a85 ,
which contains only two tyrosine and two phenylalanine residues, the entire aro-
matic region of the 1D spectra could be simulated at any urea concentration (for
example by using the program ALASKA [102]), which significantly increases the
reliability of the extracted folding rates [12].
The cold shock protein CspB from Bacillus subtilis also follows a two-state folding
mechanism with k f 0 ¼ 1070 G 20 s1 and k u 0 ¼ 12 G 7 s1 at 25 C and 0 M urea
[103]. From a urea transition monitored by 1D NMR in 2 H2 O, the resonance of
His29 e1 could be used to determine unfolding and refolding rates between 2.9 M
and 5.6 M urea (Figures 16.7 and 16.8). The simulation of the line shape and the
position of this resonance according to the extracted parameters closely resem-
bles the experimental data and is depicted in the right panel of Figure 16.7. The
extrapolated folding rates from the dynamic NMR experiment (k f 0 ¼ 1490 G
370 s1 and k u 0 ¼ 16 G 3 s1 ) correspond nicely to the rates derived from the
stopped-flow fluorescence experiments. The Tanford factors b T ¼ m f /ðm f þ m u Þ
obtained from the linear slopes of log k f and log k u versus urea concentration (Fig-
e e
His29-H 1 (experiment) His29-H 1 (simulation)
8.8 M
[urea]
0M
8.3 8.2 8.1 8.0 8.3 8.2 8.1 8.0
1 1
H (ppm) H (ppm)
1
Fig. 16.7. Stacked plot of 1D H-NMR spectra the left panel. In the right panel, simulations of
of a complete urea-induced unfolding the 1D spectra using the extracted and
transition of CspB from B. subtilis in 2 H2 O at extrapolated folding rates from the line shape
25 C. Experimental 1D spectra of the His29 e 1 analysis described in Section 16.6.2 at the
resonance in the low field region is shown in respective urea concentration are depicted.
552 16 Kinetic Protein Folding Studies using NMR Spectroscopy
100
10 ku
kf
1
2.5 3 3.5 4 4.5 5 5.5 6
Urea (M)
Fig. 16.8. Urea dependence of the folding of the continuous lines represents the m-value
rates of CspB determined by the line shape of refolding and unfolding (mf ; mu ) from which
analysis approach applied to the 1D 1 H-NMR the Tanford value b T is derived with 0.9.
spectra of His29 e1 in 2 H2 O (Figure 16.7). The Extrapolation to the absence of denaturant
logarithm of k f and ku in the transition region reveals folding rates of k f 0 ¼ 1490 G 370 s1
(between 2.9 ppm and 5.6 ppm) is plotted and ku 0 ¼ 16 G 3 s1 .
versus the urea concentration. The linear slope
ure 16.8) are 0.9 from both the NMR and fluorescence experiment and indicate
that the activated state of unfolding of CspB still resembles the native state in the
accessibility to the solvent [103].
The folding and unfolding kinetics of the N-terminal domain of the ribosomal
protein L9 have been measured at temperatures between 7 C and 85 C and be-
tween 0 M and 6 M guanidine deuterium chloride. The joint analysis of stopped-
flow fluorescence (between 7 C and 55 C) and dynamic NMR data (between 55 C
and 85 C) revealed thermodynamic parameters of the activated state of folding
[104, 105]. The dynamic NMR data were required because the apparent folding
rate constant for the latter temperature range was 700–3000 s1 too high for tradi-
tional stopped-flow experiments. Following the same lines a small all-helix protein
psbd41 could be analyzed by the line shape approach revealing maximum folding
rates of 21 600 s1 and maximum unfolding rates of 24 600 s1 [106].
The upper time limit for dynamic NMR to study protein folding reactions are
tens of microseconds if the difference in chemical shift of an NMR nuclei between
the unfolded and native state is sufficient (at least 500 Hz). Raleigh and coworkers
found refolding rates between 3 10 4 s1 and 2 10 5 s1 in a thermal unfolding
transition of the villin headpiece, which confirmed experimentally recent all-atoms
molecular dynamics calculations to simulate protein folding reactions (see Ref.
[107] and references therein). A similar link between theory and experiment was
possible for the three-helix bundle-forming B-domain of staphylococcal protein A.
16.4 Dynamic and Spin Relaxation NMR for Quantifying Microsecond-to-Millisecond Folding Rates 553
Dynamic NMR analyses revealed folding rate constants of 120 000 s1 , which are
in good agreement with predictions from diffusion–collision theory [108].
One limitation of the use of dynamic NMR to study fast protein folding rates is
that reliable rates can only be obtained from the central region of denaturant- or
temperature-induced unfolding transitions, where at least 15% of both states are
populated. The wide linear extrapolation of log k f to conditions with a maximal
population of the native state might miss nonlinear effects such as a ‘‘roll-over’’
under strong native conditions. One way to bypass this problem is the use of
NMR relaxation data to determine protein folding rates. If the rates of interconver-
sion are on the millisecond-to-microsecond time scale and large chemical shift dif-
ferences between the states are present, transverse relaxation rates can be sensitive
to the presence of the minor conformation with populations as low as 1% [10].
One early application was the direct determination of folding rates of the l re-
pressor head piece l 6a85 under strong native conditions by 1 H transverse relaxation
using a 1D Carr-Purcell-Meiboom-Gill (CPMG)-based spin echo pulse sequence
(see Section 16.6.4) [12]. The chemical exchange contribution (R ex ) to the trans-
verse relaxation rate (R 2 ) leads to an apparent increase of the latter from which
the folding rates can be extracted. At 0 M urea the authors found for l 6a85 k f ¼
4000 G 340 s1 and k u ¼ 32 G 3:2 s1 , which is in good agreement with the values
obtained by extrapolation from the line shape analysis (k f ¼ 4900 G 600 s1 and
k u ¼ 30 G 4:6 s1 ).
Most current applications facilitate CPMG-based 15 N-relaxation measurements
to determine the chemical exchange contribution R ex to R 2 . The advantage of this
approach is that 2D 1 Ha 15 N correlation spectra are used to extract the relaxation
rates. Therefore, R ex can be determined on a residue-by-residue basis to discrimi-
nate for example local and global unfolding of the peptide chain. The calculation of
k f and k u from R ex requires some further parameters to be determined beforehand
(see Section 16.6.4). Among several approaches to determine the R ex contributions
to R 2 (such as an extended Lipari-Szabo approach, R1r measurements, or the inter-
ference of dipolar 1 Ha 15 N and 15 N chemical shift anisotropy relaxation), R 2 disper-
sion experiments are the most popular ones (for an overview and further literature
see Ref. [10]). The basic idea is that the R ex contributions to R 2 can be modulated
in a CPMG-based sequence by varying the delay time t cp between consecutive 180
pulses on the 15 N nuclei (see Section 16.6.4). The so-called dispersion curve is a
plot of R 2 over 1/t cp (Figure 16.9). At very small 1/t cp values, the transverse 15 N
relaxation rate R 2 contains the full contribution of R ex and at high 1/t cp values no
R ex contributions are present. Therefore, the difference between R 2 at the lowest
and highest 1/t cp value provides an estimate for R ex. Fitting of the entire disper-
sion curve reveals the apparent folding rate k ex , which is the sum of k f and k u . If
the two-state model is valid and the equilibrium constant under the respective con-
ditions is known explicit values for k f and k u can be calculated [10, 12, 13]. R 2 dis-
persion curves depend on the magnetic field strength [109] and therefore a joint fit
of data obtained at various field strengths decreases the errors of the derived dy-
namic parameters.
Figure 16.9 shows the 15 N R 2 dispersion curves for A32 of the cold shock protein
554 16 Kinetic Protein Folding Studies using NMR Spectroscopy
20
10
CspB at 1.2 M urea and 2.0 M urea with native populations of 0.96 and 0.89, re-
spectively. From a urea transition under NMR detection and from ZZ-exchange
spectra [110] the difference in chemical shifts of the A32 backbone amide reso-
nances between the native and unfolded state at the two urea concentrations are
known (119 Hz and 122 Hz, respectively). The analysis of the dispersion curves
according to the protocol described below (see Section 16.6.4) reveals k ex (227 s1
at 1.2 M urea and 139 s1 at 2.0 M urea) and subsequently k f (217 s1 at 1.2 M
urea and 124 s1 at 2.0 M urea) and k u (9.8 s1 at 1.2 M urea and 15 s1 at 2.0 M
urea) in good agreement with the rates from conventional stopped-flow folding
experiments under fluorescence detection [103]. In principle, this kind of NMR
relaxation experiment can be used to measure chevron plots of proteins on a
residue-by-residue basis, which should be very useful to map the transition state
of folding in terms of homogeneity or heterogeneity and to determine the coopera-
tivity within the folding peptide chain.
Several applications of 15 N and 13 C R 2 dispersion curves have been reported to
determine rates of local motions under strong native conditions [109, 111–115]. In
this case, two conformations have to be assumed in this region of the protein to
gain apparent rates of interconversion. Additionally, the populations of the two
conformations and the difference in chemical shifts of the respective nuclei have
to be extracted entirely from the NMR relaxation data, which makes this kind of
analysis very difficult and might limit the interpretation of the data.
Studies of global two-state folding–unfolding reactions allow an accurate deter-
mination of folding rates from R 2 dispersion experiments, because the populations
of the native and unfolded state under the respective conditions can be determined
16.5 Conclusions and Future Directions 555
experimentally [13, 15, 116]. Additionally, the differences of their chemical shifts
are accessible from independent NMR experiments or can be at least extrapolated
(see also Section 16.6.4). For global unfolding reactions all resonances of the pro-
tein should show R 2 dispersion curves as long as these differences are above 2 Hz,
which makes joint fittings possible. In the case of the N-terminal SH3 domain of
the Drosophila protein drk, on top of the global unfolding reaction, a local confor-
mational exchange process of the unfolded state could be revealed by a systematic
analysis of 15 N R 2 dispersion data [15]. For the de novo designed dimeric four-helix
bundle protein a2 D, refolding rates of ð4:7 G 0:9Þ 10 6 M1 s1 and unfolding
rates of 15 G 3 s1 could be determined for this bimolecular reaction from 13 C R 2
dispersion curves [116]. The 13 C a chemical shifts for unfolded and folded forms of
a2 D indicate that the ensemble of unfolded states includes transiently structured
helical conformations.
16.5
Conclusions and Future Directions
the radius of gyration [119]. From real-time NOESY experiments we can count the
number of native and nonnative NOEs for I 39t resulting in a Q value of 0.6. The
hydrodynamic radius of I 39t was calculated from real-time NMR diffusion experi-
ments to be only 5% above R h of the native state. The relative Gibbs free energy
of I 39t was determined from the protection factors of 45 amide protons derived
from a competition between refolding and H/D exchange under NMR detection
[75]. The intermediate has already gained 40% of the total Gibbs free energy
change during refolding of 36.6 kJ mol1 , which is the global minimum of the na-
tive state with Q ¼ 1. I 39t has a well-defined tertiary structure, implying that only a
very narrow ensemble of states is present at this very deep local minimum of the
funnel. As predicted [45, 120], NMR can provide very detailed contributions com-
bining experiment and theory within the new view of protein folding.
Recent developments have gone one step further in this direction. On the one
hand, extremely long molecular dynamics (MD) simulations became possible by
using clusters of thousands of computers [121, 122]. On the other hand, dynamic
NMR spectroscopy can reveal experimentally folding rates of proteins up to 200 000
s1 [107]. At such high rates, several folding–unfolding interconversions occur
during the now accessible length of one MD simulation. Therefore the gap be-
tween experiments and simulations has been closed and many exciting new in-
sights into the elementary steps of protein folding can be expected in the near
future.
16.6
Experimental Protocols
16.6.1
How to Record and Analyze 1D Real-time NMR Spectra
16.6.1.1 Acquisition
One-dimensional real-time NMR experiments start with the initiation of the pro-
tein folding reaction inside the NMR spectrometer. If reactions are followed with
time constants of seconds, fast mixing devices are required with dead times below
100 ms [25, 123]. A dead time around 1 s can be achieved with a much simpler
device, which needs a pneumatically driven gas-tight syringe (Hamilton, Bonaduz,
16.6 Experimental Protocols 557
Switzerland) working between 8 bar and 10 bar [24]. Dead times between 2 s and
5 s are possible by the same device but manual pulling of the syringe. In the latter
case, after the first shot, 50 mL should be sucked back in the syringe and pulled
again to gain a homogeneous mixture in the NMR tube. For 2D real-time NMR,
this has to be performed during the dummy scans before recording the spectrum.
For 1D applications, the shots can occur during the first 1D spectra, which allows
the first spectrum to be selected without mixing artifacts. For a fast recovery of
the shim system, one shot should be performed in advance with the same buf-
fers and volumes but with a test protein such as commercially available bovine
a-lactalbumin or hen lysozyme. After this shot with the NMR tube in the magnet, a
proper matching and tuning of the probe head is possible, as well as shimming of
the field. For the actual experiment a tube with exactly the same geometry has to
be used. Due to the induced field inhomogeneity during the mixing process the
lock signal should be electronically increased by using a very high ‘‘lock gain’’ to
avoid unlocking at the beginning of the experiment. For denaturant-induced re-
folding experiments, urea should be used in preference to GdmCl if the stability
of the protein allows a choice, because the salt properties of GdmCl reduce the sen-
sitivity of NMR coils significantly. For experiments in 2 H2 O, no suppression of the
residual resonances of the denaturant (between 6 ppm und 6.7 ppm) is needed if
the protons of urea and GdmCl have been properly exchanged against deuterons
by several cycles of solving in 2 H2 O and freeze drying. A suppression of these sig-
nals can even be avoided for experiments in H2 O, if temperatures above 20 C
are used. The fast chemical exchange between the protons of water and of the de-
naturant causes an effective saturation transfer and suppression of water is suffi-
cient. At lower temperatures, the water resonance can be suppressed for example
by a WATERGATE sequence [124] and the resonance of the denaturant by an off-
resonance presaturation during the relaxation delay between scans [38].
16.6.1.2 Processing
Conventional 1D NMR data processing can be applied to each of the series of 1D
experiments. It is most convenient to run a pseudo 2D experiment by storing the
1D spectra in one serial file and to apply the Fourier transformation of this file only
in the F2 dimension. The phase correction should be the same for all 1D spectra
and the values for zero order and first order should be determined for the last 1D
dataset of the real-time NMR experiment. If possible, only a zero order polyno-
mial function should be used for baseline correction and pivot baseline points
should be selected manually in regions where no signals occur during the entire
NMR experiment.
16.6.1.3 Analysis
After the mixing dead time constant buffer and temperature conditions are
reached and the chemical shift position of the resonances of the different protein
states are fixed because we are usually working in the slow exchanging regime.
Thus, well-resolved resonances can be integrated within identical limits in every
1D spectrum to obtain input data for the kinetic analyses. Each 1D spectrum is a
558 16 Kinetic Protein Folding Studies using NMR Spectroscopy
sum of spectra of the respective protein populations present at the time point of
recording. The kinetic information obtained from well-resolved resonances of these
states allows a reconstruction of the full 1D spectrum of every state [38, 61]. Usu-
ally the final native state has the largest deviations from random coil chemical
shifts and well-resolved resonances of aromatic residues or high field shifted CH2
and CH3 groups are unique for this state. Secondly, a 1D spectrum with a very
good signal-to-noise ratio can be recorded after the reaction has finished. This spec-
trum has to be scaled according to the refolding kinetics of the native state, which
has been determined from the integrals of its well-resolved resonances. This scaled
1D spectrum has to be subtracted from every 1D spectrum of the real-time NMR
experiment. This step eliminates all contributions of the native state to the 1D
spectra. If a two-state reaction is analyzed, the remaining NMR intensity repre-
sents entirely the 1D spectrum of the population from which the reaction has
started. Now, these 1D spectra can be averaged to get a decent 1D spectrum of
this state. Secondly, the decay kinetics should reveal the same rate constant as the
build-up of the native state. Thirdly, this approach allows determination of the re-
sult of fast folding events during the dead time of the NMR experiment, which
cause a burst of one of these two states [38, 61]. Finally, the 1D contributions of
the decaying population can be eliminated in a second subtraction step by scaling
the above-mentioned averaged 1D spectrum according to the decay kinetics. If the
two-state approximation is valid, no residual NMR intensity should be obtained
[38]. It is a sensitive test to add all 1D spectra after the second subtraction step. In
the case of RNase T1, this test showed a third protein species populated only by 7%
at the beginning of the refolding reaction [61]. Its full 1D spectrum and time
course could be determined by analyzing the remaining NMR intensities after the
second subtraction step.
The diffusion delay (for example 30 ms) and the variation of the gradient strength
(a constant gradient pulse length between 5 ms and 10 ms are common) have to be
adjusted before the real-time NMR experiment to make sure that the intensities of
the protein signals decay by more than 50% at the highest gradient strength. If di-
oxan is used as internal standard at 3.6 ppm, a modified version of Eq. (1) has to be
used [14] and a dioxan signal should remain for at least half of the varied gradient
strengths. A fit of Eq. (1) to the Gaussian decay curves (gradient strength plotted
against NMR intensity) reveals d, which is proportional to the diffusion coefficient
of the protein. d can be converted into the hydrodynamic radius R h of the protein
by assuming R h of dioxan of 2.12 Å [14, 82]. From the empiric correlations R h ¼
16.6 Experimental Protocols 559
4:75 N 0:29 and R h ¼ 2:21 N 0:57 the hydrodynamic radius determined by NMR
diffusion experiments can be estimated from the number of residues N of a glob-
ular folded protein and an unfolded peptide chain, respectively [82].
For the determination of the hydrodynamic radius of a transient protein folding
intermediate in 1D real-time NMR diffusion experiments, a set of diffusion experi-
ments have to be recorded during the refolding reaction [56, 75]. The gradient
strength has to be varied continuously between low and high values typically in
10 steps. For a two-state model, where the intermediate forms in the dead time of
the time-resolved NMR experiment, the total intensity of all protein resonances
Iðg; tÞ in Eq. (2) depend on the known gradient strength g at every time point t.
The relative populations of the intermediate and the native state can be calculated
from the known reaction rate constant k. The differences in the relaxation rates be-
tween the intermediate and the native state are reflected in the respective ampli-
tudes A I and A N .
16.6.2
How to Extract Folding Rates from 1D Spectra by Line Shape Analysis
For an accurate line shape analysis the system under investigation has to fulfill sev-
eral requirements. Most important is the existence of a two-state folding process
where only folded (N) and unfolded (U) protein molecules are present at equilib-
rium and in the kinetics at all denaturant concentrations (Eq. (3)). Additionally it
is important that the analyzed resonance signals are well resolved throughout the
entire equilibrium transition and do not overlap with other resonances. In rare
cases, where only few resonances are in a narrow region of the 1D spectra under
all conditions, the entire 1D spectrum can be simulated to overcome signal overlap
problems [12, 102]. Denaturation of the protein can be achieved either by succes-
sively adding chaotropic agents (for example urea, guanidinium chloride) or by in-
creasing the temperature or by a combination of both. A small difference between
the resonance frequency of the native and unfolded state (below 100 Hz) allows a
determination of folding time constants of tens of milliseconds, whereas a differ-
ence above 1500 Hz can cover time constants in a submillisecond range. This dif-
ference can be varied for the same system by using different field strengths [107].
Since the exchange between the two states is in the fast exchange regime in respect
to the NMR chemical shift time scale (ðk f þ k u Þ > Dn) only one resonance is de-
tectable over the entire transition with a line shape and resonance frequency that
represents the relative populations.
ku
N8U (3)
kf
560 16 Kinetic Protein Folding Studies using NMR Spectroscopy
16.6.2.1 Acquisition
NMR spectra of high quality at reasonably high protein concentrations (ensure the
absence of aggregation) with a very good signal-to-noise ratio should be obtained
with at least 256 transients and 4096 complex points. Small receiver gains should
be used to avoid disadvantageous baseline effects. To prevent saturation of 1 H with
long spin-lattice relaxation times (t1 ) the recycle delay between individual transi-
ents should be at least 3 s. This ensures that the magnetization reaches the Boltz-
mann equilibrium. Using 2 H2 O as solvent for the protein and deuterated buffers
should be preferred because a selective suppression of individual resonances (for
example from water or denaturants) reduces the quality of the fit by producing res-
onance intensities, which are not proportional to the protein populations as well as
baseline problems. For example the WATERGATE sequence can effectively sup-
press the solvent signal but has a nonlinear excitation profile at the edges of the
spectral window. Therefore, resonances that undergo a change in chemical shift
during the unfolding transition are not excited equally throughout the entire tran-
sition. If chemically induced denaturation of the protein is achieved by adding urea
or guanidinium chloride the exchangeable 1 H are substituted by NMR nonactive
2
H, which reduces the solvent suppression scheme to residual water. Therefore, a
long presaturation pulse with a duration of 1–1.5 s on the water signal is sufficient.
To minimize the residual water one should also perform several deuteration steps
of the denaturation agent and the protein before mixing the sample. Another ad-
vantage of 2 H2 O is that the exchangeable 1 H of the protein, namely the amide pro-
tons, are not detectable, which greatly reduces spectral overlap in the low field re-
gion of the 1 H spectrum (6–11 ppm). Thus, well-resolved resonances of 1 H of the
aromatic amino acids Phe, Tyr, Trp, and His can be used to perform the line shape
analysis approach. High-field shifted resonances in the native state have the disad-
vantage that these protons cluster at their random coil positions in the unfolded
state.
16.6.2.2 Processing
The NMR data are converted to Felix format or to equivalent processing software
such as NMR pipe, XEASY, etc. The data must be dc-offset corrected and zero-filled
once in respect to the acquired complex points. No window function should be ap-
plied to prevent distortion of the original line shape of the resonance. If a window
function is required, only a single-exponential function can be used, which in-
creases the adjustable T2N and T2U . Individual spectra are separately phase cor-
rected because a correct phasing is essential for the extraction of reliable rates in
the further analysis. Baseline correction of each spectrum in the region around
the resonance of interest is also performed manually. The order of the polynomial
function depends on the quality of the baseline but should be invariant for all spec-
tra. The referencing of the 1 H spectrum to an internal standard like dioxane, DSS,
or TSP is recommended. The choice of the standard should depend on the denatu-
ration method (chaotropes, pH, temperature) due to different properties of the re-
spective compound and on the distance between the resonance frequency of the
standard and the resonance of interest to avoid unwanted disturbance. To reduce
16.6 Experimental Protocols 561
the amount of data the spectra are truncated to the region of the analyzed signal
and the real data points are saved in an ASCII format.
16.6.2.3 Analysis
The line shape analysis can be performed with any standard processing software
like Sigmaplot (SPSS Inc.), Kaleidagraph (Synergy software), or Grafit (Erithacus
software), which is capable of nonlinear least square fitting using the Levenberg-
Marquardt algorithm. Especially for this purpose Oas and coworkers designed the
ALASKA package, which is based on Mathematica [102]. This package is also very
useful for simulating complete aromatic 1D 1 H spectra. The formalism to deter-
mining folding rates from line shape analysis (Eq. (4)) was introduced by Oas and
coworkers [11], and is only applicable when the two-state approximation (Eq. (3)) is
fulfilled. In Eq. (4), IðvÞ represents the intensity as a function of the frequency v
and C0 is a normalization constant, which is proportional to the protein concentra-
tion. IðvÞ also depends on the population of the native and unfolded state ( p N ; p U ),
the apparent spin–spin relaxation times of the resonance in the respective state un-
der nonexchanging conditions (T2N ; T2U ), the resonance frequency of both states
(v N ; v U ) and the refolding and unfolding rate constant (k f ; k u ).
C0 pN pU
IðvÞ ¼ P 1 þ t þ þ Q R
P 2 þ R2 T2U T2N
1 2 2 2 2 pN pU
P ¼t T 4p Dv þ p ðdvÞ þ þ
T2N 2U T2N T2U
Q ¼ t ð2p Dv p dv ð p N p U ÞÞ
1 1 1 1
R ¼ 2p Dv 1 þ t þ þ p dv t
T2N T2U T2U T2N
þ p dv ð p N p U Þ
Dv ¼ ðv N þ v U Þ/2 dv ¼ ðv N v U Þ t ¼ ðk f k u Þ1
pN ¼ kf t pU ¼ ku t ð4Þ
In the following, the line shape analysis procedure is described step by step.
1. The apparent transverse relaxation rates of the resonance in the native and
unfolded state (T2N ; T2U ) have to be determined. Therefore, the signal of the
fully native or unfolded protein, respectively, is simulated with a function cor-
responding to an absorptive Lorentzian signal (IðvÞ ¼ C0 R 2 /fR22 þ ðv i vÞg)
where R 2 ¼ 1/T2 and v i represents the chemical shift of the signal.
2. The dependence of the resonance frequency of the signal of the native and the
unfolded state from the denaturant concentration or the temperature is also re-
quired. Thus, the resonance frequency of the respective signal in the spectra
under nonexchanging conditions (that is in the beginning of the baselines of the
transition under strong native and strong nonnative conditions) is measured.
562 16 Kinetic Protein Folding Studies using NMR Spectroscopy
16.6.3
How to Extract Folding Rates from 2D Real-time NMR Spectra
In principle, there are two ways to determine folding rates by 2D real-time NMR.
First, a series of individual 2D NMR experiments can be recorded during the fold-
ing reaction, which reveals cross-peaks with increasing or decreasing intensities
depending on the respective state. This method is useful if the folding reaction is
slow enough to record a set of reasonably good 2D spectra. The processing of these
spectra follows standard protocols. Individual cross-peaks can be integrated in
every 2D spectrum or its maximum can be used to determine the folding rates
[75]. The second method to obtain folding rates from 2D real-time NMR spectra
is, in terms of acquisition, even more straightforward but processing and analysis
are more complex. Technically, only a single 2D NMR spectrum is recorded during
the entire refolding reaction, which has been initiated by a rapid mixing device (see
Section 16.6.1). This has the advantage that significantly more transients and t1 -
increments can be recorded compared with the first method and therefore the
signal-to-noise ratio and spectral resolution in the indirect dimension is much
higher. As discussed in Section 16.3, the folding reaction leads to different line
shapes and intensities of cross-peaks assigned to the respective species (Figure
16.5 and Figure 16.6). Comparing the kinetic experiment with a reference experi-
ment under identical conditions after the reaction is finished reveals three different
read-outs. Cross-peaks of the starting conformation (for example the intermediate
or unfolded state) lose intensity during the reaction and exhibit line broadening.
Cross-peaks of the emerging final state (for example the native state) show increas-
ing intensities accompanied by characteristic line shape in the indirect dimension.
Finally, cross-peaks with invariant chemical shifts in the starting and the final con-
formation depict constant intensity and narrow line widths. The assignment of in-
dividual cross-peaks to the respective species can be achieved by using the different
line shape, which works nicely for 2D 1 Ha 15 N HSQC spectra (Figure 16.6). In the
case of 2D NOESY or 2D TOCSY spectra this approach becomes very time consum-
16.6 Experimental Protocols 563
ing because hundreds of cross-peaks have to be analyzed and signal overlap can
alter the expected theoretical line shapes. A more straightforward procedure is to
subtract the kinetic experiment from the reference experiment, revealing positive
or negative intensities for the final or starting state, respectively. Cross-peaks with
invariant chemical shifts show the same intensity in both experiments and cancel
out. A mathematical description is provided below. The power of this method
was shown for 2D 1 H- 1 H NOESY [61] and 2D 1 H- 15 N HSQC experiments [54]
and can theoretically be extended to any other two- or three-dimensional NMR
experiment.
16.6.3.1 Acquisition
Basically, one single 2D or 3D NMR experiment is recorded after initiation of the
refolding reaction depending on the time constant of the folding reaction. Since
the folding kinetics modulates the FIDs of the indirect dimension a reasonable
number of t1 -increments should be recorded to achieve high spectral resolution.
The dead time of the experiment can be greatly reduced by using a mixing device
(see Section 16.6.1) because tuning and matching of the probe and shimming as
well as temperature calibration can be performed before starting the reaction. No
modifications of standard pulse programs are needed.
16.6.3.2 Processing
The NMR data can be converted and analyzed by any standard processing software.
The FIDs in both dimensions should be zero-filled once and an apodization in the
time domain applied. Only single-exponential window functions can be used if a
quantitative line shape analysis is to be performed to extract protein folding rates.
16.6.3.3 Analysis
The assignment of each cross-peak to the respective state can be achieved by ana-
lyzing the different line shapes. For a more straightforward assignment of the
cross-peaks a simple subtraction method can be used. The complex FID for a sig-
nal with a frequency offset v 0 and an apparent transversal relaxation rate constant
R2 is given by Eq. (5).
As discussed before, three cases of modulation of the FID in the indirect dimen-
sion are possible and are defined by Eq. (6) with the assumption that k is the rate
constant of a two-state transition.
G 1 ðtÞ ¼ expðktg
G 2 ðtÞ ¼ 1 expðktg
G 3 ðtÞ ¼ 1 ð6Þ
Subtracting the three equations (Eq. (7)) from the reference experiment of the fully
native protein reveals Eq. (8), which corresponds to the three different read-outs
discussed above.
" #
1 R2 þ k
RefFT½S ðtÞg ¼ A
ðR2 þ kÞ 2 þ 4p 2 ðn n 0 Þ 2
" #
R2 þ k
RefFT½S 2 ðtÞg ¼ A
ðR2 þ kÞ 2 þ 4p 2 ðn n 0 Þ 2
The refolding rate is obtained from the 2D NMR spectra (for example 1 H- 1 H
NOESY, 1 H- 15 N HSQC) by a two-step line shape analysis procedure. For this
approach the 1D spectrum in the indirect dimension has to be extracted from the
two-dimensional matrix of the kinetic and reference experiment, respectively. In
the first step, the resonance signal of the reference spectrum is fitted with the real
part of the Fourier transformed function S 3 ðtÞ of Eq. (7), which yields the ampli-
tude, R2 and n 0 of the native signal under equilibrium conditions (Figure 16.10a).
In the second step, the resonance signal of the native state in the kinetic experi-
ment is fitted by RefFT½S 2 ðtÞg of Eq. (7) using the determined R2 and n 0 as fixed
parameters (Figure 16.10b). The thus derived apparent folding rate k has to be fur-
ther scaled because the time scale of the acquisition of the indirect dimension
(t dwell ¼ 1/2SW indirect with SW indirect as the spectral width in the incremented di-
mension) and the ‘‘real’’ time resolution between two points in the FID of the in-
direct dimension (that is the explicit time between two sequential t1 -increments,
t1D , which is the total length of the refolding experiment divided by the number
t1 -increments) differ. The actual refolding rate k f can therefore be calculated by
Eq. (9):
k dwell
kf ¼ k ð9Þ
k1D
16.6 Experimental Protocols 565
20
(a) (b)
15
Rel. NMR intensity
10
0
650 700 750 800 650 700 750 800
Frequency (Hz) Frequency (Hz)
Fig. 16.10. Line shape analysis of A19 from 725 Hz, respectively. b) The equivalent 15 N 1D
S54G/P55N-RNase T1. The folding reaction at NMR spectrum of A19 from the kinetic 1 H-
15 C was followed by 2D 1 H- 15 N NMR 15
N HSQC experiment (Figure 16.6a). The
correlation spectroscopy. a) The 15 N 1D NMR resonance shows the characteristic line shape
spectrum of A19 in the native state (trace with the two flanking negative components.
along in the indirect F1 dimension from the Fitting the data to RefFT½S 2 ðtÞg of Eq. (7)
reference spectrum (Figure 16.6b)). The with constant R2 and v 0 from (a) reveals the
continuous line represents the fit of the data to continuous line and a rate k of 130.6 s1 . The
RefFT½S 3 ðtÞg of Eq. (7) (see Section 16.6.3), actual folding rate k f is then calculated by Eq.
which reveals R2 and v 0 with 23.4 s1 and (9) and yields 4:3 104 s1 .
16.6.4
How to Analyze Heteronuclear NMR Relaxation and Exchange Data
As described in the preceding sections, the determination of explicit un- and re-
folding rates requires a protein that follows a two-state folding model. One of the
first applications of R 2 relaxation dispersion curves to study fast folding reactions
was performed by homonuclear 1D 1 H NMR spectroscopy [12]. To overcome the
limitations of 1D NMR like severe spectral overlap, more recently heteronuclear
2D relaxation methods have been established [10], which provide many site-
specific probes. Therefore, a further requirement is to enrich the protein under in-
vestigation with the NMR active heteronuclei 15 N and/or 13 C.
One common approach is the measurement of so-called 15 N R 2 dispersion
curves. Despite the high spectral resolution of the 2D 1 H- 15 N correlation spectra
used to determine the R 2 relaxation rates there are some further advantages of
this method in respect to the above-mentioned 1D experiments (for example line
shape analysis, see Section 16.6.2). First, the determination of the R 2 rates is
566 16 Kinetic Protein Folding Studies using NMR Spectroscopy
y y f2 y y y
t t1
1
D D D D D+ 21 2 D D D D D d d
H
f1 f1 y y f3 f4 f4 y y
15
t t t t t t t t
N GARP
2n 2n 2n 2n
Grd.
G1 G2 G2 G3 G4 G4 G5 G6 G6 G6 G6 G7
Fig. 16.11. Pulse sequence of a relaxation- The wide white bars represent 180 15 N pulses
compensated 15 N CPMG experiment for the in the spin echo sequence. All pulses are
determination of conformational and chemical x-phase unless indicated otherwise and
exchange. Variation of t ðt ¼ tcp /2Þ and a gradients are shown as half-ellipsoids. Spectra
successive determination of R 2 for each tcp for initial intensities are obtained by omitting
reveals the so called R 2 relaxation dispersion the sequences in brackets. For more experi-
curves (Figure 16.9). Narrow and wide black mental details and theoretical description see
bars depict 90 and 180 pulses, respectively. Ref. [132].
straightforward and robust, because instrument effects such as magnetic field in-
homogeneities might introduce significant errors in 1D line shape analyses. More
important, it is possible to extract folding rates down to a population of one state of
about 1%. The major drawback of this method is the time-consuming acquisition.
R 2 rates are usually measured with pulse sequences based on the Carr-Purcell-
Meiboom-Gill (CPMG) spin echo element (½t 180 t n ) with t cp ¼ 2 t [126,
127]. An example of a useful pulse sequence and further description is provided
in Figure 16.11. Principally, the spin echo element contains a single 180 pulse
on the nuclei of interest, which is separated by two equivalent delays t ¼ t cp /2.
Contributions from chemical exchange (R ex ) to R 2 can be effectively eliminated by
the CPMG sequence if the exchange process is on a slower time scale in respect
to the delay time t cp . For very short t cp values (that is the fast pulsing limit) only
very fast processes on a picosecond to nanosecond time scale contribute to the
apparent R 2 value, which is R20 . However, if t cp is made long enough that multiple
folding/unfolding events occur during t cp , chemical exchange R ex contributes to
R 2 : R 2 ¼ R20 þ R ex . Therefore, a dispersion of R 2 regarding R ex can be generated
by a variation of t cp .
16.6.4.1 Acquisition
High-quality heteronuclear 2D NMR spectra can be recorded with pulse sequences
as shown in Figure 16.11. The recycle delay between individual transients should
be as long as possible and at least 1 s. For each 2D 1 H- 15 N spectrum at least 128 t1 -
increments in the indirect dimension should be acquired with 16–32 transients for
each t1 -increment depending on the protein concentration. The relaxation delay
using a fixed spin echo delay t cp has to be varied by multiple repetition of the
spin echo sequence. This relaxation delay (calculated by 4 ð2n ð2t þ t180ðNÞ ÞÞ þ
2 D þ t180ðNÞ þ t90ðHÞ according to Figure 16.11, where t180ðNÞ and t90ðHÞ denote
16.6 Experimental Protocols 567
the length of the 180 pulse on 15 N and the 90 pulse on 1 H, respectively) varies
typically between 10 ms and 250 ms. The decay of the intensity for an accurate de-
termination of R 2 should be defined by at least six different relaxation delays with a
minimal loss of 80% of the initial intensity.
16.6.4.2 Processing
The 2D NMR data can be processed with any available processing software such as
Felix, NMR pipe, XEASY, etc. and follows closely standard protocols. The FID is
zero-filled once in both dimensions and any window function can be applied be-
fore Fourier transformation. Normally phase correction in the 1 H dimension re-
mains constant over sequentially acquired data sets if the pulse program is not
changed. Phase correction in the indirect dimension should be avoided by properly
adjusting the first t1 -increment. The spectral resolution in the indirect dimension
can be enhanced by linear prediction methods especially if only 128 t1 -increments
are recorded. A baseline correction can also be performed. However, most impor-
tant is that all spectra are processed identically to minimize uncertainties of the
extracted peak intensities.
16.6.4.3 Analysis
For the determination of the R 2 rates for every t cp value the fitting of the inten-
sity decay curve is applicable with a single exponential function without offset
(I ¼ A expðR 2 tÞ) and can be performed using any standard software. However,
the programs of Palmer and coworkers such as CURVEFIT and for further analysis
CPMGfit (http://cpmcnet.columbia.edu/dept/gsas/biochem/labs/palmer/software/
modelfree.html) are strongly recommended because they are specially designed for
this purposes and due to the extensive error analysis by various methods such as
Monte-Carlo or jack-knife simulations [128, 129]. The derived R 2 values are plotted
on the ordinate of the R 2 dispersion curve (see Figure 16.9). The exchange rate k ex
and the chemical exchange contribution R ex to R 2 can be derived by fitting Eq. (10)
to the R 2 dispersion curves [10, 130]. This formalism represents the general phe-
nomenological description of R 2 of site N in Eq. (3), which is independent of the
exchange regime (slow, intermediate or fast in respect of the NMR chemical shift
time scale) as well as the present population ratio.
1 1
R 2 ð1/t cp Þ ¼ R2N þ R2U þ k ex cosh1 ½Dþ coshðhþ Þ D coshðh Þ
2 t cp
ð10Þ
with
" # rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 c þ 2Do 2 t cp
DG ¼ G1 þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2ffi ; hG ¼ pffiffiffi Gc þ c2 þ z2
2 c2 þ z 2
where Do is the difference of the 15 N chemical shift of the cross-peak between the
native and the unfolded state (Do ¼ 2pDn with Dn given in Hz), p N and p U the
population of the respective state and t cp ¼ 2 t as well as k ex ¼ k f þ k u . Do de-
fines the exchange regime for a given k ex (k ex < Do: slow exchange; k ex A Do: in-
termediate exchange; k ex > Do: fast exchange). A less complex approximation was
proposed by Ishima and Torchia, which is valid for all time scales if p N g p U (Eq.
(11)).
Acknowledgments
References
1 Wagner, G., Wüthrich, K. J. Mol. Kay, L. E. J. Am. Chem. Soc. 2001, 123,
Biol. 1982, 160, 343–361. 11341–11352.
2 Wand, A. J., Roder, H., Englander, 16 Myers, J. K., Oas, T. G. Annu. Rev.
S. W. Biochemistry 1986, 25, 1107– Biochem. 2002, 71, 783–815.
1114. 17 Meiler, J., Peti, W., Griesinger, C.
3 Englander, S. W., Sosnick, T. R., J. Am. Chem. Soc. 2003, 125, 8072–
Englander, J. J., Mayne, L. Curr. 8073.
Opin. Struct. Biol. 1996, 6, 18–23. 18 Meiler, J., Prompers, J. J., Peti, W.,
4 Bai, Y. W., Sosnick, T. R., Mayne, L., Griesinger, C., Brüschweiler, R.
Englander, S. W. Science 1995, 269, J. Am. Chem. Soc. 2001, 123, 6098–
192–197. 6107.
5 Radford, S. E., Dobson, C. M., 19 Grimaldi, J., Baldo, J., McMurray,
Evans, P. A. Nature 1992, 358, 302– C., Sykes, B. D. J. Am. Chem. Soc.
307. 1972, 94, 7641–7645.
6 Udgaonkar, J. B., Baldwin, R. L. 20 Grimaldi, J. J., Sykes, B. D. J. Am.
Nature 1988, 335, 694–699. Chem. Soc. 1975, 97, 273–276.
7 Roder, H., Elöve, G. A., Englander, 21 Kühne, R. O., Schaffhauser, T.,
S. W. Nature 1988, 335, 700–704. Wokaun, A., Ernst, R. R. J. Magn.
8 Dobson, C. M., Evans, P. A. Reson. 1979, 35, 39–67.
Biochemistry 1984, 23, 4267–4270. 22 McGee, W. A., Parkhurst, L. J. Anal.
9 Roder, H. Methods Enzymol. 1989, Biochem. 1990, 189, 267–273.
176, 446–473. 23 Frieden, C., Hoeltzli, S. D., Ropson,
10 Palmer, A. G., III, Kroenke, C. D., I. J. Protein Sci. 1993, 2, 2007–2014.
Loria, J. P. Methods Enzymol. 2001, 24 van Nuland, N. A. J., Forge, V.,
339, 204–238. Balbach, J., Dobson, C. M. Acc.
11 Huang, G. S., Oas, T. G. Proc. Natl Chem. Res. 1998, 31, 773–780.
Acad. Sci. U.S.A. 1995, 92, 6878–6882. 25 Mok, K. H., Nagashima, T., Day, I. J.,
12 Burton, R. E., Huang, G. S., Jones, J. A., Jones, C. J., Dobson,
Daugherty, M. A., Fullbright, C. M., Hore, P. J. J. Am. Chem. Soc.
P. W., Oas, T. G. J. Mol. Biol. 1996, 2003, 125, 12484–12492.
263, 311–322. 26 Hoeltzli, S. D., Frieden, C.
13 Vugmeyster, L., Kroenke, C. D., Biochemistry 1996, 35, 16843–16851.
Picart, F., Palmer, A. G., III, 27 Hoeltzli, S. D., Frieden, C.
Raleigh, D. P. J. Am. Chem. Soc. Biochemistry 1998, 37, 387–398.
2000, 122, 5387–5388. 28 Bann, J. G., Pinkner, J., Hultgren,
14 Zeeb, M., Jacob, M. H., Schindler, S. J., Frieden, C. Proc. Natl Acad. Sci.
T., Balbach, J. J. Biomol. NMR 2003, USA 2002, 99, 709–714.
27, 221–234. 29 Balbach, J., Schmid, F. X. In
15 Tollinger, M., Skrynnikov, N. R., Mechanisms of Protein Folding; 2nd
Mulder, F. A., Forman-Kay, J. D., edn; Pain, R. H., Ed.; Oxford
570 16 Kinetic Protein Folding Studies using NMR Spectroscopy
105 Kuhlman, B., Luisi, D. L., Evans, Boyer, M., Saldana, J. L., Akasaka,
P. A., Raleigh, D. P. J. Mol. Biol. K., Roumestand, C. J. Mol. Biol. 2002,
1998, 284, 1661–1670. 320, 609–628.
106 Spector, S., Raleigh, D. P. J. Mol. 119 Brooks, C. L. Acc. Chem. Res. 2002,
Biol. 1999, 293, 763–768. 35, 447–454.
107 Wang, M., Tang, Y., Sato, S., 120 Dobson, C. M., Sali, A., Karplus, M.
Vugmeyster, L., McKnight, C. J., Angew. Chem. Int. Ed. Engl. 1998, 37,
Raleigh, D. P. J. Am. Chem. Soc. 868–893.
2003, 125, 6032–6033. 121 Snow, C. D., Nguyen, H., Pande,
108 Myers, J. K., Oas, T. G. Nat. Struct. V. S., Gruebele, M. Nature 2002, 420,
Biol. 2001, 8, 552–558. 102–106.
109 Millet, O., Loria, J. P., Kroenke, 122 Zagrovic, B., Snow, C. D., Shirts,
C. D., Pons, M., Palmer, A. G. J. Am. M. R., Pande, V. S. J. Mol. Biol. 2002,
Chem. Soc. 2000, 122, 2867–2877. 323, 927–937.
110 Farrow, N. A., Zhang, O., Forman- 123 Hoeltzli, S. D., Ropson, I. J.,
Kay, J. D., Kay, L. E. J. Biomol. NMR Frieden, C. Tech. Prot. Chem. 1994, V,
1994, 4, 727–734. 455–465.
111 Akke, M., Liu, J., Cavanagh, J., 124 Piotto, M., Saudek, V., Sklenar, V.
Erickson, H. P., Palmer, A. G., III. J. Biomol. NMR 1992, 2, 661–665.
Nat. Struct. Biol. 1998, 5, 55–59. 125 Helgstrand, M., Härd, T., Allard,
112 Ishima, R., Wingfield, P. T., Stahl, P. J. Biomol. NMR 2000, 18, 49–63.
S. J., Kaufman, J. D., Torchia, D. A. 126 Carr, H. Y., Purcell, E. M. Phys. Rev.
J. Am. Chem. Soc. 1998, 120, 10534– 1954, 94, 630–638.
10542. 127 Meiboom, S., Gill, D. Rev. Sci.
113 Mulder, F. A., van Tilborg, P. J., Instrum. 1958, 29, 688–691.
Kaptein, R., Boelens, R. J. Biomol. 128 Mandel, A. M., Akke, M., Palmer,
NMR 1999, 13, 275–288. A. G., III J. Mol. Biol. 1995, 246, 144–
114 Mulder, F. A., Mittermaier, A., 163.
Hon, B., Dahlquist, F. W., Kay, L. E. 129 Palmer, A. G., III, Rance, M.,
Nat. Struct. Biol. 2001, 8, 932–935. Wright, P. E. J. Am. Chem. Soc. 1991,
115 Mulder, F. A., Hon, B., Mittermaier, 113, 4371–4380.
A., Dahlquist, F. W., Kay, L. E. J. 130 Carver, J. P., Richards, R. E.
Am. Chem. Soc. 2002, 124, 1443–1451. J. Magn. Reson. 1972, 6, 89–105.
116 Hill, R. B., Bracken, C., DeGrado, 131 Yao, J., Chung, J., Eliezer, D.,
W. F., Palmer, A. G. J. Am. Chem. Wright, P. E., Dyson, H. J.
Soc. 2000, 122, 11610–11619. Biochemistry 2001, 40, 3561–3571.
117 Kim, S., Szyperski, T. J. Am. Chem. 132 Loria, J. P., Rance, M., Palmer, A. G.
Soc. 2003, 125, 1385–1393. J. Am. Chem. Soc. 1999, 121, 2331–
118 Kitahara, R., Royer, C., Yamada, H., 2332.
573
17
Fluorescence Resonance Energy Transfer
(FRET) and Single Molecule Fluorescence
Detection Studies of the Mechanism of
Protein Folding and Unfolding
Elisha Haas
Abbreviations
17.1
Introduction
17.2
What are the Main Aspects of the Protein Folding Problem that can be Addressed
by Methods Based on FRET Measurements?
17.2.1
The Three Protein Folding Problems
The term ‘‘protein folding’’ covers three different general phenomena: (a) the tran-
sition of the backbone to its native fold, known as the chain entropy problem; (b) the
stabilization of the native fold, that is the stability problem; and (c) the refolding of
the protein molecule to different conformations in response to small changes of
the solution conditions, ligand binding, membrane insertion, or protein–protein
interactions. This is known as the function problem.
hqE 2 i ¼ kB T 2 CV ð1Þ
Where hqE 2 i is the mean squared energy fluctuation, related to the second mo-
ment of energy distribution, kB is the Boltzmann constant, T is the absolute tem-
perature, and CV is the heat capacity at constant volume. A typical heat capacity for
proteins is approximately 0.3 cal K1 g1 [11], and for a 25 000 Da protein this
leads to a root-mean-squared fluctuation of 35 kcal mol1 . This number should be
compared to the common average free energy of stabilization of the folded states of
proteins, @10–15 kcal mol1 [12].
The dynamics of protein molecules do not have one typical time scale, and there
is a hierarchy of time scales for motions of different structural elements in a pro-
tein. The fastest motions are those of side chains that occur on the picosecond time
scale, while slow, large scale conformational changes may take microseconds to
be completed [13]. The investigation of the functional properties of many proteins,
especially enzymes, still relies on relatively slow kinetic techniques [14]. Ensemble
trFRET and single molecule FRET spectroscopy are best posed to detect and char-
acterize such fluctuations.
In short, the sensitivity, the time and spatial resolution, the specificity of mea-
surements of structural elements, the option of measurements of distributions of
intramolecular distances and their rapid fluctuations by the ensemble trFRET and
single molecule FRET experiments, makes FRET a powerful tool in protein folding
research. The currently limited extent of applications of this approach is probably
a result of the complex procedure involved in the biochemistry and chemistry of
site specific labeling of the protein samples. The following sections contain the the-
oretical background of the FRET experiments, the principles of their applications
in folding research, and the essential control experiments that should support the
data analyses of these experiments.
17.3
Theoretical Background
17.3.1
Nonradiative Excitation Energy Transfer
vibronic levels in the donor have the same energy as the corresponding transitions
in the acceptor. Energy transfer can result from different interaction mechanisms.
The interactions may be Coulombic and/or due to intermolecular orbital overlap.
The Coulombic interactions consist of long-range dipole–dipole interactions (För-
ster’s mechanism) and short-range multipolar interactions. In the energy transfer
process, the initially excited electron on the donor, D, returns to ground state or-
bital on D, while simultaneously an electron on the acceptor, A, is promoted to
an excited state. For permitted transitions on D and A, the Coulombic interac-
tion is predominant, even at short distances. This is the case of the singlet-singlet
transfer:
1
D þ 1 A ! 1 D þ 1 A
which is effective over a long interaction range (up to 80–100 Å). For forbidden
transitions on D or A, the Coulombic interaction is negligible and the exchange
mechanism is dominant. Here, the interaction is effective only at short distances
(< 10 Å) because the interaction requires the overlap of the molecular orbitals.
This is the case of triplet–triplet transfer
3
D þ 1 A ! 1 D þ 3 A and 3
D þ 1 A ! 1 D þ 1 A
17.3.2
What is FRET? The Singlet–Singlet Excitation Transfer
1) The acronym FRET denoting fluorescence The correct term should be EET, representing
resonance energy transfer is incorrect because ‘‘electronic energy transfer’’ or ‘‘excitation
the transfer does not involve any fluorescence energy transfer’’ or RET for ‘‘resonance
but the electronic excitation energy of the energy transfer.’’ However, since the literature
donor. This mechanism does not depend on is saturated with the term FRET, we will use
the fluorescence properties of the acceptor. it.
578 17 Fluorescence Resonance Energy Transfer (FRET)
DE
Fig. 17.1. Energy level diagram demonstrating transitions are shown in dashed lines and the
the mechanism of resonance coupling of the radiative transitions are shown as continuous
nonradiative transitions in the donor and vertical lines. The upper inset shows a scheme
acceptor probes under conditions where the of the corresponding absorption and emission
vibrational relaxation is faster than the energy spectra of the probes and the overlap integral.
transfer (very weak coupling). The nonradiative
be affected by the components of the field that do not contribute to the radiation of
energy.
17.3.3
Rate of Nonradiative Excitation Energy Transfer within a Donor–Acceptor Pair
A molecule in an electronically excited state (donor) can transfer its excitation en-
ergy to another molecule (acceptor) [1] provided that the pair fulfills several condi-
tions [20]: (a) The energy donor must be luminescent; (b) the emission spectrum
17.3 Theoretical Background 579
of the donor should have some overlap with the absorption spectrum of the ac-
ceptor; and (c) the distance between the two probes should not exceed an upper
limit (usually within the range of up to 80 Å). The transfer is readily observed
when two different probes are involved, but transfer within a homogeneous popu-
lation of chromophores can occur. This transfer was detected already in 1927 by
Perrin by the loss of polarization of the emitted light [16]. The theory for the mech-
anism of long-range nonradiative energy transfer was developed by Förster with
both classical and quantum mechanical approaches. In both approaches, the inter-
action is assumed to be of the dipole–dipole type between the energy donor and
acceptor [1]. Förster considered two neighboring, electrically charged classical me-
chanical oscillators coupled through weak electrostatic interactions. This coupling
is analogous to two resonating pendulums that are weakly coupled mechanically
and exchange oscillation energy back and forth. Förster assumed that only the do-
nor oscillator is initially vibrating. Classically, the donor oscillator will lose energy
via radiation and nonradiative exchange with the acceptor provided that they are
close enough. The two dipoles, of dipole moment m (an approximation) interact
over distance R with interaction energy
Eint A km 2 =n 2 R 3 ð2Þ
where k is the orientation factor (see below) and n is the refractive index of the me-
dium. Förster calculated the distance between the two oscillators where the time
for radiative emission of a classical oscillator trad A 3hc 2 =m 2 o 2 and the time for ex-
change of energy between the two oscillators tint A h=Eint ¼ hn 2 R 3 =km 2 are equal.
Here R 1 R o when trad ¼ tint . R o , also known as the Förster critical distance, is the
distance between the donor and the acceptor, where half of the energy that would
have otherwise been emitted by the donor is transferred to the acceptor. Spectral
broadening by very rapid dephasing of the molecular excitation energies lengthens
the time considerably for the exchange of energy between the weakly coupled oscil-
lators, and this leads to smaller R o values. Förster calculated the probability of full
overlap of the spectral regions of the two oscillators and w, the probability that the
energies of both oscillators are simultaneously nearly the same within the small
interaction energy Eint. This probability is w ¼ ðW 0 =WÞðEint =hWÞ, where W and W 0
are respectively the spectral width (assumed to be the same for both probes) and
the overlap spectral region (the second term is the probability that frequency
within the broadband frequencies, W, will fall within the relatively narrow band-
width of Eint =h). The rate of transfer calculated without this dephasing correction
(1=tint A km 2 =hn 2 R 3 ) is then multiplied by w to give the rate of transfer with real
systems:
Here, we see that the correction for the spectral broadening in real systems gives
the famous R 6 dependence of the rate of excitation energy transfer on the inter-
chromophoric distance.
580 17 Fluorescence Resonance Energy Transfer (FRET)
By solving the distance R o where trad ¼ t 0 int , Förster calculated the dependence
of R o on the characteristics of the two probes:
where trad ¼ 3hc 3 =m 2 o 3 replaces m 2 and c ¼ lo=2p. The most relevant feature of
this derivation in the context of application of FRET methods in folding research
is the 1=R 6 dependence of the FRET efficiency. This strong distance depen-
dence derives from the distance dependence of the interaction energy of a dipole
on 1=R 3 which is multiplied by the correction for the dephasing of the two spectra.
The strict requirement for exact resonance between the two interacting dipoles
means that overlapping narrower (line) spectra can exchange energy over larger
distances. Förster emphasized that although the transfer mechanism depends on
exchange of excitation energy quanta, it is essentially a resonance mechanism and
hence the classical approximation gives satisfactory results. Förster published his
rigorous quantum mechanical derivation in 1948 and showed that the original clas-
sical derivation included all the essentials of process [1].
An alternative classical derivation by Steinberg et al. [17] also assumes an inter-
action between two idealized oscillating dipoles. Let us denote the donor dipole by
f ¼ fo sin 2pnt where fo is the amplitude, t is time, and n is the frequency of oscil-
lations. The fields surrounding such an oscillating dipole are given in polar coordi-
nates as follows:
fo 2pn r 1 r
Er ¼ 2 cos y cos 2pn t þ sin 2pn t
De r 2 c=n c=n r3 c=n
"
f0 ð2pnÞ 2 r 2pn r
Ey ¼ sin y sin 2pn t þ cos 2pn t
De rðc=nÞ 2 c=n r 2 c=n c=n
#
1 r
þ 3 sin 2pn t
r c=n
" #" #
f0 ð2pnÞ 2 r 2pn r
Hf ¼ sin y sin 2pn t þ 2 cos 2pn t
ðDe mÞ 1=2 rðc=nÞ 2 c=n r c=n c=n
Ef ¼ Hr ¼ Hy ¼ 0 ð5Þ
where E and H are the electric and magnetic fields, respectively. The subscripts r; y
and f denote components along the r; y and f spherical coordinates, respectively;
De and m are respectively, the dielectric constants and magnetic permeability of the
medium; n is the refractive index of the medium, and c is the velocity of light
in vacuum. Only components whose Poynting’s vector ½ðc=4pÞðE HÞ is nonzero
carry radiative energy flux. Only the first terms of Ey and Hf , which show 1=r de-
pendence contribute to the energy flux radiated from the oscillating dipole over
complete cycle of oscillation. The other terms carry energy from the dipole to the
field and back in an oscillatory way, with zero net radiative flux. At distances
17.3 Theoretical Background 581
shorter than the wavelength of the emitted radiation, the 1=r 3 terms in Ey and Er
are the dominant ones in magnitude, but they do not carry a net flux of energy. An
acceptor molecule positioned at such short distances from the oscillating dipole is
affected predominantly by the field components which do not contribute to the
radiation field. Therefore the excitation transfer has no effect on the shape of the
emission spectrum of the donor. The transferred energy is drawn from the field
components that exchange energy with the oscillating dipole.
The dipole–dipole interaction that leads to the transfer of excitation energy is
very weak, usually of the order of @2–4 cm1 , while the spectroscopic energies
that are transferred are much higher, @15 000–40 000 cm1 .
Based on the above discussion, which identifies the short-range components of
the dipole field, the rate of energy transfer can be calculated. Several treatments
lead to the same results. Förster’s theory describe the rate of energy transfer for
an isolated pair of chromophores which fulfill the requirements for energy transfer
by the dipole–dipole interaction to be
ðy
9ðln 10Þk 2 FDo
kT ¼ f ðlÞeðlÞl 4 dl ð6Þ
128p 5 NA0 r 6 tDo 0
1 Ro 6
kT ¼ o ð6aÞ
tD r
where k ¼ cos yDA 3 cos yD cos yA ¼ sin yD sin yA cos j 2 cos yD cos yA (yDA
is the angle between the donor and the acceptor dipoles; yD and yA are respectively
the angles between the donor and the acceptor dipoles and the line joining their
centers and j is the angle between the projections of the transition moments on
the plane perpendicular to the line through the centers (Figure 17.2). k 2 can in
principle assume values from 0 (perpendicular transition moments) to 4 (collinear
Fig. 17.2. The angles involved in the definition of the orientation factor k 2.
582 17 Fluorescence Resonance Energy Transfer (FRET)
transition moments), or 1 when the transition moments are parallel)); FD0 and tD0
are respectively the quantum yield and the fluorescence lifetime of the donor in the
absence of an acceptor; r is the distance between the centers of the two dipoles (do-
nor and acceptor); NA0 is Avogadro’s number per mmole, f ðlÞ is the Ð yfluorescence
intensity of the donor in the range l to l þ dl normalized so that 0 f ðlÞ dl ¼ 1
and eðlÞ is the absorption coefficient of the acceptor at the wavelength l. In Eq. (6),
r and l are in centimeters, eA ðlÞ in cm 2 mol1 , and JðlÞ in cm 6 mol1 . When
those units are used R o is defined by Eq. (7) and is given by:
where J is the overlap integral in Eq. (6). The energy transfer process competes
with the spontaneous decay of the excited state of the donor, characterized by
the rate constant koD ¼ 1=tDo . If eA ðlÞ is given in cm1 mol1 L units, then Ro6 ¼
8:8 1025 FDo k 2 n4 J. Thus the probability r for the donor to retain its excitation
energy during the time t after excitation is given by:
1 dr 1 1 Ro 6
¼ þ ð8Þ
r dt tDo tDo r
Ro6
E¼ ð9Þ
Ro6 þ r 6
R o is thus the inter-dipole distance at which the transfer efficiency (and the donor
lifetime) is reduced to 50% and has the strongest dependence on changes of that
distance (Figure 17.3).
There is strong experimental support for Förster’s equations (Eqs (6)–(9)) (re-
viewed by Steinberg [4]. Weber and Teal [18], and Latt et al. [19] showed the depen-
dence on the overlap integral. The R6 dependence was demonstrated by Stryer
and Haugland [20].
Equation (9) shows that the distance between a donor and an acceptor can be de-
termined by measuring the efficiency of transfer, provided that r is not too different
from R o and that all molecules in the sampled ensemble share the same intra-
molecular distance. If this is not the case, an average distance rav that would be
extracted from the measured transfer efficiency does not correspond to any sim-
ple average distance.
The efficiency of energy transfer is independent of the value of the lifetime of
the excited state of the donor. Efficient energy transfer can occur even for ‘‘long
life’’ excited states (e.g., for phosphorescence emission) provided that the quantum
yield of emission is reasonably high. Such transfer phenomena has been observed.
Equations (8) and (9) thus define the characteristic time and distance ranges (‘‘win-
dows’’) in which the transfer efficiency, E, is most sensitive to conformational tran-
sitions and their rates (Figure 17.3). Typical values available for probes that are
used in protein chemistry are in the range of 10–80 Å and down to picosecond
17.3 Theoretical Background 583
r/Ro
Fig. 17.3. Variation of the transfer efficiency, E, as a function of
the distance between the donor and the acceptor probes. The
shadowed range represents the limits of the distance range
where reliable measurements of distances are possible.
17.3.4
The Orientation Factor
assume. In cases where the distance between the donor and the acceptor does not
change appreciably during the time that the transfer takes place, it is possible to
derive a hk 2 imin and a hk 2 imax from anisotropy measurements. This procedure
is based on assumptions that the two probes can be represented by single oscillat-
ing dipoles whose orientations fluctuate (e.g., wobble) within a cone and that the
orientation of the axis of the cone also fluctuates. k 2 values can be averaged over
the corresponding range of orientations. Rapid reorientation during the lifetime
of the excited state of the donor is reflected in the anisotropy measurements and
therefore can be estimated experimentally for each preparation. The average k 2
will depend on ‘‘depolarization factors’’ that are (averages of ) second-rank Legen-
dre polynomials of a cosine of a polar angle y between 0 and p. The depolarization
factors have the form:
1 1 1
dy ¼ cos 2 y ) a dy a 1 ð10Þ
2 2 2
or
1 1 1
hdy i ¼ cos 2 y ) a hdy i a 1 ð10aÞ
2 2 2
where the brackets denote a weighted average over the range of angles covered by
y during times that are brief as compared to the average transfer time. The depo-
larization factors are derived from the anisotropy values under each experimental
setup and solution conditions (for details of all the angles taken into account see
Dale et al. [22, 23] and Van der Meer [6], pp. 55–83).
An ideal case, where full averaging is an intrinsic characteristic of the system,
is achieved with atom probes, such as lanthanides and chelated transitions metal
ions [5, 24–26]. An alternative approach to this problem is based on the fact that
many chromophores show mixed polarizations in their spectral transitions (i.e.,
their absorption and emission across the relevant spectral range of overlap involves
a combination of two or more incoherent dipole moments). The physical basis for
this phenomenon has been discussed by Albrecht [27]. Its manifestation is well
known in the observation that the limiting anisotropy of many organic probes
used in FRET experiments is lower than the theoretically predicted upper limit.
Haas et al. [26] showed that the occurrence of mixed polarizations in the energy
donor and acceptor may markedly limit the range of possible values that k 2 can as-
sume and thus in favorable conditions alleviate the problem of the orientation fac-
tor in FRET experiments. The fact that any one of the two probes has two or more
transition dipole moments involved in the transfer mechanism eliminates the ex-
treme values that k 2 can assume. It is equivalent to ultrafast rotation of an ideal
dipole over the very wide range of orientations and hence to fast averaging. This
effect, when considered in planning FRET experiments by using probes that show
a high degree of internal depolarization, can reduce the k 2 -related uncertainties to
the range of the other common experimental uncertainties.
17.3 Theoretical Background 585
17.3.5
How to Determine and Control the Value of R o ?
The distance range for the statistically significant determination of distances for
each pair is defined by the characteristics of the probes and is given by the R o
value. Equation (9) shows that the limits of that range are r ¼ R o G 0:5R o
(Figure 17.3). Maximal sensitivity of the FRET effect to small changes in r is
observed when r ¼ R o . Therefore optimal design of FRET experiments includes a
selection of a pair of probes that have an R o value close to the expected r value.
R o depends on four parameters which are characteristic of each pair of probes
(Eq. (6)). These parameters were discussed critically by Eisinger, Feuer, and Lamola
[28]. Modulation of the value of R o of any specific pair in order to adjust it to the
expected r value, is limited to a narrow range due to the sixth power of relation of
R o and the spectroscopic constants of the probes. Equation (6) shows that the do-
nor quantum yield FDo and the overlap integral parameters are the most readily
available for manipulation to modulate the R o value.
17.3.6
Index of Refraction n
The values used for n to calculate R o for a pair of probes attached to a protein mol-
ecule vary widely, from 1.33 to 1.6. Eisinger, Feuer, and Lamola [28] recommended
586 17 Fluorescence Resonance Energy Transfer (FRET)
using the value n ¼ 1:5 in the region of spectral overlap of the aromatic amino
acids in proteins.
17.3.7
The Donor Quantum Yield FDo
FDo should be determined for each probe when attached to the protein molecule
at the same site used for the FRET experiment. Many experiments showed that
the FDo is affected by both the site of attachment of the probes and by modifications
at other sites in the protein molecule; by attachment of an acceptor at second sites;
by changes of solvent conditions; and by ligand binding or folding/unfolding tran-
sitions. Therefore R o should be determined repeatedly after changes in the condi-
tions to avoid misinterpretation of changes of R o as changes of r. The R o depends
on the sixth root of FDo and is therefore not very sensitive to uncertainties in FDo .
Nevertheless, this same dependence limits the range that R o can be manipulated.
For instance, a twofold decrease FDo would result in @9% decrease of R o . Increas-
ing FDo without a change of probes usually requires change of solvent and there-
fore is not very useful in folding studies; but FDo can be decreased by adding
quenching solute without a change of solvent.
Frequently bound donor probes may have few modes of interactions with the
macromolecular environment in the ground or in the electronically excited state.
In such a case, each subpopulation which has a different lifetime also has a differ-
ent R o , as indicated in Eqs (6) and (7).
17.3.8
The Spectral Overlap Integral J
17.4
Determination of Intramolecular Distances in Protein Molecules using FRET
Measurements
Clear presentation of theoretical background for the methods described below may
be found in the monographs by Valeur [30], Lakowicz [31], Van Der Mear [6], and
numerous reviews such as Refs [4, 32–35] published in the past 40 years.
17.4 Determination of Intramolecular Distances in Protein Molecules using FRET Measurements 587
17.4.1
Single Distance between Donor and Acceptor
The Förster resonance energy transfer can be used as a spectroscopic ruler in the
range 10–100 Å. Measurements can be conducted under equilibrium conditions or
in the kinetics mode during conformational transitions. Distances shorter than 8–
10 Å should be avoided since short-range multipoles contribute interactions which
have different distance dependence.
E can be defined analogously to the quantum efficiency in terms of the rates of
elementary competing processes of de-excitation of the donor excited states:
X
E ¼ k ret = k ret þ k fl þ ki ð11Þ
P
where k ret ; k fl , and k i are respectively the rate constants of FRET, fluorescence,
and all other competing modes of de-excitation of the donor’s first singlet excited
state. E can be effectively monitored by observing the effect of the acceptor on
either the lifetime of the donor excited state or on emission quantum efficiency.
To evaluate E one must compare tDo or FDo (in the absence of an acceptor) and the
donor lifetime tD or quantum efficiency FD in the presence of an acceptor.
Determination of distances via determination of FRET efficiency, E, is possible
via a determination of the decrease of donor emission or an enhancement of the
acceptor emission by steady state and time-resolved methods.
1=6
1
r¼ 1 Ro ð12Þ
E
FD
E ¼1 ð13Þ
FDo
since only relative quantum yields are to be determined, Eq. (13) can be written
directly in terms of single wavelength donor emission intensities at wavelengths
where the acceptor emission intensity is negligible:
The factor A=AD corrects for the absorption by the acceptor. This method can be
readily applied at the single molecule level.
IA ðlD ; lem em
A Þ Iref 0 ðlD ; lA Þ
E¼ ð15Þ
Iref 1 ðlD ; lA Þ Iref 0 ðlD ; lem
em
A Þ
17.4.2
Time-resolved Methods
17.4.3
Determination of from Donor Fluorescence Decay Rates
tD
E ¼1 ð17Þ
tDo
and
Ro
r¼ ð18Þ
ðtD =tDo 1Þ 1=6
when a nonexponential decay of the donor alone (tDo ) is observed, a common situ-
ation due to heterogeneity of microenvironments, average lifetime values can be
used to calculate E, provided that the deviation from monoexponentiality is moder-
ate:
X X
htD i
E ¼1 where hti ¼ a i ti ai ð19Þ
htDo i i i
when the heterogeneity of the decay rates is enhanced by the FRET effect, then the
use of Eq. (18) might yield only an approximation of an average intramolecular dis-
tance. In such a case, distribution analysis accompanied by appropriate control ex-
periments should be applied.
17.4.4
Determination of Acceptor Fluorescence Lifetime
The concentration of excited acceptor probes, A ðtÞ, after a d-pulse excitation at the
donor excitation wavelength, obeys the following rate law:
dA ðtÞ
¼ k ret D ðtÞ ð1=tAo ÞA ðtÞ ð20Þ
dt
where D ðtÞ is the time dependent concentration of excited donor probes. Under
ideal conditions, where there is no direct excitation of the acceptor, and the donor
decay is single-exponential, the time dependent excited acceptor concentration,
A ðtÞ, shows a ‘‘growing in’’ component (marked by the negative pre-exponent).
Under the initial condition that D ðt ¼ 0Þ ¼ D ð0Þ, the time dependence of the
590 17 Fluorescence Resonance Energy Transfer (FRET)
D ð0Þk ret o
A ðtÞ ¼ ½et=tA et=tD ð21Þ
1=tD 1=tAo
This expression is typical of an excited state reaction where the second exponent
is equal to the decay of the donor excited state that produces the acceptor excited
states. The pre-exponential factor of the donor excited state is the same as for the
first exponent, but is negative. Therefore the measurement of the acceptor decay
under excitation of the donor is an important control experiment which confirms
that any reduction of the donor excited state lifetime is due to the FRET mecha-
nism, and that the distance determination is valid.
In practice, direct excitation of the acceptor to create a subpopulation, A ð0Þ ¼
A ðt ¼ 0Þ, of directly excited acceptor probes at t ¼ 0, cannot be avoided in most
cases. Consequently, Eq. (21) should be rewritten to add the decay of this subpopu-
lation:
D ð0Þk ret t=tAo D ð0Þk ret t=tD
A ðtÞ ¼ o þ A ð0Þ e e ð22Þ
1=tD 1=tA 1=tD 1=tAo
17.4.5
Determination of Intramolecular Distance Distributions
ðy
Ro6
hEi ¼ No ðrÞ dr ð23Þ
0 Ro6
þ r6
where No ðrÞ dr is the fraction of molecules with D–A pairs at distance r to r þ dr.
It is not possible to evaluate No ðrÞ from single steady state measurement. Cantor
and Pechukas [36] suggested the use of a series of measurements where the same
ensemble would be labeled by different pairs of probes, characterized by different
R o values for each experiment. A single time-resolved experiment contains all the
information needed for the determination of No ðrÞ.
Consider the time dependence of a sample of the labeled molecules in which the
donor has been excited at time t ¼ 0 by a very short pulse of excitation light. At the
moment of excitation, the intramolecular distance probability distribution (IDD) of
17.4 Determination of Intramolecular Distances in Protein Molecules using FRET Measurements 591
t=0
Probability
Distance
Fig. 17.4. Demonstration of the perturbation starting at time t ¼ 0 with subsequent time
of the shape of the intramolecular distance intervals every 0.4 ns (R o ¼ 27 Å, t ¼ 6:0 ns)
distribution by the energy transfer effect. without diffusion. The shift of the mean of the
Simulation of the time dependence of an distribution to longer distances and the
intramolecular distance distribution of the depletion of the short distance fractions as a
excited state donor subpopulation, N ðr; tÞ, function of time is demonstrated.
the sample of molecules with excited state donor probe, which is selected at ran-
dom by the excitation pulse, N ðr; 0Þ, is congruent with the equilibrium distance
distribution of the whole population, No ðrÞ (within a proportionality factor). The
rate of decay of the donor excited state in the fraction of labeled protein molecules
which are characterized by a short D–A distance is nonlinearly accelerated relative
to the corresponding rate observed for labeled molecules with larger D–A distan-
ces. This is due to the 1=r 6 dependence of the dipole–dipole interactions. Figure
17.4 shows the rapid depletion of the lower end fractions of N ðr; tÞ.
The net effect is (a) the generation of a virtual concentration gradient (within the
original Gaussian boundaries); (b) a time-dependent shift of the mean of the time-
dependent distance distribution of the sample of protein molecules with excited
donor probe, N ðr; tÞ, to larger intramolecular distances; and (c) an enhanced rate
of decay of the excited state of the donor probe. This effect can be illustrated as an
equivalent of optical ‘‘hole burning’’ in the distance probability distribution. When
the intramolecular distances are unchanged during the lifetime of the excited state
of the donor, the decay kinetics of the donor emission of the entire population of
molecules with excited donor is multiexponential,
ðy " 6 !#
t Ro
iðtÞ ¼ k No ðrÞ exp o 1þ 6 dr ð24Þ
0 tD r
17.4.6
Evaluation of the Effect of Fast Conformational Fluctuations and Determination of
Intramolecular Diffusion Coefficients
t=0
Probability
Distance
Fig. 17.5. Demonstration of the restoration of the equilibrium
intramolecular distance distribution by the diffusion effect. The
simulation shown in Figure 17.4 was repeated with nonzero
value of the diffusion coefficient, D ¼ 20 107 cm 2 s1 . The
replenishment of the short distance fractions is demonstrated.
17.4 Determination of Intramolecular Distances in Protein Molecules using FRET Measurements 593
where i denotes the ith species of the donor fluorescence decay (in the absence of
an acceptor), N i ðr; oÞ ¼ N0 ðrÞ (the equilibrium distance distribution); N i ðr; tÞ dr
is the probability for molecules with an excited state species i of the donor to have
an intramolecular distance in the range r to r þ dr and N i ðr; tÞ ¼ N i ðr; tÞ=N0 ðrÞ.
D is defined as the intramolecular diffusion coefficient of the segments carrying
the two probes; ti and ai are respectively the lifetime and the normalized pre-
exponential factor of the excited state of the ith species of the donor in the absence
of an acceptor. k i ðrÞ, the reaction term, includes the spontaneous emission rate and
the Förster energy transfer rate:
1 1 8:79 1025 n4 k 2 J
k i ðrÞ ¼ þ ð26Þ
ti t r r6
tr is the radiative lifetime of the donor, and the other terms were defined in Eqs (6)
and (6a). Equation (26) considers the variation of R o due to variations of the donor
quantum yield FD0 . This is represented by the ratio of the donor lifetime compo-
nents and the radiative lifetime, which is assumed to be independent of local envi-
ronmental effects.
The solution of Eq. (25) is obtained by numerical methods. Since the boundaries
are chosen in such a way that the distribution function is very small at both the
upper and lower extreme distances, reflective boundary conditions are arbitrarily
used [39]:
iD
c ðtÞ, the calculated fluorescence decay curve of the donor is readily obtained from
Nðr; tÞ by the relationship [37]
ð rmax " #
X
D
ic ¼ m N0 ðrÞ a i N i ðr; tÞ dr ð28Þ
rmin i
where m is a constant and a i is the pre-exponential factor of the ith excited state
species of the donor in the absence of an acceptor.
The ideal case is such that the fluorescence decay of the donor in the absence of
an acceptor is monoexponential. In such a case, all molecules share one value of
R o and the sum in Eq. (28) reduces to one term and the reactive term, k i ðrÞ (Eq.
(26)), can be replaced by a simpler expression:
In a case where the diffusion rate is known to be negligible on the time scale of the
lifetime of the excited state of the donor, the Fick term in Eq. (25) is zero and No ðrÞ
can be derived directly from an analysis of the donor decay curve. In such a case,
Eq. (25) can be replaced by [17]:
ðy
iD
c ðtÞ ¼ k No ðrÞðexp kðrÞÞ dr ð30Þ
0
594 17 Fluorescence Resonance Energy Transfer (FRET)
17.5
Experimental Challenges in the Implementation of FRET Folding Experiments
17.5.1
Optimized Design and Preparation of Labeled Protein Samples for FRET Folding
Experiments
A minimum of at least two, and more often three labeled protein species should be
prepared for each well controlled, folding FRET experiment. These are (a) a double-
labeled (by both donor and acceptor) mutant; (b) a single donor-labeled mutant (at
the same site as in (a)); and (c) a single acceptor-labeled mutant (at the same site as
in the double-labeled mutant) (optional). An ideal folding FRET experiment would
employ three homogeneously (> 97%) labeled samples of the model protein where
the donor and the acceptor probes would be selectively attached to selected residues.
In this ideal experiment, the following requirements should be satisfied: (a) the dis-
tance between the two selected sites should be sensitive to changes most relevant
to the working hypothesis to be tested by the planned experiments; (b) the structure
and function of the labeled protein derivatives should be unperturbed or minimally
perturbed; (c) the mechanism of folding (kinetics and equilibrium transitions)
should be undisturbed; (d) the probes should be of small size and preferably close
to the main chain; and (e) the probes should have ideal spectroscopic characteristics
that would make the detection and data analysis free of corrections factors.
Like everything in life, ideal experimental systems are good for dreams only, pro-
teins are rarely totally unaffected by chemical or mutational modifications; the sites
that are most desirable for insertion of the probes are in many cases also critical for
folding mechanism and the spectroscopically ideal probes are usually also very
large. Therefore, the design process is a matter of optimization, not maximization.
The compositional and structural homogeneity as well as the extent of perturba-
tion of the folding mechanism; the structure; the function (activity) and stability of
each labeled preparation should be monitored by standard methods. The extent of
perturbation is an important factor in the evaluation of the FRET results. Occasion-
ally modified protein molecules are found to be unsuitable for planned experi-
ments due to very strong perturbations. Consequently, an alternative modification
design should be attempted. The concern for purity cannot be overemphasized
since background or impurity fluorescence can drastically distort the results of dis-
tance determinations.
17.5 Experimental Challenges in the Implementation of FRET Folding Experiments 595
17.5.2
Strategies for Site-specific Double Labeling of Proteins
1. Use of natural probes, the tyrosine or tryptophan residues and a few prosthetic
groups. Site-directed mutagenesis can be applied for the preparation of single
tryptophan mutants and generation of sites for attachment of the acceptor by
insertion of cysteine or other residues [33, 40, 41].
2. Nonselective chemical modification of reactive side chains and chain terminal
reactive groups (amine, carboxyl, and mainly sulfhydryl groups) combined with
high resolution or affinity chromatography separation methods.
3. Solid phase synthesis of protein fragments and conformationally assisted liga-
tion of two synthetic peptide fragments, or of a genetically truncated protein
fragment and a synthetic complementing fragment can be applied for efficient
site-specific labeling of proteins. Total synthesis of labeled protein molecules is
applicable for medium-size proteins (e.g., the total synthesis of chymotrypsin
inhibitor 2 (CI2) [42]).
4. Production of edited mutants with no more than one pair of exposed reactive
side chains, preferably cysteine residues, at the desired sites, followed by chem-
ical modification and chromatographic separation.
596 17 Fluorescence Resonance Energy Transfer (FRET)
5. Cell-free protein synthesis and the use of edited tRNAs and edited genetic code for
the incorporation of nonnatural amino acids into selected sites in the chain [43].
6. Enzyme-catalyzed insertion of fluorescent substrates taking advantage of the
specificity and mild conditions of the enzymatic reaction [44].
Strategy 1 avoids the potential artifacts caused by labeling procedures and uncer-
tainties such as the labeling ratio. On the other hand, the main limitations of using
the natural probes are the limited arsenal of available probes; their less than ideal
spectral characteristics; the limited range of R o values that can be achieved using
them, and the fact that most proteins contain several Trp and Tyr residues. Appli-
cations of strategy 2 are limited to small proteins or proteins where a limited num-
ber of the reactive side chains can be found [45]. Nitration of tyrosine residues was
also successfully applied in few cases [49, 50]. Strategy 5 is the ultimate one, which
achieves full freedom for the designer of protein derivatives best suited for test-
ing well-defined working hypotheses. This strategy will most probably become the
method of choice in the field of spectroscopic research of protein folding [43, 51].
The limited scope of application of this approach at the present time is mainly
due to the very sophisticated and multistep operations involved and the low yields.
These yields are limiting only in the case of measurements of folding kinetics,
where many samples are employed (e.g., stopped-flow double kinetics experi-
ments). In most FRET folding experiments, the sensitivity of the fluorometric de-
tection systems makes this method satisfactory and promising. Strategy 6 is not of
general use due to the limited number of known suitable enzymatic reactions and
the multiplicity of the reactive side chains. The well-documented catalysis of pep-
tide bond synthesis by proteolytic enzymes was applied (for C-terminal labeling of
RNase A [44]) but it is not easy to apply as a general method. Transglutaminase
catalysis of insertion of primary amino groups attached to fluorescent probes was
also attempted [46–48] but the development of strategy 4 offers a simpler route.
Strategy 4 is currently the most practical approach for several reasons: first,
exposed cysteine residues are rare, and in many proteins these residues can be re-
placed by Ser or Ala residues without major perturbations of structure and func-
tion. Thus, most proteins can be engineered to have two exposed sulfhydryl side
chains. Second, the arsenal of sulfhydryl reactive pairs of probes covering a wide
range of spectroscopic characteristics and R o values is abundant, and the experi-
mental design can be optimized to meet a wide variety of distances and labeling
conditions. Third, these reagents can be selectively targeted at the cysteine residues
and huge amounts of-labeled mutants suitable for kinetic experiments can be pro-
duced [40, 52–57].
17.5.3
Preparation of Double-labeled Mutants Using Engineered Cysteine Residues
(strategy 4)
Although reagents for selective alkylation of sulfhydryl residues are available, the
challenge here is to develop methods for selective modification of each one of the
17.5 Experimental Challenges in the Implementation of FRET Folding Experiments 597
Fig. 17.6. The first step in preparation of a set The column was eluted with a gradient of
of labeled AK mutants for trFRET experiments decreasing salt concentration. The four
(labeling strategy 4). An engineered AK mutant fractions that were well resolved were:
with two engineered cysteine residues at sites unlabeled protein (U), the two single labeled
188 and 203 was prepared and reacted with 1- proteins (188P-203SH where only cysteine 188
iodo-acetamido-methyl-pyrene. The reaction was labeled, 188SH-203P where only cysteine
was allowed to proceed for 24 hours. The 203 was labeled) and a double-labeled fraction
excess reagent was removed by dialysis and (D). The main product, 188SH-203P, was later
the solution was equilibrated with 0.5 M on reacted with an acceptor reagent to
(NH4 )2 SO4 in 20 mM Tris pH 7.5. The reaction produce the double-labeled protein. The
mixture was then applied to a phenyl- differential reactivity of the two engineered
sepharose column (1 cm diameter, height 7 cysteine residues is demonstrated.
cm) equilibrated with the same salt solution.
two reactive sulfhydryl groups in the same protein molecule (Figures 17.6 and
17.7).
Submilligram amounts of labeled proteins can suffice for a full set of FRET
experiments under equilibrium conditions. Such minute amounts of site-specific
double-labeled protein samples can be prepared, based on high-resolution chroma-
tography, i.e. chromatography of a mixture of the possible products, following a
nonselective reaction of two-cysteine mutants with a fluorescent reagent. Homoge-
neous double-labeled products can be obtained in a second cycle of reaction and
separation. Hundred-fold larger preparations of labeled proteins are needed for
stopped-flow double-kinetic experiments. In this case, it is desirable to develop
methods in which the selectivity is achieved by differential reactivity of selected
sites on the protein molecule. A general systematic procedure was developed for
high-yield selective labeling of each one of two cysteine residues in mutant protein
molecules [57, 58]. The determination of individual reaction rate constants of the
engineered SH groups and the optimization of reaction conditions enabled high
yields (70–90%) of the preparation of pure, site-specific double-labeled AK mutant.
598 17 Fluorescence Resonance Energy Transfer (FRET)
Fig. 17.7. Preparation of site-specific double- 10-fold excess of the donor reagent (Sa) and
labeled AK with two engineered cysteine concentrated GdmCl (to a final concentration
residues (142 and 203) in a single reaction of 2 M) were added. After 2.5 h at room
step (strategy 4). The 10-fold reactivity of temperature. The reaction was stopped
cysteine 203 relative to the reactivity of by addition of 50-fold excess DTT and
cysteine 142 enabled the preferred labeling of dialyzed into 20 mM Tris pH 8.0 before the
residue 203. The reaction mixture was chromatography. B) The same procedure was
separated on a Mono-Q ion exchange column repeated, except that the probes reagents were
(0:5 5 cm). Two preparations are shown: A) added in the reversed order. Single separation
the acceptor (Fl) was reacted first (equimolar step gave the pure double-labeled products in
ratio of dye to protein, 2.5 h) and then a both cases.
These high yields make some otherwise difficult experiments feasible, such as the
double kinetics time-resolved FRET measurements [59, 60].
Jacob et al. [58] improved the prediction of the rates of reactions of engineered
cysteine residues at selected sites in a protein. They based this on the observed cor-
relation between reactivity of engineered sulfhydryl groups and the effect of neigh-
boring charged groups (due to dependence of its pK a ), and the freedom of rotation
of the side chain. The selectivity of the labeling reaction can be further enhanced
by local increase or reduction of reactivity of selected side chains by means of con-
formational effects. Solvent composition (organic solvents, salt components, and
pH) and temperature are the most common variables that can differentially affect
the reaction rate constants of selected side chains. Drastic effects can be obtained
by ligand binding (substrates, inhibitors, and protein–protein interactions). In this
work, the ratio of the reaction rate constants was examined only under a range of
temperatures and salt concentrations.
The level of homogeneity required for spectroscopic measurements makes the
chromatographic separation steps indispensable. Separation of the reaction prod-
ucts can be enhanced when one of the labels is either charged or very hydrophobic.
Yet, either hydrophobicity or extra electrostatic charges can affect the folding path-
17.5 Experimental Challenges in the Implementation of FRET Folding Experiments 599
way of proteins and can be a disadvantage when the modifications are made for
structural studies. Affinity chromatography was also used for separating multiple-
labeled protein derivatives [45, 61]. It is also possible to use a mixed disulfide resin
for separation of fractions with free cysteine residues in the process of labeling
two-cysteine mutants. The single-labeled mutants that are used for the reference
measurements can be prepared by straightforward labeling of single cysteine mu-
tants. Caution must be taken since in several cases it was shown that the spectro-
scopic properties of a probe attached at one site on a protein are significantly modi-
fied when the second cysteine was inserted at the second site for the double
labeling. In this case, it is desirable to use the corresponding single-labeled protein
sample of the two-cysteine mutant in which the second cysteine residue is blocked
by a nonfluorescent alkylation reagent such as iodoacetamide.
17.5.4
Possible Pitfalls Associated with the Preparation of Labeled Protein Samples for
FRET Folding Experiments
Important issues associated with the sample preparations include: (a) the specific-
ity and the extent of labeling; (b) the purity of the labeled samples; and (c) possible
effects of the labeling procedures and modifications on the stability, activity, solu-
bility, and folding mechanisms.
(a) Incomplete specificity of the labeling might prohibit meaningful interpreta-
tion of trFRET data, since the reference measurements might not be faithful to
the double-labeled samples. Consequently, distributions of labeled products might
be interpreted as distributions of distances. In principle, under conditions where
the extent of distribution of labeling sites is known, it should be possible to adjust
the model in the analysis algorithm and resolve subpopulations of distance distri-
butions. This, however, might be possible only under very limited circumstances,
and since it requires the addition of parameters in the analysis, the uncertainties
might become too high.
(b) Two types of impurities are of concern here: the first is the presence of fluo-
rescent impurities and background emission, and the second is incomplete label-
ing. As long as the background emission is not too high (ca. < 20% of the signal),
and the emission can be determined independently (control experiments), reliable
distance determination is possible by careful subtraction of the background emis-
sion. Incomplete labeling of the single-labeled samples should not interfere with
analysis of trFRET data. Analysis of data collected with double-labeled samples
that contain fractions of single-labeled molecules can be achieved as long as an
independent analytical procedure is employed to determine the size of such frac-
tions and that these fractions do not exceed about 25% of the molecules. Consider
the results of a FRET measurement using a preparation where a donor–acceptor
labeled protein sample contains 20% molecules labeled with donor only, without
acceptor. A long lifetime component would be observed in the decay of the donor
fluorescence, and this can erroneously be interpreted as a fraction of molecules
with long intramolecular distance (‘‘unfolded subpopulation’’). When the size of
600 17 Fluorescence Resonance Energy Transfer (FRET)
such a fraction is known this decay component can be considered and the correct
distance distribution can be recovered. Less strict requirements apply in prepara-
tions made for single molecule FRET measurements. Data collected from mole-
cules which lack one of the probes in a double-labeled preparation can be easily
recognized and discarded without bias of the computed distributions.
(c) By definition, chemical or mutational modifications of any protein molecule
are a perturbation, which should be minimized. Naturally, the modified proteins
are different from the native ones, which is not a matter of concern unless the ex-
periments are aimed at studying specific aspects of the folding of a specific protein.
In many folding studies, the more important concern regarding the modification
is that in a series of modified mutants of any model protein the perturbations
should be similar in all mutants. Such common perturbation enables reliable mul-
tiprobe test of the cooperativity and steps of the folding mechanism where each
pair probes the folding of the same model protein at a selected section of the chain
(e.g., secondary structure elements, loops, and nonlocal interactions). Structural
perturbations can be minimized by the proper choice of labeling sites and probes.
Rigorous control experiments for determination of the extent of perturbation due
to the modifications are mandatory. Control experiments can include the determi-
nation of biological activities; the detection of structural properties such as CD, IR,
and other spectral characteristics and rates of refolding.
17.6
Experimental Aspects of Folding Studies by Distance Determination Based on
FRET Measurements
17.6.1
Steady State Determination of Transfer Efficiency
the accuracy of transfer efficiency calculations when the donor and acceptor emis-
sions overlap [34, 64].
ex
IDA IAex
E¼ ð31Þ
Eref Iref IAex
ex
17.6.2
Time-resolved Measurements
Equations (17)–(19) are used for the determination of the average FRET efficiency
by time-resolved measurements. The main technical advantage of this measure-
ment is that there is no need to know the concentrations of the proteins and the
accuracy of the lifetime measurements surpasses that of the steady state intensity
measurements. Intramolecular distances can be obtained from measurements of
the average transfer efficiency only for very uniform structures.
The shape of fluorescence decay curves of the probes in time-resolved FRET ex-
periments contain detailed information on heterogeneity of the ensembles of pro-
tein molecules and are therefore much more effective in the folding research. The
scheme shown in Figure 17.8 demonstrates the application of trFRET in folding
research. The fluorescence decay of the donor (and the acceptor as well) contain
the information on the distribution of the distance between the two labeled sites
and that can be used for characterization of the conformational state of the labeled
protein molecule at equilibrium and during the folding/unfolding transitions.
The technology of time-resolved fluorescence measurements reached a level of
maturity during the last quarter of the twentieth century. During this period, the
main technological developments that promoted the art of time-resolved measure-
ments to a stage of stable and routine analytical method were (a) the implementa-
tion of tunable picosecond laser sources (at present mainly the titanium sapphire
solid state lasers); (b) the development of very fast and sophisticated single photon
counting devices and the phase modulation instruments; and (c) the development
of efficient algorithms for global data analysis. Fluorescence decay can be detected
either by the single-photon counting method or by the phase modulation method,
and in principle, both methods are equivalent. While phase experiments are rela-
602 17 Fluorescence Resonance Energy Transfer (FRET)
Folding
ID(t)
Time (ns)
NO(r)
Distance (A)
Fig. 17.8. A scheme of the application of the the scheme. The two corresponding
time-resolved energy transfer experiments for intramolecular distance distributions derived
determination of conformational changes in from the time domain data, are shown in the
globular protein. The scheme describes a lower section. This scheme is a simulation of
hypothetical two states folding transition in a the detection of a formation of a nonlocal
globular protein. The corresponding time interaction, indicated by a transition from a
domain signals, the decay curves of the donor broad to a narrow distribution with a smaller
in the two states, are shown in the center of mean distance.
17.7
Data Analysis
The basic procedure used for analysis of experiments follows the common practice
of analysis of time-resolved experiments. The basic procedure is based on a search
for a set of free parameters. These are the parameters of the distribution func-
tion; the diffusion coefficient; the fluorescence lifetimes and their associated pre-
exponents. A calculated donor emission impulse response, iD c ðtÞ is obtained via
numerical solution of Eqs (25)–(28) or (30). The acceptor impulse response, iAc ðtÞ,
is obtained by the convolution of, NA ðtÞ, the calculated population of molecules.
The acceptor probes are excited at time t either via the FRET mechanism or via
direct excitation, with the acceptor fluorescence lifetime (in the absence of FRET),
iA ðtÞ,
FD ðtÞ ¼ kD ðl em Þ½Iðt; l ex ; l em Þ n iD
c ðtÞ ð33Þ
where kD ðl em Þ represents the emission spectral contour of the donor relative to the
acceptor and Iðt; l ex ; l em Þ represents the experimentally measured instrument
response function at this particular optical setup (excitation and emission wave-
lengths setting, monochromator setting, detector spectral response, etc.). A calcu-
lated acceptor decay curve, FA ðtÞ, is similarly obtained by:
Several computational methods were developed for the search of the set of free
parameters of No ðrÞ and D. This yields the theoretical impulse response, which
when convoluted with the system response, yields Fcalc ðt; l ex ; l em Þ best fitted to
the experimentally recorded fluorescence decay curves Fexp ðt; l ex ; l em Þ [65, 66].
This procedure is inherently difficult since the analysis of any multiexponential
604 17 Fluorescence Resonance Energy Transfer (FRET)
decay curve is a mathematically ‘‘ill defined’’ problem [67, 68]. The quality of fit
of the experimental and calculated curves is judged by the minimization to the w 2
value of the fit and the randomness of the deviations [68, 69].
The differential Eq. (25) does not have a simple general analytical solution (ex-
cept for specific cases). Consequently, a numerical solution based on the finite dif-
ference (FA) method [70, 71] was applied. Therefore the currently most common
practice is the use of model distribution functions for No ðrÞ with a small number
of free parameters. The model used in the experiments described below is a Gaus-
sian distribution ðNo ðrÞ ¼ C4pr 2 exp aðr mÞ 2 where a and m are the free pa-
rameters which determine the width and the mean of the distribution respectively
and C is a normalization factor). Correlations between the free parameters, mainly
a; m and D, searched in the curve-fitting procedure leave a range of uncertainties
in these values. This uncertainty is being routinely reduced by global analysis
of multiple independent measurements. In particular, coupled analysis of the
fluorescence decay of both the donor and the acceptor probes contributes over-
determination of the free parameters. This over-determination is sufficient to allow
simultaneous determination of both the parameters of the equilibrium intramolec-
ular distance distribution and the diffusion coefficients, despite the strong correla-
tion between those parameters [38].
The information content of the decay curves of the two probes is not redundant.
Due to the r 6 dependence of the transfer probability, the fluorescence decay of the
donor is insensitive to contributions from the conformers with intramolecular dis-
tance r f R o (fast decay). The contribution of these conformers is well detected
in the fluorescence decay of the acceptor. On the other hand, the contributions of
the fractions of the distance distributions characterized by intramolecular distance
r > R o , are well detected by the donor decay and are less weighted in the acceptor
fluorescence decay. Both simulations and experiments have shown the following:
with the present level of experimental noise of the time-resolved experiments a
ratio of fluorescence lifetimes of the donors and the acceptors, tD =tA b 3 is a nec-
essary condition for reliable simultaneous determination of the parameters of the
intramolecular distribution functions and the associated intramolecular diffusion
coefficients. Several curve-fitting procedures are available, the most widely used
are based on the well-known Marquardt-Levenberg algorithm [38].
Maximum entropy method (MEM) is also applied to the analysis of time-resolved
fluorescence [72] and FRET experiments [73]. This analysis is independent of any
physical model or mathematical equation describing the distribution of life-
times. Whatever methods are used for data analysis, the best fit must be tested by
the statistical tests. Fine details of the distance distributions can be extracted
from each experiment depending on the level of systematic and random noise of
the system.
The routine global analysis procedure includes joint analysis of at least one set of
four fluorescence decay curves of the probes attached to the protein. Two measure-
ments are conducted on a double-labeled molecule, containing one donor and one
acceptor: (a) The ‘‘DD-experiment,’’ determination of tD the fluorescence decay of
the donor in the presence of FRET (excitation at the donor excitation wavelength
17.7 Data Analysis 605
Fig. 17.9. Representative pair of donor (. . .) the system response to excitation pulse,
fluorescence decay curves used for (---) the experimental trace of the tryptophan
determination of intramolecular distance fluorescence pulse, and (—) the best fit
distributions in a labeled RNase A mutant. A) theoretical curves obtained by the global
Fluorescence decay curves for the tryptophan analysis. The ‘‘DD’’ experiment was fit to a
residue in W 76 -RNase A (the ‘‘D’’-experiment) theoretical curve based on the distribution of
and B) the fluorescence decay of the distances [31]. The residuals of the fits (Res)
tryptophan residue in (76–124)-RNase A in the are shown in the lower blocks, and the
reduced under folding conditions (RN state) in autocorrelation functions of the residuals [CðtÞ]
40 mM phosphate buffer, pH 7.0, 20 mM DTT are presented for each curve in the upper right
(the ‘‘DD’’ experiment) at room temperature. inset. The w 2 obtained in this experiment was
Excitation wavelength 297 nm and emission at 2.02 (reproduced with permission from ref. 40).
360 nm (bandwidth for emission was 2 nm).
and detection at the donor emission wavelength); and (b) The ‘‘DA-experiment,’’
determination of tA , the acceptor emission in the presence of FRET, with excitation
at the donor excitation range, and detection of the acceptor emission at the long
wavelength range (Figure 17.9). Two internal reference measurements for determi-
nation of the probes lifetime when attached to the same corresponding sites as in
the double-labeled mutant but in the absence of FRET. (c) The ‘‘D-experiment’’ is a
measurement of tDo , the emission of the donor in the absence of an acceptor, using
protein molecule labeled by the donor alone; and (d) the ‘‘A-experiment’’ is a mea-
surement of tAo , the decay of the acceptor emission in the double-labeled protein,
excited directly at a wavelength where the donor absorption is negligible or using
a protein preparation labeled by the acceptor only.
When collected under the same laboratory conditions (optical geometry, state
of the instrument, time calibration, linearity of the response, and solution parame-
ters) used for the first two experiments, the D- and A-experiments serve as ‘‘inter-
nal standards’’ for the time calibration of this experiment and the dependence of
the spectroscopic characteristics of the probes applicable for the specific labeled
sites and experimental conditions of the FRET experiments. These on-site obtained
data contribute to the elimination of most possible sources of experimental uncer-
tainty due to variations between independent measurements and conditions.
606 17 Fluorescence Resonance Energy Transfer (FRET)
Why monitor the FRET effect twice, both at the donor and at the acceptor end?
Brand and his students [66] showed that this over-determination significantly re-
duces the uncertainty in the values of the free parameters, searched by curve-fitting
procedures. The Brand group introduced the concept of global analysis in the field
of time-resolved fluorescence measurements and showed that every additional in-
dependent data set that is globally analyzed, further reduces the statistical uncer-
tainty in the most probable values of the searched parameters. Beechem and Haas
[38] showed that the global analysis of sets of donor and acceptor decay curves can
reduce the uncertainty in the values of the parameters of the distance distributions
and their associated diffusion coefficients. Both a shift of the mean and reduction
of the width of a distance distribution as well as an enhanced rate of the intramo-
lecular diffusion can cause similar reduction of the donor fluorescence lifetime.
The fine differences between the two mechanisms appear in the shape of the decay
curves and hence the combined analysis of the donor and the acceptor decay
curves can resolve the two mechanisms.
17.7.1
Rigorous Error Analysis
The statistical significance of the value of any one of the free parameters that is
determined by any curve-fitting method can be evaluated by a rigorous analysis
procedure. In this procedure the iterative curve-fitting procedure is repeated while
the select parameter is fixed and the other free parameters adopt values that yield
the best possible fit under the limitation of that fixed value of the selected parame-
ter. This procedure can be repeated for a series of values of the selected parameter,
and the limits of acceptable values can be determined by the statistical tests and
the selected thresholds, in particular those of the w 2 -values (Figure 17.10). This pro-
cedure, if repeated for every free parameter, actually generates projections of the w 2
hypersurface on specific axes in the parameter space. Error limits at any required
significance level can then be determined using an F-test [38].
This procedure considers all correlations between the parameters. The evalua-
tion of each analysis and of the significance of the parameters is based on four
indicators: (1) The global w 2 -values; (2) the distributions of the residuals; (3) the
autocorrelation of the residuals; and (4) the error intervals of the calculated pa-
rameters obtained by the rigorous analysis procedure. In addition, each ‘‘DD-
experiment’’ is analyzed by multiexponential (three or four exponentials) decay
model. The w 2 -values obtained in these analyses serve for evaluation of the signifi-
cance of the w 2 -values obtained for each set of experiments in the global analysis,
based on the solution of Eqs (25)–(28) and (32)–(35).
17.7.2
Elimination of Systematic Errors
The linearity of the systems and possible effects of systematic errors are routinely
checked by measurements of the fluorescence decay of well-defined standard ref-
17.8 Applications of trFRET 607
1.7 1.7
1.6
χ2
1.6
1.58
1.532
1.5 1.5
33 34 35
Distance (A)
Fig. 17.10. w 2 surface obtained in rigorous and all other parameters were allowed to
analysis of a typical trFRET experiment for change. The ranges of one and two standard
determination of intramolecular distance deviations at w 2 values of 1.53 and 1.58
distribution in double-labeled adenylate kinase respectively are shown and can be used for
mutant labeled at residues 154 and 203. determination of the uncertainty range for the
Global analysis was repeated for several fixed mean of the distance distribution.
values of the intramolecular mean distance
17.8
Applications of trFRET for Characterization of Unfolded and Partially Folded
Conformations of Globular Proteins under Equilibrium Conditions
17.8.1
Bovine Pancreatic Trypsin Inhibitor
Bovine pancreatic trypsin inhibitor (BPTI) (Figure 17.11) is a small, stable globular
protein of 58 amino-acyl residues, including four lysyl residues and three disulfide
bonds [74, 75]. Four double-labeled BPTI samples were prepared for measure-
608 17 Fluorescence Resonance Energy Transfer (FRET)
17.8.2
The Loop Hypothesis
Reduced BPTI in 6 M GdmCl is fully denatured [75, 76], but under these condi-
tions the mean of the four intramolecular distance distributions did not show mo-
notonous increase as a function of labeled chain length [77]. The mean distances
were reduced by raising the temperature from 4 C to 60 C. These results are an
indication of conformational bias of the distribution even under drastic denaturing
conditions (i.e., the protein was partially unfolded). The strong temperature depen-
dence of the mean distances is cold unfolding transition [78] and an indication that
the structures that exist in the denatured state are probably stabilized by hydropho-
bic interactions. Moreover, at least two, one native-like and one unfolded, subpopu-
lations were distinguished in the distributions of end-to-end distances of each
segment. The sizes of the two subpopulations were temperature- and denaturant
concentration-dependent [79] (i.e., transition to more folding conditions increased
the native like subpopulation and reduced the unfolded one).
These observations led to the loop hypothesis, which states that loop formation
by specific nonlocal interactions (NLIs) prior to the formation of secondary struc-
17.8 Applications of trFRET 609
(2)
(0)
(3)
(4)
(5)
Fig. 17.12. The loop hypothesis. A scheme of after formation of four nonlocal interactions
a protein molecule with several groups of the chain fold is native-like and the last step of
residues that form nonlocal interactions is packing all the local interactions completes the
shown. A hypothetical pathway is shown where folding transition.
tures can be a key factor both in the solution of the Levinthal paradox and in
determining the steps which direct the folding pathway [79] (Figure 17.12). The
loops, generated by loose cross-links between specific clusters of residues separated
by long chain stretches, may bias the intramolecular distance distributions and re-
duce the chain entropy and the volume of the searched conformational space. The
observation of the cold unfolding of the intramolecular loops [77] led to the sug-
gestion that the main interaction between these clusters should be hydrophobic.
These long loops define the basic folding unit and thus BPTI is composed of
two folding units [79]. A similar mechanism was recently suggested by Klein-
Seetharaman et al. [80] based on the NMR study of the denatured state of lyso-
zyme. This specificity of the interactions is in agreement with Lattman and Rose
[81], who suggested that a specific stereochemical code directs the folding. FRET
experiments can be designed to locate those interactions, and site-directed muta-
genesis can be used to test the hypotheses of the role of specific residues in the
formation of the NLIs. Berezovsky et al. [82, 83] analyzed the crystal structures of
302 proteins in search for native loop structures. They showed that protein struc-
ture can be viewed as a compact linear array of closed loops. Furthermore, they
proposed that protein folding is a consecutive looping of the chain with the loops
ending primarily at hydrophobic nuclei.
17.8.3
RNase A
Navon et al. [40] applied this same experimental approach to the study of the un-
folded and partially folded states of reduced ribonuclease A (RNase A). RNase A
represents a higher level of complexity in the folding problem. The chain length
is twice that of BPTI (124 residues). Buckler et al. labeled the C-terminus of RNase
610 17 Fluorescence Resonance Energy Transfer (FRET)
A with a donor and attached the acceptor to residues 104, 61, or 1, one at a time.
This set of pairs of labeling sites spans the full length, one half of the length of
the chain and the C-terminal segment [44, 84]. Navon prepared 11 double-labeled
RNase A mutants to probe the folding transitions in the C-terminal subdomain of
the protein.
The pattern of the dependence of the means of the end-to-end distance distribu-
tions of the labeled segments in the RNase A mutants on the number of residues in
each segment resembled the pattern of the segment length dependence of the cor-
responding Ca-Ca distances in the crystal structure of the molecule. But the width of
the distributions was large, an indication for very weak bias by nonlocal interactions.
The distance distribution between the C-terminal residue (residue 124) and resi-
due 76 (a 49-residue chain segment) was another example whereby two subpopula-
tions were resolved. The distributions represent an equilibrium between native-like
and unfolded conformers (Figure 17.13). The same equilibrium was found for the
chain segment between residues 76 and 115.
The intramolecular segmental diffusion coefficients obtained by Buckler et al.
[84] correlated well with the conformational states of the chain. In the native state,
only the distance between the two chain termini showed rapid nanosecond fluctua-
tions; in the presence of 6 M GdmCl, the shorter segments also underwent rapid
intramolecular distance fluctuations which were further enhanced by the reduction
of the disulfide bonds. This demonstrates that the segmental intramolecular diffu-
sion coefficients can be determined and used as another parameter for monitoring
specific changes of the states of folding intermediates.
0.12 0.12
76-124 B 76-115 A
No GdmCl No GdmCl
1M 1M
2M 2M
Probability
0.08 3M 0.08 3M
5M 5M
0.04 0.04
0.00 0.00
0 20 40 60 80 0 20 40 60 80
Distance (Å)
Fig. 17.13. Two subpopulations of GdmCl. The mean of the first subpopulation
intramolecular distance distribution between was close to that of the native state; the mean
residues 76 (tryptophan residue, the donor) of second subpopulation was much larger, but
and 124 (engineered cysteine residue labeled could not be determined with a high degree of
by an aminocoumarin derivative, the acceptor) accuracy due to the smaller R o. The ratio of the
in the C-terminal subdomain of reduced RNase unfolded subpopulations increased from 30%
A were resolved by the trFRET experiment (in in HEPES buffer to 95% at 6 M GdmCl
40 mM HEPES buffer pH 7.0, 20 mM DTT, and (Reproduced with permission from ref. 40).
5 mM EDTA) with increasing concentrations of
17.10 The Third Folding Problem 611
17.8.4
Staphylococcal Nuclease
Brand and coworkers [85, 86] used a similar approach in a search for structural
subpopulations and chain dynamics in the compact thermally denatured state of
staphylococcal nuclease mutant. Two subpopulations were found in the distribu-
tion of distances between residues 78 and 140 in the heat-denatured state. At the
highest temperatures measured, both the apo- and holoenzyme showed compact
and heterogeneous populations in which the nonnative subpopulation was domi-
nant. The asymmetry found in the distribution of distances between Trp140 and a
labeled Cys45 in the destabilized mutant K45C at room temperature (native state)
was attributed to static disorder. Partial cold unfolding or chemical denaturation
reduced the asymmetry of the distribution and its width. This effect was analyzed
in terms of the nanosecond dynamic flexibility of the protein molecule.
17.9
Unfolding Transition via Continuum of Native-like Forms
Lakshmikanth et al. [73] and their collaborators employed the strength of the
multiple pairs trFRET experiments to reveal multiple transitions in the unfolding
of a small protein. Lakshmikanth et al. [73] used a maximum entropy method for
determination of the distributions of intramolecular distances between Trp53 and
the trinitrobenzoic acid moiety attached to Cys82 in the bacterial RNase inhibitor
barstar. They found that the equilibrium unfolding transition of native barstar
occurs through a continuum of native-like conformers. They also concluded that
the swelling of the molecule as a function of denaturant concentration to form a
molten globule state is separated from the unfolded state by a free energy barrier.
This is another demonstration of the effectiveness of FRET-detected folding experi-
ments in the multiple probe test of the cooperativity of folding/unfolding transi-
tions.
17.10
The Third Folding Problem: Domain Motions and Conformational Fluctuations of
Enzyme Molecules
Lakowicz and coworkers [41, 89–93] applied frequency domain FRET methods in
experiments that focused on the third folding problem, the functional refolding of
various proteins. Residues Met25 and Cys98 in rabbit skeletal troponin C located at
the N- and C-terminal domains respectively were labeled by a donor and acceptor.
At pH 7.5 and in the presence of Mg 2þ , the mean distance was close to 15 Å and
the FWHM was 15 Å, a flexible structure. The addition of calcium caused an in-
crease of the average distance to 22 Å and a decrease of the width by 4 Å, a transi-
tion to an extended and more uniform ensemble of conformers. At pH 5.2, the
conformation of troponin C is further extended with an average distance of 32 Å,
612 17 Fluorescence Resonance Energy Transfer (FRET)
and a narrower FWHM of only 7 Å. Similar results were obtained in the complex
of troponin C and troponin I. This finding is a good example of the application of
the FRET methods in studies of conformational changes involved in functional re-
folding of proteins.
Vogel et al. [94, 95] developed applications of FRET measurements in studying
conformational transitions of membrane peptides and proteins.
The unique feature of proteins is that they have probably evolved to couple be-
tween conformational fluctuations and the specific tasks they perform [96, 97]. A
prominent example of this principle is the enzymes that can be viewed as ma-
chines designed to accelerate chemical transformations by dynamic adaptation of
specific binding sites to metastable transition states [98–100]. This can be achieved
by a combination of structural flexibility (fluctuations) and a design that allows
a limited number of modes of motion [101]. This is manifested in enzymes called
transferases which catalyze the transfer of charged groups. Upon substrate bind-
ing, these molecules undergo structural changes to screen the active center from
water to avoid abortive hydrolysis [102, 103] and catalyze the transfer reaction.
X-ray analysis of enzyme crystals revealed large-scale structural changes associ-
ated with substrate binding. These and diffuse X-ray scattering data suggest that
domain displacement induced by substrate binding represents a unique mecha-
nism for the formation of the enzyme active center [102, 104].
The dynamic trFRET and the single molecule FRET experiments go beyond ex-
isting methods. These experiments are usually able to describe the full confor-
mational space for a specific mode of motion. Moreover, the FRET experiments
provide dynamic information which is essential for understanding how the intra-
molecular interactions (which are programmed in the sequence information of
the protein) control the modes of motion that compose the directed conformational
change.
Two phosphoryl transfer enzymes were studied as model systems: yeast phos-
phoglycerate kinase (PGK) [105], which catalyzes the phosphoryl transfer from
(1,3)-diphosphoglyceric acid to an ADP molecule, and E. coli adenylate kinase
(AK) which catalyzes the phosphoryl transfer reaction (MgATP þ AMP $
MgADP þ ADP) [106].
The PGK molecule is made of two distinct domains and a domain closure was
postulated as the simple mechanism that brings the two substrates into contact
(closure of a 10 Å distance) [107]. A mutant PGK was prepared in which one probe
was attached to residue 135 (the N-terminal domain) and the second to residue 290
(C-terminal domain). A very wide distribution was obtained from trFRET mea-
surements of the apo form of the labeled PGK molecule. This is an indication of
the multiplicity of conformations. Substrates binding induced an increase in the
mean of the distributions (increased interdomain distance) and as expected, re-
duced the FWHM.
The crystal structure of E. coli AK shows three distinct movable domains. A com-
parison of the crystal structures (Figure 17.14) representing the enzyme in differ-
ent ligand forms revealed large structural changes [108–110]. Very wide distance
distributions were obtained for the distances between both minor domains and
17.11 Single Molecule FRET-detected Folding Experiments 613
the CORE domain in the apo-AK. This an indication for multiple conformers
separated by small energy barriers. Furthermore, the crystal structure is only very
poorly populated in the apo-AK. Moreover, the wide distributions also include the
partial population of the closed conformers, where the interdomain distance is
already similar to that of the holoenzyme. This is in support of a select-fit mecha-
nism of the coupling of the domain closure conformational change and the bind-
ing of either substrates. Binding of each one of the substrates is associated with
substantial reduction of the mean and width of the distributions, as expected for
the domain closure mechanism. The interdomain distance distributions observed
for the complexes of the two enzymes with their substrates were narrower, i.e.,
the flexibility is restricted towards a more uniform structure in solution. The fold-
ing of the holoenzyme differs from that of the apoenzyme due to the additional
interactions between the pairs of the domains and substrate. Electrostatic interac-
tions (see chapter 7) probably contribute a major part of the extra stabilization of
the structures of holo-phosphoryl transferases studied here.
17.11
Single Molecule FRET-detected Folding Experiments
The long-range goal of protein folding research is the elucidation of the principles
of the solution of the chain entropy problem, i.e., the guided construction of the
native structure. Imagine a dream experiment that would create a movie that fol-
614 17 Fluorescence Resonance Energy Transfer (FRET)
lows a polypeptide chain during its transition from an unordered state to the native
state. Single molecule FRET spectroscopy, based on site-specific labeling and the
recent advances in laser technology and pinhole optics, makes this dream a reality.
It is already possible to continuously monitor trajectories of changes of intra-
molecular distances in single protein molecules over periods of few seconds. This
ultra high sensitivity helps us arrive as closely as possible to the ultimate analytical
resolution. Trajectories collected for series of double-labeled protein mutants that
monitor various distances in the protein molecule can report the order of forma-
tion of the various structures in the protein and their transitions. This type of ex-
periment can add important insights relevant to the central folding questions, but
the folding transition is inherently a stochastic process. Therefore, we cannot de-
duce any general principle from observations of single molecules at any resolution.
Consequently, the principles of the mechanism of the folding transition must be
investigated in the behavior of the ensembles, not in the individual molecules. So
how do the amazing modern achievements of single molecule FRET spectroscopy
serve the goals of protein folding research beyond the knowledge gained by the en-
semble studies?
Individual trajectories of single molecule FRET spectroscopy can be grouped
to generate both distributions of intramolecular distances and rates of their tran-
sitions. So why bother with the tedious, noisy single molecule detection, when
trFRET can produce the ensemble distributions with much higher signal-to-noise
ratio? The extra value of the distributions constructed from individual trajectories
of single molecules can be described as follows: (a) these model-free distributions
represent the true statistics of the sample trajectories; (b) subpopulations of pro-
tein molecules which may differ in the extent of the conformation space that they
populate can be resolved; and (c) slow (longer than the span of a lifetime of a
donor-excited state) conformational transitions and fluctuations within an equilib-
rium conformational space can be recorded and analyzed. This subject is of much
interest for studies of conformational dynamics of unfolded and partially folded
protein molecules.
The dynamic aspects of single molecule FRET really distinguish it from the
ensemble trFRET experiments. So, why not quit the ensemble trFRET experiments
and concentrate efforts on the promotion of single molecule FRET folding experi-
ments? In general, the folding problem is multidimensional and therefore, a com-
bination of complementary methods is essential. At present, the main limitations
of single molecule FRET folding experiments are: (a) limited time resolution; (b)
difficulties in performing perturbation initiated fast kinetics experiments; (c) the
contribution of shot noise which affects the observed width of the distributions;
(d) the problem of immobilization; and (e) quantitative analyses. Quantitative anal-
yses of single molecule FRET experiments are inherently difficult due to the com-
parative nature of the measurements. Reference trajectories for the determination
of variables that are essential input for the analyses of FRET experiments (e.g., de-
termination of ID ; FD0 , or tDo ) and control experiments for the non-FRET contribu-
tions, such as molecular orientations or non-FRET dye quenching, are difficult to
match with individual FRET measurements. Therefore, it seems that a combina-
17.12 Principles of Applications of Single Molecule FRET Spectroscopy in Folding Studies 615
tion of ensemble and single molecule experiments are and will be the appropriate
way to proceed.
17.12
Principles of Applications of Single Molecule FRET Spectroscopy in Folding Studies
17.12.1
Design and Analysis of Single Molecule FRET Experiments
Time (s)
Fig. 17.15. Time-dependent signals from in green. B) FRET efficiency trajectory
single AK molecules labeled at residues 73 and calculated from the signals in A. C) The
203 showing slow folding or unfolding interprobe distance trajectory showing that the
transitions. A) Signals showing a slow folding slow transition involves a chain compaction by
transition starting at @0.5 s and ending at only 20%. The distance was computed from
@2 s. The same signals display a fast unfold- the data presented in trace B using a F€
orster
ing transition as well (at @3 s). The acceptor distance R o ¼ 49 Å.
signal is shown in red, and the donor is shown
through the excitation volume) are summed to obtain ID and IA ; (b) immobilized
[114], or confined single molecules [113] in which the photon flux is continuously
recorded at time resolution determined by the bining intervals, and ID ðtÞ IA ðtÞ are
recorded. Following various corrections and filtering techniques (distinguish real
molecular signal from noise, correction for leakage of signals due to overlap of
spectra) the transfer efficiency, E, is calculated by
where b is a correction factor that depends on the quantum yields of the donor and
the acceptor and the detection efficiencies of the two channels.
17.12 Principles of Applications of Single Molecule FRET Spectroscopy in Folding Studies 617
surface, the proteins remain localized within the illuminating laser beam. There-
fore, it is possible to record single molecule conformational transitions trajectories
whose duration are limited by the photostability of the probes.
17.12.2
Distance and Time Resolution of the Single Molecule FRET Folding Experiments
17.13
Folding Kinetics
17.13.1
Steady State and trFRET-detected Folding Kinetics Experiments
A dream folding kinetics experiment (see Chapters 12–15) would be one that would
produce a time-dependent series of 3D structures. These structures constitute the
transition of an ensemble of molecules from an unfolded state under folding
conditions to the folded state. The experiment would probably initiate at very low-
resolution blurred 3D images and gradually the profiles would become more re-
solved and the structures become visible. Would this change to more resolved
structures be uniform for the whole molecule or would different folding nuclei or
focal points become visible at different time points? FRET-detected rapid kinetic
experiments were designed in a search for folding intermediates, the order of for-
mation of structural elements, and the interactions that stabilize them. An analysis
of such a dream experiment would hopefully reveal the order of structural transi-
tions and formation of structural elements along very broad and gradually narrow-
ing pathways to the native state. These might hopefully enable inference of the basic
principles of the master design of the folding transitions. If such a dream experiment
could be combined with site-directed perturbation mutagenesis, it might also be pos-
sible to search for the ‘‘sequence signals,’’ i.e., inter-residue interactions that stabi-
lize and lock structural elements, either in parallel, simultaneously, or sequentially.
Steady state and trFRET-detected kinetics experiments are far from such dream
experiments, but these FRET experiments display some unique qualities that jus-
tify the preparative efforts of production of site-specifically labeled protein samples.
The goal of these experiments is production of time series of distributions of se-
lected key intramolecular distances (e.g., distances between structural elements).
Such series can enable characterization of the transient structures of folding inter-
mediates. The range of detected distances; the time resolution; the direct structural
interpretation of the spectroscopic signals; the real time detection and the ability
to determine transient distributions of distances add to the kinetic FRET experi-
ment’s unique strength.
Both steady state and time-resolved detection were applied together with any
method of fast initiation of the folding or unfolding transition, e.g., stopped flow,
continuous flow, and T-jump.
17.13.2
Steady State Detection
The ensemble mean transfer efficiency can be detected continuously by rapid re-
cording of the donor emission, or the acceptor emission, or both. As in the case
of equilibrium experiments, at least two traces should be recorded: (a) the donor
emission (IDD ðtÞ) or the acceptor emission (IDA ðtÞ) of the double-labeled protein;
and (b) the donor or the acceptor emission in the absence of FRET, (ID ðtÞ) or
620 17 Fluorescence Resonance Energy Transfer (FRET)
strates the potential structural details that can be obtained by FRET determination
of multiple intramolecular distances during the unfolding/refolding transitions of
protein molecules.
Udgaonkar and Krishnamoorthy and their coworkers used multisite trFRET in a
study of the unfolding intermediates in the unfolding transition of barstar [87, 88].
Four different single cysteine-containing mutants of barstar with cysteine residues
at positions 25, 40, 62, and 82 were studied. Four different intramolecular dis-
tances were measured by steady state detection of the donor emission intensity
and multiple, coexisting conformers could be detected on the basis of the time depen-
dence of the apparent mean distances. The authors concluded that during unfold-
ing the protein surface expands faster than, and independently of, water intrusion
into the core. Like the works reported previously for the larger proteins, the multi-
site FRET study of Barstar showed that noncooperative folding, which is not ob-
served by routinely used spectroscopic methods, can be found in unfolding of
small proteins as well.
Another application of a multiple-distance probe study of folding of a small two-
state protein was recently reported by Magg and Schmid [126]. Bacillus subtilis cold
shock protein (Bc-CspB) folds very rapidly in a simple two-state mechanism. Magg
and Schmid measured the shortening of six intramolecular distances during
stopped-flow-initiated refolding by means of steady state FRET. Six mutants labeled
by the pair Trp-1,5-AEDANS were prepared by means of labeling strategy 1.
The calculated R o ¼ 22 Å fit well with the native dimensions of this small protein.
Two pairs of sites were found to have the same inter-residue distance in both the
native and the unfolded states. For four donor/acceptor pairs, the probed apparent
mean intramolecular distances shorten with almost identical very rapid rates and
thus, the two-state folding of this protein was confirmed. At the same time, more
than 50% of the total increase in energy transfer upon folding occurred prior to the
rate-limiting step. This finding reveals a very rapid collapse before the fast two-
state folding reaction of Bc-Csp, and suggests that almost half of the shortening
of the intramolecular distances upon folding of Bc-Csp has occurred before the
rate-limiting step.
Fink and coworkers [126] used a similar labeling strategy and studied both the
equilibrium and transient stopped-flow refolding intermediates of Staphylococcus
nuclease by means of steady state FRET. The results indicate that there is an initial
collapse of the protein in the deadtime of the stopped-flow instrument (regain 60%
of the native FRET signal) which precedes the formation of the substantial second-
ary structure. The distance determination shows similar structures in the equilib-
rium and transient intermediates.
17.13.3
Time-resolved FRET Detection of Rapid Folding Kinetics: the ‘‘Double Kinetics’’
Experiment
The transient transfer efficiencies determined by steady state detection of the fluo-
rescence intensities of the donor or the acceptor probes report rapid changes of dis-
622 17 Fluorescence Resonance Energy Transfer (FRET)
tances. But since the conformations found in ensembles of partially folded protein
molecules are inherently heterogeneous, the mean transfer efficiency (Eq. (12))
cannot be used for the determination of any meaningful mean distances. The
mean and width of the distributions of distances in these rapid changing ensem-
bles of partially folded protein molecules can be determined by rapid recording
of time-resolved fluorescence decay curves of the probes. This was achieved by the
‘‘double kinetics’’ experiment.
The ‘‘double kinetics’’ [59, 127, 129] folding/unfolding experiments combine
fast initiation of folding/unfolding transitions by rapid change of the solvent
synchronized with the very rapid determination of fluorescence decay curves [129].
Single pulse detection enables the determination of fluorescence decay curves in
less than a microsecond per curve, with a satisfactory signal-to-noise ratio. This en-
ables us to determine time-dependent transient intramolecular distance distribu-
tions, IDD(t). Two time regimes are involved in this experimental approach: first
is the duration of the conformational transition, the ‘‘chemical time regime’’ (tc ),
(microseconds to seconds) and the second the ‘‘spectroscopic time regime’’, (t s ), the
nanosecond fluorescence decay of the probes. This experimental approach is de-
signed to reveal the course of the development of IDD(t)s with millisecond or sub-
millisecond time resolution (depending on the time resolution of the folding initi-
ation technique). Combining this instrumental approach with the production of a
series of protein samples, site-specifically labeled with donor and acceptor pairs,
enables the characterization of the backbone fold and flexibility in transient inter-
mediate states, during the protein folding transitions.
The challenge here is twofold: first, to collect fluorescence decay curves with a
sufficiently high signal-to-noise ratio to enable determination of statistically signif-
icant parameters of the IDD(t); and second, to synchronize the refolding initiation
device with the probe pulsed laser source. A stopped-flow double kinetics device
based on a single photon counting method was developed by Beechem and co-
workers [127] and applied for the determination of transient anisotropy decays.
This approach utilized only one photon in several pulses and hence required both
a very large number of stopped-flow experiments, and an amount of labeled pro-
tein. A system developed by Ratner et al. [59] based on low-frequency (10 MHz)
laser pulses and fast digitizer oscilloscope, overcomes this difficulty (Figure
17.17). The current time resolution of the spectroscopic time scale, t s , in this
mode of double kinetics experiment is 250 ps. Up to 20 fluorescence decay curves
can be measured with an acceptable signal-to-noise ratio within a single stopped-
flow run. The single pulse detection of fluorescence decay curves can be synchron-
ized with several methods of rapid initiation of refolding or unfolding, e.g.,
stopped-flow and laser-induced temperature jump.
17.13.4
Multiple Probes Analysis of the Folding Transition
The multiple probe test, in which a variety of probes of secondary and tertiary
structures are applied to monitor the unfolding/refolding kinetics (see Chapters 2
17.13 Folding Kinetics 623
Laser
Stopped
8 flow 8 6
1 2 3 7
unit
4
5
PC 6
Scope
7
Fig. 17.17. Scheme of double kinetics device (Hewlett-Packard, CA, USA), 1 and 5: lenses;
in the stopped flow configuration. The laser is 2: fast photodiode (Hamamatsu, Japan);
a Quata-Ray high finesse OPO YAG Laser 3: shutter; 4: bandpass filter; 6: biplanar
(Spectra-Physics, CA, USA), the stopped-flow phototubes (R1193U-03 Hamamatsu, Japan);
unit is DX17MV (Applied photophysics, UK), 7: preamplifiers C 5594 (Hamamatsu, Japan);
the scope is Infinium 1.5 GHz oscilloscope 8: attenuating filters.
and 12), has proved to be useful in distinguishing between parallel and sequential
mechanisms. The determination of distributions of selected intramolecular distan-
ces by means of a series of double-labeled mutants of the model protein enables
the determination of the time dependence of the formation of secondary structure
elements; the formation of loops; changes of subdomain structures, and overall
compaction of the chain at high time resolution. Changes of the mean and the
width of distributions of intramolecular distances in protein molecules during the
initial phases of the folding transitions can reveal weak energy bias (see Chapter 6).
This favors conformational trends that affect the direction of the folding process.
Very few and very weak interactions between chain elements may affect the shape
of the distributions at the earliest phases of the folding. From such findings we can
ask: Are such interactions effective during the initial collapse making it a biased
change of conformation, rather than a random compaction driven only by solvent
exclusion mechanism? Are those interactions local or nonlocal?
Ratner et al. [128] applied the double kinetics experiment in studying the early
folding transitions of the CORE domain of E. coli adenylate kinase (AK) (Figure
17.14). Using the double-mixing stopped-flow mode of the double kinetics experi-
ment the intramolecular distance distributions of a series of double-labeled AK
mutants were determined in the unfolded state (in 1.8 M GdmCl). This occurred
immediately after the change to refolding conditions (2–5 ms after dilution of de-
naturant), and upon completion of the refolding transition. This series of experi-
ments was motivated by two questions. First, what is the extent of structure forma-
tion in different parts of the chain at the earliest detected intermediate state? And
second, do secondary structure elements appear during folding prior to the forma-
tion of tertiary folds of the chain? What is the role of local and nonlocal interactions
in the earliest phases of folding? This work is in progress, but the results obtained
624 17 Fluorescence Resonance Energy Transfer (FRET)
to date with several double-labeled mutants indicate the following: 5 ms after being
transferred from unfolded to refolded states in a double-mixing unfolding/refold-
ing cycle, the distributions of distances between two pairs of sites which measure
the full length of secondary structure elements in the protein (pairs 169–188, an
a-helix and pair 188–203, a b-sheet) maintain the same shape as found in the un-
folded state (Figure 17.18). Native intramolecular distance distributions of these
two segments are found only after 3 s. This finding shows that at least these sec-
ondary structure elements are formed only very late in the folding transition. Do
tertiary structure elements appear in the CORE domain at an earlier time? Prelim-
inary experiments in which the distances between residues 18 and 203, indicate
that this may be the case, and this question remains under investigation.
Data obtained through multiple FRET experiments can reflect the complexity of
the folding transition, but simultaneously help resolve some common principles of
the resolution of this complexity. Those are the principles that enable the transition
of ensembles of protein molecules from states characterized by multiple conforma-
tions to the native state characterized by narrow distributions of conformations.
0.016
0.012
Probability
0.008
0.004
0.000
0 20 40 60
Distance (A)
Fig. 17.18. Sets of transient distance are shown in order to demonstrate the range
distribution between residues 169 and 188 in of experimental uncertainty in the parameters
AK in the denatured state (blue traces), in the of the distributions. This experiment shows
5 ms refolding intermediate state (red traces) that in the early folding steps the helical
and in the native state (after 3 s, green trace). segment between residues 169 and 188 (Figure
The protein was denatured in 1.8 M GdmCl for 17.15) is unfolded, very similar to its state in
100 s and then the denaturant was diluted to the high denaturant concentration.
final concentration of 0.3 M. Repeated traces
17.14 Concluding Remarks 625
17.14
Concluding Remarks
Acknowledgments
The author was privileged to work for a long time with a group of devoted and en-
thusiastic scientists, collaborators and students, who contributed to many aspects
of the projects cited in this chapter. Their contributions are deeply acknowledged.
The approach described here was influenced by the initiative which was started
References 627
References
61 Amir, D., Levy, D. P., Levin, Y. & entropy method of data analysis in
Haas, E. (1986). Selective fluorescent time-resolved spectroscopy. Methods
labeling of amino groups of bovine Enzymol 240, 262–311.
pancreatic trypsin inhibitor by 73 Lakshmikanth, G. S., Sridevi, K.,
reductive alkylation. Biopolymers 25, Krishnamoorthy, G. & Udgaonkar,
1645–1658. J. B. (2001). Structure is lost
62 Epe, B., Woolley, P., Steinhauser, incrementally during the unfolding of
K. G. & Littlechild, J. (1982). barstar. Nat Struct Biol 8, 799–804.
Distance measurement by energy 74 Deisenhofer, J. & Steigmann, W.
transfer: the 3 0 end of 16-S RNA and (1975). Crystallographic refinement
proteins S4 and S17 of the ribosome of the structure of bovine pancreatic
of Escherichia coli. Eur J Biochem 129, trypsin inhibitor at 1.5 A resolution.
211–219. Acta Crystallogr B31, 238.
63 Flamion, P. J., Cachia, C. & 75 Creighton, T. E. (1978).
Schreiber, J. P. (1992). Non-linear Experimental studies of protein
least-squares methods applied to the folding and unfolding. Prog Biophys
analysis of fluorescence energy Mol Biol 33, 231–297.
transfer measurements. J Biochem 76 Gussakovsky, E. E. & Haas, E. (1992).
Biophys Methods 24, 1–13. The compact state of reduced bovine
64 Brand, L. (1992). Methods in pancreatic trypsin inhibitor is not the
Enzymology (Brand, L. & Johnson, compact molten globule. FEBS Lett
L. M., eds), Vol. 210. 308, 146–148.
65 Beechem, J. M. & Brand, L. (1985). 77 Gottfried, D. S. & Haas, E. (1992).
Time-resolved fluorescence of Nonlocal interactions stabilize
proteins. Annu Rev Biochem 54, compact folding intermediates in
43–71. reduced unfolded bovine pancreatic
66 Ameloot, M., Beechem, J. M. & trypsin inhibitor. Biochemistry 31,
Brand, L. (1986). Simultaneous 12353–12362.
analysis of multiple fluorescence decay 78 Griko, Y. V., Privalov, P. L.,
curves by Laplace transforms. Sturtevant, J. M. & Venyaminov, S.
Deconvolution with reference or (1988). Cold denaturation of
excitation profiles. Biophys Chem 23, staphylococcal nuclease. Proc Natl
155–171. Acad Sci USA 85, 3343–3347.
67 Lanczos, C. (1956). Applied Analysis, 79 Ittah, V. & Haas, E. (1995). Nonlocal
pp. 272–304. Prentice Hall, interactions stabilize long range loops
Engelwood Cliffs. in the initial folding intermediates of
68 Bevington, P. R. (1969). Data reduced bovine pancreatic trypsin
Reduction and Error Analysis for the inhibitor. Biochemistry 34, 4493–4506.
Physical Sciences. McGraw-Hill, New 80 Klein-Seetharaman, J., Oikawa, M.,
York. Grimshaw, S. B. et al. (2002). Long-
69 Grinvald, A. & Steinberg, I. Z. range interactions within a nonnative
(1974). On the analysis of fluorescence protein. Science 295, 1719–1722.
decay kinetics by the method of least- 81 Lattman, E. E. & Rose, G. D. (1993).
squares. Anal Biochem 59, 583–598. Protein folding – what’s the question?
70 Ames, W. F. (1977). Ames, William F. Proc Natl Acad Sci USA 90, 439–441.
Numerical Methods for Partial 82 Berezovsky, I. N., Grosberg, A. Y. &
Differential Equations, 2nd edn. Trifonov, E. N. (2000). Closed loops
Academic Press, New York. of nearly standard size: common basic
71 Press, W. H., Flannery, B. P., element of protein structure. FEBS
Teukolski, S. A. & Vetterling, W. F. Lett 466, 283–286.
(1989). Numerical Recipes: The Art of 83 Berezovsky, I. N., Kirzhner, V. M.,
Scientific Computing. Cambridge Kirzhner, A. & Trifonov, E. N. (2001).
University Press, Cambridge. Protein folding: looping from hydro-
72 Brochon, J. C. (1994). Maximum phobic nuclei. Proteins 45, 346–350.
References 631
models of kinase clefts and conforma- 114 Talaga, D. S., Lau, W. L., Roder, H.
tion changes. Science 204, 375–380. et al. (2000). Dynamics and folding
104 Pavlov, M., Sinev, M. A., of single two-stranded coiled-coil
Timchenko, A. A. & Ptitsyn, O. B. peptides studied by fluorescent energy
(1986). A study of apo- and holo-forms transfer confocal microscopy. Proc Natl
of horse liver alcohol dehydrogenase Acad Sci USA 97, 13021–13026.
in solution by diffuse x-ray scattering. 115 Dickson, R. M., Cubitt, A. B., Tsien,
Biopolymers 25, 1385–1397. R. Y. & Moerner, W. E. (1997). On/
105 Watson, H. C., Walker, N. P., Shaw, off blinking and switching behaviour
P. J. et al. (1982). Sequence and of single molecules of green fluores-
structure of yeast phosphoglycerate cent protein. Nature 388, 355–358.
kinase. EMBO J 1, 1635–1640. 116 Boukobza, R. S. & Haran, G. (2001).
106 Noda, L. (1973). Adenylate kinase. In Immobilization in surface-tethered
The Enzymes (Boyer, P. D., ed.), Vol. lipid vesicles as a new tool for single
8, pp. 279–305. Academic Press, New biomolecule spectroscopy. J Phys Chem
York. B 105, 12165–12170.
107 Banks, R. D., Blake, C. C., Evans, P. 117 Deniz, A. A., Dahan, M., Grunwell,
R. et al. (1979). Sequence, structure J. R. et al. (1999). Single-pair
and activity of phosphoglycerate fluorescence resonance energy
kinase: a possible hinge-bending transfer on freely diffusing molecules:
enzyme. Nature 279, 773–777. observation of Forster distance
108 Muller, C. W., Schlauderer, G. J., dependence and subpopulations. Proc
Reinstein, J. & Schulz, G. E. (1996). Natl Acad Sci USA 96, 3670–3675.
Adenylate kinase motions during 118 Ha, T., Enderle, T., Chemla, D. S.,
catalysis: an energetic counterweight Selvin, P. R. & Weiss, S. (1997).
balancing substrate binding. Structure Quantum jumps of single molecules
4, 147–156. at room temperature. Chem Phys Lett
109 Muller, Y. A., Schumacher, G., 271, 1–5.
Rudolph, R. & Schulz, G. E. (1994). 119 Veerman, J. A., Garcia-Parajo, M. F.,
The refined structures of a stabilized Kuipers, L. & van Hulst, N. F.
mutant and of wild-type pyruvate (1999). Time-varying triplet state
oxidase from Lactobacillus plantarum. lifetimes of single molecules. Phys Rev
J Mol Biol 237, 315–335. Lett 83, 2155–2158.
110 Muller, C. W. & Schulz, G. E. 120 Chan, C. K., Hu, Y., Takahashi, S.,
(1992). Structure of the complex Rousseau, D. L., Eaton, W. A. &
between adenylate kinase from Hofrichter, J. (1997).
Escherichia coli and the inhibitor Ap5A Submillisecond protein folding
refined at 1.9 A resolution. A model kinetics studied by ultrarapid mixing.
for a catalytic transition state. J Mol Proc Natl Acad Sci USA 94, 1779–
Biol 224, 159–177. 1784.
111 Schuler, B., Lipman, E. A. & Eaton, 121 Eaton, W. A., Munoz, V., Thompson,
W. A. (2002). Probing the free-energy P. A., Chan, C. K. & Hofrichter, J.
surface for protein folding with single- (1997). Submillisecond kinetics of
molecule fluorescence spectroscopy. protein folding. Curr Opin Struct Biol
Nature 419, 743–747. 7, 10–14.
112 Lipman, E. A., Schuler, B., Bakajin, 122 Teilum, K., Kragelund, B. B. &
O. & Eaton, W. A. (2003). Single- Poulsen, F. M. (2002). Transient
molecule measurement of protein structure formation in unfolded acyl-
folding kinetics. Science 301, 1233– coenzyme A-binding protein observed
1235. by site-directed spin labelling. J Mol
113 Rhoades, E., Gussakovsky, E. & Biol 324, 349–357.
Haran, G. (2003). Watching proteins 123 Teilum, K., Maki, K., Kragelund,
fold one molecule at a time. Proc Natl B. B., Poulsen, F. M. & Roder, H.
Acad Sci USA 100, 3197–3202. (2002). Early kinetic intermediate in
References 633
18
Application of Hydrogen Exchange Kinetics
to Studies of Protein Folding
Kaare Teilum, Birthe B. Kragelund, and Flemming M. Poulsen
18.1
Introduction
Amide hydrogen exchange is one of the most useful methods for obtaining site-
specific information about hydrogen bond formation and breaking in the folding
protein. In particular, the method allows study of the individual amides in the pep-
tide chain, and for this reason the method has very wide applications in studies of
the protein folding mechanism.
Hydrogen exchange is a chemical exchange reaction between labile hydrogen
atoms chemically bound to either nitrogen, oxygen, or sulfur atoms. The exchange
reaction occurs most commonly between these labile hydrogen atoms in molecules
dissolved in a solvent that itself contains labile hydrogen atoms like water. Obvi-
ously this type of exchange reaction can take place for many atoms in a protein
molecule dissolved in water, where several types of labile hydrogen can exchange
with the water hydrogen atoms. The hydrogen atoms that potentially engage in
hydrogen exchange typically also have the potential for hydrogen bond formation.
The chemistry of the hydrogen exchange reaction is a proton transfer reaction,
which requires the release of the hydrogen atom. For this reason the hydrogen ex-
change reaction can be used to monitor hydrogen bond formation and breaking. In
proteins hydrogen bonds involving the peptide backbone amides and carbonyls are
the most important structural element in secondary structure formation. Therefore
by studying the hydrogen exchange reaction of the individual amides in the folding
peptide information becomes available regarding the hydrogen bond formation
of the individual segments in the protein folding process. Similarly information
about the stability and the kinetics of the hydrogen bonding segments in the pep-
tide chain can be obtained by measuring the hydrogen exchange reaction in native
conditions.
The kinetics of hydrogen exchange in proteins can be studied directly by nu-
clear magnetic resonance (NMR) spectroscopy using solvent saturation transfer or
the magnetic relaxation rate to study very fast reactions. Slower reactions can be
studied by letting the exchange reaction replace a hydrogen atom in the dissolved
equilibrium, this equilibrium will for most proteins be strongly shifted towards the
folded form in native conditions. However, since both states have to be populated
in all conditions, even in native conditions a small fraction of molecules will be of
the unfolded form. If the mechanism of hydrogen exchange for a specific amide
group is by global unfolding, the kinetics of the exchange reaction will carry infor-
mation about the rate constants of folding and unfolding in native conditions. This
consideration has stimulated much interest in the use of hydrogen exchange in na-
tive conditions to exploit the nature of the protein folding/unfolding equilibrium.
The group around Linderstøm-Lang suggested a simple, two-step mechanism for
hydrogen exchange [1, 4]. The first step is the equilibrium between the closed state
and the open state and the second step is the chemical exchange reaction occurring
from the open state. Therefore, when the mechanism of exchange for a given
amide group is by protein folding/unfolding, the rate constants for opening and
for closing would be equivalent to those of folding and unfolding, provided the
folding is a simple two-state process. Hence there would be an obvious interest in
determining these rate constants. A paragraph in this chapter will address some of
the methods used to extract the rate constants of the opening and closing reactions
in particular conditions favoring such measurements. The simplest condition is to
measure the amide hydrogen exchange kinetics when the rate-determining step in
the reaction mechanism is the opening rate, in which case the amide hydrogen ex-
change rate is simply the unfolding rate [13, 14]. Another method applies the mea-
surement of the hydrogen exchange rates at a set of different pH values at which
the chemical exchange rate from the open form is known. The measurement of
the amide hydrogen exchange kinetics in an array of pH values provide a sample
of data that subjected to nonlinear fitting can provide the two rate constants for the
opening and closing reactions [10, 15]. A paragraph in this chapter will describe
examples of the application of this type of analysis to several model proteins and
present the results of these analyses.
Another approach which is being used to bridge the gap between amide hydro-
gen exchange in native-like conditions and in denaturing conditions is to study the
rate of exchange as a function of a denaturing agent [9, 15–17]. As the concentra-
tion of denaturant increases, the unfolding equilibrium constant will typically in-
crease, bringing a larger and larger proportion of the molecules of the unfolded
form, resulting in faster amide hydrogen exchange. The study of individual amides
has for several proteins shown that the kinetics of the exchange of groups of
amides has typical patterns that reflect the mechanism of exchange and allow the
distinction between exchange by local, subglobal, or cooperative global exchange
mechanisms.
Amide hydrogen exchange has been a very useful tool for measuring the for-
mation of hydrogen bonds during protein folding. The quenched-flow and pulse-
labeling techniques, which combine rapid mixing techniques with analysis by
NMR spectroscopy or mass spectrometry, have had a great impact on outlining
the routes of protein folding [18–26]. These techniques allow determination of
the kinetics of the formation of a protected state for individual amides through-
18.1 Introduction 637
out the peptide chain. These methods are extremely powerful because the informa-
tion they provide has both a structural and kinetic implication.
For an understanding of the folding of proteins, the unfolded state of the peptide
chain has been subject to several studies because many studies have indicated the
existence of structure in the otherwise unfolded peptide chain. Such structures are
of interest because they may be labile precursors of the formation of native struc-
tures. Again, amide hydrogen exchange is one method of probing the presence of
residual structure in the unfolded state, structures in the molten globule and other
folding intermediates by comparison of the exchange kinetics with the exchange
rates predicted for the random coil conformation of the peptide [9, 12, 15, 27–43].
The study of protein folding is focused on understanding the thermodynamic
and structural routes of a peptide chain transforming from the unfolded state to
the folded state, identifying secondary and tertiary interactions in the transition
states and intermediates. Amide hydrogen exchange in combination with NMR
spectroscopy has played a considerable role in monitoring and describing these
states of protein folding. It should be kept in mind, however, that the method of
amide hydrogen exchange is an indirect way of detecting the progress of protection
of an amide hydrogen atom towards exchange with solvent. The method has no
way of providing information regarding the chemical and structural nature of the
protection. Therefore, careful evaluation is recommended when structural interpre-
tations of transition states and intermediates based on the results of amide ex-
change are presented in their own right. The literature has several examples of op-
posite interpretations of amide hydrogen exchange results. For instance it has been
proposed that the slow exchanging core of amides in the interior of a protein is the
folding nucleus of proteins [44, 45]. This view has been opposed on the grounds of
thermodynamic considerations and by examples of experimental evidence of pro-
teins for which the folding nucleus and the core of slowly exchanging amides did
not coincide [46, 47]. It has been suggested that pathways of protein folding may
be determined by detection of on-pathway intermediates [48]. Many of the inter-
mediates of protein folding have been identified by ultra-fast kinetic measure-
ments or indirectly detected in the dead time of rapid mixing experiments and re-
corded by spectroscopic techniques, which do not provide information about the
structure of the intermediate. The coincidence of the measures of thermodynamic
parameters describing the intermediate and those describing native state hydrogen
exchange kinetics have been used to couple such observations [12]. However, even
the presence or absence of an on-pathway intermediate in barnase, as determined
by amide hydrogen exchange studies in combination with a large number of very
carefully conducted folding studies, has turned out to be controversial [11, 49–52].
Nevertheless, amide hydrogen exchange is a powerful tool in studies of protein
folding, and used with care the combined results of kinetic and native state hydro-
gen exchange measurements with other spectroscopic measurements can provide
important information about protein folding.
The field of amide hydrogen exchange in protein folding has been reviewed
many times, most recently in Refs [53–57].
638 18 Application of Hydrogen Exchange Kinetics to Studies of Protein Folding
18.2
The Hydrogen Exchange Reaction
The exchange of amide hydrogen with hydrogen atoms from water is a catalyzed
reaction [58]. The catalyst can be the basic hydroxide ion, the neutral water mole-
cule, or the acid hydroxonium ion. The initial step in the reaction is the formation
of a transient complex between the reactive group and the catalyst through a hydro-
gen bond. Subsequently the labile proton is transferred to the catalyst in the com-
plex and the complex is dissociated. The replacement of the hydrogen atom with a
hydrogen atom from the solvent molecule represents the hydrogen solvent ex-
change reactions. This implies that amide hydrogen exchange is pH dependent
(Figure 18.1).
18.2.1
Calculating the Intrinsic Hydrogen Exchange Rate Constant, k int
1e+08
1e+06
kex (s )
–1
10000
100
1
0 2 4 6 8 10 12
pH
Fig. 18.1. Variation with pH of the amide hydrogen exchange
rate, k ex , of poly-d,l-alanine. At pH below 3 acid catalysis
dominates. Above pH 3 base catalysis dominates.
18.2 The Hydrogen Exchange Reaction 639
where kA ; kB , and k W are second-order rate constants for acid, base, and water cata-
lyzed hydrogen exchange in poly-d,l-alanine (PDLA). These rate constants have
been measured under different conditions and are listed in Table 18.1. When cal-
culating k int , a PDLA reference close to the actual experimental conditions should
be chosen. In Eq. (1), AL ; AR ; BL , and BR are factors correcting the PDLA rates for
effects of neighboring side chains other than Ala (Table 18.2) [59]. Different factors
for acid and base catalysis should be used. Thus, AL and AR are used to correct kA ,
whereas BL and BR are used to correct kB and k W . The L and R subscripts corre-
spond to whether the amide is to the left or to the right of the side chain (see Fig-
ure 18.2 for definition of left and right in this situation). Note that the correction
factors for left and right in Table 18.2 are different. For calculation of [OH ] from
a pH meter reading, the pKW values of Covington et al. [60] in H2 O and D2 O at
278 K and 293 K are listed in Table 18.3.
The temperature difference between the temperature under the reference condi-
tions, Tref , and the actual temperature of the experiment, T, is corrected from the
Arrhenius equation for each of the reference rate constants kA ; kB , and k W accord-
ing to:
where E a is the activation energy and R is the gas constant. E a is 14, 17, and 19
kcal mol1 for kA ; kB, and k W respectively. These E a values correct the second-order
rate constants kA ; kB , and k W and the autoprotolysis constants of H2 O or D2 O.
For example, calculate k int in H2 O for the amide hydrogen in the sequence -Phe-
NH-Gly- at low salt, pH 7, and 298 K.
From Table 18.1 the PDLA reference for D-exchange in H2 O with no salt at 293 K
is chosen. Thus, kA ¼ 10 1:40 M1 min1 , kB ¼ 10 9:87 M1 min1 , and k W ¼ 101:6
640 18 Application of Hydrogen Exchange Kinetics to Studies of Protein Folding
Tab. 18.2. Effect of amino acid side chains on hydrogen exchange rates of neighboring amide
hydrogens.
O RL
RR NH
Fig. 18.2. Left and right in Table 18.2 are defined relative to an
amino acid side chain. To correct the hydrogen exchange rate
of an amide to the right of side chain RR and to the left of side
chain RL the PDLA reference rate has to be corrected with the
‘‘R’’ value of side chain RR and the ‘‘L’’ value of side chain RL .
18.3 Protein Dynamics by Hydrogen Exchange in Native and Denaturing Conditions 641
H2 O 14.734 14.169
D2 O 15.653 15.049
M1 min1 . The amide is to the right of Phe, so from Table 18.2 AR ¼ 100:43 and
BR ¼ 10 0:06 . The amide is to the left of Gly, so AL ¼ 100:22 and BL ¼ 10 0:27 . From
Table 18.3 we get pOH ¼ pKW pH ¼ 15:049 7 ¼ 8:049. Calculating the acid,
base, and water terms in Eq. (1) separately gives:
Each term must be corrected for the temperature difference between the reference
conditions and the experimental conditions according to Eq. (2) and added to get
k int . However, k acid and k water are negligible compared with kB and only the base
term need be considered:
!
1 17 kcal mol1 ðð298 KÞ1 ð293 KÞ1 Þ
k int ¼ 141:6 min exp
1:987 103 kcal mol1 K1
¼ 231:1 min1
Hydrogen exchange rates may also be calculated using the web-service ‘‘Sphere’’
from Heinrich Roders laboratory at: http://www.fccc.edu/research/labs/roder/
sphere.
Protection factors, P, are defined as the ratio of the calculated intrinsic exchange
rate of an amide in the unstructured environment, k int , to the observed rate con-
stant of exchange, k obs , from the folded, intermediate, or unfolded state. Protection
factors are calculated individually for each backbone amide and are regarded as a
tool to predict the degree of structural protection against exchange, P ¼ k int /k obs
(see also Section 18.4.3).
18.3
Protein Dynamics by Hydrogen Exchange in Native and Denaturing Conditions
The hydrogen exchange reactions of amides in native state globular proteins are a
result of dynamic processes. From simple measurements of direct amide exchange
642 18 Application of Hydrogen Exchange Kinetics to Studies of Protein Folding
rates and the subsequent calculation of protection factors (see Section 18.1), the
stability and hydrogen bond structure of stable, populated, equilibrium protein
folding intermediates of several proteins have been described. Typically, the amides
of a protein in its stable native state are protected against exchange and the rates
are reduced by factors in the order of 10 6 to 10 10 , whereas the intermediate states
have protection factors in the order of 10 1 to 10 3 [43, 61–64]. For a few proteins
studied at high concentrations of chemical denaturants, for example in 8 M urea,
protection factors of less than five have been determined [38, 65].
At equilibrium under native conditions, a fraction of the protein population will
occupy high-energy intermediate and unfolded states, and this provides an excel-
lent opportunity to map the structure of protein folding intermediates in native
conditions, even when sparsely populated. The amide hydrogen exchange process
is a suitable process, since any perturbation of the native state stability occurring
from a change in any equilibrium in the direction of populating folding/unfolding
intermediates or the unfolded state will result in a change in the observed ex-
change rate. The perturbation of the equilibrium can occur by changing the pH of
the sample, by addition of small amounts of chemical denaturants, or by changing
the temperature. Information about global and local folding rates and populations
of intermediates in the protein folding reactions are all available from equilibrium
hydrogen exchange measurements. In the following studies, it is of importance
that the native state is always the most populated state.
18.3.1
Mechanisms of Exchange
Amides are protected against exchange either because they are involved in a hydro-
gen bond or because they are placed in a hydrophobic environment [66]. Exchange
of hydrogen-bonded amides requires opening of the hydrogen bond and separation
of donor and acceptor by at least 5 Å [67]. The breakage of any given hydrogen
bond can occur by three different mechanisms, each of which can be identified by
the use of the ‘‘Native State Hydrogen Exchange’’ strategy [9] described below. Ei-
ther the hydrogen bond is broken as a consequence of a cooperative global unfold-
ing process or by cooperative local unfolding reactions exposing sets of hydrogen
bonds [8, 53, 68]. The third possibility for exposure to solvent is by noncooperative
(limited) local fluctuations, which only expose one amide proton at a time for ex-
change [67, 69]. In a globular, native protein the amide hydrogen exchange rate
monitored for a particular amide proton will be a result of the sum of exchange
by all three mechanisms.
18.3.2
Local Opening and Closing Rates from Hydrogen Exchange Kinetics
Independent of the three mechanisms for exchange described above, and neglect-
ing any populations of intermediates, the amide exchange reaction can be de-
scribed by a two-step reaction (Figure 18.3) [1, 70]. The first step opens the amide
18.3 Protein Dynamics by Hydrogen Exchange in Native and Denaturing Conditions 643
d½CNH
¼ k op ½CNH þ k cl ½ONH ð3Þ
dt
d½ONH
¼ k op ½CNH ðk int þ k cl Þ½ONH ð4Þ
dt
1
l1 ¼ ðkop þ k cl þ k int ððkop þ k cl þ k int Þ 2 4kop k int Þ 1/2 Þ ð7Þ
2
and
1
l2 ¼ ðkop þ k cl þ k int þ ððkop þ k cl þ k int Þ 2 4kop k int Þ 1/2 Þ ð8Þ
2
In a hydrogen exchange experiment l2 is generally larger than 10 s1 and the fast
phase described by this rate constant will be over in the dead time of the experi-
ment, which typically will be measured in minutes. Thus, only l1 will be observed
and k obs ¼ l1 . This general description of the amide exchange rate was first de-
scribed by Linderstrøm-Lang [1] and is applicable under all conditions.
Two limiting cases of Eq. (9) may be considered. If k int g k cl , which is typical un-
der alkaline conditions, Eq. (9) reduces to:
k obs A k op ð10Þ
This is the EX1 limit [4, 70, 73, 74]. In this limit the amide will exchange in every
opening event as the rate-limiting step will be the conformational opening from
the closed form. In the EX1 limit, the observed rate of exchange, k obs , directly gives
k op .
If k cl g k int , Eq. (9) reduces to:
This is the EX2 limit [4, 73, 74]. In the EX2 limit the amide will not exchange in
every opening event as closing competes with exchange. From the observed rate of
exchange, k obs , and the intrinsic exchange rate, k int , the equilibrium constant for
opening, K op , may be obtained.
From Eqs (10) and (11) it is seen that if k op and k cl are both pH independent,
k obs for amides under EX1 conditions is also independent of pH. k obs for amides
under EX2 conditions will be pH dependent through the pH dependence of k int .
Taking the logarithm of Eqs (10) and (11) yields:
18.3 Protein Dynamics by Hydrogen Exchange in Native and Denaturing Conditions 645
log k obs ¼ log k int þ log K op ¼ pH pKW þ log BL þ log BR þ log kB þ log K op
ð13Þ
for EX2. Thus, in a plot of log k obs versus pH a slope of zero will reflect an amide
with exchange in the EX1 limit and a slope of one will reflect an amide with ex-
change in the EX2 limit (Figure 18.4).
For a given protein, the residue-dependent variations of k op ; k cl , and k int will re-
sult in observed exchange rates varying over several orders of magnitudes. For a
given amide in a fixed amino acid sequence, a set of simulations of the observed
exchange rate, k obs , with different opening and closing rates is shown in Figure
–1
log (kint (s ))
-2.0 -1.0 0.0 1.0 2.0 3.0 -3
-3.0 (kop,kcl),(10 ,10)
-4
-4.0 (kop,kcl),(10 ,10)
EX(1)
EX(1/2)
-5
-5.0 (kop,kcl),(10 ,10)
log (kobs (s ))
–1
-6.0
EX(2)
-7.0
-8.0
4.0 5.0 6.0 7.0 8.0 9.0
pH
Fig. 18.4. Simulated pH dependence of k obs 10ðpH-14Þ M expected for the amide in
for amide exchange according to Eq. (7). Data the sequence Ala-NH-Ala is on the top axis.
are simulated in the pH range from pH 4.0 to The three situations of exchange are illustrated
pH 10.0 for the situation that only the base- in the figure: EX2 (pH dependence ¼ 1),
catalyzed reaction is active and the rate EX(1/2) (0 < pH dependence < 1), and EX1
constants k op and k cl are not pH dependent. (no pH dependence).
The chemical exchange rate k int ¼ 10 8 s1 M1
646 18 Application of Hydrogen Exchange Kinetics to Studies of Protein Folding
18.4. Four lines are shown representing simulated pH-dependent exchange in four
different situations with different values of k op and k cl . The variation in k op will
shift the value for the observed rate constant, log k obs , by a simple transposition
where increasing k op (and/or k int ) shift log k obs to higher values (bold solid line to
thin solid line). Variation in k cl will change the profile of the line and increasing k cl
will shift log k obs to the left and lead to exchange in the EX1 limit at lower pH
(bold solid line to bold dashed line).
negative. If the stability is decreasing with pH, the pH dependence will be larger
than one for an EX2 amide (a pH > 1) and larger than zero for an EX1 amide
(a pH > 0). An amide that has a pH < 1 is therefore either an EX2 amide whose en-
vironment is stabilizing with increasing pH, an EX1 amide that is destabilizing
with increasing pH, or an amide that exchanges in the EX(1/2) range. When deter-
mining the segmental opening and closing rate constants, k op and k cl , these issues
must be considered. One way is to assure that exchange takes place by EX1 mecha-
nism at higher pH and by EX2 mechanism at lower pH. Optical techniques may
also be invoked to determine the global stability with pH.
pH dependence of k obs and resulting exchange limits When the rate constants k op
and k cl are not themselves pH dependent, amides exchanging in the EX(1/2) range
can easily be distinguished from amides in the EX1 and the EX2 range by the slope
a pH ¼ D log k obs /DpH of the pH dependence of the hydrogen exchange kinetics.
EX2 amides a pH ¼ 1
EX1 amides a pH ¼ 0
EX(1/2) amides a pH is between 0 and 1
18.3.3
The ‘‘Native State Hydrogen Exchange’’ Strategy
The structure and stability of exchange-competent states, which also includes the
structure and stability of low populated, partially unfolded states of a protein, can
be deduced and investigated using the ‘‘Native State Hydrogen Exchange’’ strategy
[9, 69, 81]. In this strategy, the denaturant dependence of hydrogen exchange rates
is measured under EX2 conditions such that the denaturant concentration is varied
only in the range favoring the native state. Through the perturbations of the equi-
librium between the folded state and any unfolded state by chemical denatur-
ants or by temperature, the dominant exchange process for a given amide can be
mapped and cooperative units in protein unfolding, the so-called native-like par-
tially unfolded forms, PUFs (see below) [9] can be located. Importantly, no infor-
mation regarding the order of the folding reaction can be provided by this strategy,
as kinetic mechanisms cannot be derived from an equilibrium analysis [47, 82]. It
has been of some debate whether the slow-exchanging amides identify the protein
folding nucleus. There seems only to be a correlation for a subset of proteins [44,
83] and certainly not for others [54, 84]. It must be stressed that the order of events
cannot be determined from equilibrium measurements. Only when independent
and complementary kinetic folding experiments are provided can distinctions be-
tween on- and off-pathway intermediates be made.
In the EX2 regime of exchange, the transient equilibrium constant for the open-
ing reaction, K op , can be used to calculate the apparent free energy of exchange (or
opening), DG HX , (or DGop ) as
DG HX ¼ RT ln K op ð14Þ
Earlier studies have demonstrated that the free energy of unfolding is linearly de-
pendent on the concentrations of denaturant [85–87], which is expressed as
where m is a measure of the change in exposed surface area upon unfolding and
DG unf is the unfolding free energy at 0 M denaturant (see Chapters 12.1, 12.2 and
13). The exchange can be regarded as a sum of two different processes [8, 68].
First, exchange can occur as a result of local fluctuations that do not expose addi-
tional surface area upon exchange and that are independent of the concentration of
denaturant. Secondly, exchange can occur as a result of an unfolding reaction,
which can be either global or local, depends on the concentration of denaturant,
and exposes additional surface area. These two processes can be separately defined
and the free energy of opening described as the sum of the processes.
Combining Eqs (16) and (17), an expression for the free energy of exchange that
accounts for both fluctuation and unfolding mechanisms for exchange is obtained
as
DG fluc DG unf þ m½denaturant
DG HX ¼ RT ln exp þ exp ð18Þ
RT RT
8 A
B
6
C1
4 C
-DGHX
D
2
-2
0 0.5 1 1.5 2
[Denaturant] (M)
Fig. 18.5. Simulation of different hydrogen is due to local fluctuations. The exchange is
exchange mechanisms according to Eq. (18). governed by global unfolding at higher
For amide A the slope of the line (m) equals denaturant concentrations. The curve coincide
the m-value from equilibrium unfolding studies with the curve of amide A. Amides C and C1
monitoring global unfolding (for example shift to local unfolding with increasing
fluorescence and CD spectroscopy). For amide denaturant concentration and merge to a
A, hydrogen exchange takes place entirely by common isoenergetic unfolding reaction and is
global unfolding mechanism, and this has the part of the same cluster in the structure.
highest free energy determined for the protein. Amide D is an example where global unfolding
Amide B has an m-value of zero at low is never the dominating mechanism of
concentration of denaturant and thus exchange exchange (Adapted from Ref. [135].)
650 18 Application of Hydrogen Exchange Kinetics to Studies of Protein Folding
8 Amide A: Global unfolding. Amide hydrogen exchange takes place by global un-
folding and represents the highest free energy at 0 M guanidinium chloride
(GdmCl) determined in the protein. The slope of the line equals the m-value in
Eq. (18) and corresponds to the m-value obtained from classical equilibrium
unfolding studies using, for example, circular dichroism (CD) or fluorescence
spectroscopy.
8 Amide B: Local fluctuations to global unfolding. At low concentrations of denatur-
ant, this amide exchange by local fluctuations. This is seen by the m-value of
zero. At higher concentrations of denaturant, there is a shift for amide B to ex-
change by global unfolding, with an m-value equal to the m-value determined
from CD or fluorescence.
8 Amide C and C1: Local fluctuations ! local unfolding. At low denaturant concen-
tration amides C and C1 exchange by local fluctuations. At intermediate denatur-
ant concentrations amides C and C1 exchange through local unfolding events
with similar change in free energy (see below (PUF)). At higher concentrations
of denaturant (> 2 M) it is expected that both C and C1 will exchange by global
unfolding and their lines will merge with the lines for A and B.
8 Amide D: Local unfolding. For this amide global unfolding is never the dominat-
ing mechanism of exchange in the shown region of denaturant concentrations.
This amide exchanges predominantly by local unfolding.
When fitting to Eq. (18), three parameters are determined: the m-value (slope),
DG unf , which is the free energy of exchange at 0 M denaturant (extrapolated inter-
cept y-axis, dotted line), and DG fluc , which is the free energy arising from fluctua-
tions. The m-value provides direct information on the amide exchange mechanism:
Fluctuation mechanisms are characterized by m-values of zero or very low m-
values, local unfolding mechanisms have intermediate m-values lower than the
m-values describing global unfolding, and finally global unfolding mechanisms
give m-values equal to the m-values determined by global unfolding monitored by
spectroscopic measures.
18.4
Hydrogen Exchange as a Structural Probe in Kinetic Folding Experiments
18.4.1
Protein Folding/Hydrogen Exchange Competition
V/Vref 1
I¼ f ð19Þ
Vc /V c; ref H2 O
Quench conditions
0.38 M GdmCl, pH ~5
6%D
Mixer 2
Exchange/refolding conditions
0.55 M GdmCl, pH 7 - 11
Delay line
tpulse
9%D
Syringe 3
Mixer 1
Unfolding conditions
6 M GdmCl, pH 5 - 6
100 % D
Syringe 2 Syringe 1
Fig. 18.6. Outline of the folding/exchange @7 to @11 is performed. Hydrogen exchange
competition experiment. The experiment is and protein folding competes while the protein
carried out in a three-syringe quenched flow flows from mixer 1 to mixer 2. After a period of
apparatus. Syringe 1 contains fully deuterated time, t pulse the sample reaches mixer 2 where
protein in 100% D2 O under conditions quench buffer is added from syringe 3. The pH
favoring the unfolded state. In mixer 1 the is now lowered to 5 or less and hydrogen
protein is diluted with 10 volumes of hydro- exchange becomes slow. The sample is
genated folding and exchange buffer from collected and desalted prior to analysis by
syringe 2. A series of experiments where pH of NMR.
the refolding buffer is varied in the range from
d½UD
¼ ðk int þ kUN Þ½UD ð22Þ
dt
dI dð½UH þ ½NH Þ
¼ ¼ kUN ½UD ð23Þ
dt dt
Solving Eqs (22) and (23) with the assumption that all protein is found as UD for
t pulse ¼ 0 yields:
k int
I¼ ð1 expððkUN þ k int Þt pulse ÞÞ ð24Þ
kUN þ k int
The proton occupancy versus pH curve of an amide that gets protected from ex-
change will thus be shifted to the right of the curve calculated for an unstructured
amide using Eq. (21) (Figure 18.8). Selecting t pulse g 1/ðkUN þ k int Þ Eq. (24) re-
duces to:
k int
I¼ ð25Þ
kUN þ k int
By fitting the measured proton occupancies to Eqs (24) or (25) the rate of struc-
ture (hydrogen bond) formation, kUN , can be obtained.
18.4 Hydrogen Exchange as a Structural Probe in Kinetic Folding Experiments 655
0.8
Proton occupancy
0.6
0.4
Unstructured
0.2 U=N
U = I = N (KUI = 2)
U = I = N (KUI = 50)
0
7 8 9 10 11 12
pH
Fig. 18.8. Simulated proton occupancy Eq. (24); ( ) exchange from an amide in a
profiles from a protein folding/hydrogen three-state process with KUI ¼ 2, kIN ¼ 50 s1 ,
exchange competition experiment. ( ) k int ¼ 1:67 10 8 s1 M1 [OH ], and k ex; I ¼
exchange from an unprotected amide with 0 according to Eq. (29); (——) exchange from
k int ¼ 1:67 10 8 s1 M1 [OH ] according to an amide in a three-state process with KUI ¼
Eq. (21); ( ) exchange from an amide in a 50, kIN ¼ 50 s1 , k int ¼ 1:67 10 8 s1 M1
two-state process with kUN ¼ 50 s1 , k int ¼ [OH ], and k ex; I ¼ 0 according to Eq. (29).
1:67 10 8 s1 M1 [OH ] according to
For a protein that folds through an intermediate (Figure 18.7), the folding/
competition protocol can be used to estimate the stability of the hydrogen bonds
formed in the intermediate. If kUI ; kIU g k int ; kIN , U and I can be approximated
to be in equilibrium at all times. Thus ½U ¼ 1/ð1 þ KUI Þð½U þ ½IÞ and ½I ¼
KUI /ð1 þ KUI Þð½U þ ½IÞ. The rate equations become:
where XH is any hydrogenated state ð½XH ¼ ½UH þ ½IH þ ½NH Þ. During measure-
ment of the proton occupancy I only the N state is observed and essentially all hy-
drogenated species will be found in this state. Thus [XH ] is equivalent to I. Solving
these equations yields the following expression for I:
k int þ KUI k ex; I KUI ðkIN þ k ex; I Þ þ k int
I¼ 1 exp t pulse ð29Þ
k int þ KUI ðkIN þ k ex; I Þ KUI þ 1
where k app; I ¼ k int /Papp with the apparent protection factor, Papp ¼
KUI þ 1
. If k ex; I f k int only little exchange occurs directly from I and
KUI k ex; I /k int þ 1
Papp A 1 þ KUI . If no protection from hydrogen exchange is present in I,
k ex; I ¼ k int and Papp ¼ 1. Great care should be taken when using the simplified
Eq. (30). Indeed, Gladwin and Evans [33] and Bieri and Kiefhaber [111] have
stressed the importance of accounting for later folding events (i.e. the flux to N)
in folding competition experiments. In a study of early folding intermediates in
lysozyme and ubiquitin Gladwin and Evans simplified Eq. (29) by considering the
U and I states as a single denatured state D in which some protecting structure
may be formed. A two-state reaction scheme as in Figure 18.7 was used to describe
the situation and an expression analogous to Eq. (24) can be obtained by setting
k ex ¼ ðk int þ KUI k ex; I Þ/ðKUI þ 1Þ and k f ¼ KUI kIN /ðKUI þ 1Þ in Eq. (29):
k ex
I¼ ð1 expððk f þ k ex Þt pulse ÞÞ ð31Þ
k f þ k ex
where k f is the apparent folding rate and k ex is the apparent exchange rate from D.
In order to describe the protective structure in D, a protection factor or inhibition
factor, k int /k ex , was reported [33].
An important prerequisite for good estimates of the stability of folding inter-
mediates from the folding/exchange competition protocol is that the stability and
kinetics of the folding process have low pH dependence. If the stability of an inter-
mediate changes significantly with pH analysis of the data is not straightforward
[111].
18.4.2
Hydrogen Exchange Pulse Labeling
Whereas the folding/exchange competition experiment has proven very useful for
characterizing early folding intermediates, the protocol does not have the time res-
olution required for characterizing the time course of a protein folding process
with multiple kinetic phases. The hydrogen exchange pulse labeling experiment is
the choice for studying the time course of hydrogen bond formation. The hydrogen
18.4 Hydrogen Exchange as a Structural Probe in Kinetic Folding Experiments 657
18.4.3
Protection Factors in Folding Intermediates
N D O C Quench conditions
0.29 M GdmCl, pH ~5
5%D
N H O C
Mixer 3
N D O C
Exchange conditions
0.38 M GdmCl, pH 9 - 11
Delay line 2
tpulse 6%D
N H O C
Syringe 4
Mixer 2
N D O C Refolding conditions
0.55 M GdmCl, pH 5 - 6
Delay line 1
tfold 9%D
N D O C
Syringe 3
Mixer 1
Unfolding conditions
6 M GdmCl, pH 5 - 6
N D O C 100 % D
Syringe 2 Syringe 1
Fig. 18.9. Outline of the quenched flow pH is added from syringe 3. At high pH
hydrogen exchange pulse labeling experiment. hydrogen exchange is fast and deuterium at
The experiment is carried out in a four-syringe amides not engaged in hydrogen bonds will
quenched flow apparatus. Syringe 1 contains exchange with hydrogen from the solvent. After
fully deuterated protein in deuterated buffer a period of time, t pulse the sample reaches
under conditions favoring the unfolded state. mixer 3 where quench buffer is added from
In mixer 1 the protein is diluted with 10 syringe 4. The pH is now lowered to 5 or less
volumes of hydrogenated refolding buffer. The and hydrogen exchange becomes slow. The
protein is allowed to fold while it flows from result is a snapshot of the hydrogen bonds
mixer 1 to mixer 2. The folding time will thus formed at mixer 2 as amides engaged in
be dependent on the volume between mixers 1 hydrogen bonds at this point will be labeled
and 2 and the flow rate. At mixer 2 five with deuterium. The sample is collected and
volumes of hydrogen exchange buffer with high desalted prior to analysis by NMR.
18.4 Hydrogen Exchange as a Structural Probe in Kinetic Folding Experiments 659
state, the protection factor is defined as P ¼ k int /k obs (Section 18.2.1). In analogy to
this, a structural protection factor Pstruc ¼ k int /k ex; I can be defined, where k ex; I is
the apparent rate of hydrogen exchange directly from I [114]. Pstruc thus reports
on the stability and local fluctuations of structure in I-like protection factors for
the native state. In Pstruc , exchange through preceding unfolding of I to U is not
considered. In order to get a reliable measure of Pstruc it is important to be able to
separate the exchange from the I-state from unfolding to and exchange from the
U-state. This may be achieved if the I-state only converts slowly to the U-state com-
pared with the exchange directly from I, or if previous knowledge of the folding
kinetics allows for correction of the contribution to exchange from other states.
An example of measurement of Pstruc for two intermediates in the refolding of
RNase A has been reported by Houry and Scheraga, who used an ingenious double
jump hydrogen exchange protocol to selectively populate two different intermedi-
ates and measure the hydrogen exchange from these [114, 115].
Another way of defining a protection factor for early folding intermediates is as
an apparent protection factor, Papp ¼ k int /k app; I [114], which was introduced in the
discussion of Eq. (30). In this definition, the apparent exchange rate k app , which
includes both direct exchange and exchange through unfolding processes, is con-
sidered. Papp is thus closely related to the overall stability of a hydrogen bond
formed in an intermediate state including both the direct exchange and unfolding
events. Often Papp is more easily measured than Pstruc , as the contributions from
the structure in I and the folding events do not have to be dissected.
18.4.4
Kinetic Intermediate Structures Characterized by Hydrogen Exchange
The hydrogen exchange labeling techniques described above have been used to
characterize kinetic folding intermediates in numerous proteins. Both transient in-
termediates formed very early in the folding process and more stable intermediates
formed later during folding have been observed by hydrogen exchange labeling.
One of the most well-studied systems is cytochrome c (cyt c) [116]. Three kinetic
phases of the folding process have been characterized by tryptophan fluores-
cence [19, 117]. The fastest phase occurring within the dead time of conventional
stopped-flow techniques was characterized by rapid mixing continuous-flow kinetic
experiments and has a time constant of 50 ms [117]. The structures formed in this
initial phase have been characterized by folding/hydrogen exchange competition.
Sauder and Roder [26] found that several amides from three segments of the poly-
peptide that form a-helices in the folded state acquire significant protection from
hydrogen exchange with the solvent within 2 ms of folding in 0.3 M GdmCl. Ap-
parent protection factors in the range from 1.5 to 8 relative to exchange in cyt c
unfolded in 2.5 M GdmCl were observed. Larger protection factors of 5–8 were
observed in the helical regions forming side-chain interactions between the N-
and C-terminal helices in the native structure. These protection factors agree well
with the protection factors of 4–5 expected from stopped-flow experiments [26].
The lower protection factors of 1.5–5 were observed at the helical ends, suggesting
660 18 Application of Hydrogen Exchange Kinetics to Studies of Protein Folding
exchange through local fluctuations in these regions in the intermediate. The fol-
lowing phase in the folding of cyt c has a time constant of 20 ms, during which the
structure in the N- and C-terminal helices consolidates. Tight native-like packing of
these two helix fragments in the intermediate probably stabilizes the hydrogen
bonds. The major fraction of the protein population reaches the native state from
the late intermediate by organizing the structure around the heme with a time con-
stant of 370 ms [19].
Lysozyme has also been extensively studied. Folding experiments at pH 5.2
showed a dead-time burst followed by two exponential phases using a variety of de-
tection techniques with time constants of 25 ms and 340 ms respectively [118]. By
folding/hydrogen exchange competition, protection factors of 2–15 in the burst
phase were observed for amides constituting the a-domain in the native structure
[33]. Hydrogen exchange pulse labeling resolved the kinetics of the hydrogen bond
formation and showed that the amides in the a-domain become significantly pro-
tected in a kinetic phase with varying time constants of 7 ms to 60 ms for the
different amides. The b-domain acquires protection in a later phase with a time
constant of around 400 ms [22]. The range of time constants observed for the for-
mation of protective structure in the a-domain apparently disagrees with kinetics
measured by optical detection. However, simulation of the hydrogen exchange
shows, when taking the folding kinetics of lysozyme during the exchange pulse at
pH 9.5 into account, that the experimental data are in agreement with the pro-
posed triangular folding model for lysozyme [111, 119]. The results on lysozyme
stress the importance of thorough comparison of data from different techniques.
In the case of apomyoglobin, hydrogen exchange pulse labeling was used to
identify the order of helix formation [41]. It was found that helices A, G, and H
and a part of helix B form very fast within the dead time of the experiment. The
rest of helix B forms stable structure at a later stage followed by helix C, the
CD loop, and helix E, which dock as a late event in the folding process [41]. Leg-
hemoglobin, which is a distant homolog of myoglobin (13% sequence identity),
has also been studied by hydrogen exchange pulse labeling [120]. Like apomyoglo-
bin, apoleghemoglobin forms a substantial amount of helical structure within the
dead time of an ordinary stopped-flow experiment. Interestingly, the hydrogen ex-
change pulse labeling experiment showed that helices E, G, and H are structured
in the burst phase intermediate whereas the A and B helices are not. It has thus
been demonstrated that though apomyoglobin and apoleghemoglobin both fold
through a compact intermediate with helical structure, the order in which the
helices form and dock are different [120]. These results have contributed to the dis-
cussion about evolutionary conservation in protein folding [121–124] and show
that the details of the folding process need not be evolutionary conserved.
On b-lactoglobulin, hydrogen exchange pulse labeling experiments have been
performed using both conventional quench-flow mixing and ultra-rapid mixing.
By this approach it was possible to resolve the kinetics of the hydrogen bond
formation from 240 ms to 1 s [125]. It has been found that during folding of b-
lactoglobulin some nonnative helical structure is formed [125, 126]. By hydrogen
18.5 Experimental Protocols 661
exchange labeling the location of this transient structure was mapped to the
N-terminal part of the protein, which forms a b-strand in the native structure. It
was suggested that this nonnative structure protects the N-terminal part of the
protein from aggregation during folding until most of the sheet, of which this
strand is a part, has formed.
For several other proteins, both well-defined (RNase A [112, 115, 127], barnase
[128], and RNase H from E. coli [129] and Thermus thermophilus [130]) and tran-
sient low-stability intermediates (protein G [131] and ACBP [132]) have been ob-
served by hydrogen exchange pulse labeling techniques.
18.5
Experimental Protocols
18.5.1
How to Determine Hydrogen Exchange Kinetics at Equilibrium
The following experiment can be used to measure directly the amide exchange ki-
netics from any protein state at equilibrium. Typically exchange of hydrogen is
measured with solvent deuterium, so any buffers used need to be deuterated be-
fore use.
where IðtÞ is the cross-peak intensity at time t after addition of D2 O to the lyophi-
lized protein, Ið0Þ is the cross peak intensity at time zero, IðyÞ is the final cross-
peak intensity, and k obs is the observed exchange rate constant. The time, t, is taken
662 18 Application of Hydrogen Exchange Kinetics to Studies of Protein Folding
as the midpoint of the NMR data acquisition for each data point. For amides with
incomplete exchange in the time course of the experiment, IðyÞ is omitted from
the fit. Values and standard deviations of IðyÞ; Ið0Þ, and k obs are obtained from
nonlinear fitting procedures, and the uncertainties obtained on peak integrals are
conveniently included.
18.5.2
Planning a Hydrogen Exchange Folding Experiment
two-mixer mode whereas the pulse labeling experiment require three mixers. The
easiest way of making proper mixing sequences is to use the manufactures soft-
ware, which calculates the aging times in the delay lines at the chosen flow rates.
In the pulse labeling experiment the apparatus must be operated in either of two
modes. In a continuous flow mode the folding time depends on the flow rate
and the volume of the first delay line. In interrupted mode the flow is stopped
when the first delay line has been filled with mixed protein/refolding buffer. In
this mode varying the length of the time the flow is stopped varies the folding
time.
In a folding/exchange competition experiment the volume of the solutions in
each step will typically be 1:10:5 (protein:refolding/exchange buffer:quench buffer).
In the pulse labeling hydrogen exchange experiment the mixing ratio will be
1:10:5:5 (protein:refolding buffer:exchange buffer:quench buffer).
18.5.3
Data Analysis
Transform the spectra and integrate each amide cross-peak. Choose a set of well-
resolved cross-peaks that show no protection against hydrogen exchange as refer-
ence peaks. Calculate the proton occupancy I according to Eq. (19). In a folding/
exchange competition experiment plot I as a function of pH and fit the data to
References 665
Eq. (30) or (31) depending on which assumptions apply to the system under inves-
tigation. In a hydrogen exchange pulse labeling experiment plot the data against
t fold and fit the data to one or more exponential decays.
Acknowledgments
We would like to thank Flemming Hofmann Larsen, Wolfgang Fieber, and Jens
Kaalby Thomsen for suggestions and for critically reading the manuscript.
References
1 Linderstrøm-Lang, K. (1955). application to bacitracin. FEBS Lett.
Deuterium exchange between peptides 49, 115–119.
and water. Chem. Soc. Spec. Publ. 2, 1– 8 Woodward, C. K. & Hilton, B. D.
20. (1979). Hydrogen exchange kinetics
2 Hvidt, A. (1955). Deuterium exchange and internal motions in proteins and
between ribonuclease and water. nucleic acids. Annu. Rev. Biophys.
Biochim. Biophys. Acta 18, 306–308. Bioeng. 8, 99–127.
3 Blout, E. R., Deloze, C. & 9 Bai, Y., Sosnick, T. R., Mayne, L. &
Asadourian, A. (1961). Deuterium Englander, S. W. (1995). Protein
exchange of water-soluble polypeptides folding intermediates: native-state
and proteins as measured by infrared hydrogen exchange. Science 269, 192–
spectroscopy. J. Am. Chem. Soc. 83, 197.
1895. 10 Arrington, C. B. & Robertson, A. D.
4 Hvidt, A. (1973). Isotope hydrogen (1997). Microsecond protein folding
exchange in solutions of biological kinetics from native-state hydrogen
macromolecules. In Dynamic aspects of exchange. Biochemistry 36, 8686–
conformation changes in biological 8691.
macromolecules (Sadron, C., ed), pp. 11 Chu, R. A., Takei, J., Barchi, J. J. &
103–115, Reidel and Dordrect, Bai, Y. (1999). Relationship between
Holland. the native-state hydrogen exchange
5 Richards, F. M. (1979). Packing and the folding pathways of barnase.
defects, cavities, volume fluctuations, Biochemistry 38, 14119–14124.
and access to the interior of proteins – 12 Parker, M. J. & Marqusee, S. (2001).
including some general-comments on A kinetic folding intermediate probed
surface-area and protein-structure. by native state hydrogen exchange. J.
Carlsberg Res. Commun. 44, 47–63. Mol. Biol. 305, 593–602.
6 Masson, A. & Wuthrich, K. (1973). 13 Sivaraman, T., Arrington, C. B. &
Proton magnetic resonance Robertson, A. D. (2001). Kinetics of
investigation of the conformational unfolding and folding from amide
properties of the basic pancreatic hydrogen exchange in native
trypsin inhibitor. FEBS Lett. 31, 114– ubiquitin. Nat. Struct. Biol. 8, 331–
118. 333.
7 Campbell, I. D., Dobson, C. M., 14 Roder, H., Wagner, G. &
Jeminet, G. & Williams, R. J. (1974). Wuthrich, K. (1985). Amide proton
Pulsed NMR methods for the exchange in proteins by EX1 kinetics:
observation and assignment of studies of the basic pancreatic trypsin
exchangeable hydrogens: inhibitor at variable p2H and
666 18 Application of Hydrogen Exchange Kinetics to Studies of Protein Folding
exchange in the molten globule and core? Trends Biochem. Sci. 18, 359–
native states of human alpha- 360.
lactalbumin. J. Mol. Biol. 253, 651– 45 Woodward, C. K. (1994). Hydrogen
657. exchange rates and protein folding.
35 Udgaonkar, J. B. & Baldwin, R. L. Curr. Opin. Struct. Biol. 4, 112–116.
(1995). Nature of the early folding 46 Bai, Y. (1999). Equilibrium amide
intermediate of ribonuclease A. hydrogen exchange and protein
Biochemistry 34, 4088–4096. folding kinetics. J. Biomol. NMR 15,
36 Jones, B. E. & Matthews, C. R. 65–70.
(1995). Early intermediates in the 47 Clarke, J. & Fersht, A. R. (1996). An
folding of dihydrofolate reductase evaluation of the use of hydrogen
from Escherichia coli detected by exchange at equilibrium to probe
hydrogen exchange and NMR. Protein intermediates on the protein
Sci. 4, 167–177. folding pathway. Fold. Des. 1,
37 Kotik, M., Radford, S. E. & Dobson, 243–254.
C. M. (1995). Comparison of the 48 Baldwin, R. L. (1996). On-pathway
refolding of hen lysozyme from versus off-pathway folding
dimethyl sulfoxide and guanidinium intermediates. Fold. Des. 1, R1–R8.
chloride. Biochemistry 34, 1714–1724. 49 Clarke, J., Hounslow, A. M.,
38 Buck, M., Radford, S. E. & Dobson, Bycroft, M. & Fersht, A. R. (1993).
C. M. (1994). Amide hydrogen Local breathing and global unfolding
exchange in a highly denatured state. in hydrogen exchange of barnase and
Hen egg-white lysozyme in urea. J. its relationship to protein folding
Mol. Biol. 237, 247–254. pathways. Proc. Natl. Acad. Sci. USA
39 Alexandrescu, A. T., Ng, Y. L. & 90, 9837–9841.
Dobson, C. M. (1994). Characterization 50 Matouschek, A., Kellis, J. T., Jr.,
of a trifluoroethanol-induced partially Serrano, L., Bycroft, M. & Fersht,
folded state of alpha-lactalbumin. J. A. R. (1990). Transient folding
Mol. Biol. 235, 587–599. intermediates characterized by
40 Alexandrescu, A. T., Evans, P. A., protein engineering. Nature 346,
Pitkeathly, M., Baum, J. & Dobson, 440–445.
C. M. (1993). Structure and dynamics 51 Fersht, A. R. (2000). A kinetically
of the acid-denatured molten globule significant intermediate in the folding
state of alpha-lactalbumin: a two- of barnase. Proc. Natl. Acad. Sci. USA
dimensional NMR study. Biochemistry 97, 14121–14126.
32, 1707–1718. 52 Takei, J., Chu, R. A. & Bai, Y. (2000).
41 Jennings, P. A. & Wright, P. E. Absence of stable intermediates on the
(1993). Formation of a molten globule folding pathway of barnase. Proc. Natl.
intermediate early in the kinetic Acad. Sci. USA 97, 10796–10801.
folding pathway of apomyoglobin. 53 Englander, S. W. (2000). Protein
Science 262, 892–896. folding intermediates and pathways
42 Radford, S. E., Buck, M., Topping, studied by hydrogen exchange. Annu.
K. D., Dobson, C. M. & Evans, P. A. Rev. Biophys. Biomol. Struct. 29, 213–
(1992). Hydrogen exchange in native 238.
and denatured states of hen egg-white 54 Clarke, J. & Itzhaki, L. S. (1998).
lysozyme. Proteins 14, 237–248. Hydrogen exchange and protein
43 Jeng, M. F. & Englander, S. W. folding. Curr. Opin. Struct. Biol. 8,
(1991). Stable submolecular folding 112–118.
units in a non-compact form of 55 Englander, S. W., Mayne, L., Bai, Y.
cytochrome c. J. Mol. Biol. 221, 1045– & Sosnick, T. R. (1997). Hydrogen
1061. exchange: the modern legacy of
44 Woodward, C. K. (1993). Is the slow Linderstrøm-Lang. Protein Sci. 6,
exchange core the protein folding 1101–1109.
668 18 Application of Hydrogen Exchange Kinetics to Studies of Protein Folding
19
Studying Protein Folding and Aggregation
by Laser Light Scattering
Klaus Gast and Andreas J. Modler
19.1
Introduction
The term laser light scattering is used here for recent developments in both static
light scattering (SLS) and dynamic light scattering (DLS). In principle, DLS can be
performed with conventional light sources [1], however, laser light is mandatory in
practice, and recent progress in laser technology has greatly improved the capabil-
ity of both DLS and SLS.
Static or ‘‘classical’’ light scattering, which attains molecular parameters from
the angular and concentration dependence of the time-averaged scattering inten-
sity, was one of the main tools for the determination of molecular mass in the mid-
dle of the last century. However by the 1970s the method had somewhat lost its
significance when it became possible to measure the molecular mass of proteins
and other small biological macromolecules with high precision by electrospray ion-
ization (ESI) or matrix-assisted laser desorption/ionization (MALDI) mass spec-
trometry. Furthermore these alternative methods required less material and less
careful preparation of the samples. SLS is now restoring its popularity due mostly
to significant technical improvements. Well-calibrated instruments equipped with
lasers and sensitive solid-state detectors allow precise measurements over a wide
range of particle masses (10 3 –10 8 kDa) using sample volumes of the order of
10 mL. The coupling with separation devices (e.g., size exclusion or other chro-
matographic techniques and field-flow fractionation) has provided the biomolec-
ular community with a powerful molecular analyzer, particularly for polydispersed
materials.
The advent of DLS in the early 1970s led to great expectations concerning the
study of dynamic processes in solution. A readily measurable quantity by this pro-
cedure is the translational diffusion coefficient of macromolecules from which the
hydrodynamic Stokes radius R S can be calculated. Up to the 1990s, however, the
method appeared to be experimentally much more difficult than SLS. Laboratory-
built or commercially available DLS photometers consisted of gas lasers, expensive,
huge correlation electronics and subsequent data evaluation was tedious because of
severe requirements on computation: powerful online computers were not gener-
ally available. The first ‘‘compact’’ devices (DLS-based particle sizers) were only ap-
plicable to strongly scattering systems. This situation has fundamentally changed
during the last 10 years. A sophisticated single-board correlator, which can be eas-
ily plugged into a desktop computer, is essentially the only additional element
needed to upgrade an SLS to a DLS instrument.
As a consequence of these developments, laser light scattering can now be con-
sidered as a standard laboratory method. Several compact instruments are on the
market, which are valuable tools for protein analysis. Nevertheless, for special
applications, including also particular studies of protein folding and aggregation,
a more flexible laboratory-built set-up assembled of commercially available compo-
nents is preferred.
We would like to stress that laser light scattering investigations are entirely non-
invasive and can be done under any solvent conditions. However, the measure-
ments must be done with solutions that are free of undesired large particles and
bubbles. Particular care has to be taken with proteins that absorb light in the visi-
ble region. It is worth mentioning that SLS measurements can also be performed
using conventional fluorescence spectrometers including stopped-flow fluores-
cence devices in the case of kinetic experiments.
Important information in the case of protein folding studies can be obtained es-
pecially from DLS, which yields the molecular dimensions of proteins in terms of
the Stokes radius. However, the combined application of SLS and DLS is very use-
ful for distinguishing changes in the dimensions of individual protein molecules
due to molecular folding or association.
In this review we will first deal with the measurement of hydrodynamic dimen-
sions, intermolecular interactions, and the state of association of proteins in differ-
ently folded states. The application of kinetic laser light scattering to the study of
folding kinetics is also discussed. A special section is dedicated to the study of pro-
tein aggregation accompanying misfolding. Useful protocols for some standard ap-
plications of light scattering to protein folding studies are listed in Section 19.6.
19.2
Basic Principles of Laser Light Scattering
19.2.1
Light Scattering by Macromolecular Solutions
Much of the experimental methodology, skilful practice, and basic theoretical ap-
proaches for static light scattering had already been developed in the first half of
the last century. The birth of dynamic light scattering can be dated to the middle
of the 1960s, and the next decade yielded much of the fundamental groundwork
on the principles and practice of DLS from researchers in the fields of physics,
macromolecular biophysics, biochemistry, and biology. This short introduction to
both methods is an attempt to supply fellow researchers who are not familiar with
these techniques with an intuitive understanding of the basic principles and the
19.2 Basic Principles of Laser Light Scattering 675
4p 4 n04 a 2
IS ¼ 2 I0 ð1Þ
l4 r
Fig. 19.1. Schematic diagram of the optical part of an light scattering instrument.
the primary beam and the aperture of the detector. The instantaneous intensity
fluctuates in time for nonfixed particles, like macromolecules in solution, due to
phase fluctuations f i ðtÞ ¼ ~q ~
r i ðtÞ caused by changes in their location ~
r i ðtÞ.
SLS measures the time average of the intensity, thus the term ‘‘time-averaged
light scattering’’ would be more appropriate, although ‘‘static light scattering’’ ap-
pears the accepted convention now. SLS becomes q-dependent for large particles
when light waves emitted from scattering elements within a individual particle
have noticeable phase differences.
DLS analyses the above-mentioned temporal fluctuations of the instantaneous
intensity of scattered light. Accordingly, DLS can measure several dynamic pro-
cesses in solution. It is evident that only those changes in the location of scattering
elements lead to intensity fluctuations that produce a sufficiently large phase shift.
This is always the case for translational diffusion of macromolecules. The motion
of segments of chain molecules and rotational motion can be studied for large
structures. Rotational motion of monomeric proteins and chain dynamics of un-
folded proteins are practically not accessible by DLS.
19.2.2
Molecular Parameters Obtained from Static Light Scattering (SLS)
The expression for the light scattering intensity from a macromolecular solution
can be derived from Eq. (1) by summation over the contributions of all macromo-
lecules in the scattering volume n and substituting the polarizability a by related
physical parameters. The latter can be done by applying the Clausius-Mosotti equa-
tion to macromolecular solutions, which leads to n 2 n02 ¼ 4p N 0 a, where n
and n 0 are the refractive indices of the solution and the solvent, respectively. N 0 is
the number of particles (molecules) per unit volume and can be expressed by
N 0 ¼ N A c/M. N A is Avogadro’s number, c and M are the weight concentration
and the molecular mass (molar mass) of the macromolecules, respectively. Using
19.2 Basic Principles of Laser Light Scattering 677
4p 2 n02 ðqn/qcÞ 2 n n
I ex ¼ 2 c M I0 ¼ H 2 c M I0 ð2Þ
l4NA r r
Hc 1
¼ þ 2A2 c ð4Þ
Rq M PðqÞ
predominant repulsive (covolume and electrolyte effects) and negative for predom-
inant attractive intermolecular interactions.
In general, both extrapolation to zero concentration and zero scattering angle
(q ¼ 0) are done in a single diagram (Zimm plot) for calculations of M from Eq.
(4). The primary mass ‘‘moment’’ or ‘‘average’’ obtained is the weight-average mo-
P P
lar mass M ¼ i c i M i / i c i in the case of polydisperse systems. This has to be
taken into consideration for proteins when monomers and oligomers are present.
Molar masses of proteins used in folding studies (i.e., M < 50 000 g mol1 ) can be
determined at 90 scattering angle because the angular dependence of R q is negli-
gible. Measurements at different concentration are mandatory, however, because
remarkable electrostatic and hydrophobic interactions may exist under particular
environmental conditions. In the following, the values of parameters measured at
finite concentration are termed apparent values, e.g., M app .
19.2.3
Molecular Parameters Obtained from Dynamic Light Scattering (DLS)
kT
RS ¼ ð6Þ
6p h D
19.2 Basic Principles of Laser Light Scattering 679
PL
where S ¼ i¼1 a i is the normalization factor. The weights a i ¼ c i M i ¼ n i M i2
reflect the c M dependence of the scattered intensity (see Eq. (2)). n i is the num-
ber concentration (molar concentration) of the macromolecular species. Accord-
ingly, even small amounts of large particles are considerably represented in the
measured g ð1Þ ðtÞ. The general case of an arbitrary size distribution, which results
in a distribution of D, can be treated by an integral
ð
2
gðtÞ ¼ aðDÞ eq Dt
dD S ð7bÞ
Equation (7b) has the mathematical form of a Laplace transformation of the dis-
tribution function aðDÞ. Thus, an inverse Laplace transformation is needed to
reconstruct aðDÞ or the related distribution functions cðDÞ and nðDÞ. This is an
ill-conditioned problem from the mathematical point of view because of the experi-
mental noise in the measured correlation function. However, there exist numerical
procedures termed ‘‘regularization’’, which allow a researcher to obtain stabilized,
‘‘smoothed’’ solutions. A widely used program package for this purpose is ‘‘CON-
TIN’’ [8]. Nevertheless, the distributions obtained can depend sensitively on the ex-
perimental noise and parameters used during data evaluation procedure in special
cases. Thus, it might be more appropriate to use simpler but more stable data eval-
uation schemes like the method of cumulants [9], which yields the z-averaged dif-
fusion coefficient D and higher moments reflecting the width and asymmetry of
the distribution. D can be obtained simply from the limiting slope of the logarithm
d
of g ð1Þ ðtÞ, viz. D ¼ q 2 ðlnjg ð1Þ ðtÞjÞt!0 . This approach is very useful for rather
dt
narrow distributions.
D, like M, exhibits a concentration dependence, which is usually written in the
form
DðcÞ ¼ D0 ð1 þ k D cÞ ð8Þ
19.2.4
Advantages of Combined SLS and DLS Experiments
Many of the modern laser light scattering instruments allow both SLS and DLS
to be measured in one and the same experiment. This experimental procedure is
more complicated, since the optimum optical schemes for SLS and DLS are differ-
ent. Briefly, DLS needs focused laser beams and only a much smaller detection ap-
erture can be employed because of the required spatial coherence of the scattered
light. This reduces the SLS signal and demands higher beam stability.
Combined SLS/DLS methods have two main advances for protein folding
studies. The first one concerns the reliability of measurements of the hydrody-
namic dimensions during folding or unfolding. The observed changes of R S could
partly or entirely result from an accompanying aggregation reaction, which can be
excluded if the molecular mass remains essentially constant.
The second advantage is the capacity to measure molecular masses of proteins in
imperfectly clarified solutions that contain protein monomers and an unavoidable
amount of aggregates. This situation is often met, for example during a folding re-
action or when the amount of protein is too small for an appropriate purification
procedure. In this case, SLS alone would measure a meaningless weight average
mass. The Stokes radii of monomers and aggregates are mostly well separated,
allowing the researcher to estimate the weighting a i for monomers and aggre-
gates fitting the measured g ð1Þ ðtÞ by Eq. (7a). Knowledge of the weighting allows
the contributions of monomers and aggregates to the total scattering intensity to
be distinguished and, therefore, the proper molecular mass of the protein to be
obtained.
19.3
Laser Light Scattering of Proteins in Different Conformational States – Equilibrium
Folding/Unfolding Transitions
19.3.1
General Considerations, Hydrodynamic Dimensions in the Natively Folded State
applied with great care and clearly contradicts the application of DLS to folding
studies.
For correct estimations of molecular dimensions in terms of R S before and after
a folding transition, extrapolation of D to zero protein concentration should be
done before applying Eq. (6). The concentration dependence of D indicated by k D
possibly changes during unfolding/refolding transitions as it is shown in Figure
19.2 for thermal unfolding of RNase T1. Strong changes in k D are often observed
(a)
R
(b)
(cm2 s–1)
D
c (g L–1)
Fig. 19.2. Heat-induced unfolding of ribonu- two-state transition yielding Tm ¼ 51:0 G
clease T1 in 10 mM sodium cacodylate, pH 7, 0:5 C and DHm ¼ 497 G 120 kJ mol1
1 mM EDTA. a) Temperature dependence of (average over all three fits). b) Concentration
the apparent Stokes radius at concentrations dependence of the diffusion coefficient D at
of 0.8 mg mL1 (open circles), 1.9 mg mL1 20 C (open square) and 60 C (filled square).
(filled squares), and 3.9 mg mL1 (filled dia- Linear fits to the data yield the diffusive con-
monds). The continuous lines were obtained centration dependence coefficient kD (see text).
by nonlinear least squares fits according to a
682 19 Studying Protein Folding and Aggregation by Laser Light Scattering
in the case of pH-induced denaturation. Apparent Stokes radii, R S; app , are mea-
sured if extrapolation is not possible due to certain experimental limitations.
19.3.2
Changes in the Hydrodynamic Dimensions during Heat-induced Unfolding
The first DLS studies of the thermal denaturation of proteins were reported by
Nicoli and Benedek [13]. These authors investigated heat-induced unfolding of
lysozyme in the pH range between 1.2 and 2.3 and different ionic strength. The
size transition coincided with the unfolding curve recorded by optical spectro-
scopic probes. The average radius increased by 18% from 1.85 to 2.18 nm. The
same size increase was also found for ribonuclease A (RNase A) at high ionic
strength. The proteins had intact disulfide bonds.
A good candidate for thermal unfolding/refolding studies is RNase T1, since the
transition is completely reversible even according to light scattering criteria [14].
The unfolding transition curves measured at three different protein concentrations
at pH 7 are shown in Figure 19.2a. The concentration dependence of D in the na-
tive and unfolded states is shown in Figure 19.2b. The increase of the diffusive
concentration dependence coefficient from 49 to 87 mL g1 on the transition to
the unfolded state indicates a strengthening of the repulsive intermolecular inter-
actions. The Stokes radii obtained after extrapolation to zero protein concentration
are 1.74 nm and 2.16 nm for the native and unfolded states, respectively. The cor-
responding increase in R S of 24% is somewhat larger than those for lysozyme and
RNase A. This might be due to the fact that RNase T1 differs in the number and
position of the disulfide bonds as compared with lysozyme and RNase A.
The thermal unfolding behavior of the wild-type l Cro repressor (Cro-WT) is
more complex, but it allows us to demonstrate the advantage of combined SLS/
DLS experiments. The native Cro-WT is active in the dimeric form. Spectroscopic
and calorimetric studies [15] (and references cited therein) revealed that thermal
unfolding proceeds via an intermediate state. The corresponding changes of the
apparent Stokes radius and the relative scattering intensity, which reflects changes
in the average mass, are quite different at low and high concentrations (Figure
19.3). At low concentration, Cro-WT first unfolds partly, but remains in the dimeric
form between 40 C and 55 C according to the increase in R S between 40 C and
55 C and the nearly constant scattering intensity. Above 55 C, both R S and the
relative scattering intensity decrease due to the dissociation of the dimer. At high
concentration, the increase of R S during the first unfolding step is much stronger
and is accompanied by an increase in the scattering intensity. This is due to the
formation of dimers of partly unfolded dimers. At temperatures above 55 C, dis-
sociation into monomers is indicated by the decrease of both R S and scattering in-
tensity. The transient population of a tetramer is a peculiarity of thermal unfolding
of Cro-WT.
Unfortunately, many proteins aggregate upon heat denaturation, thus prevent-
ing reliable measurements of the dimensions.
19.3 Laser Light Scattering of Proteins in Different Conformational States 683
RI
IT
T
Fig. 19.3. Heat-induced unfolding and association/
dissociation of Cro repressor wild-type at 1.8 mg mL1 (filled
squares) and 5.8 mg mL1 (open squares) in 10 mM sodium
cacodylate, pH 5.5.
19.3.3
Changes in the Hydrodynamic Dimensions upon Cold Denaturation
Since the stability curves for proteins have a maximum at a characteristic tempera-
ture, unfolding in the cold is a general phenomenon [16]. However, easily attain-
able unfolding conditions in the cold exist only for a few proteins. Cold denatura-
tion under destabilizing conditions was reported for phosphoglycerate kinase from
yeast (PGK) [17, 18] and barstar [19, 20]. The increases in R S upon cold denatura-
tion for PGK [18] and barstar (unpublished results) are shown in Figure 19.4. The
size increase is a three-state transition in the case of PGK, because of the indepen-
dent unfolding of the two domains in the cold. Both proteins aggregate on heating
preventing to compare the results with those for heat-induced unfolding.
684 19 Studying Protein Folding and Aggregation by Laser Light Scattering
R
R
0 5 10 15 20 25 30
T
Fig. 19.4. Relative expansion, RS; app /RS; ref , mL1 , in 20 mM sodium phosphate, pH 6.5,
upon cold denaturation for barstar (pseudo 10 mM EDTA, 1 mM DTT, 0.7 M GdmCl (filled
wild-type), 2.4 mg mL1 , in 50 mM Tris/HCl, squares). RS; ref is the Stokes radius in the
pH 8, 0.1 M KCl, 2.2 M urea (open circles) and reference state at the temperature of maximum
phosphoglycerate kinase from yeast, 0.97 mg stability under slightly destabilizing conditions.
19.3.4
Denaturant-induced Changes of the Hydrodynamic Dimensions
Before the advent of DLS, most data relating to the hydrodynamic dimensions of
proteins under strongly denaturing conditions had been obtained by measuring in-
trinsic viscosities [21]. In a pioneering DLS experiment on protein denaturation,
Dubin, Feher, and Benedek [22] studied the unfolding transition of lysozyme in
guanidinium chloride (GdmCl). These authors meticulously analyzed the transi-
tion at 31 GdmCl concentrations between 0 M and 6.4 M in 100 mM acetate buffer,
pH 4.2, at a protein concentration of 10 mg mL1 . It was found that R S; app in-
creases during unfolding by 45% and 86% in the case of intact and reduced disul-
fide bonds, respectively. Meanwhile, the hydrodynamic dimensions at high concen-
trations of GdmCl have been measured for many proteins lacking disulfide bonds
by using DLS. The results will be discussed below in connection with scaling (i.e.,
power) laws, which can be derived from this data. Here we consider briefly the in-
fluence of disulfide bonds and temperature on the dimensions of proteins in the
highly unfolded state. Such experiments were done with RNase A [23]. Figure 19.5
shows the temperature dependence of the relative dimensions (R S; app /R S; native Þ for
RNase A with intact and with reduced disulfide bonds. The slight compaction with
increasing temperature is typical for proteins in highly unfolded states: for exam-
ple, similar results are obtained with RNase T1 (Figure 19.5). Surprisingly, RNase
19.3 Laser Light Scattering of Proteins in Different Conformational States 685
R
R
T
Fig. 19.5. Temperature dependence of the 1 mM EDTA, 6 M GdmCl (open squares), and
relative expansion RS; app /RS; nat of RNase A and RNase A (broken disulfide bonds), 2.9 mg
RNase T1 in the unfolded state at high mL1 , in 50 mM MES, pH 6.5, 1 mM EDTA,
concentrations of GdmCl. RNase T1, 2.2 mg 6 M GdmCl (filled squares). RNase A, 2.5 mg
mL1 , in 10 mM sodium cacodylate, pH 7, mL1 , with broken disulfide bonds is highly
1 mM EDTA, 5.3 M GdmCl (open circles), unfolded even in the absence of GdmCl (filled
RNase A, 2.9 mg mL1 , in 50 mM MES, pH 5.7, diamonds).
A without disulfide bonds is already unfolded in the absence of GdmCl and has
larger dimensions than RNase A with intact disulfide bonds in 6 M GdmCl.
19.3.5
Acid-induced Changes of the Hydrodynamic Dimensions
Many proteins can be denatured by extremes of pH, and most of the studies on
this have focused on using acid as denaturant. Proteins respond very differently
to acidic pH. For example, lysozyme remains native-like, some other protein
adopt the molten globule conformation with hydrodynamic dimensions about
10% larger than in the native state, and many proteins unfold into an expanded
conformation. Fink et al. [24] have introduced a classification scheme for the
unfolding behavior under acidic conditions. The molecular mechanisms of acid
denaturation have been studied in the case of apomyoglobin by Baldwin and co-
workers [25]. The dimensions of selected acid-denatured proteins are listed below
in Table 19.1. Some proteins are more expanded in the acid-denatured than in the
chemical-denatured state, e.g., PGK (R S; unf /R S; native ¼ 2:49) and apomyoglobin
(R S; unf /R S; native ¼ 2:05). Furthermore, those proteins exhibit strong repulsive inter-
molecular interactions, as reflected by large values of k D (e.g., 550 mL g1 for PGK
at pH 2). This means, that D app is twice as large as D0 at a concentration of about
2 mg mL1 . This underlines the importance of extrapolation of the measured
quantity to zero protein concentration.
686 19 Studying Protein Folding and Aggregation by Laser Light Scattering
Tab. 19.1. Stokes radii RS , relative compactness RS /RS; native , and diffusive concentration
dependence coefficients kD of selected proteins in differently folded states determined by DLS.
Bovine a-lactalbumin
Native state (holoprotein) 1.88 7 29
Unfolded state (5 M GdmCl) 2.46 1.31 9 29
A-state (pH 2, 50 mM KCl) 2.08 1.11 15 29
Molten globule (neutral pH) 2.04 1.09 62 29
Molten globule (15% v/v TFE) 2.11 1.12 100 33
TFE-state (40% v/v TFE) 2.25 1.20 10 33
Kinetic molten globule 1.99 1.06 60 33
PGK
Native state 2.97 0.8 44
Unfolded state (2 M GdmCl) 5.66 1.91 – 44
Cold denatured state (1 C, 0.7 M GdmCl) 5.10 1.72 – 44
Acid denatured state (pH 2) 7.42 2.50 552 34
TFE-state (50% v/v TFE, pH 2) 7.76 2.61 1030 34
Apomyoglobin
Native state 2.09 26 30
Acid denatured (U-form, pH 2) 4.29 2.05 520 30
Molten globule (I-form, pH 4) 2.53 1.21 104 30
RNase T1
Native state 1.74 49 14
Unfolded state (5.3 M GdmCl) 2.40 1.38 – 14
Heat denatured (60 C) 2.16 1.24 87 14
RNase A
Native state 1.90 A0 56
Unfolded state (6 M GdmCl, nonreduced SS) 2.60 1.37 – 56
Unfolded state (6 M GdmCl, reduced SS) 3.14 1.65 – 56
TFE-state (70% v/v TFE) 2.28 1.20 15 33
Barstar
Native state 1.72 5 87
Unfolded state (6 M urea) 2.83 1.65 50 87
Cold-denatured state (1 C, 2.2 M urea) 2.50 1.45 3 87
19.3.6
Dimensions in Partially Folded States – Molten Globules and Fluoroalcohol-induced
States
From the wealth of partly folded states existing for many proteins under different
destabilizing conditions, DLS data obtained for some particular types will be con-
sidered here. Molten globule states [26–28] have been most extensively studied
19.3 Laser Light Scattering of Proteins in Different Conformational States 687
(see Chapter 23). DLS data on the hydrodynamic dimensions were published for
the molten globule states of two widely studied proteins, a-lactalbumin [29] and
apomyoglobin [30]. The results including data for the kinetic molten globule of a-
lactalbumin are shown in Table 19.1. Data also exist on the geometric dimensions
in terms of the radius of gyration, R G , for both a-lactalbumin [31] and apomyoglo-
bin [30, 32] in different conformational states.
Only a few DLS studies on the hydrodynamic dimension of proteins in aqueous
solvents containing structure-promoting substances like trifluoroethanol (TFE) and
hexafluoroisopropanol (HFIP) have been reported so far. The Stokes radii at high
fluoroalcohol content, at which the structure stabilizing effect saturates, are very
different for various proteins. An increase in R S of only about 20% was found for
a-lactalbumin and RNase A [33] with intact disulfide bonds. By contrast, PGK at
pH 2 and 50% (v/v) TFE has a Stokes radius R S ¼ 7:76 nm exceeding that of the
native state by a factor of 2.6 [34]. According to the high helical content estimated
from CD data and the high charge according to the large value of k D it is conceiv-
able that the entire PGK molecule consists of a long flexible helix. Fluoroalcohols
may have a different effect at low volume fractions. In the case of a-lactalbumin, a
molten globule-like state was found at 15% (v/v) TFE [33]. This state revealed a
native-like secondary structure and a Stokes radius 10% larger than that of native
protein. Furthermore, strong attractive intermolecular interactions were indicated
by a large negative diffusive concentration dependence coefficient, k D ¼ 100 mL
g1 , which result presumably from the exposure of hydrophobic patches in this
state. k D was close to zero both in the absence and at high percentage of TFE.
19.3.7
Comparison of the Dimensions of Proteins in Different Conformational States
19.3.8
Scaling Laws for the Native and Highly Unfolded States, Hydrodynamic Modeling
globular proteins in the native state and also for proteins lacking disulfide bonds in
the unfolded state induced by high concentrations of GdmCl or urea. Scaling (i.e.,
‘‘power’’) laws for native globular proteins have been published by several authors.
1/3
Damaschun et al. [37] obtained the relation R S ½nm ¼ 0:362 N aa on the basis of
the Stokes radii calculated for more than 50 globular proteins deposited in the pro-
tein databank. Uversky [38] found the dependence R S ½nm ¼ 0:0557 M 0:369 using
experimental data from the literature and Stokes radii measured with a carefully
calibrated size exclusion column (this is consistent with R S ½nm ¼
0:369
0:325 N aa for a mean residue weight of 119.4 g mol1 ). An advantage of DLS is
that no calibration of the methods is needed. The relation between R S and N aa de-
0:29
termined by Wilkins et al. [39] is R S ½nm ¼ 0:475 N aa . The scaling laws for
the native state are useful for the classification of proteins for which the high-
resolution structure is not known or cannot be obtained. Deviation from the scal-
ing law indicates that the protein has no globular shape or does not adopt a com-
pactly folded structure under the given environmental conditions.
In contrast to the native state, proteins in unfolded states populate a large en-
semble of configurations, which are restricted only by the allowed f; c angles and
nonlocal interactions of the polypeptide chain. In the absence of (or at balanced)
nonlocal interactions (i.e., in the unperturbed or theta state), polypeptide chains
behave as random chains and their conformational properties can be described by
the rotational isomeric state theory [40]. Whether or not unfolded proteins are in-
deed true random chains is a matter of debate [41]. The dimension of a denatured
protein is an important determinant of the folding thermodynamics and kinetics
[42]. Thus, the chain length dependence of the hydrodynamic dimension in the
form of scaling laws is an important experimental basis for the characterization of
unfolded polypeptide chains.
A first systematic analysis of the chain length dependence of the hydrodynamic
dimensions was done by Tanford et al. [43] by measuring the intrinsic viscosities
of unfolded proteins. The Stokes radii measured by DLS [44] for 12 proteins dena-
tured by high concentrations of GdmCl are shown in Figure 19.6. From the data
0:5
the scaling law R S ½nm ¼ 0:28 N aa was obtained. Scaling laws were also pub-
lished by other authors. Uversky [38] derived R S ½nm ¼ 0:0286 M 0:502 (corre-
0:502
sponding to R S ½nm ¼ 0:315 N aa ) from Stokes radii measured by size exclu-
0:57
sion chromatography. Wilkins et al. [39] found R S ½nm ¼ 0:221 N aa using
Stokes radii mostly measured by pulse field gradient NMR. By analysing published
Stokes radii measured by different methods Zhou [42] obtained the relation
0:522
R S ½nm ¼ 0:2518 N aa .
Two aspects that are closely related to measurements of the hydrodynamic di-
mensions of proteins are hydrodynamic modeling [45, 46] and the hydration prob-
lem [47, 48]. Particularly for globular proteins, both are important for the link be-
tween high-resolution data from X-ray crystallography or NMR and hydrodynamic
data. Protein hydration appears as an adjustable parameter in recent hydrodynamic
modeling procedures [49]. The average values over different proteins are 0.3 g
water per g protein or in terms of the hydration shell 0.12 nm. Model calculations
are also in progress for unfolded proteins [42].
19.4 Studying Folding Kinetics by Laser Light Scattering 689
N
Fig. 19.6. Stokes radii RS of 12 proteins 6: glyceraldehydes-3-phosphate dehydrogenase,
unfolded by 6 M GdmCl in dependence on the 7: aldolase, 8: pepsinogen, 9: streptokinase,
number of amino acid residues Naa . The linear 10: phosphoglycerate kinase, 11: bovine serum
0:5
fit yields the scaling law RS ½nm ¼ 0:28 Naa . albumin, 12: DnaK. Disulfide bonds, if existing
1: apocytchrome c, 2: staphylokinase, 3: RNase in native proteins, were reduced.
A, 4: apomyoglobin, 5: chymotrypsinogen A,
19.4
Studying Folding Kinetics by Laser Light Scattering
19.4.1
General Considerations, Attainable Time Regions
First, unlike other methods including SLS, the enhancement of the signal is not
sufficient to increase the (S/N), since the measured signal is itself a statistical
quantity. At light levels above the shot-noise limit the signal-to-noise ratio of the
measured time correlation function depends on the ratio of T A to the correlation
time t C of the diffusion process by ðS/NÞ ¼ ðT A /t C Þ 1/2. t C is of the order of tens
of microseconds for proteins, leading to a total acquisition time T A @ 5–10 s in
order to achieve an acceptable (S/N). This time can be shortened by splitting it
into N S ‘‘shots’’ in a stopped-flow experiment. N S ¼ 100 is an acceptable value
leading to acquisition times of 50–100 ms during one ‘‘shot.’’ This corresponds to
the time resolution in the case of the stopped-flow technique.
The second limiting factor becomes evident when the continuous-flow method is
used. Excellent time resolutions have been achieved with this method in the case
of fluorescence measurements [52] and small-angle X-ray scattering [53, 54]. Con-
tinuous-flow experiments are especially useful for slow methods, since the time
resolution depends only on the aging time between mixer and detector and not
on the speed of data acquisition. Increasing the flow speed in order to get short
aging times also reduces the resting time of the protein molecules within the scat-
tering volume. However, to avoid distortions of the correlation functions, this time
must be longer than the correlation time corresponding to the diffusional motion
of the molecules. This limits the flow speed to about 15 cm s1 under typical exper-
imental conditions [51], resulting in aging times of about 100 ms. The benefit of
continuous-flow measurements over stopped-flow measurements does not exist
for DLS. Thus, DLS is only suitable for studying late stages of protein folding in
the time range > 100 ms. However, this time range is important for folding and
oligomerization or misfolding and aggregation (see Section 19.5). Two examples
for the compaction of monomeric protein molecules during folding will be now
be considered.
19.4.2
Hydrodynamic Dimensions of the Kinetic Molten Globule of Bovine a-Lactalbumin
R 2.46 nm
(a)
R
R
(cm2 s–1)
(b)
R
D
c (mg mL–1)
Fig. 19.7. a) Kinetics of compaction of bovine coefficient D for the apoform of bovine a-
a-lactalbumin after a GdmCl concentration lactalbumin under native conditions (filled
jump from 5 M to 0.5 M, final protein squares), in the presence of 5 M GdmCl (filled
concentration 1.35 mg mL1 , 50 mM sodium diamonds), and for the kinetic molten globule
cacodylate, pH 7, 50 mM NaCl, 2 mM EGTA, (open circles). The buffer was same as in (a).
T ¼ 20 C. The data can be fitted by a single The Stokes radii obtained from D extrapolated
exponential with a decay time t ¼ 43 G 6 s. to zero concentration are shown.
b) Concentration dependence of the diffusion
19.4.3
RNase A is Only Weakly Collapsed During the Burst Phase of Folding
In the previous section, it was demonstrated that a-lactalbumin collapses very rap-
idly into the molten globule with dimensions close to that of the native state. A dif-
692 19 Studying Protein Folding and Aggregation by Laser Light Scattering
t
Fig. 19.8. Kinetics of compaction of RNase A after a GdmCl
concentration jump from 6 M to 0.67 M, final protein
concentration 1.9 mg mL1 , 50 mM sodium cacodylate, pH 6,
T ¼ 10 C. The single exponential fit yields a decay time
t ¼ 28 G 5 s.
ferent folding behavior was observed for RNase A [56]. The Stokes radius of RNase
is 2.56 nm in the unfolded state at 6 M GdmCl and decreases only to 2.34 nm dur-
ing the burst phase (Figure 19.8). Most of the compaction towards the native state
occurs at later stages in parallel with the final arrangement of secondary structure
and the formation of tertiary structure.
19.5
Misfolding and Aggregation Studied by Laser Light Scattering
19.5.1
Overview: Some Typical Light Scattering Studies of Protein Aggregation
Misfolding and aggregation of proteins have been regarded for a long time as un-
desirable side reactions and the importance of these processes became evident only
recently in connection with recombinant protein technology and the observation
that the development of particular diseases correlates with protein misfolding and
misassembly [57]. The basic principles of protein aggregation have been consid-
ered for example by De Young et al. [58].
Light scattering is the most sensitive method in detecting even small amounts of
aggregates in macromolecular solutions. From the numerous applications of light
scattering to aggregation phenomena only a few selected applications to proteins
will be discussed briefly before turning to the more special phenomenon of the
combined processes of misfolding and aggregation.
19.5 Misfolding and Aggregation Studied by Laser Light Scattering 693
19.5.2
Studying Misfolding and Amyloid Formation by Laser Light Scattering
changes in secondary structure (CD, FTIR, NMR) and methods giving evidence of
the morphology of the growing species such as electron microscopy (EM) and/or
atomic force microscopy (AFM). Nevertheless, we will concentrate mainly upon
the contribution from the light scattering studies in the following.
Amyloid formation comprises different stages of particle growth. The informa-
tion that can be obtained from light scattering experiments strongly depends on
the quality of the initial state [68]. A starting solution containing essentially the
conversion competent monomeric species is a prerequisite for studies of early
stages involving the formation of defined oligomers, later called critical oligomers,
and protofibrillar structures. These structures became interesting recently because
of their possible toxic role. The size distributions and the kinetics of formation of
these rather small particles can be characterized very well by light scattering. A
careful experimental design (e.g., multi-angle light scattering or model calculations
regarding the angular dependence of the scattered light intensity) is needed for the
characterization of late products such as large protofibrils or long ‘‘mature’’ fibrils.
The problems involved in light scattering from long fibrillar structures have been
discussed in a review article by Lomakin et al. [69]. Quantitative light scattering
studies of fibril formation have been reported for a few protein systems. Some of
them will be discussed now.
(a)
M t M
R t R
t
(b)
R t R
M t M
(c)
M t M
R R
Fig. 19.9. Amyloid assembly of PGK and Stokes radius (circles) during assembly of
recombinant Syrian hamster PrP 90a232 . recombinant Syrian hamster PrP 90a232 ,
a) Relative increases in mass (squares) and c ¼ 0:59 mg mL1 , in 20 mM sodium acetate,
Stokes radius (circles) during assembly of pH 4.2, 50 mM NaCl, 1 M GdmCl, 20 C.
PGK, c ¼ 2:7 mg mL1 , into fibrillar structures Structural conversion was initiated by a GdmCl
in 10 mM HCl, pH 2, 9.1 mM sodium trichloro- concentration jump from 0 to 1 M. c)
acetate, T ¼ 20 C. Structural conversion was Dimensionality plots, MðtÞ/Mmon versus
initiated by adding sodium trichloroacetate. RS ðtÞ/RS; mon for the data shown in (a) and (b)
Mmon and RS; mon are the molecular mass and for PGK (squares) and Syrian hamster PrP 90a232
the Stokes radius of the monomer, respectively. (circles), respectively. The fit for the data of
b) Relative increases in mass (squares) and PGK corresponds to a dimensionality of 1.6.
19.5 Misfolding and Aggregation Studied by Laser Light Scattering 697
(a)
A
(b)
A
[79]. The subsequent slow increases in size and mass result from the formation of
protofibrillar structures [79]. The change in particle morphology is again indicated
by a kink in the dimensionality plot (Figure 19.9c).
The concentration dependence of the growth process revealed remarkable ki-
netic differences as compared to PGK according to the estimated apparent reac-
tion order of 3. The differences in the formation of the critical oligomers are
well documented by size exclusion chromatography (Figure 19.10). Different oligo-
meric states can be detected during aggregation of PGK, while only monomers and
the critical oligomer (octamer) are populated in the case of ShaPrP 90a232 . Accord-
ingly, the octamer is the smallest stable oligomer. Complementary FTIR and CD
698 19 Studying Protein Folding and Aggregation by Laser Light Scattering
studies [79] have led to the conclusion that the misfolded secondary structure is
stabilized within the octamer. The sole appearance of monomers and octamers
and/or higher oligomers tells us that the dimensionality of the first growth process
in the case of ShaPrP 90a232 (Figure 19.9c) is due to changes in the population of
two discrete species.
Kinks in the dimensionality plot are also observed when various growth stages
are not clearly separated in plots of M and R S versus time (Figure 19.9a,b) because
of comparable growth rates. This is a clear advantage of this data representation.
However, one must be careful in relating the (apparent) dimensionality derived
from these plots to simple geometrical models.
19.6
Experimental Protocols
19.6.1
Laser Light Scattering Instrumentation
tant feature of APDs is the much higher quantum yield (>50%) within the wave-
length range of laser light between 400 and 900 nm.
The expense of a light scattering system depends essentially on the type of the
laser and the complexity of the detection system. For example, simple systems de-
tect the light only at 90 or in the nearly backward scattering direction, while so-
phisticated devices are able to monitor the scattered light simultaneously at many
selectable scattering angles. An important feature is the construction of the sample
cell holder, which should allow the use of different cell types. As discussed earlier,
for folding studies and molecular weight determinations of proteins detection
at one scattering angle is sufficient. Multi-angle detection is recommended for
studies of protein assembly, when the size of the formed particles exceeds about
50 nm.
A criterion for choosing the appropriate cell type is the decision between batch
and flow experiments and the available amount of protein. Batch experiments re-
quire the lowest amount of substance, since standard microfluorescence cells with
volumes of the order of 10 mL can be used. These cells have the further advantage
that the protein concentration can be measured in the same cell. Flow-through
cells, which are essential for stopped-flow experiments and on-line coupling to par-
ticle separation devices, like HPLC, FPLC, and field-flow fractionation, are also use-
ful for batch experiments in the following respect. A direct connection of these cell
to a filter unit are the best way to obtain perfectly cleaned samples.
The cells must be cleaned first by flushing a large amount of water and solvent
through the filter and the cell. The protein sample is then applied without remov-
ing the filter unit. Several cell types can be used including standard fluorescence
flow-through cells, sample cells used in stopped-flow systems, or special flow-
through cells provided by manufacturers of light scattering equipments. Most of
these flow cells have the disadvantage of a restricted angular range. An exception
is the cell used in the Wyatt multiangle laser light scattering (MALLS) device. Cy-
lindrical cells in connection with an index matching bath have to be used for pre-
cise measurements at different, particularly small scattering angles.
In the case of SLS experiments, measurements of the photocurrent with subse-
quent analog-to-digital conversion as well as photon-counting techniques are used.
The latter technique is preferred in DLS experiments. An appropriate correlation
electronics is no longer a problem for modern dynamic light scattering devices
since practically ideal digital correlators can be built up.
The minimum protein concentration needed in a DLS experiment depends on
the molecular mass and the optical quality of the instrument, particularly on the
available laser power. But, even at sufficiently high laser power of about 0.5 W
the experiments become impractical below a certain concentration. This happens
when the excess scattering of the protein solution falls below the scattering of
pure water, e.g., for a protein with M ¼ 10 kDa at concentrations below 0.25 mg
mL1 . Thus, DLS experiments can be done without special efforts at concentra-
tions satisfying the condition c M b 5 kDa mg mL1 . For SLS experiments the
value of c M can be about one order of magnitude lower.
Removal of undesired contaminations, such as dust, bubbles, or other large par-
700 19 Studying Protein Folding and Aggregation by Laser Light Scattering
ticles, is a crucial step in preparing samples for light scattering experiments. Large
particles can readily removed by membrane filters having pore sizes of about 0.1
mm or by centrifugation at 10 000 g or higher, while size-exclusion chromatography
or similar separation techniques have to be applied when a protein solution con-
tains oligomers or small aggregates only slightly larger than the monomeric pro-
tein. Fluids used to clean the scattering cell must also be free of all particulates.
In an SLS experiment, the time-averaged scattering intensities of solvent, solu-
tion, and reference sample (scattering standard) have to be measured. Data evalua-
tion has to be done according to Eqs (3) and (4). Commercial instruments are in
general equipped with software packages that generate Zimm plots according to
Eq. (4). Additionally, the software contains the Rayleigh ratios for scattering stan-
dards at the wavelength of the instrument. A widely used scattering standard is tol-
uene. The Rayleigh ratios of toluene and other scattering standards at different
wavelength are given in Ref. [5]. Instrument calibration can also be done using an
appropriate protein solution. However, the protein must be absolutely monomeric
under the chosen solvent conditions.
DLS measurements demand slightly more experience concerning experiment
duration and data evaluation. Calibration is not needed. The basis for obtaining
stable and reliable results is an adequately measured correlation function. A visual
inspection of this primary data is recommended before any data evaluation is done.
The signal-to-noise ratio that is needed depends on the complexity of the system
under study. For example, short measurement times are acceptable only in the
case of unimodal narrow size distributions, when a noisy correlation function can
be well fitted by a single exponential. In practise, two data evaluation schemes are
mainly used, the method of cumulants [9] yielding an average Stokes radius and
the second and third moments of the size distribution and/or the approximate re-
construction of the size distribution by an inverse Laplace transformation, mostly
using the program CONTIN [8]. The corresponding software packages are avail-
able from the author or are provided by the manufacturers of the instruments.
CONTIN allows the reconstruction of a distribution function in terms of scattering
power, weight concentration, or number (molar) concentration. Though the latter
two types are more instructive, they should only be calculated when a reasonable
particle shape can be anticipated within the considered size range.
For calculations of R S from DLS data via Eq. (6) the dynamic viscosity h of the
solvent must be known. h can be obtained from the kinematic viscosity n (e.g.,
measured by an Ubbelohde type viscometer) and the density r (measured by a dig-
ital densitometer, e.g., DMA 58, Anton Paar, Austria) by h ¼ n r.
A very useful option for light scattering instruments is the on-line coupling
with FPLC, HPLC, or field-flow fractionation (FFF) and a concentration detector,
which might be a UV absorption monitor or a refractive index detector. This allows
direct measurements of M during the flow. In the case of known qn/qc for the
particular solvent conditions, the molecular mass can be obtained from the out-
put signals of the SLS and refractive index detectors by M ¼ k e ðqn/qcÞ1
ðoutputÞ SLS /ðoutputÞ RI , where k e is the instrument calibration constant. A parallel
estimation of R S is possible, when the scattering is strong enough allowing a suffi-
ciently precise DLS experiment during a few minutes.
A further option is the coupling to a mixer allowing kinetic light scattering ex-
periments. Two schemes can be used. Either a mixing device is coupled to a light
scattering apparatus equipped with a flow cell [66] or the light scattering apparatus
is built-up around a stopped-flow device originally constructed for fluorescence de-
tection [56].
19.6.2
Experimental Protocols for the Determination of Molecular Mass and Stokes Radius
of a Protein in a Particular Conformational State
strument provides an average over time T of the scattered intensity I T and a set of
N normalized correlation functions g ð2Þ ðt; T A Þ calculated during an accumulation
time T A and is equipped with the software to reconstruct the size distribution
(CONTIN or similar packages).
Protocol 1
1. Prepare the protein solution of the required concentration (suitable range 1–5
mg mL1 ).
2. Wet the Anotop filter with distilled water and connect it to the flow-through
cell (do not disconnect until the end of the experiment).
3. Flush at least 5 mL distilled water through filter and cell.
4. Insert the cell into the light scattering instrument and check for purity (the
scattering intensity must be very low and ‘‘spikes’’ in intensity should be to-
tally absent. Otherwise, repeat step 3.
5. Fill a syringe with 1–2 mL buffer and flush it through filter and cell (note that
the actual dead volume of filter and cell is larger than the specified volume of
the cell).
6. Insert the cell into the light scattering instrument and measure the scattering
intensity of the buffer, I T; buffer . In some cases, it could be useful to run a short
DLS experiment in order to check that the baseline is flat.
7. Measure the scattering intensity of the reference sample (toluene) in a separate
cuvette of similar geometry.
19.6 Experimental Protocols 703
8. Insert the flow-through cell with buffer into the UV spectrometer and measure
the blank spectrum (note that the center height of the cell fits both the SLS/
DLS and UV spectrophotometer).
9. Measure the refractive index of the buffer with the Abbe-type refractometer.
10. Flush about 1 mL protein solution slowly through filter and cell in order to re-
place the buffer by the protein solution (the protein solution that has passed
filter and cell can be used to prepare a sample of lower concentration).
11. Measure the absorption spectrum of the protein solution in the flow-through
cell, subtract the blank spectrum, and calculate the actual protein concentra-
tion in the cell on the basis of known extinction coefficients.
12. Insert the cell into the SLS/DLS instrument and start data acquisition. It is rec-
ommended to record the correlation function not in a single long run, but
rather by superposition of N short runs (of about 10 s). This enables one to
get a better overview about the stability and quality of the sample. Further-
more, good acquisition software allows to discard distorted runs. Though a cor-
relation function is clearly built up within 10 s, it is mostly too noisy for ade-
quate data evaluation.
13. Store all data and prepare data evaluation by entering additional parameters
like temperature, refraction index, and viscosity.
14. Data evaluation, especially estimations of size distribution, should be done im-
mediately giving the chance to continue the experiment. This could be useful
if the resulting size distribution is still very unstable against variation in data
evaluation parameters.
15. The size distribution should consist of one dominating peak in the present
case. Otherwise, the protein is strongly prone to aggregation. The influence of
stable aggregates of clearly distinct size can be eliminated in many cases as
discussed at the end of Section 19.2.
16. Calculate M using Eqs (2), (3), and (4), the Rayleigh ratio of the reference
sample (the value depends on the wavelength used in the instrument), and
qn/qc ¼ 0:19 mL g1 .
Although Scheme A is superior from the light scattering point of view, a rather
large amount of protein is needed. This can be circumvented by using ultra-micro
cells. Here, we repeat only those steps that concern the preparation of buffer and
protein solutions. Two approaches are applicable.
1. Insert a dry filter membrane into the filter holder and wet the filter with buffer
and filter about 100 mL buffer.
2. Flush the fluorescence cell with a strong nitrogen stream in order to remove
any dust.
3. Filter about 50 mL buffer into the cell and check for purity in the light scattering
instrument. Do the same measurements with the buffer and reference sample
as indicated under Scheme A. (Empty the cell, wash is with distilled water, dry it
with ethanol (Uvasol) and nitrogen and repeat steps 2 and 3 if dust or bubbles
can be detected.)
4. Empty the cell carefully (washing and drying is not necessary) and filter about
25 mL protein solution into the cell.
5. Close the cell, determine the concentration and measure SLS and DLS as under
Scheme A.
6. This procedure is nearly as good as filtration according to Scheme A, but the
presence of dust and bubbles cannot be totally excluded.
This procedure has to be applied when only about 15 mL of protein solution are
available and differs from Scheme B1 only by step 4. The protein solution centrifu-
gated about 15 min at 10 000 g or more has to be transferred into the scattering
cell. An alternative is careful centrifugation within the scattering cell (note that
ultra-micro cells are rather expensive!).
Protocol 2
Acknowledgments
The authors are grateful to Stephen Harding, University Nottingham, for valuable
comments and suggestions.
References
20
Conformational Properties of Unfolded Proteins
Patrick J. Fleming and George D. Rose
20.1
Introduction
20.1.1
Unfolded vs. Denatured Proteins
The term unfolded protein is generic and inclusive, and it can range from protein
solutions in harsh denaturants to protein subdomains that undergo transitory ex-
1
Fraction Unfolded
0
[Denaturant]
Fig. 20.1. The folding transition. The folding experiment [37], the population is followed by a
reaction of a typical, small biophysical protein conformational probe (e.g., circular dichroism
is a highly cooperative, all-or-none transition. or fluorescence) as a function of denaturant
At the transition midpoint, half the ensemble concentration. Upon addition of sufficient
is folded and half is unfolded; the population denaturant, the probe signal reaches a plateau,
of partially folded/unfolded molecules is indicating that the transition is complete.
negligible. In this idealized plot of an actual
cursions from their native format via spontaneous fluctuations. While conceptually
complete, this range is too diverse to be practically useful, and it requires further
specification. Accordingly, the field has focused more specifically on denatured pro-
teins, the population of unfolded conformers that can be studied at equilibrium
under high concentrations of denaturing solvents, high temperature, high pres-
sure, high/low pH, etc. Early experiments of Ginsburg and Carroll [37] and Tan-
ford [93] demonstrate that such conditions can give rise to a defined equilibrium
population in which the unfolding transition is complete (Figure 20.1). In this
chapter, we use both terms and rely on the context for specificity.
20.2
Early History
The fact that protein molecules can undergo a reversible disorder T order transi-
tion originated early in the last century, in ideas proposed by Wu [29, 101] and
Mirsky and Pauling [59]. Both papers propose that a theory of protein structure
is tantamount to a theory of protein denaturation. In particular, these authors
recognized that many disparate physical and chemical properties of proteins are
abolished coordinately upon heating. This was unlikely to be mere coincidence.
Both Wu and Mirsky and Pauling hypothesized that such properties are a conse-
quence of the protein’s structure and are abolished when that structure is melted.
Their hypothesis was later confirmed by Kauzmann and Simpson [86], at which
712 20 Conformational Properties of Unfolded Proteins
point the need for an apt characterization of the melt became clear, and protein
denaturation emerged as a research discipline.
A widely accepted view assumes that unfolded polypeptide chains can explore
conformational space freely, with constraints arising only from short-range local re-
strictions and longer range excluded volume effects. To a good first approximation,
short-range local restrictions refer to repulsive van der Waals interactions between
sequentially adjacent residues (i.e., steric effects) captured by the well-known Ram-
achandran map for a dipeptide [70]. Longer range excluded volume effects also re-
fer to repulsive van der Waals interactions, in this case those between nonbonded
atoms that are distant in sequence but juxtaposed in space as the chain wanders at
random along a Brownian walk in three dimensions [32, 91]. This random coil
model has conditioned most of the thinking in the field.
It is important to realize that the random coil model need not imply an absence
of residual structure in the unfolded population. Kauzmann’s famous review raised
the central question about structure in the unfolded state explicitly [45]:
20.3
The Random Coil
A chain molecule is a freely jointed random coil if it traces a random walk in three-
dimensional space, in incremental steps of fixed length. The random coil model
has enjoyed a long and successful history in describing unfolded proteins. By defi-
nition, a random coil polymer has no strongly preferred backbone conformations
because energy differences among its sterically accessible backbone conformations
are of order @kT. Accordingly, the energy landscape for such a polymer can be vi-
sualized as an ‘‘egg crate’’ of high dimensionality, and a Boltzmann-weighted en-
semble of such polymers populates this landscape uniformly.
More than others, this elegant theory was developed by Flory [33], pp. 30–31;
[17], pp. 991–996) and advanced by Tanford [91–93], who demonstrated that pro-
teins denatured in 6 M guanidinum chloride (a strong denaturant) appear to be
structureless, random chains. Tanford’s pioneering studies established a compel-
ling framework for interpreting experimental protein denaturation that would sur-
vive largely unchallenged for the next 30 years.
Often, the term random coil is used synonymously with the freely jointed chain
model (described below), in which there is no correlation between the orientation
of two chain monomers at any length scale. That is, configurational properties of a
freely jointed chain, such as its end-to-end distance, are Gaussian distributed at all
20.3 The Random Coil 713
chain lengths. In practice, no actual polymer chain is freely jointed. More realistic
models, such as Flory’s rotational isomeric-state model, have Gaussian-distributed
chain configurations only in the infinite chain limit ([33], pp. 30–31; [17], pp. 991–
996). These distinctions notwithstanding, the main characteristic of the random
coil holds in all cases, both ideal and real: the unfolded state is structurally feature-
less because the number of available conformers is large and the energy differen-
ces among them are small.
20.3.1
The Random Coil – Theory
Xn
1
R 2G ¼ R2 ð2Þ
n þ 1 i¼0 Gi
where R Gi is the distance of link i from the center of gravity and n is the number of
links in the polymer. According to a theorem of Lagrange in 1783, R G can be re-
written in terms of the individual link vectors, r ij , without explicit reference to the
center of gravity ([33], appendix A).
1 X N X N
R 2G ¼ r2 ð3Þ
2n 2 i¼1 j¼1 ij
hr 2 i
hR 2G i ¼ as n ! y ð4Þ
6
For a freely jointed chain, the values of such configurational measures are Gaus-
sian distributed.
Of course, no real chain is freely jointed. The chemical bonds in real chains re-
strict motion; bond rotations are never random. Also, each link of a real chain oc-
cupies a finite volume, thereby reducing the free volume accessible to remaining
links. Accordingly, ideal chains descriptions must be modified if they are to accom-
modate such real-world constraints.
A strategy for accommodating restricted bond motion is to depart from physical
chain links and instead re-represent the chain as though it were comprised of
longer, uncorrelated virtual links. The idea underlying this strategy is as follows: a
short chain segment (e.g., a dipeptide) is somewhat rigid [70], but a sufficiently
long segment is flexible. Therefore, the chain becomes flexible at some length be-
tween the dipeptide and the longer peptide. This leads to the idea of an effective
segment, l effective , also called a Kuhn segment, the length scale at which chain seg-
ments approach independent behavior and correlated orientations between them
dwindle away. A chain of length L contains L/l effective Kuhn segments and can be
approximated as a freely jointed chain of Kuhn segments:
20.3 The Random Coil 715
L
hr 2 i ¼ l effective
2
¼ l effective L ð5Þ
l effective
A closely related idea is defined in terms of the chain’s persistence length, the
length scale over which correlations between bond angles ‘‘persist’’. In effect, the
chain retains a ‘‘memory’’ of its direction at distances less than the persistence
length. Stated less anthropomorphically, the energy needed to bend the chain
through a 90 angle diminishes to @kT/2 at its persistence length, and thus
ambient-temperature fluctuations can randomize the chain direction beyond this
length. The size of a Kuhn segment is approximately two persistence lengths (i.e.,
directional correlations die away in either direction).
Current models strive to capture the properties of real chains with more detail
than idealized, freely jointed chains can provide. For example, no actual chemical
bond is a freely swiveling joint. To treat bond restrictions more realistically, Flory
devised the rotational isomeric state approximation ([33], p. 55), in which bond
angles are restricted to discrete values, chosen to correspond to known potential
minima (e.g., gaucheþ , gauche , and trans).
A real polymer chain cannot evade itself. Inescapably, the volume occupied by a
chain element is excluded from occupancy by other chain elements. Otherwise, a
steric clash would ensue: nonbonded atoms cannot occupy the same space at the
same time. This excluded volume effect is substantial for proteins and results in a
major departure from ideal chain dimensions (Eqs (1)–(4)).
As real polymers fluctuate, contracted coils have more opportunities to experi-
ence excluded volume steric clashes than expanded coils, perturbing the chain to-
ward larger dimensions than those expected for ideal polymers.
Chain dimensions are also perturbed by the nature of the solvent. A good solvent
promotes chain expansion by favoring chain:solvent interactions over chain:chain
interactions. Conversely, a poor solvent promotes chain contraction by favoring
chain:chain interactions over chain:solvent interactions. Flory introduced the idea
of a Y-solvent in which, on average, chain:chain interactions exactly counterbal-
ance chain:solvent interactions, leading to unperturbed chain behavior. He pointed
out that the notion of a Y-solvent for a liquid is analogous to the Boyle point for a
real gas, the temperature at which a pair of gas molecules follow an ideal isotherm
because repulsion arising from volume exclusion is compensated exactly by mu-
tual attraction ([33], p. 34).
Flory provided a simple relationship that relates the coil dimensions to solvent
quality [33]. For a random coil polymer with excluded volume, the radius of gyra-
tion, R G , is given by:
R G ¼ R 0n n ð6Þ
quality. Values of n range from 0.33 for a collapsed, spherical molecule in poor sol-
vent through 0.5 at the Y-point (Eq. (1)) to 0.6 in good solvent.
Protein molecules are amphipathic, and their interactions with solvent are com-
plex. However, on balance, denaturing agents such as urea and guanidinum chlo-
ride can be considered good solvents. Using Eq. (6), the degree to which unfolded
proteins are random coil polymers in denaturing solvents can assessed by measur-
ing n, the main topic of Section 20.3.2.
1. The kinetic question: How does a protein discover its native conformation in bio-
logical real time? If restricted solely to the two most populated regions for a di-
peptide, a 100-residue backbone would have 2 100 G 10 30 conformers. With bond
rotations of order 1013 s, the mean waiting time en route to the native confor-
mation would be prohibitive just for the backbone. In actuality, experimentally
determined folding times range from milliseconds to seconds.
2. The thermodynamic question: How does a protein compensate for the large con-
formational entropy loss on folding? With 2 100 G 10 30 conformers, the entropic
price required to populate a single conformation G 30 R lnð10Þ G 40 kcal
mol1 at room temperature, a conservative estimate.
3. The dynamic question: How does a protein avoid meta-stable traps en route to its
native conformation? An equivalent way of asking this question is: why do pro-
teins have a unique native conformation instead of a Boltzmann-distributed en-
semble of native conformations?
Many investigators have sought to provide answers to these questions. Two nota-
ble examples are mentioned here, though only in bare outline.
traps can be avoided if the funnel walls are sufficiently smooth [23], answering
question 3. As a corollary, it is postulated that evolutionary pressures screen pro-
tein sequences, selecting those which can fold successfully in a funnel-like manner
[94]. The folding funnel evokes a graphic portrait of folding dynamics and thermo-
dynamics but is not intended to address specific structural questions, such as
whether a region of interest will be helix or sheet.
20.3.2
The Random Coil – Experiment
½h ¼ Kn x ð7Þ
where K is a constant that depends upon the molecular weight, but only slightly.
Intrinsic viscosity is closely related to R G , and Eqs (6) and (7) have a similar form.
If unfolded proteins retain residual structure, each in their own way, the relation
between [h] and n is expected to be idiosyncratic. Conversely, a series of proteins
that conform to Eq. (7) is indicative of random coil behavior.
720 20 Conformational Properties of Unfolded Proteins
In fact, for a series of 15 proteins denatured in GdmCl, a plot of log½h vs. log n
describes a straight line with slope 0.666, the exponent in Eq. (7) ([91], figure 6).
The linearity of this series and the value of the exponent are strong evidence in
favor of random coil behavior.
1 XN X N
sinh r ij
PðyÞ ¼ 2
ð8Þ
n i¼1 j¼1 hr ij
where n is the number of scattering centers, r ij is the distance between any pair of
centers i and j, and h is a function of the wavelength and scattering angle:
4p y
h¼ sin ð9Þ
l 2
The double sum over all scattering centers is immediately reminiscent of Eq. (3),
in which R G is rewritten in terms of individual vectors, without explicit reference
to their center of gravity. Van Holde et al. ([99], p. 321) show that
1 X N X N
h2 X N X N
PðyÞ ¼ ð1Þ r2 ð10Þ
n 2 i¼1 j¼1 6n 2 i¼1 j¼1 ij
where the first term is unity and R G is directly related to the double sum in the
second term, as in Eq. (3).
Millett et al. used SAXS to determine R G for a series of proteins under both de-
naturing and native conditions ([58], table I and fig. 4). Disulfide cross-links, if any,
were reduced in the denatured species. Their experimentally determined values of
R G were fit to Eq. (6), giving values of n ¼ 0:61 G 0:03 for the denatured proteins
and n ¼ 0:38 G 0:05 for their native counterparts (Figure 20.3). These values are re-
markably close to those expected from theory, viz., n ¼ 0:6 for a random coil with
excluded volume in good solvent and n ¼ 0:33 for a collapsed, spherical molecule
in poor solvent.
20.4 Questions about the Random Coil Model 721
80
70
60
50
RG ( A)
40
30
20
10
0
0 100 200 300 400 500
Length (residues)
Fig. 20.3. The relationship between chain has a value of n ¼ 0:61 G 0:03, in close
length and the radius of gyration, R G , for a agreement with theory. This figure reproduces
series of denatured proteins is well described the one in Millett et al. ([58], fig. 4), but with
by Eq. (6). Data points were taken from table 1 omission of outliers.
in [58], obtained using SAXS. The fitted curve
The SAXS data provide the most compelling evidence to date in favor of the ran-
dom coil model for denatured proteins.
20.4
Questions about the Random Coil Model
The random coil model would seem to be on firm ground at this point. However,
recent work from both theory and experiment has raised new questions about the
validity of the model – questions that provoke considerable controversy. Are they
mere quibbles, or are they the prelude to a deeper understanding of the unfolded
state?
Familiarity conditions intuition. At this point, the random coil model has condi-
tioned our expectations for several decades. Should we be surprised that the di-
mensions of unfolded proteins are well described by a single exponent? Size mat-
ters here. As Al Holtzer once remarked, a steel I-beam behaves as a Gaussian coil
if you make it long enough. But, at relevant length scales, the fact that proteins and
polyvinyl behave similarly is quite unanticipated. After all, proteins adopt a unique
folded state, whereas nonbiological polymers do not.
722 20 Conformational Properties of Unfolded Proteins
20.4.1
Questions from Theory
Superficially, the question of whether polypeptide chains are true random coils
seems amenable to straightforward analysis by computer simulation. In principle,
chains of n residues could be constructed, one at a time, using some plausible
model (e.g., the Flory rotational isomeric model) to pick backbone dihedral angles.
The coil dimensions and other characteristics of interest could then be analyzed by
generating a suitable population of such chains. In practice, the excluded volume
problem precludes this approach for chains longer than @20 residues, where the
likelihood of encountering a steric clash increases sharply, killing off nascent
chains before they can elongate. Naively, one might think that the problem can be
solved by randomly adjusting offending residues until the clash is relieved, but this
tactic biases the overall outcome. In fact, the only unbiased tactic is to rebuild the
chain from scratch, resulting almost invariably in other clashes at new sites for
chain lengths of interest. Such problems have thwarted attempts to analyze the un-
folded population via simulation and modeling.
Fig. 20.4. Testing the Flory isolated-pair mesostate and testing for steric collisions
hypothesis. f; c space for a dipeptide was [65]. Twenty-two meso-states have no allowed
subdivided into 36 alphabetically labeled population; in each of these cases, every f; c
coarse-grain grid squares, called mesostates. value results in a steric collision. In the 14
Treating the atoms as hard spheres, a remaining mesostates, the fraction of sterically
Ramachandran plot (shown in gray) was allowed samples, 0 < L i a 1, was determined,
computed by generating 150 000 randomly as shown.
chosen f; c conformations within each
folded and unfolded states alike. The failure of the hypothesis for contracted chains
implies that such conformers will be reduced selectively in the unfolded state, re-
sulting in a population that is more extended than random coil expectations. Struc-
turally, this shift in the population will result in a more homogeneous ensemble of
unfolded conformers, and thermodynamically, it will reduce the entropy loss ac-
companying folding. But is it significant?
Studies of van Gunsteren et al. [97] and Sosnick and his colleagues [104] concur
that the size of conformational space that can be accessed by unfolded molecules is
restricted in peptides. However, Ohkubo and Brooks [63] argue that restrictions be-
come rapidly insignificant as chain lengths grow beyond n G 7, with negligible
consequences for the random coil model.
In an inventive approach to the problem, Goldenberg simulated populations of
protein-sized chains [39] by adapting a standard software package that generates
three-dimensional models from NMR-derived distance constraints. He analyzed
the resultant unfolded state population using several measures, including coil di-
mensions, and found them to be well described as random coils. A note of caution
is in order, however, because a substantial fraction of the conformers generated by
this method fall within sterically restricted regions of f; c space ([39], table 1).
0.8
0.4
100
0
-100
0 ψ
0 -100
φ 100
Fig. 20.5. A single energy basin can span upper left quadrant of f; c space and
multiple conformers. Most sterically accessible partitions into two distinct energy basins, one
conformers of short polyalanyl chains in good that includes polyproline II helices (larger) and
solvent are extended. Using a soft-sphere another that includes b-strands (smaller)
potential, the Boltzmann-weighted population [65]. At 300 K, conformers with each basin
for the alanyl dipeptide is predominantly in the interconvert spontaneously.
spersed with bends [95, 96]. They were led to this prescient proposal by the similar-
ity between the optical spectra of P II helices and nonprolyl homopolymers. Even ear-
lier, Schellman and Schellman had already argued that the spectrum of unfolded
proteins was unlikely to be that of a true random coil [79]. Following these early
studies, the ensuing literature disclosed a noticeable similarity between the spectra
of P II and unfolded proteins, but such suggestive hints failed to provoke wide-
spread interest – until recently. See Shi et al. for a thorough review [84].
The designation ‘‘polyproline’’ can be misleading. The circular dichroism (CD)
spectrum, characteristic of actual polyproline or collagen peptides, has a pro-
nounced negative band near 200 nm and a positive band near 220 nm. However,
similar spectra can be obtained from peptides that are neither ‘‘poly’’ [54] nor pro-
line-containing [96].
The P II conformation is a left-handed helix with three residues per turn
(f; c G 75 ; þ145 ), resulting in three parallel columns spaced uniformly around
the long axis of the helix. This helix has no intrasegment hydrogen bonds, and, in
solution, significant fluctuations from the idealized structure are to be expected.
The P II conformation is forced by sterics in a polyproline sequence, but it is
adopted readily by proline-free sequences as well [21].
Only three repetitive backbone structures are sterically accessible in proteins: a-
helix, b-strand and P II -helix [70]. In the folded population, a-helices and b-strands
are abundant, whereas P II -helices are rare. More specifically, isolated residues with
P II f; c values are common in the non-a, non-b regions, accounting for approxi-
mately one-third of the remaining residues, but longer runs of consecutive resi-
dues with P II f; c-values are infrequent [90].
726 20 Conformational Properties of Unfolded Proteins
This finding can be rationalized by the fact that P II -helices cannot participate in
hydrogen bonds in globular proteins. Hydrogen bonds are eliminated because the
spatial orientation of backbone donors and acceptors is incompatible with both in-
trasegment hydrogen bonding within P II -helices and regular extra-segment hydro-
gen bonding between P II helices and the three repetitive backbone structures.
Upon folding, those backbone polar groups deprived of hydrogen-bonded solvent
access can make compensatory hydrogen bonds in a-helices and strands of b-sheet,
but not in P II -helices.
Recent work by Creamer and coworkers focused renewed attention on P II [21,
73, 90], raising the question of whether fluctuating P II conformation might con-
tribute substantially to the unfolded population in proteins [96]. Studies performed
during the past few years lend support to this idea, as described next.
20.4.2
Questions from Experiment
Early NMR studies provided evidence for residual structure in the denatured state
of both proteins [36, 61] and peptides excised from proteins [28]. However, the
structured regions seen in proteins were not extensive. Furthermore, most isolated
peptides lacked structure, and the few exceptions did not always retain the confor-
mation adopted in the native protein [27].
Peptide studies tell a similar story. A prime example involves the assessment of
autonomous stability in the a-helix. Early evidence indicated that the cooperative
unit for stable helix formation is @100 residues [105], a length that exceeds the
average protein helix (@12 residues) by almost an order of magnitude. Conse-
quently, the prevailing view in the 1970s was that protein-sized helical peptides
would be random coils in isolation. This view was reversed in the 1980s, after Bier-
zynski et al. [12], expanding upon earlier work by Brown and Klee [15], demon-
strated helix formation in water at near-physiological temperature for residues 1–
13 of ribonuclease, a cyanogen bromide cleavage product. This finding prompted
a re-evaluation of helix propensities in peptides [53, 56, 62, 64] and motivated
numerous biophysical studies of peptides [80]. Summarizing this large body of
work, there is evidence for structure in some short peptides in aqueous solvent at
physiological temperature, but it is marginal at best and, more often, undetectable
altogether.
is too short to form a stable a-helix and should therefore be a random coil. Con-
trary to this expectation, the peptide is largely in P II conformation, in agreement
with predictions from theory [65]. While not all residues are expected to favor the
P II conformation [73], this result shows that the unfolded state is predominantly a
single conformer, at least in the case of polyalanine.
20.4.3
The Reconciliation Problem
The measured radii of gyration, R G , of denatured proteins have values [58] that are
consistent with those expected for a random coil with excluded volume in good sol-
vent (Section 20.3.2.2). Yet, experimental evidence in both proteins [85] and pepti-
des [83] suggests the presence of residual structure in the unfolded population.
How are these seemingly contradictory findings to be reconciled? Millett et al. refer
to this as the reconciliation problem ([58], see their discussion, p. 257).
Paradox is often a prelude to perception. Equation (6), in its generality, necessar-
ily neglects the chemical details of any particular polymer type. Accordingly, the re-
sultant chain description is insensitive to short-range order, apart from the propor-
tionality constant R 0 , which is a function of the persistence length (Section 20.3.1).
Sterically induced local order, encapsulated in R 0 , is surely present in unfolded
proteins [31, 66], but can it rationalize the apparent contradiction between random
coil R G values and global residual structure [85]? One possible explanation is that
multiple regions of local structure dominate the ensemble average to such an ex-
tent that they are interpreted as global organization [52, 103].
The coil library may provide a useful clue to the resolution of this puzzle. The
coil library is the collection of all nonrepetitive elements in proteins of known
structure, that fraction of native structure which remains after a-helix and b-sheet
are removed. Given that the library is composed of fragments extracted from solved
structures, it is surely not ‘‘coil’’ in the polymer sense. However, the term ‘‘coil li-
brary’’ is intended to convey the hypothesis that such fragments do, in fact, repre-
sent the full collection of accessible chain conformers in unfolded proteins [87].
Taken to its logical conclusion, this hypothesis posits that the coil library is a col-
lection of structured fragments in folded proteins and, at the same time, a collec-
tion of unstructured fragments in unfolded proteins. If so, this library, together
with a-helix, b-strand, and P II helix, represents an explicit enumeration of acces-
sible conformers from which the unfolded ensemble might be reconstructed [5].
At this writing, the reconciliation problem remains an ongoing question. Re-
gardless of the eventual outcome, this paradox appears to be moving the field in
an informative direction.
20.4.4
Organization in the Unfolded State – the Entropic Conjecture
Are there general principles that lead to organization in the unfolded state? If
accessible conformational space is vast and undifferentiated, the entropic cost of
20.4 Questions about the Random Coil Model 729
populating the native basin exclusively will be large. However, if the unfolded state
is largely restricted to a few basins, with nonuniform, sequence-dependent basin
preferences, then entropy can function as a chain organizer.
Consider two thermodynamic basins, i and j. The Boltzmann-weighted ratio of
their populations, n i /n j , is given by ðw i /w j ÞebDU, where n i and n j designate the
number of conformers in i and j, w i and w j are the degeneracies of state, b is the
Boltzmann factor, and DU is the energy difference between the two basins. Both
entropy and enthalpy contribute to this ratio. If w i and w j are conformational
biases (i.e., the number of isoenergetic ways the chain can adopt conformations
i and j), and w i /w j is the dominant term in the Boltzmann ratio, the entropy dif-
ference, DS conf ¼ R lnðw i /w j Þ, would promote organization in the unfolded popu-
lation.
In particular, if P II is a dominant conformation in polyalanyl peptides, then it is
also likely to be favored in unfolded proteins, in which case the unfolded state is
not as heterogeneous as previously believed. The usual estimate of about five acces-
sible states per residue in an unfolded protein is based on a familiar argument: the
free energy difference between the folded and unfolded populations, DG conf ,
is a small difference between large value of DH conf and TDS conf [13, 14]. If
DG conf G 10 kcal mol1 (a typical value) and DH conf G 100 kcal mol1 , then the
counterbalancing TDS conf is also @100 kcal mol1 . Then DS conf G 3:33 entropy
units per residue for a 100-residue protein at 300 K. Assuming DS conf ¼ R ln W,
the number of states per residue, W, is 5.34.
However, instead of a reduction in the number of distinct states, this entropy
loss on folding could result from a reduction in the degeneracy of a single state,
providing the f; c space of occupied regions in the unfolded population is fur-
ther constricted upon folding. For example, a residue in P II is within a room-
temperature fluctuation of any sterically allowed f; c value in the upper left quad-
rant of the dipeptide map ([57], table 1). Consequently, different f; c values from
these regions would be thermodynamically indistinguishable and therefore not dis-
tinct states at all. As a back-of-the-envelope approximation, consider a residue that
can visit any allowed region of the upper left quadrant in the unfolded state. Upon
folding, let this residue be constrained to lie within G30 of ideal b-sheet f; c val-
ues. The reduction in f; c space would be a factor of 5.58, approximating the value
attributed to distinct states. Similar, but less approximate, estimates can be ob-
tained when the unfolded populations are Boltzmann weighted.
What physical factors might underwrite such entropy effects?
20.5
Future Directions
The early analysis of steric restrictions in the alanyl dipeptide (more precisely, the
compound Ca-CO-NHaCaHRaCO-NH-Ca, which has two degrees of backbone
freedom like a dipeptide) by Ramachandran et al. [71] has become one of those
rare times in biochemistry where theory is deemed sufficient to validate experi-
ment [49]. The fact that the dipeptide map is based only on ‘‘hard sphere’’ repul-
sion alone led some to underestimate the generality of this work, but not Richards,
who commented [72]:
Acknowledgments
We thank Buzz Baldwin, Nicholas Fitzkee, Haipeng Gong, Nicholas Panasik, Kevin
Plaxco, and Timothy Street for stimulating discussion and The Mathers Founda-
tion for support.
References
34 Garcia, P., Serrano, L., Durand, D., 44 Jalkanen, K. J. and Suhai, S. 1996.
Rico, M., and Bruix, M. 2001. NMR N-Acetyl-L-Alanine N 0 -methylamide: a
and SAXS characterization of the density functional analysis of the
denatured state of the chemotactic vibrational absorption and birational
protein CheY: implications for protein circular dichroism spectra. Chem.
folding initiation. Protein Science 10, Phys. 208, 81–116.
1100–1112. 45 Kauzmann, W. 1959. Some factors in
35 Garcia, A. E. 2004. Characterization the interpretation of protein denatura-
of non-alpha helical conformations in tion. Adv. Protein Chem. 14, 1–63.
Ala peptides. Polymer 45, 669–676. 46 Kazmirski, S. L., Wong, K. B.,
36 Garvey, E. P., Swank, J., and Freund, S. M. V., Tan, Y. J., Fersht,
Matthews, C. R. 1989. A hydrophobic A. R., and Daggett, V. 2001. Protein
cluster forms early in the folding of folding from a highly disordered
dihydrofolate reductase. Proteins denatured state: the folding pathway
Struct. Funct. Genet. 6, 259–266. of chymotrypsin inhibitor 2 at atomic
37 Ginsburg, A. and Carroll, W. R. resolution. Proc. Natl Acad. Sci. USA
1965. Some specific ion effects on the 98, 4349–4354.
conformation and thermal stability of 47 Kentsis, A., Gindin, T., Mezei, M.,
ribonuclease. Biochemistry 4, 2159– and Osman, R. 2004. Unfolded state
2174. of polyalanine is a segmented poly-
38 Go, N. 1984. The consistency proline II helix. Proteins 55, 493–
principle in protein structure and 501.
pathways of folding. Adv. Biophys. 18, 48 Koltun, W. L. 1965. Precision space-
149–164. filling atomic models. Biopolymers 3,
39 Goldenberg, D. P. 2003. Computa- 665–679.
tional simulation of the statistidal 49 Laskowski, R. A., MacArthur,
properties of unfolded proteins. M. W., Moss, D. S., and Thornton,
J. Mol. Biol. 326, 1615–1633. J. M. 1993. PROCHECK: a program to
40 Grant, J. A., Williams, R. L., and check the stereochemical quality of
Scheraga, H. A. 1990. Ab initio self- protein structures. J. Appl. Cryst. 26,
consistent field and potential- 283–291.
dependent partial equalization of 50 Lazaridis, T. and Karplus, M. 1997.
orbital electronegativity calculations of Science 278, 1928–1931.
hydration properties of N-acetyl-N 0 - 51 Levinthal, C. 1969. How to fold
methyl-alanineamide. Biopolymers 30, graciously. Mossbauer spectroscopy in
929–949. Biological Systems, Proceedings.
41 Han, W.-G., Jalkanen, K. J., Elstner, University of Illinois Bull. 41, 22–24.
M., and Suhai, S. 1998. Theoretical 52 Louhivuori, M., Paakkonen, K.,
study of aqueous N-acetyl-L-alanine Fredriksson, K., Permi, P., Lounila,
N 0 -methylamide: structures and J., and Annila, A. 2003. On the origin
raman, VCD and ROA spectra. J. Phys. of residual dipolar couplings from
Chem. B 102, 2587–2602. denatured proteins. J. Am. Chem. Soc.
42 Hoang, T. X., Trovato, A., Seno, F., 125, 15647–15650.
Banavar, J., and Maritan, A. 2004. 53 Lyu, P. C., Liff, M. I., Marky, L. A.,
What determines the native state folds and Kallenbach, N. R. 1990. Side
of proteins? submitted. chain contributions to the stability of
43 Itzhaki, L. S., Otzen, D. E., and alpha-helical structure in peptides.
Fersht, A. R. 1995. The structure of Science 250, 669–673.
the transition state for folding of 54 Madison, V. and Schellman, J. A.
chymotrypsin inhibitor 2 analysed by 1970. Diamide model for the optical
protein engineering methods: activity of collagen and polyproline
evidence for a nucleation-condensation I and II. Biopolymers 9, 65–94.
mechanism for protein folding. J. Mol. 55 Maritan, A., Micheletti, C.,
Biol. 254, 260–288. Trovato, A., and Banavar, J. 2000.
734 20 Conformational Properties of Unfolded Proteins
101 Wu, H. 1931. Studies on denaturation 104 Zaman, M. H., Shen, M. Y., Berry,
of proteins. XIII. A theory of denat- R. S., Freed, K. F., and Sosnick, T. R.
uration. Chin. J. Physiol. V, 321–344. 2003. Investigations into sequence and
102 Yi, Q., Scalley-Kim, M. L., Alm, E. J., conformational dependence of back-
and Baker, D. 2000. NMR charac- bone entropy, inter-basin dynamics
terization of residual structure in the and the Flory isolated-pair hypothesis
denatured state of protein L. J. Mol. for peptides. J. Mol. Biol. 331, 693–
Biol. 299, 1341–1351. 711.
103 Zagrovic, B., Snow, C. Khaliq, S., 105 Zimm, B. H. and Bragg, J. K. 1959.
Shirts, M., and Pande, V. 2002. Theory of the phase transition
Native-like mean structure in the between helix and random coil in
unfolded ensemble of small proteins. polypeptide chains. J. Chem. Phys. 31,
J. Mol. Biol. 323, 153–164. 526–535.
737
21
Conformation and Dynamics
of Nonnative States of Proteins studied
by NMR Spectroscopy
Julia Wirmer, Christian Schlörb, and Harald Schwalbe
21.1
Introduction
21.1.1
Structural Diversity of Polypeptide Chains
From a structural point of view, protein folding is one of the most fascinating as-
pects in structural biology. Protein folding proceeds from the disordered random
coil polypeptide chain defined by its primary sequence to intermediates with in-
creasing degree of conformational and dynamic order to the final native state of
the protein (Figure 21.1). While the starting point of protein folding, probably
best described as a statistical coil, but often called the random coil state of a pro-
tein, can be defined as a state in which few if any nonlocal interactions exist, the
native state can be described by one predominant conformation around which only
local fluctuations of low amplitude occur. Order exists and forms on various levels
of the protein structure, the primary structure describing the conformational pref-
erences of the amino acid residues differs from secondary structure elements,
which are defined by the conformations adopted by consecutive residues to the
final arrangement of secondary structure elements in the three-dimensional fold
of the protein.
During folding, the polypeptide chain therefore adopts very different conforma-
tions and its interaction with the surrounding solvent changes considerably. This
variability in conformational space of a polypeptide chain has become even more
interesting with the observation of protein misfolding revealing that proteins can
also adopt highly ordered, oligomeric states, so-called fibrillic states of proteins, in
which residues adopt repetitive conformations very different to the native fold and
in which hydrogen bonding is satisfied in an intermolecular fashion. In addition,
the conformational equilibria between the various states of a protein are influenced
by interactions with chaperones in the cell. The given sequence of a protein there-
fore can adopt a continuum of different conformational states and degrees of oligo-
merization, those states interconvert at different rates and therefore vary widely in
persistency (Figure 21.1).
Fig. 21.1. The different conformations formation of folding intermediates (b) whose
adopted by a polypeptide chain can range from structure and dynamics may be modulated by
the native, often monomeric state (c), in which protein-protein interactions with molecular
a single conformation exists and which is build chaperones such as GroEL (e) shown in the
up from secondary structure elements and figure. At the other extreme of conformational
their specific arrangements, to the ensemble of states, proteins can aggregate and form
conformers representing the random coil state oligomeric states called fibrils (d). Liquid and
of a protein (a). The individual members of solid state NMR spectroscopy can provide
this ensemble have widely different detailed information on structure and
compaction, dynamics, local and nonlocal dynamics in all of these states.
conformations. Protein folding preceeds via
dynamics simulations. They are being developed to aid the interpretation of NMR
parameters such as chemical shifts d, spin–spin coupling constants J, homo-
nuclear NOE data (NOEs), residual dipolar couplings (RDCs) and heteronuclear
relaxation properties such as relaxation rates ( 15 N-R1 , 15 N-R2 ) and heteronuclear
NOEs (f 1 Hg- 15 N-NOE).
21.1.2
Intrinsically Unstructured and Natively Unfolded Proteins
Nonnative states of proteins have also received attention recently because of the ob-
servation that an axiomatic linking of the function of a protein to persistent fold
might not be general but a number of proteins have been identified that lack in-
trinsic globular structure in their normal functional form [1–3]. The expression
‘‘intrinsically unstructured’’ and ‘‘natively unfolded’’ are used synonymously, the
latter being coined by Schweers et al. in 1994 in the context of structural studies
of the protein tau [4]. Intrinsically unstructured proteins are extremely flexible,
noncompact, and reveal little if any secondary structure under physiological condi-
tions. In 2000, the list of natively unfolded protein comprised 100 entries [5] (see
Table 21.1b). Natively unfolded proteins are implied in the development of a num-
ber of neurodegenerative diseases including Alzheimer’s disease (deposition of
amyloid-b, tau-protein, a-synuclein), Down’s syndrome, and Parkinson disease to
name a few (quoted after Ref. [5]). They are predicted to be ubiquituous in the pro-
teome [6, 7] and algorithms available as a web-program (http://dis.embl.de/) have
been developed to predict protein disorder [8]. According to the predictions, 35–
51% of eukaryotic proteins have at least one long (> 50 residues) disordered region
and 11% of proteins in Swiss-Prot and between 6 and 17% of proteins encoded by
various genomes are probably fully disordered (quoted from Ref. [6]). Proteins pre-
dicted to be intrinsically unstructured show low compositional complexity. These
regions sometimes correspond to repetitive structural units in fibrillar proteins.
Therefore, it seems likely that the unstructureness of the polypeptide chains in
some states of a protein plays an important role in the development of fibrillar
states and this further supports the importance for detailed structural and dynamic
investigations of nonnative states of proteins. Chapter 8 in Part II is dedicated to
the discussion of natively disordered proteins, and Chapter 33 in Part II discusses
protein folding diseases.
It has also been noted by Gerstein that the average genomically encoded protein
is significantly different in terms of size and amino acid composition from folded
proteins in the PDB [9]. This difference would indicate that the structures depos-
ited in the PDB are not random and in turn that they cannot be taken as represen-
tative for the entire structural diversity of polypeptide chains.
In this article, we discuss NMR investigations, primarily of nonnative states of
proteins based on our investigations in the past using lysozyme and its mutants
as well as a-lactalbumin and ubiquitin as model proteins. A number of excellent
review articles have been published reporting on the topic of NMR investigations
of nonnative states of proteins (see, for example, Refs [10–14]). In order to under-
stand and determine unambiguously those parts of the polypeptide chain in which
740 21 Conformation and Dynamics of Nonnative States of Proteins studied by NMR Spectroscopy
21.2
Prerequisites: NMR Resonance Assignment
c)
b)
a)
Fig. 21.2. One-dimensional 1 H-NMR range from the random coil state (a) with
experiments showing the differences in sharp NMR resonance lines to the molten
chemical shift dispersion and linewidth for globule state (b) with extensive line
different states accessible for the protein a- broadening to the native state with well
lactalbumin, a protein with 123 amino acid resolved peaks at least for some atomic groups
residues and four native disulfide bridges. They (c).
21.2 Prerequisites: NMR Resonance Assignment 741
15
N
a) b) c) 105.00
110.00
115.00
(ppm)
120.00
125.00
130.00
1
10.00 9.00 8.00 7.00 10.00 9.00 8.00 7.00 10.00 9.00 8.00 7.00 H
(ppm) (ppm) (ppm)
Fig. 21.3. Two-dimensional 1 H, 15 N-NMR chemical shifts of the different types of amino
correlation experiments showing the acids when in unfolded conformations. For
differences in chemical shift dispersion for comparison, the similar spectrum for the
different states accessible for the protein a- native protein is also shown (a). At pH 2, the
lactalbumin, a protein with 123 amino acid protein forms a molten globule state and the
residues and four native disulfide bridges. For spectrum shows only very few peaks, while the
the random coil state of the protein (c), the other peaks are unobservable due to
resonances cluster in three regions in the considerable line broadening (b).
spectra, largely reflecting the characteristic 15 N
742 21 Conformation and Dynamics of Nonnative States of Proteins studied by NMR Spectroscopy
Fig. 21.4. Two-dimensional 1 H, 13 C-NMR region, and b) aliphatic carbon region. Distinct
correlation experiment of hen egg white regions of 13 C resonances have been labeled
lysozyme denatured in 8 M urea at pH 2. The with the one-letter amino acid code. The
chemical shift resolution in 1 H, 13 C correlation chemical shifts of the 13 C resonances are very
is considerable lower than the 1 H, 15 N-NMR similar to literature values for linear denatured
correlation experiment. a) Aromatic carbon hexapeptides [37].
Tab. 21.1a. Representive set of proteins that were investigated using heteronuclear NMR
spectroscopy in their nonnative state.
Protein References
Tab. 21.1b. Representive set of natively unfolded proteins that were investigated using
heteronuclear NMR spectroscopy in their nonnative state.
Protein References
Protein References
30.00
40.00
50.00
0
60.00
0
70.00
0
8.29 8.25 8.23 8.21 8.11 7.86 8.20 8.37
7 8.15
1H (ppm)
Fig. 21.5. ðo 1 ; o 3 Þ strips from the HNCACB Asn27, Trp28, Val29, Cys30, Ala31, and Ala32
experiment of cysteine reduced and methylated (the additional methionine residue is labeled
lysozyme with the N-terminal additional M1 to keep the numbering of lysozyme).
methionine, a nonnative state of lysozyme in Intra- and interresidual correlation peaks are
H2 O and at pH 2, taken at the 1 H N ðo 3 Þ and marked and clearly visible.
15
N ðo 1 Þ frequencies of Ser24, Leu25, Gly26,
cal shift in protein structures [22] and from quantum chemical calculations [23].
Resonance assignment following this procedure have been reported for a number
of different proteins as summarized in Table 21.1a. Table 21.1b gives an overview
of natively unfolded proteins for which lack of structure has been identified by
NMR spectroscopy. Table 21.1c summarizes solid state NMR investigations of fi-
brillar states of peptides.
Figure 21.5 shows the parts of three-dimensional spectra for hen egg white lyso-
zyme for which sequential assignment is shown for a stretch of neighboring resi-
dues. The resonance assignment for random coil states of proteins with sharp res-
onance lines is straightforward provided the amino acid sequence is nonrepetitive.
21.3
NMR Parameters
In Section 21.3, the most important NMR parameters summarized in Table 21.2
will be introduced with an emphasis on the relevant aspects in the context of con-
formational averaging observed in nonnative states of proteins. It is important to
stress here that in contrast to native proteins, the dependence of the measurable
21.3 NMR Parameters 745
21.3.1
Chemical Shifts d
[28]. Secondary and tertiary structures contribute to the observed chemical shifts
and therefore can be investigated by analysis of the chemical shifts [29]. Deviations
from random coil chemical shifts are also called secondary chemical shifts and are
indicative for the content of residual structures in unfolded or partially folded pro-
teins. An overall averaging over the deviations from random coil chemical shifts
gives an impression of the extent of residual structure in denatured states of
proteins.
Proper referencing of the chemical shifts to standard substances like DSS (2,2-
dimethyl-2-silapentane-5-sulfonic acid) or TSP (3-(trimethylsilyl) propionate) is re-
quired for the comparison of experimental spectra to random coil chemical shifts
from the literature [30, 31]. In general, the dispersion of the 15 NH , 1 HN , and 13 CO
resonances in unfolded proteins is much greater than for the 13 Ca, 13 Cb, 1 Ha, and
1
Hb resonances, reflecting the sensitivity of the former nuclei to the nature of the
neighboring amino acids in the primary sequence of the protein [32]. Both 13 Ca
and 13 CO chemical shifts are shifted downfield when they are in a-helical struc-
tures and upfield, when they are located in b-sheets, while 13 Cb and 1 Ha reso-
nances experience upfield shifts in a-helices and downfield shifts in b-structures
[30, 31, 33].
So-called chemical shift index (CSI) calculations [33] can routinely be used to de-
tect secondary structure elements in proteins. In this method, chemical shift devi-
ations are normalized to 1, 0, and 1, depending on the extent and direction of the
deviation, and then plotted against the residues of the protein sequence. The result
of this is a topology plot of the protein from which one easily can spot a-helical
structures and b-sheet regions, which differ in the direction of the secondary chem-
ical shifts.
X
d obs ¼ pid i ð1Þ
i
where d obs is the observed chemical shift, d i is the chemical shift in the ith confor-
P
mation with weights p i and i p i ¼ 1. The increase in linewidth at half height
DG1=2 due to fast exchange between two site equally populated is given by:
pðdnÞ 2
DG1=2 ¼ ð2Þ
2k
21.3 NMR Parameters 747
Fig. 21.6. Schematic picture indicating the population between the two states, e.g.,
appearance of NMR spectra for different at a given denaturant concentration. b)
chemical exchange regimes: a) Slow chemical Intermediate chemical exchange regime
exchange between conformers, e.g., in the (molten globule state). c) Fast chemical
nonnative (A) and native state of a protein (B). exchange (random coil state).
The height of the signals indicates the
jdnj
k ¼ pffiffiffi ð3Þ
8
748 21 Conformation and Dynamics of Nonnative States of Proteins studied by NMR Spectroscopy
21.3.2
J Coupling Constants
Tab. 21.3. Overview of Karplus parametrization for different 1 J; 2 J, and 3 J coupling constants.
b-sheet
9 3J(HN, Ha) (φ) =
H 6.4 cos2(φ -60°) - 1.4 cos(φ - 60°) + 1.9
Ha) (Hz)
H
6.5 H
3J(HN,
4
H
a-helix
1.5
-180 -120 -60 0 60 120 180
φ (°)
Fig. 21.7. Dependence of the 3 JðHN ,Ha Þ coupling constants on
the backbone torsion angle f using the parametrization by
Vuister et al. (see Table 21.3).
750 21 Conformation and Dynamics of Nonnative States of Proteins studied by NMR Spectroscopy
X
J obs ðaÞ ¼ p i J i ða i Þ ð4Þ
i
where J obs is the observed scalar coupling constant, J i is the scalar coupling con-
P
stant in the ith conformation with torsion angle a i and weights p i and i p i ¼ 1.
Two approaches have been developed to interpret the averaged scalar coupling con-
stant. In one approach, random coil scalar coupling constants have been deter-
mined for random coil peptides GGXGG in close similarity with the reference
chemical shift values [53].
Different to chemical shifts, the dependence of scalar coupling constants on the
intervening torsion angle is well established (see Table 21.3) and therefore, cou-
pling constants can be predicted using models for the distribution of torsion angle
space in the random coil state of a protein (see Section 21.4).
Table 21.4 shows a comparison of coupling constants predicted from the random
coil model and those measured for random coil peptide GGXGG in 6 M GdmCl.
21.3.3
Relaxation: Homonuclear NOEs
Tab. 21.4. Random coil 3 JðHN ,Ha Þ coupling constants predicted on the basis of the random
coil model (see Section 21.4) and measured experimentally. Values for 3 JðHN,Ha Þpred: are based
on the distribution of f angles in the protein database and represent an average over all the
different adjacent amino acids, rather than for residues with adjacent glycines. Experimental
coupling constants were measured using a GGXGG peptide in 6 M GdmCl at 20 C, pH 5.0.
3
J(H N ,Ha ) pred. (Hz) 3
J(H N ,Ha ) exp. (Hz)
r
NOE
Fig. 21.8.The NOE interaction sH1; H2 between two protons
H1 and H2 is the most important NMR parameter to
determine structures of proteins.
752 21 Conformation and Dynamics of Nonnative States of Proteins studied by NMR Spectroscopy
NOE
Fig. 21.9. Dependence of the cross-peak intensity sH1; H2 in
NOESY spectra on the distance between protons H1 and H2.
Characteristic secondary structure elements are indicated, the
cross-peak for two protons 3.4 Å apart has arbitrarily been
scaled to 1.
Spectral density function The spectral density function J q ðoq Þ is obtained by eval-
uating the correlation function of the spherical harmonics
ðy
J q ðoq Þ ¼ dtF ðqÞ ðtÞF ðqÞ ðt þ tÞ expðioq tÞ ð6Þ
0
The bar indicates time average over t. q can assume values of 0, G1, G2. The corre-
lation function describes the probability of a given internuclear vector to stay in a
fixed orientation in dependence of the time t. Due to rotational tumbling, the ori-
entation is lost; for proteins of 15 kDa size, a typical time constant tc for this pro-
cess is of the order of 5 ns. However, depending on the model of rotational reorien-
tation (isotropic, axially symmetric, asymmetric), the spectral density function has
21.3 NMR Parameters 753
Spherical top molecules The spectral density function for isotropic rotational dif-
fusion has been derived by Hubbard and coworkers [54–57]:
" #
q 2 6D
J ðoq Þ ¼
5 ð6DÞ 2 þ oq2
" #
1 2tc
¼ ð7Þ
5 1 þ ðoq tc Þ 2
where D denotes the diffusion constant, with 1=tc ¼ 6D. The rotational diffusion
constant is inversely proportional to the correlation time tc . The correlation time
for brownian rotational diffusion can be measured by time-resolved fluorescence
spectroscopy, light scattering, or NMR relaxation [58] and can be approximated
4phw rH3
for isotropic reorientation by the Stoke’s-Einstein correlation: tc ¼ in which
3kB T
hw is the viscosity, rH is the effective hydrodynamic radius of the protien, kB is the
Boltzmann constant, and T the temperature.
Axially symmetric top molecules For the case that Dxx ¼ Dyy ¼ D? , one obtains
the spectral density function of a symmetric top rotator (Dzz ¼ Dk ) or of an axially
symmetric top molecule whose rotational diffusion is defined by a second rank dif-
fusion tensor with polar coordinates y and f:
1
J q ðoq Þ ¼ fS 2 J q; 0 þ S12 J q; 1 þ S22 J q; 2 g
20 0
3
S22 ¼ hðsin 2 yÞðcos 2fÞi 2 þ hðsin 2 yÞðsin 2fÞi 2
4
2tc; m
J q; m ¼ ð9Þ
1 þ ðoq tc; m Þ 2
have been used. The correlation times tc; m can be rewritten as diffusion constants
Dk and D? according to
754 21 Conformation and Dynamics of Nonnative States of Proteins studied by NMR Spectroscopy
NOE
The analysis shows that of cross-peak intensities sH1; H2 will therefore be very dif-
ferent depending on the overall compactness of the conformer, ranging from
nearly isotropic to highly extended and anisotropic shape, in the ensemble in the
random coil state of a protein.
S 2 tc ð1 S 2 Þt
JðoÞ ¼ þ ð11Þ
1 þ o tc 1 þ o 2 t 2
2 2
In Eq. (11), tc is the overall rotational correlation time of the molecule and
1=t ¼ 1=tc þ 1=te . In the absence of internal motion (S 2 @ 1 and t ¼ tc ), JðoÞ sim-
plifies to:
21.3 NMR Parameters 755
Fig. 21.10. Model-free parameters shown for correlation time te . The overall rotation is
a NaH spin system within a molecule that assumed to be isotropic. B) Two-time scale
tumbles isotropically with the overall rotational extended model with two S 2 parameters, SS2
correlation time tc . A) Simple model with the and Sf2 for slow and fast motions. C) like A but
order parameter S 2 and the effective with anistropic overall rotation.
tc
JðoÞ ¼ ð12Þ
1 þ o 2 tc2
In cases where different fast motions take place, whose time scales differ by at least
one order of magnitude, an extended model-free formalism has been developed
with a separate S 2 for each motion, SS2 and Sf2 (Figure 21.10b, Eq. (13)).
S 2 tc ðSf2 S 2 Þt
JðoÞ ¼ þ ð13Þ
1 þ o 2 tc2 1 þ o2t2
In this two-time scale model it is assumed that the contribution of the faster of
the two motions can be neglected, therefore, while it contributes to the overall
S 2 since S 2 ¼ SS2 Sf2 , the term containing the fast effective correlation time tf
is left out. The time scale of the slower internal motion ts is included in
t ð1=t ¼ 1=tc þ 1=ts Þ similar to te of the single-time scale model (it will therefore
not be discriminated from te in the following).
NOE
Figure 21.11 shows the dependence of the cross-peak intensity sH1; H2 in NOESY
experiments on the overall correlation tc , the internal correlation time te and the
order parameter S 2 as defined in the single-time scale model. It can be seen that
for global correlation time of 2 ns, an order parameter S 2 of 0.4, and an internal
NOE
correlation time te of 150 ps, sH1; H2 is scaled down from –0.36 in the absence of
any internal motion ðS 2 ¼ 1:0Þ to 0.12. If one reduces tc to 1 ns, sH1;
NOE
H2 is further
reduced to 0.04. It is interesting to note that for longer internal correlation times
NOE
te , sH1; H2 increases again. The partition of tc and te is difficult to assess by NMR.
In the case, in which local and global motions cannot be entirely uncoupled and in
756 21 Conformation and Dynamics of Nonnative States of Proteins studied by NMR Spectroscopy
NOE
Fig. 21.11.Dependence of the cross-peak intensity sH1; H2 in
NOESY spectra on the correlation time for local motions te
assuming different correlations times t2 for the global motion
and different order parameters S 2 .
NOE
which local motions have relative larger amplitudes, interpretation of sH1; H2 in
terms of arising from an average distance is difficult. In addition, the correlation
times for both internal and global motion may be affected in a different manner
depending on the specific experimental conditions such as temperature and sol-
vent viscosity; these effects make it difficult to argue that the absence of a specific
cross-peak could be taken as evidence for specific conformation sample as is some-
times done in the literature.
I N i,i+3
I N i,i+3 0.17
coop
Fig. 21.12. Model of a consecutive stretch of cross-peak Iða; NÞi; iþ3 is 50% of the cross-peak
three amino acid residues. For the sake of the observed for a single static conformation. In
argument, it is assumed that the residues can the case of noncooperative conformational
non coop
adopt only two different conformations, a and sampling, the intensity Iða; NÞi; iþ3 decreases
b. In the case of cooperative sampling of to 17%, for which only two out of the eight
conformational space, which is indicated by possible conformation contribute (see Table
the dashed lines, the intensity of the NOE 21.5).
tide (aaa; baa; aba; bba; aab; bab; abb; bbb) are each sampled by 12.5%. Obviously,
the J coupling constant in either of the two cases is the average ð J a þ J b Þ=2 and J
couplings therefore do not differentiate between different degrees of cooperativity.
This is different for the measurement of an NOE, e.g., between a residue i and
i þ 3. In the cooperative case, there are two populations present (aaa; bbb), each by
50%. Let us assume that the aaa conformer represents a helical fragment and the
bbb conformer an extended fragment. Then, Figure 21.9 shows that no cross-peak
can be observed for bbb conformer. In the cooperative case, in which 50% of all
molecules in the ensemble have an aaa conformation, the cross-peak intensity is
50% of the cross-peak intensity predicted for a helix (Table 21.5). In the noncoop-
erative case, in which all eight conformations are equally populated, the cross-peak
intensity drops to 0.17 and only two conformers (aaa; baa) contribute to the cross-
peak intensity.
21.3.4
Heteronuclear Relaxation (15 N R1 , R2 , hetNOE)
Tab. 21.5. Contribution to NOE cross-peak intensities from different conformations for
two models: (i) cooperative sampling of conformational space defined by two different
conformations, a and b, or (ii) noncooperative sampling of conformational space (see Figure
21.12).
aab – – 12.5% a
aba – – 12.5% a
bab – – 12.5% a
bba – – 12.5% a
d2
R1 ¼ ½ JðoH oN Þ þ 3JðoN Þ þ 6JðoH þ oN Þ þ c 2 JðoN Þ
4
d2
R2 ¼ ½4Jð0Þ þ JðoH oN Þ þ 3JðoN Þ þ 6JðoH Þ þ 6JðoH þ oN Þ ð14aacÞ
8
c2
þ ½4Jð0Þ þ 3JðoN Þ þ R ex
6
100
R1 , R2 (Hz), hetNOE
R2
10
R1
1
hetNOE
0.1
0 2 4 6 8 10
τc (ns)
Fig. 21.13. Dependence of R1 ; R2 and the heteronuclear NOE
(hetNOE) on tc . Calculations are based on the assumption of a
rigid molecule tumbling isotropically.
R s No internal motion
chain
stiffness
(ns)
15
Fig. 21.14. N heteronuclear R2 rates as underlying model-free approach for the
function of correlation time tc for global description of the dynamics in nonnative states
motions, te for internal motions and the order of proteins is discussed in the text.
parameters S 2 . The appropriateness of the
760 21 Conformation and Dynamics of Nonnative States of Proteins studied by NMR Spectroscopy
approach for random coil states of proteins is controversial, for a detailed discus-
sion see, for example, Ref. [64]: the extraction of motional parameters using the
Lipari-Szabo approach is based on two (single time scale model) or three (two
time scale model) discrete correlation times that can be separated and are uncorre-
lated to the other motions. In other words, the approach assumes that the time
scales of global and local motions can clearly be separated, an assumption ques-
tionable in the context of random coil states of proteins.
Alexandrescu and Shortle [65] have used a local model-free approach, according
to which one global correlation time is fitted per residue. In the case of extensive
averaging over a range of frequency in the nanosecond range, a continuous distri-
bution of correlation times (and subsequently spectral density functions) can be in-
troduced. One of such functions, the Cole-Cole distribution, has been used by Bue-
vich and Baum to describe the relaxation behavior of the unfolded propeptide of
subtilisin PPS [66]. An improved treatment of this problem has been proposed by
Ochsenbein et al., in which a lorentzian distribution of correlation times (see Eq.
(15)) is used to describe the underlying dynamics of the ensemble of conformers.
ðy
JðoÞ ¼ FðtÞJðo; tÞ dt
0
D
FðtÞ ¼ K for 0 a t a tmax ð15Þ
D þ ðt t0 Þ 2
2
21.3.5
Residual Dipolar Couplings
Fig. 21.15. A two-spin system of spin 1 H and the static magnetic field Bo . The dipolar field of
15
N connected by an internuclear vector rNH. the 15 N nucleus can increase or decrease the
The 15 N nucleus exerts a dipolar field, which static magnetic field active at the position of
acts at the side of the 1 H nucleus. The the 1 H nucleus depending on the orientation
magnetic moments of the nuclei are aligned of the NH vector and the spin state of the
parallel or antiparallel (not shown) relative to proton (parallel or antiparallel to Bo ).
for native proteins. A number of excellent review articles have been published [70–
73]. Only recently have first measurements using RDCs in the case of nonnative
states of proteins [74] been published.
In our description of the conformational dependence of RDCs, we follow a re-
cent review article by Bax [73]: Dipolar couplings between NMR-active nuclei such
as a spin NaH are caused by the magnetic dipole field exerted by the N nucleus at
the site of nucleus H (Figure 21.14). In the following, we keep only the component
of the magnetic dipole field that is parallel to the direction (the z-direction) of the
magnetic field Bo . The dipole field exerted by the N nucleus will change the reso-
nance frequency of the H nucleus. The dipolar coupling D NH depends on the
internuclear distance rNH and on the orientation y of the internuclear vector NH
(bold, italics indicating the vector property) relative to Bo (Eq. (16)). For a fixed ori-
entation of the vector, the dipolar field exerted by N can either increase or decrease
the magnetic field of nucleus H, depending on whether the equally populated spin-
states a and b of H are parallel or antiparallel relative to Bo .
D NH ¼ Dmax
NH
hð3 cos 2 y 1Þ=2i ð16Þ
where y is the angle between the internuclear vector NH and the magnetic field Bo ,
the angle brackets denote averaging over the ensemble of molecules, and
NH
Dmax ¼ m 0 ðh=2pÞgH gN =ð4p 2 rNH
3
Þ ð17Þ
is the splitting observed for y ¼ 0; the additional constants have the same meaning
as explained for Eq. (14a–c).
In the case of isotropic rotational tumbling of a molecule in solution (Figure
21.16a), brownian motions average the magnetic dipole effects to zero, while in
the solid state spectrum of a molecule, such dipolar couplings are not averaged
and every nucleus couples with a large number of other nuclei, resulting in consid-
erable line broadening.
762 21 Conformation and Dynamics of Nonnative States of Proteins studied by NMR Spectroscopy
However, there are now various methods that impart partial alignment to a dis-
solved molecule, Many of them are additives to the solution such as liquid crystal-
line media [75] and partial alignment is induced by steric and/or ionic interaction
between the aligned medium and the dissolved macromolecule (Figure 21.16b). In
the case of NMR investigations of nonnative states of proteins, the use of polyacry-
lamide gels that have been compressed in an anisotropic manner have been suc-
cessful [76, 77].
Yet another different way of inducing alignment of a molecule is to use para-
magnetic factors bound to the protein, either by using natural paramagnetic cofac-
tors or metals [78, 79] or specifically designed binding sites for paramagnetic ions
[80]. In contrast to external alignment media, alignment is not caused by intermo-
lecular interactions but is rather a property of the molecule itself. This is advan-
tageous in the case of nonnative states of proteins, since different interactions be-
tween the members of the ensemble of conformers representing these states and
the aligning medium will lead to different extents of alignment for each member
of the ensemble.
The aligning of the proteins has the effect that rotational tumbling is not iso-
tropic, and not all orientations are equally likely to occur. The extent of difference
in this tumbling can be expressed as an alignment tensor A. Its principal compo-
nents A xx ; A yy , and A zz reflect the probabilities (along the x-, y-, and z-direction) for
the diffusional motion relative to the external magnetic field Bo .
The dipolar coupling depends on the polar coordinates of the NH vector in the
frame of the alignment tensor:
21.3 NMR Parameters 763
D NH ðy; jÞ ¼ 3=4Dmax
NH
½ð3 cos 2 y 1ÞA zz þ sin 2 y cos 2jðA xx A yy Þ ð18Þ
NH
where Da ¼ 3=4Dmax A zz is the magnitude of the dipolar coupling tensor and
R ¼ 2=3ðA xx A yy Þ=A zz is the rhombicity.
In the context of the analysis of nonnative states based on residual dipolar cou-
plings, it is of importance to discuss the ability to predict the dipolar couplings
from a given conformation. In Eq. (19), the magnitude and rhombicity of the
alignment tensor relative to the molecular frame need to be known. For a given
conformation, this may be predicted on the basis of the molecular shape [81, 82].
However, best results have been obtained so far if the alignment mechanism is
steric and due to the large number of members in the ensemble of structures,
such approach is particularily difficult for nonnative states of proteins. A second
approach is to obtain a tensor that fits the experimental data best in a linear fitting
algorithm [83]. This approach, however, cannot be used to predict RDCs from the-
oretical models for the torsion angle distributions in nonnative states of proteins.
n
N-n
h
Fig. 21.17. Model for a random flight chain the origin of the molecular frame, the chain is
consisting of N þ 1 segment represented by partioned into two half chains containing n
dots and connected by straight lines. The chain and N n segments. RDCs are functions of
tumble at distance h z from a disk representing the average angle y of i, the Bo field is parallel
the aligning medium. At the locus i, placed at to the z-axis.
The random flight chain is composed of N þ 1 segments. For any given element of
the chain i, there are n segments before i and N n segment after i. Therefore, the
probability function contains two parts, Wnobs ; WNn
obs
, and is given by:
1
Wðz; h z ; c; n; NÞ ¼ fnWnobs ðz; h z ; c; nÞ þ ðN nÞWnobs ðz; h z ; c; N nÞg
N
1
¼ fn½Wnfree ðz; c; nÞ Wnbar ðz; h z ; c; nÞ
N
þ ðN nÞ½Wnfree ðz; c; N nÞ Wnbar ðz; h z ; c; N nÞg
pffiffiffiffiffiffiffiffiffiffi ( " # " #)
2n=p ð2 þ 1=2cÞ 2 ð2h z z þ 1=2cÞ 2
¼ exp exp
N 2n 2n
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ( " #
2ðN nÞ=p ðz 1=2cÞ 2
þ exp
N 2ðN nÞ
" #)
ð2h z z 1=2cÞ 2
exp ð21Þ
2ðN nÞ
where c ¼ cos y.
For any arbitrary pair of nuclei A and B in the segment i separated by the inter-
nuclear vector rAB, the vector rAB will be at an angle aAB with respect to the internal
coordinate system of the seqment. The segment itself will be oriented at an angle y
with respect to the magnetic field Bo . The distribution W describing the averaging
of the ith segment of the chain is a function of the position of each segment, each
segment has its own averaged hP2 i polynomial. As a consequence, each segment
21.3 NMR Parameters 765
has its own alignment tensor A, which is axially symmetric due to the distribution
functions for a random flight chain. The dipolar coupling DAB is proportional to
the axial component of the alignment tensor:
The average of cos 2 y contained in the expression of the dipolar coupling will be
obtained by integration over all angles and space weighted by the distribution and
normalized. The following observations can be made: For longer chains, hP2 i will
become smaller because the distributions of larger number of uncorrelated seg-
ments will become more spherical. Only polypeptide chains up to 100–200 resi-
dues will give rise to observable dipolar couplings. These observed residual dipolar
couplings for a random flight chain are a consequence of the fact that such chain is
restrained by the covalent structure of the chain, each segment is coupled and not
free. The RDC originates from the individual loci along the chain, not from the
whole chain average; each individual segment exhibits a nonvanishing hP2 i, it is
time and ensemble averaged, but not averaged over all loci.
The treatment of residual dipolar couplings for random coil states of proteins is
yet another example in which the prediction of what to expect for such state is of
importance before deviations from random coil behavior can be interpreted with
any vigor.
21.3.6
Diffusion
1
(ppm)
Fig. 21.18. Signal intensity in a 1 H 1D NMR spectrum as
function of gradient strength (g.s.). a) lysozyme mutant W62G;
b) dioxan.
21.3.7
Paramagnetic Spin Labels
Paramagnetic spin labels in proteins can be used to test long-range order or con-
tacts. Spin labels such as nitroxide cause significant enhancement of relaxation
rates of resonances that are within a distance of 15 Å away from the label [91, 92].
Spin labels can be attached to a number of residues containing reactive side chains
such as Cys [93], His [91], or Lys [92]. Also the N-terminus of the protein can be
used to attach a spin label [92]. Solvent-exposed residues especially in folded pro-
teins have to be selected to prevent structure perturbations [94, 95], also paramag-
netic metal ions such as Mn 2þ or Gd 3þ can be used if specific binding sites exist
[96].
In folded proteins, distances can be extracted from the differential line broaden-
ing of the diamagnetic protein (reduction of the nitroxide with ascorbate to nitro-
xylamin) [94] to the paramagnetic protein [91, 92], whereas only a qualitative
picture for nonnative states of proteins is gained, since they are composed of fluc-
tuating conformations [97–99]. Using paramagnetic spin labels, long-range inter-
actions were identified (e.g., in highly denatured myglobin) [99], in staphylococcal
nuclease [97, 100], and in protein L [98]. Introduction of new binding sites for
paramagnetic ions [80] and tagging those to nonnative states of proteins will allow
detailed investigations using paramagnetic effects.
21.3 NMR Parameters 767
21.3.8
H/D Exchange
Hydrogen exchange methods can give insight into protein structure and dynamics
by looking at the exchange of amide protons of the protein backbone with solvent
protons. In general, protons that are involved in stable hydrogen bonds or are
buried within the protein core are protected from exchange with the solvent. If ei-
ther the buffer contains D2 O or the protein has been deuterated, the exchange of
protons for deuterons or vice versa can be monitored by NMR. Hydrogen exchange
techniques are valuable for the investigation of protein folding kinetics [101–104],
folding intermediates, and unfolded states [105–107], since they can provide infor-
mation on the degree and dynamics of secondary structure formation, which de-
pends on the formation of stable hydrogen bonds. Hydrogen exchange is slow at
low pH and faster at higher pH. In pulse labeling hydrogen exchange experiments
[101], the process of folding or refolding could be investigated for some proteins. A
typical approach of pulse-labeling hydrogen exchange is to let a deuterated protein
initially refold for a short time (ms) (e.g., by diluting into D2 O refolding buffer)
and give short pulses of protons at distinct times and high pH. This way, unpro-
tected amide deuterons will exchange for protons and can be detected by 2D-NMR
[108]. The reverse approach, namely refolding of nondeuterated protein in H2 O
buffer and giving D2 O pulses, is also used [109–111]. The process of protein un-
folding has been investigated in pulse-labeling hydrogen exchange NMR studies
as well [112]. This topic is further described in Chapter 18.
21.3.9
Photo-CIDNP
Accessible aromatic side chains in proteins can be monitored using the photo-
CIDNP (chemically induced dynamic nuclear polarization) NMR technique [113,
114]. The photo-CIDNP NMR technique relies on the temporary interaction of a
photoexcited dye – mostly a flavin such as FMN (Figure 21.19) – with an aromatic
amino acid side chain (Trp, Tyr, and His). Applying the photo-CIDNP technique to
a protein results in signal enhancement as well as a better resolution in the aro-
matic region of the NMR spectrum (see Figure 21.19). Intensities of photo-CIDNP
signals depend on (a) the polarization efficiency of a certain amino acid which
is Trp > Tyr g His g g Met and (b) the accessibility of the amino acid. Emissive
photo-CIDNP signals are observed for Tyr residues, whereas the signals of Trp
and His are absorptive – the sign of a signal therefore reports on the type of amino
acid.
Using photo-CIDNP to achieve polarization enhancement is mostly seen in pro-
ton NMR spectroscopy – nevertheless higher enhancements are observed for 15 N
nuclei [115]. One particular advantage of photo-CIDNP NMR for unfolded states
is the better resolution due to the fact that only few residues are polarized. Using
unlabeled protein, it is therefore often possible to assign peaks to the respective
amino acid side chain and furthermore following changes in accessibility in differ-
768 21 Conformation and Dynamics of Nonnative States of Proteins studied by NMR Spectroscopy
NMR
CIDNP
R
Me N N O
N
Me N H
O
HO
2-
R= OPO3
OH OH
ent unfolded states [116]. Furthermore, photo-CIDNP NMR is being used in ki-
netic investigations of protein folding to monitor folding intermediates [117–119]
– besides the better resolution since only few amino acids are polarized, better time
resolution compared with normal NMR methods can be achieved due to the
fact that the interscan delay depends only on the relaxation time of an electron
(@10 ms). In addition, CIDNP experiment have been performed that transferred
information on accessibilities in nonnative states of proteins onto the native
protein using photo-CIDNP by polarizing the protein in the nonnative state,
folding it rapidly, and observing polarized spins in the native state [120].
21.4
Model for the Random Coil State of a Protein
Fig. 21.21. 3
JðHN Ha Þ coupling constants in Model I: all f angles are equally likely (dashed-
unfolded lysozyme (8 M urea, pH 2). Coupling dotted line); model II: 50% a and 50% b-sheet
constants expected for an a-helical and a b- conformation are populated (small dashed
sheet conformation are indicated as black line); model III: populations are equally to the
lines. Models I–III indicate different averaging Ramachandran distribution (large dashed line).
models that could be present in a random coil.
Fig. 21.22. The distribution of f; c torsion and gleft ) regions of f; c space are labeled.
angles for all amino acids not located in Similar diagrams can be obtained with
secondary structure elements in a database of sufficient statistic for each amino acid
402 high-resolution crystal structures of native individually.
folded proteins. The a; b and aL (combined aleft
21.5 Nonnative States of Proteins: Examples from Lysozyme, a-Lactalbumin, and Ubiquitin 771
residues in a random coil are derived from coil regions of 402 proteins from the
protein data bank (Figure 21.22). It has been found that the distribution of f; c
torsion angles in the protein data bank resembles the experimental f; c energy
surface [24, 45, 122, 123], the effects of specific nonlocal interactions present in
individual proteins are thus averaged out over the set of structures.
The amino acid-specific distribution obtained for each amino acid can also be
used to calculate residue specific J coupling constants and also to calculate NOE
intensities using a Monte-Carlo procedure and a simplified pentapeptide model.
The pentapeptide model consists of explicit treatment of the backbone angles f
and c and a simplistic treatment of the side chains as a pseudoatom. Distributions
of interproton distances can then be extracted from the conformational ensembles,
and NOE intensities calculated assuming a two-spin approximation and r 6 averag-
ing. Predicted values of NOE intensities converged with ensemble sizes of 10 5 con-
formers.
A different approach is chosen for the interpretation of dynamics in unfolded
proteins. Backbone dynamics in proteins can be monitored by 15 N amide hetero-
nuclear relaxation measurements. These heteronuclear relaxation measurements
are sensitive to motions on a subnanosecond time scale and to slow conforma-
tional exchange in the millisecond time scale. Heteronuclear relaxation data un-
folded proteins without disulfide bridges show only small variations over the se-
quence: Values approach a plateau at the middle of the protein while lower values
are found at the termini of the sequence (Figure 21.23). This implies that the relax-
ation properties of a given amide are not influenced by its neighbor, but just by the
fact that its part of the polypeptide chain. This relaxation behavior has been found
for a number of unfolded proteins [21, 124] and in synthetic polymers [125] (see
also Figure 21.35 for random coil state of ubiquitin). It was shown for atactic poly-
styrene that plateau values of R1 relaxation rates remain constant for homopoly-
mers between 10 kDa and 860 kDa. Individual segments of the long polymer chain
therefore move independently in solutions – the apparent correlation time is there-
fore largely independent of the overall tumbling rate and only affected by segmen-
tal motions.
21.5
Nonnative States of Proteins: Examples from Lysozyme, a-Lactalbumin, and Ubiquitin
After the overview of NMR parameters that can measure aspects of conformation
and dynamics of nonnative states of proteins, we wish to show the measurement of
these NMR parameters primarily for one particular protein, lysozyme. Many of the
experiments have been done together with Chris Dobson and former members of
his group.
The native state of lysozyme consists of two domains, the a-domain, which spans
from residue 1 to 35 and 85 to 129 and the b-domain that involves residues 36–84.
Four disulfides bridges are present in the native structure; two of them in the a-
domain of the protein, holding the N- and the C-termini together (C6–C127, C30–
C115) one in the b-domain (C64–C80) and one between the domains (C76–C94).
772 21 Conformation and Dynamics of Nonnative States of Proteins studied by NMR Spectroscopy
Fig. 21.23. Relaxation dynamics model for the distinguish a) the unbranch polymer chain and
random coil state. The relaxation properties of b) the branched polymer chain, in which
a given residue is influenced by the number of disulfide bridges cause additional cross-linking
neighboring residues along the chain. For the of the polypeptide chain and lead to increased
interpretation of relaxation rates, one has to relaxation rates.
Lysozyme has been studied under a range of denaturing conditions by NMR tech-
niques [25, 42, 107, 126–129, 183, 253].
21.5.1
Backbone Conformation
shown in Figure 21.25. In the following section, it will be shown how these chem-
ical shifts of random coil proteins can be used to (a) identify residual secondary
structure (HN and H a ) and to (b) investigate mean residue-specific properties of
chemical shifts in unfolded states of proteins ( 15 N chemical shift).
HN and H a chemical shift perturbations (Dd) in unfolded variants of hen lyso-
zyme from chemical shifts measured in short unstructured peptides are shown in
Figure 21.24. The mean chemical shift pertubations are very close to the values
measured in small unstructured peptides for all three variants. Figure 21.24a
shows perturbations in a lysozyme variant in which all disulfide bridges have
been reduced and subsequently methylated. This variant is unfolded at pH 2 in
the presence and even in the absence of the denaturant urea. Large downfield de-
viations are found for Gly22 (HN ), for Val29 and Cys-SME 30 (both H a ) around
Trp62/Trp63 (R61 H a ; W62, W63 and C64 HN ), around Trp108/Trp111 (V109,
R112 H a ; W111, N113, R114 HN ), and Trp123 indicating residual secondary struc-
774 21 Conformation and Dynamics of Nonnative States of Proteins studied by NMR Spectroscopy
ture. All these pertubations are found in aromatic patches of the protein, indicating
that aromatic amino acids cluster and hereby form secondary structure elements,
presumably to prevent contact with water. The cluster around Gly22 consists of the
sequence Tyr-X-Gly-Tyr (where X can be any amino acid) and is also found to form
a nonrandom cluster in BPTI [130–132]. Comparison of these chemical shift devi-
ations to deviations of unfolded hen lysozyme in 8 M urea (oxidized and reduced)
reveal that the formation of residual secundary structure around aromatic residues
is a feature of the protein even detectable in 8 M urea.
Single point mutations of residues involved in patches of residual secondary
structure in the unfolded state (A9G, W62G, W62Y, W111G, and W123G) do not
change the pattern of chemical shift perturbations observed in methylated protein
in water at pH 2 (see Figure 21.26) [253].
For an assessment of residue-specific properties of chemical shifts in random
coil states of proteins, a data set of three proteins in 8 M urea at pH 2 was com-
pared: human ubiquitin, reduced and methylated hen egg lysozyme and all-Ala a-
lactalbumin [138]. As mentioned in Section 21.2, chemical shift dispersion in non-
native states of proteins is considerably smaller than in folded states of proteins,
amide 15 N chemical shifts, however, display significant chemical shift dispersion
and therefore have been analyzed as discussed below. Aromatic amino acids, cys-
teine, and methionine residues were excluded from the investigations because of
the lack of experimental data and due to the fact that aromatic residues have been
found to be involved in nonrandom structures in at least one of the three proteins.
Mean observed chemical shifts in native proteins of the BMRB (as of February
1999) are very close to the mean experimental 15 N chemical shifts in the three
unfolded proteins (R ¼ 0:98, see Figure 21.27). However, the spread of chemical
shifts in the native state is very large, while there are only small variations in chem-
ical shifts of the unfolded proteins.
number
Fig. 21.25. Differences in 13 C 0 , 13 Ca and 13 Cb one-letter amino acid code and residue
chemical shifts for lysozyme with its four number excluding terminal residues.
disulfides intact denatured in 8 M urea Appropriate corrections for random coil shifts
from empirical values measured in short of Thr69 preceded by Pro70 were employed.
unstructured peptides ([37] Dd ¼ dexp demp ). Secondary chemical shifts for glutamate and
13
C chemical shifts in the present work are aspartate residues have been omitted as the
referenced indirectly to the internal standard chemical shifts are affected by the protonation
TSP at pH 2 at 20 C in 8 M urea. Mean of the amino acid side chains at the low pH
secondary 13 C chemical shifts are (with value of 2 used in this study. This is reflected
standard deviations given in parentheses): by significant average deviations from random
hDdCa i ¼ 0:2 (0.3) ppm, hDdCb i ¼ 0:2 coil values for Asp, hDdCa i ¼ 1:6 ppm,
(0.2) ppm, hDdC 0 i ¼ 0:9 (0.6) ppm. hDdCb i ¼ 3:3 ppm, hDdC 0 i ¼ 1:7 ppm,
Residues with Dd b 0:38 ppm, DdCa; b a 0:8 and Glu, hDdCa i ¼ 1:3 ppm, hDdCb i ¼ 1:0
ppm, and DdC 0 a 1:9 ppm are labeled by ppm, hDdC 0 i ¼ 1:6 ppm.
776
Fig. 21.26. Normalized chemical shift 3 units and Dd > 0:8 ppm by 4 units. Ha :
perturbations in methylated lysozyme single Dd > 0:1 ppm by 1 unit, Dd > 0:2 ppm by
point mutants: A9G, W62G, W62Y, W111G and 2 units, Dd > 0:3 ppm by 3 units, and
21 Conformation and Dynamics of Nonnative States of Proteins studied by NMR Spectroscopy
W123G. The following normalization scheme Dd > 0:4 ppm by 4 units. Dots indicate
was applied: HN : Dd > 0:2 ppm by 1 unit, missing assignments.
Dd > 0:4 ppm by 2 units, Dd > 0:6 ppm by
21.5 Nonnative States of Proteins: Examples from Lysozyme, a-Lactalbumin, and Ubiquitin 777
(ppm)
(ppm)
Fig. 21.27.Comparison of experimental 15 N chemical shifts in
unfolded ubiquitin, hen lysozyme and all-Ala lactalbumin with
chemical shifts in native proteins deposited in the databank for
biomolecular NMR data (BMRB) (as of February 1999).
tance of 4.2 Å in the random coil state is shorter than in either of the secondary
structure types. aN (i; i þ 1), NN (i; i þ 1), and aN (i; i þ 2) as well as NN (i; i þ 2),
aN (i; i þ 3), and NN (i; i þ 3) distances have been predicted for hen lysozyme
taking into account the amino acid sequence of hen lysozyme and using residue-
specific f; c distributions extracted from the PDB. The actual observability of NOE
cross-peaks in the NOESY-HSQC of hen lysozyme (ox þ red) in 8 M urea at pH 2 is
dependent on the signal-to-noise ratio in the experiment and the effective correla-
tion time of the protein, which could vary along the protein sequence (for the good
signal-to-noise obtainable in the spectra of hen lysozyme, see Figure 21.28) [25].
Tab. 21.6. Comparison of the effective interproton distances in the ensemble of random coil
conformers with interproton distances in regions of regular secondary structure. Effective
interproton distances hr 6i1=6 are given for an ensemble of random coil conformers
generated using the f; c torsion angle populations of all residues in the protein database.
i,i+2
i,i+3
W111
A W V A W R N R C K
V109
A110
2.0 2.0
W111
V109
4.0 4.0 A110
H (ppm)
W111
1
6.0 6.0
W111arom
8.0 8.0 W111 NH
A110 NH
107 108 109 110 111 112 113 114 115 116
residue number
Fig. 21.28. Left: 1 H, 1 H strips of residues 107– different contour levels (¼ signal-to-noise
116 from a 3D NOESY spectrum of reduced levels), and 1D trace of Trp111. The different
and denatured lysozyme in 8 M urea at pH 2 noise levels are indicated at the 1D trace,
(mixing time ¼ 200 ms). Arrows indicated cross-peaks and the diagonal peak (Trp111
ði; i þ 2Þ and ði; i þ 3Þ cross-peaks as indicated. NH) are indicated.
Right: 1 H, 1 H strip of Trp111 is plotted at four
The overall agreement between the predicted and experimental data from
NOESY-HSQC spectra is remarkably good (Figure 21.29). aN (i; i þ 1Þ and NN
(i; i þ 1Þ NOEs are predicted and observed throughout the protein sequence, fur-
thermore 56 out of 82 predicted aN (i; i þ 2) NOEs are seen while two of the aN
(i; i þ 2) NOEs are observed but not predicted. No correlation is observed between
predicted and observed NN (i; i þ 2) NOEs; while only 11 of the 27 predicted NN
(i; i þ 2) NOEs are observed, 10 NN (i; i þ 2) cross-peaks are observed that are not
predicted. A similarly poor correlation is observed for aN (i; i þ 3) cross-peaks: two
out of eight predicted NOEs are observed and seven NOEs are observed but not
predicted. However, it is worth noting, that both the aN (i; i þ 3) and the NN
(i; i þ 2) crosspeaks are predicted to be weak, and that the observability therefore
depends very strongly on the signal-to-noise of the spectrum, which can vary lo-
cally. In summary, the considerable amount of predicted and observed NOEs in un-
folded hen lysozyme, in particular aN (i; i þ 1), NN (i; i þ 1), and aN (i; i þ 2) NOEs
indicate that the averaged backbone conformation predicted from all proteins in
the PDB describes the local preferences in lysozyme very well. It is important to
note that this presence of NOEs is not a sign of persistent nonrandom structure
10 20 30 40 50 60
NN
N
i,i+1
scN
NN
i,i+2
N
scN
NN
i,i+3
scN
i,i+1
scN
NN
i,i+2
N
scN
i,i+3
21.5 Nonnative States of Proteins: Examples from Lysozyme, a-Lactalbumin, and Ubiquitin
scN
779
Fig. 21.29. Observed short- and medium- states, whereas open bars indicated NOEs
range NOEs in oxidized and reduced lysozyme observed in the reduced state and dashed bars
(8 M urea, pH 2) from 3D NOESY-HSQC in the oxidized state only. NOEs to any side
spectra (mixing time ¼ 200 ms). Black bars are chain are summarized as scN NOEs.
indicative for NOEs that are present in both
780 21 Conformation and Dynamics of Nonnative States of Proteins studied by NMR Spectroscopy
in the unfolded state, but arises from statistical averaging of possible conforma-
tions in a random coil. This average is dominated by the shorter distances since
the NOE is averaged as hr 6 i1=6 .
Fig. 21.31. Experimental versus predicted 3 JðC 0 CbÞ coupling constants using the coil model.
and the experimental coupling constants, reflecting the quality of the model used
for the predictions. Investigations of the individual data sets (–Glu, –Asp and –
aromatics) give rise to similar correlation coefficients (R ¼ 0:96 for ubiquitin;
R ¼ 0:94 for lysozyme and R ¼ 0:90 for all-Ala a-lactalbumin). These results
indicate the independence of both the primary sequence and the native structure
of the protein ubiquitin is composed of mainly b-sheets, while lysozyme and
a-lactalbumin contain a significant number of a-helices in the native state.
The variation of 3 JðC 0 Cb Þ and 3 JðC 0 C 0 Þ coupling constants is smaller than the
variation in 3 JðHN Ha Þ coupling constants (Figure 21.31). The smallest variations
are found in the 3 JðC 0 C 0 Þ values, which range from 0.5 to 1 Hz. Nevertheless clus-
tering of the experimental values is observable: values for eight Gly residues range
from 0.5 to 0.7 Hz. A more detailed analysis of residue-specific properties of
3
JðC 0 C 0 Þ coupling constants is not appropriate due to the small range of measured
coupling constants.
Values of 3 JðC 0 Cb Þ in unfolded oxidized lysozyme range from 1.3 to 2.6 Hz. As
observed for the 3 JðHN Ha Þ coupling constants, large variations along the sequence
are found. These variations again cluster for different residue types. However a
different trend is seen to that for the 3 JðHN Ha Þ coupling constants: the largest
3
JðC 0 C 0 Þ are found for alanine residues, which have a high probability of being
found in a conformation in the f; c Ramachandran space, while the smaller values
are found for residues with larger side chains such as histidine, asparagine, and
valine residues. Figure 21.31 shows a correlation of experimental coupling con-
stants with predicted coupling constants (Hz) from the coil theory. A correlation
coefficient of R ¼ 0:63 is observed for the complete data set (axis intercept ¼ 0:07,
782 21 Conformation and Dynamics of Nonnative States of Proteins studied by NMR Spectroscopy
slope ¼ 1), R ¼ 0:69 (axis intercept ¼ 0:0949, slope ¼ 1) if Asp and Glu residues
(that are protonated at pH 2 but not in the data sets used for the predictions) are
left out.
The discussion thus far has focused on the determination of coupling constants
that report on the backbone angle f. Figure 21.32 shows a comparison of the pre-
diction of 3 JðNH ,Ha Þ coupling constants taking (a) all residues into account or (b)
Fig. 21.32. Correlation between experimental elements in the database of native folded
3
JðHN ,Ha Þ averaged for a given residue type, protein structures with f; c torsion angles
e.g., all alanines in ubiquitin denatured in 8 M covering all regions of the Ramachandran
urea, pH 2, and 3 JðHN ,Ha Þ coupling constants diagram. b) Comparison with predictions for
predicted for amino acids excluding aromatic residues that are not located in secondary
amino acids and Asp and Glu residues. a) structure elements and that have positive c
Comparison with predictions for residues that torsion angles only.
are not located in secondary structure
21.5 Nonnative States of Proteins: Examples from Lysozyme, a-Lactalbumin, and Ubiquitin 783
Pro19
Pro38
J(Ca,Ca) (Hz)
3
Residue
3
Fig. 21.33. JðCa ,Ca Þ coupling constants in unfolded (dots)
and folded (crosses) ubiquitin. Dashed line indicates expected
coupling constants in a b-sheet, shaded area indicates expected
values in a random coil (rc), and dotted line indicates values of
coupling constants in the a-conformation.
only residues with positive c angle into account (compare Figure 21.22). A similar
correlation (R ¼ 0:96 in both cases) is observed showing that 3 JðNH ,Ha Þ coupling
constants cannot differentiate between a situation in which only extended con-
formations are being sampled (positive c values) and a situation in which the
polypeptide chain samples both a- and b-regions of the Ramachandran space. The
presence of experimental NOE contacts HN i , HN iþ1 and of Hai , HN iþ1 of similar
intensity in nonnative states of proteins indicate that averaging involves sampling
of positive and negative c torsion angles, which is compatible with predictions of
our model. However, a quantitative analysis of populated rotamers from NOE in-
tensities is difficult due to the r 6 averaging and complex dynamical properties
that are potentially nonuniform along the denatured peptide chain.
3
JðCa ,Ca Þ coupling constants can probe the conformation around the backbone
angle c directly and are shown in Figure 21.33 for the random coil state and for
the native state of ubiquitin [254]. The dependence of 3 JðCa ,Ca Þ on the angle c is un-
expected and in contradiction to a simple Karplus-type dependence on the interven-
ing torsion angle o but was found for data obtained for native ubiquitin. The mean
experimental coupling constants 3 JðCa ,Ca Þ for residues in b-sheet regions of native
ubiquitin are 1:69 G 0:1 Hz while coupling constants are too small to be observed
in a-helical regions. In the random coil state of ubiquitin, 3 JðCa ,Ca Þ are larger than
0.7 Hz throughout the sequence, including residues 23–34 that are located in the
a-helical region in the native state. The mean 3 JðCa ,Ca Þ coupling constant is
0:85 G 0:2 Hz with a pairwise rmsd of 0.1 Hz for the 41 resonances that are suffi-
ciently resolved to allow coupling constants to be determined. As an exception,
784 21 Conformation and Dynamics of Nonnative States of Proteins studied by NMR Spectroscopy
large 3 JðCa ,Ca Þ values are observed for Pro19 and Pro38. This might reflect a
strong preference of prolines for f; c torsion angles in the polyprolyl region of
f; c space (f @ 60 ; c @ þ150 ) and potentially restricted c sampling.
The correlations found here confirm the previous conclusion based on the
3
JðHN ,Ha Þ coupling constants, that the main-chain torsion angle populations in
denatured lysozyme predominantly reflect the intrinsic f; c preferences of the
amino acids involved, and shows that coupling constant measurements can be
used successfully to probe main-chain conformations in the nonnative state of a
protein.
21.5.2
Side-chain Conformation
3
Jexp ðN,Cg Þ ¼ p180 3 Jtrans ðN,Cg Þ þ ð1 p180 Þ 3 Jgauche ðN,Cg Þ
3
Jexp ðC 0 ,Cg Þ ¼ p60 3 Jtrans ðC 0 ,Cg Þ þ ð1 p60 Þ 3 Jgauche ðC 0 ,Cg Þ ð22Þ
p60 ¼ 1 p180 p60
Using Eqs (22a–c), the relative populations of the w1 conformers of 60 , 60 ,
and 180 have been calculated for each residue from the experimental 3 JðC 0 ,Cg Þ
and 3 JðN,Cg Þ coupling constants. Analysis of the populations of the three staggered
rotamers derived from the 3 JðC 0 ,Cg Þ and 3 JðN,Cg Þ coupling constants has enabled
21.5 Nonnative States of Proteins: Examples from Lysozyme, a-Lactalbumin, and Ubiquitin 785
Fig. 21.34. Comparison of the mean in 8 M urea and predicted from the statistical
percentage populations of the three staggered model for a random coil. The aromatic
side chain w1 rotamers for different amino residues are excluded from this comparison
acids calculated from the experimental due to their anomalous behavior.
coupling constant data for lysozyme denatured
(class IV) the 60 w1 rotamer is most populated (64%) with the lowest population
for 180 w1 (3%). The behavior of the residues with aromatic side chains (class V)
varies considerably with a range of populations being seen for a given type of aro-
matic side chain and different mean populations being observed for the various
aromatic residues.
The behavior of the amino acid side chains observed experimentally in the ran-
dom coil state of lysozyme has been compared with the predictions for a random
coil from the statistical model. In particular the w1 populations derived from the
experimental coupling constants have been compared with the w1 distributions in
the PDB for residues that are not in recognized regions of secondary structure
(COIL parameter set). The choice of this set of residues is important because the
steric restraints of regular secondary structure elements can affect the w1 distribu-
tions significantly [122, 134, 135]. As the Pachler analysis does not consider the
population of side-chain conformations other than those of the three staggered ro-
tamers, we compare the relative proportions of w1 torsion angles falling within the
ranges w1 ¼ 60 G 30 , w1 ¼ þ60 G 30 and w1 ¼ 180 G 30 in the PDB with
the analogous distributions calculated from the coupling constants. In general
only a small number of residues in the database have w1 torsion angles that do
not fall in the three ranges considered (e.g., 87 (5%) of the 1628 aspartate residues
in the database). The PDB populations are listed in Table 21.7.
Many of the features of the side-chain populations derived from the experimen-
tal coupling constants are also clearly observed in the populations predicted for a
random coil from the PDB. Examples of comparisons of the two sets of popula-
tions are shown in Figure 21.34. The PDB distributions predict that for all amino
acids in a random coil, excluding serine, threonine, valine, isoleucine, cysteine, as-
paragine, and aspartic acid, the 60 w1 rotamer should have the highest popula-
tion (> 50%) and the þ60 w1 rotamer the lowest population (a 20%). The 60
w1 rotamer is most favored as it places the side chain in a position where the steric
clash with the main chain is minimized while the þ60 w1 rotamer is the most
sterically restricted and therefore the least populated.
21.5.3
Backbone Dynamics
Tab. 21.7. Predicted 3 JðC 0 ,Cb Þ and 3 JðC 0 ,C 0 Þ coupling constants and populations of the three
staggered side-chain w1 rotamers from the statistical model for a random coil. The coupling
constant values and were calculated using the residues in all secondary structure elements
(ALL) and for residues found neither in a nor b (COIL) in a database of 402 high-resolution
crystal structures. The w1 populations were calculated using the COIL parameter set considering
only the resides whose w1 torsion angles fall into 60% ranges centered about the distinct
staggered rotamers (w1 ¼ 60 G 30 ; w1 ¼ 60 G 30 or w1 ¼ 180 G 30 ). No side-chain
populations are given for proline.
ation rates. Besides that, the polypeptide chain of ubiquitin in 8 M urea, pH 2 does
not exhibit any amino acid-specific variation of the heteronuclear relaxation rates.
This relaxation behavior has also been found for a number of unfolded proteins
[21, 124] and in synthetic polymers [125]. It was shown for atactic polystyrene
that plateau values of R1 relaxation rates remain constant for homopolymers be-
tween 10 kDa and 860 kDa. Individual segments of the long polymer chain there-
fore move independently in solutions – the apparent correlation time is therefore
largely independent of the overall tumbling rate and only affected by segmental
motions.
The relaxation rates for the random coil state of lysozyme are shown in Figure
21.36.
Using a model of segmental motions the relaxation properties can be described,
assuming that relaxation contributions due to neighboring residues in a given
788 21 Conformation and Dynamics of Nonnative States of Proteins studied by NMR Spectroscopy
Tab. 21.8. Summary of the populations for w1 rotamers for denatured lysozyme calculated from
the experimental coupling constants for the different classes of amino acids. The experimentally
derived populations are mean values for each amino acid, the corresponding standard
deviations being given in parentheses. Values for residues where the population of only one of
the staggered rotamers could be defined have not been included when calculating the means
(except of lysine). For isoleucine the populations listed are a mean of those calculated from
coupling constants involving Cg1 and Cg2 .
polypeptide chain decay exponentially as the distance from a given residue in-
creases [25].
X
N
R2rc ðiÞ ¼ R int eðjijj=l 0 Þ ð23Þ
j¼1
R (s )
R
R
(s )
hetNOE
R
R
Fig. 21.36. Relaxation data of backbone triangles) and R1r relaxation rates (open
amides of oxidized and reduced lysozyme in circles) of oxidized lysozyme, and d) R2
8 M urea at pH 2. a) R1 relaxation rates of relaxation rates (filled triangles) and R1r
reduced lysozyme, b) hetNOE of oxidized relaxation rates (open circles) of reduced
lysozyme, c) R2 relaxation rates (filled lysozyme.
790 21 Conformation and Dynamics of Nonnative States of Proteins studied by NMR Spectroscopy
R2(s–1)
Fig. 21.37. a) 15 N R2 in WT-SME . The (R int ¼ 0:2 s1 and l 0 ¼ 7) are shown as
experimental rates are shown as a scatterplot dashed lines. Translation of the relaxation data
and the rates fitted by a model of segmental on the native structure of lysozyme is shown
motion expected for a random coil with R2 rc in b).
from the segmental motions model can be done introducing Gaussian clusters to
the segmental motion model:
exp
X
N X 2
R2 ðiÞ ¼ R int eðjijj=l 0 Þ þ R cluster eððixcluster Þ=2l cluster Þ ð24Þ
j¼1 cluster
X
N Ncys
X
exp
R2 ðiÞ ¼ R int eðdm ij =l 0 Þ þ R exch eðjiCysk j=l2 Þ ð25Þ
j¼1 k¼1
(i.e., unbranched polypeptide chains). Figure 21.37 shows R 2 relaxation rates mea-
sured in reduced and methylated lysozyme in water. Six hydrophobic clusters were
identified in WT-SME using the extended segmental motion model (Eq. (24)). These
clusters are centered at residues (1) L8, (2) C30, (3) S60, (4) S85, (5) W111, and (6)
W123. The most prominent cluster, cluster 3 is found in the middle of the protein
sequence which belongs to the b-domain in folded lysozyme. All other clusters
(cluster 1, 2, 4, 5, and 6) are found in what is to be the a-domain in the folded state
of the protein. While clusters 3, 5, and 6 are also observed in urea, cluster 2 which
is near Trp28 is destroyed by the addition of urea. Furthermore the rather small
clusters 1 and 4 cannot be identified in the presence of urea. The importance of
weak interactions in protein folding is reviewed in Chapter 6.
Stabilization of these clusters can be tested by single point mutations in conjunc-
tion with relaxation measurements [184, 253]: Replacement of Trp62 in cluster 3
by glycine essentially abolishes not only its own cluster 3 but also clusters 1
through 4, and diminishes clusters 5 and 6 (as seen in Figure 21.38). The disrup-
tion of the clusters far away in sequence clearly indicates the presence of long-
range interactions in WT-SME . Interestingly, clusters also seem to play an im-
portant role in refolding: hydrogen exchange measurements have shown that the
a-domain of lysozyme folds a lot faster than the b-domain, with exception of
residue W63, which becomes exchange-protected as quickly as the a-domain. An
association of the region around W62/W63 with the a-domain as shown by
relaxation measurements is therefore not only present in the unfolded state, but
seems to be important for the folding nucleus of the protein.
In order to test whether this property of Trp62 to stabilize the folding nucleus is
unique to tryptophan or may be more generally attributable to aromatic amino
acids, the effect of replacing Trp62 by tyrosine is shown in Figure 21.38c. The
W62Y mutation has a very minor effect on the relaxation properties of WT-SME
compared with W62G-SME . This result suggests that aromatic residues play an
important role in the stabilization of long-range interactions in unfolded states of
lysozyme.
A9G-SME even more so displays WT-SME properties. There is not even an effect
on its own cluster, cluster 1. The hydrophobic clusters remain of virtually identical
size and position, indicating that Ala9 is not important for stabilizing the hydro-
phobic core.
In contrast, results of the replacement of tryptophan residues at positions 111
and 123 are very significant. When Trp111 (cluster 5) is replaced by glycine, the
cluster in which the mutation is located disappears entirely, and cluster 6 also de-
creases significantly in intensity. Loss is also observed in the intensity of cluster 2
(surrounding Trp28) and a very small decrease in the intensity of cluster 3 (sur-
rounding Trp62 and Trp63). Similarly, when Trp123 at the C-terminus (cluster 6)
is changed to a glycine, the cluster around the mutation site essentially disappears,
and the intensities of clusters 2, 3, and 5 are lessened. The effect of W123G re-
placement is less significant than that of W111G, as seen by the complete loss of
cluster 5 in W111G-SME but not in W123G-SME . The largest changes in R 2 distri-
bution as compared to WT-SME is observed with W62G-SME . W62G-SME on one,
792
R
R R R
R
Fig. 21.38. 15 N R2 in WT-SME and W62G-SME , motion expected for a random coil with R2 rc
W62Y-SME , A9G-SME , W111G-SME , and W123G- (R int ¼ 0:2 s1 and l 0 ¼ 7) are shown as
SME are shown in panels a–e, respectively. The dashed lines; black lines show Gaussian fits for
21 Conformation and Dynamics of Nonnative States of Proteins studied by NMR Spectroscopy
experimental rates are shown as a scatter plot WT-SME (a,b,c) W111G-SME (e) and W123G-
and the rates fitted by a model of segmental SME (f ).
21.6 Summary and Outlook 793
21.6
Summary and Outlook
8 The investigations show that NMR parameters can be predicted using models for
the random coil state of a protein. The applied model treats the polypeptide
chain as a (branched or unbranched) homopolymer for the prediction of aniso-
tropic and global parameters such as relaxation rates or RDCs. On the other
hand, for the prediction of local parameters, only explicit models such as the ran-
dom coil model (Section 21.4) with amino acid-specific torsion angle distribution
will allow correlation with experimental data such as J-couplings and NOEs,
since the experiment is sensitive enough to detect small local differences in the
conformational sampling. Second-order effects revealing the more detailed de-
pendence on nearest neighbor effect will need a larger statistical database for
their firm interpretation.
8 Random coil states of proteins sample the f; c and w1 conformational space ac-
cording to distribution functions derived from the analysis of structures preva-
lent in all protein structures [24, 45, 122, 123]. The assumption that the distribu-
tion of f; c torsion angles in the PDB resembles the experimental f; c energy
surface and that the effects of specific nonlocal interactions present in individual
proteins are averaged out over the set of structures seems to be valid. More so-
phisticated approaches including molecular dynamics simulations of nonnative
states of proteins will be of importance in the future to go beyond the current
conformational and dynamic description of these states.
8 On the basis of the delineation of the ‘‘random coil’’ behaviour, significant devia-
tions identifying nonrandom structures in nonnative states can be found for a
number of different proteins. These deviations include both nonrandom confor-
mational preferences and differences from rapid dynamics that would intercon-
vert the ensemble of states. The nonrandom structure can be shown to be stabi-
lized both by local and global interactions and same of the global interactions
trap the polypeptide chain into conformations not present in the native structure
of the protein. In these investigations, combined mutational and NMR spectro-
scopic investigations have been particularily powerful tools in showing changes
in overall compaction in these states. It is tempting to speculate that mutations
do alter the energetics of the ensemble of states representing the unfolded states
of a protein.
Natively unfolded proteins form a significant part of the proteom and reveal
794 21 Conformation and Dynamics of Nonnative States of Proteins studied by NMR Spectroscopy
that the polypeptide chain may not necessarily adopt a globular, rigid fold to exert
its function. Bioinformatic analysis suggests these states to be more abundant
than initially assumed.
8 At the same time, misfolded, yet highly regular fibrillar states of proteins are
formed in many protein folding diseases. In light of these considerations, hetero-
nuclear NMR spectroscopy seems to be the most versatile and powerful tech-
nique to characterize aspects of conformation and dynamics in all of these states
of proteins. In all of these investigations, the clear definition of what to expect for
a random coil protein will form a firm baseline to predict deviations in a reliable
manner.
Acknowledgments
H.S. would like to acknowledge the fruitful and fascinating collaboration and dis-
cussions with C. M. Dobson, K. Fiebig, M. Buck, L. Smith, M. Hennig, W. Peti, S.
Grimshaw, C. Redfield, J. Jones, S. Glaser, and C. Griesinger in characterizing non-
native states of proteins. The work in the group of H.S. has been carried out by
a number of extremely creative and bright students, who we would like to thank:
T. Kühn, E. Collins, J. Klein-Seethamaran, E. Duchardt, and K. Schlepckow. We
would like to thank the DFG, the European Commission, the Fonds der Chemi-
schen Industrie, the state of Hesse and the University of Frankfurt for financial
Support.
References
89 Jones, J. A., Wilkins, D. K., Smith, 99 Lietzow, M. A., Jamin, M., Jane
L. J. & Dobson, C. M. (1997). Dyson, H. J. & Wright, P. E. (2002).
Characterisation of protein unfolding Mapping long-range contacts in a
by NMR diffusion measurements. highly unfolded protein. J Mol Biol
J Biomol NMR 10, 199–203. 322, 655–662.
90 Pan, H., Barany, G. & Woodward, C. 100 Gillespie, J. R. & Shortle, D. (1997).
(1997). Reduced BPTI is collapsed. A Characterization of long-range
pulsed field gradient NMR study of structure in the denatured state of
unfolded and partially folded bovine staphylococcal nuclease. I. Para-
pancreatic trypsin inhibitor. Protein Sci magnetic relaxation enhancement by
6, 1985–1992. nitroxide spin labels. J Mol Biol 268,
91 Schmidt, P. G. & Kuntz, I. D. (1984). 158–169.
Distance measurements in spin- 101 Roder, H. & Wuthrich, K. (1986).
labeled lysozyme. Biochemistry 23, Protein folding kinetics by combined
4261–4266. use of rapid mixing techniques and
92 Kosen, P. A., Scheek, R. M., Naderi, NMR observation of individual amide
H. et al. (1986). Two-dimensional 1H protons. Proteins 1, 34–42.
NMR of three spin-labeled derivatives 102 Baldwin, R. (1993). Pulsed H/D-
of bovine pancreatic trypsin inhibitor. exchange studies of folding inter-
Biochemistry 25, 2356–2364. mediates. Curr Opin Struct Biol 3,
93 Hubbell, W. L., McHaourab, H. S., 84–91.
Altenbach, C. & Lietzow, M. A. 103 Miranker, A., Radford, S. E.,
(1996). Watching proteins move using Karplus, M. & Dobson, C. M. (1991).
site-directed spin labeling. Structure 4, Demonstration by NMR of folding
779–783. domains in lysozyme. Nature 349,
94 Matthews, B. W. (1995). Studies on 633–536.
protein stability with T4 lysozyme. Adv 104 Radford, S. E., Dobson, C. M. &
Protein Chem 46, 249–278. Evans, P. A. (1992). The folding of
95 McHaourab, H. S., Lietzow, M. A., hen lysozyme involves partially
Hideg, K. & Hubbell, W. L. (1996). structured intermediates and multiple
Motion of spin-labeled side chains in pathways. Nature 358, 302–307.
T4 lysozyme. Correlation with protein 105 Roder, H., Wagner, G. &
structure and dynamics. Biochemistry Wuthrich, K. (1985). Individual
35, 7692–7704. amide proton exchange rates in
96 Lee, L. & Sykes, B. D. (1983). Use of thermally unfolded basic pancreatic
lanthanide-induced nuclear magnetic trypsin inhibitor. Biochemistry 24,
resonance shifts for determination 7407–7411.
of protein structure in solution: 106 Robertson, A. D. & Baldwin, R. L.
EF calcium binding site of carp (1991). Hydrogen exchange in
parvalbumin. Biochemistry 22, 4366– thermally denatured ribonuclease A.
4373. Biochemistry 30, 9907–9914.
97 Gillespie, J. R. & Shortle, D. (1997). 107 Buck, M., Radford, S. E. & Dobson,
Characterization of long-range C. M. (1994). Amide hydrogen
structure in the denatured state of exchange in a highly denatured state.
staphylococcal nuclease. II. Distance Hen egg-white lysozyme in urea.
restraints from paramagnetic J Mol Biol 237, 247–254.
relaxation and calculation of an 108 Jennings, P. A. & Wright, P. E.
ensemble of structures. J Mol Biol 268, (1993). Formation of a molten globule
170–184. intermediate early in the kinetic
98 Yi, Q., Scalley-Kim, M. L., Alm, folding pathway of apomyoglobin.
E. J. & Baker, D. (2000). NMR Science 262, 892–896.
characterization of residual structure 109 Udgaonkar, J. B. & Baldwin, R. L.
in the denatured state of protein L. (1988). NMR evidence for an early
J Mol Biol 299, 1341–1351. framework intermediate on the
800 21 Conformation and Dynamics of Nonnative States of Proteins studied by NMR Spectroscopy
ethanol. Biochemistry 34, 13219– 137 Schulman, B. A., Kim, P. S., Dobson,
13232. C. M. & Redfield, C. (1997). A
128 Buck, M., Boyd, J., Redfield, C. et al. residue-specific NMR view of the non-
(1995). Structural determinants of cooperative unfolding of a molten
protein dynamics: analysis of 15N globule. Nat Struct Biol 4, 630–634.
NMR relaxation measurements for 138 Peti, W., Smith, L. J., Redfield, C. &
main-chain and side-chain nuclei of Schwalbe, H. (2001). Chemical shifts
hen egg white lysozyme. Biochemistry in denatured proteins: resonance
34, 4041–4055. assignments for denatured ubiquitin
129 Buck, M., Schwalbe, H. & Dobson, and comparisons with other denatured
C. M. (1996). Main-chain dynamics of proteins. J Biomol NMR 19, 153–165.
a partially folded protein: 15N NMR 139 Eliezer, D. & Wright, P. E. (1996).
relaxation measurements of hen egg Is apomyoglobin a molten globule?
white lysozyme denatured in Structural characterization by NMR.
trifluoroethanol. J Mol Biol 257, 669– J Mol Biol 263, 531–538.
683. 140 Eliezer, D., Jennings, P. A., Dyson,
130 Kemmink, J. & Creighton, T. E. H. J. & Wright, P. E. (1997).
(1993). Local conformations of Populating the equilibrium molten
peptides representing the entire globule state of apomyoglobin under
sequence of bovine pancreatic trypsin conditions suitable for structural
inhibitor and their roles in folding. characterization by NMR. FEBS Lett
J Mol Biol 234, 861–878. 417, 92–96.
131 Kemmink, J. & Creighton, T. E. 141 Yao, J., Chung, J., Eliezer, D.,
(1995). The physical properties of local Wright, P. E. & Dyson, H. J. (2001).
interactions of tyrosine residues in NMR structural and dynamic
peptides and unfolded proteins. J Mol characterization of the acid-unfolded
Biol 245, 251–260. state of apomyoglobin provides
132 Lumb, K. J. & Kim, P. S. (1994). insights into the early events in
Formation of a hydrophobic cluster in protein folding. Biochemistry 40, 3561–
denatured bovine pancreatic trypsin 3571.
inhibitor. J Mol Biol 236, 412–420. 142 Kitahara, R., Yamada, H., Akasaka,
133 Hennig, M., Bermel, W., Spencer, K. & Wright, P. E. (2002). High
A., Dobson, C. M., Smith, L. J. & pressure NMR reveals that
Schwalbe, H. (1999). Side-chain apomyoglobin is an equilibrium
conformations in an unfolded protein: mixture from the native to the
chi1 distributions in denatured hen unfolded. J Mol Biol 320, 311–319.
lysozyme determined by heteronuclear 143 Schwarzinger, S., Wright, P. E. &
13C, 15N NMR spectroscopy. J Mol Dyson, H. J. (2002). Molecular hinges
Biol 288, 705–723. in protein folding: the urea-denatured
134 McGregor, M. J., Islam, S. A. & state of apomyoglobin. Biochemistry
Sternberg, M. J. (1987). Analysis of 41, 12681–12686.
the relationship between side-chain 144 Arcus, V. L., Vuilleumier, S.,
conformation and secondary structure Freund, S. M., Bycroft, M. &
in globular proteins. J Mol Biol 198, Fersht, A. R. (1994). Toward solving
295–310. the folding pathway of barnase: the
135 Dunbrack, R. L., Jr. & Karplus, M. complete backbone 13C, 15N, and 1H
(1993). Backbone-dependent rotamer NMR assignments of its pH-
library for proteins. Application to denatured state. Proc Natl Acad Sci
side-chain prediction. J Mol Biol 230, USA 91, 9412–9416.
543–574. 145 Arcus, V. L., Vuilleumier, S.,
136 Shortle, D. R. (1996). Structural Freund, S. M., Bycroft, M. &
analysis of nonnative states of proteins Fersht, A. R. (1995). A comparison
by NMR methods. Curr Opin Struct of the pH, urea, and temperature-
Biol 6, 24–30. denatured states of barnase by
802 21 Conformation and Dynamics of Nonnative States of Proteins studied by NMR Spectroscopy
using 15N relaxation data exclusively. roles of the folded and unfolded
J Biomol NMR 6, 153–162. states. J Mol Biol 307, 913–928.
163 Farrow, N. A., Zhang, O., Forman- 172 Choy, W. Y., Mulder, F. A.,
Kay, J. D. & Kay, L. E. (1997). Crowhurst, K. A. et al. (2002).
Characterization of the backbone Distribution of molecular size within
dynamics of folded and denatured an unfolded state ensemble using
states of an SH3 domain. Biochemistry small-angle X-ray scattering and pulse
36, 2390–2402. field gradient NMR techniques. J Mol
164 Yang, D., Mok, Y. K., Forman-Kay, J. Biol 316, 101–112.
D., Farrow, N. A. & Kay, L. E. (1997). 173 Crowhurst, K. A., Tollinger, M. &
Contributions to protein entropy and Forman-Kay, J. D. (2002). Cooperative
heat capacity from bond vector interactions and a nonnative buried
motions measured by NMR spin Trp in the unfolded state of an SH3
relaxation. J Mol Biol 272, 790–804. domain. J Mol Biol 322, 163–178.
165 Zhang, O. & Forman-Kay, J. D. 174 Crowhurst, K. A. & Forman-Kay, J.
(1997). NMR studies of unfolded D. (2003). Aromatic and methyl NOEs
states of an SH3 domain in aqueous highlight hydrophobic clustering in
solution and denaturing conditions. the unfolded state of an SH3 domain.
Biochemistry 36, 3959–3970. Biochemistry 42, 8687–8695.
166 Zhang, O., Forman-Kay, J. D., 175 Tollinger, M., Forman-Kay, J. D. &
Shortle, D. & Kay, L. E. (1997). Kay, L. E. (2002). Measurement of
Triple-resonance NOESY-based side-chain carboxyl pK(a) values of
experiments with improved spectral glutamate and aspartate residues in an
resolution: applications to structural unfolded protein by multinuclear
characterization of unfolded, partially NMR spectroscopy. J Am Chem Soc
folded and folded proteins. J Biomol 124, 5714–5717.
NMR 9, 181–200. 176 Logan, T. M., Olejniczak, E. T., Xu,
167 Mok, Y. K., Kay, C. M., Kay, L. E. & R. X. & Fesik, S. W. (1993). A general
Forman-Kay, J. (1999). NOE data method for assigning NMR spectra of
demonstrating a compact unfolded denatured proteins using 3D
state for an SH3 domain under non- HC(CO)NH-TOCSY triple resonance
denaturing conditions. J Mol Biol 289, experiments. J Biomol NMR 3, 225–
619–638. 231.
168 Kortemme, T., Kelly, M. J., Kay, L. E., 177 Nordstrand, K., Ponstingl, H.,
Forman-Kay, J. & Serrano, L. (2000). Holmgren, A. & Otting, G. (1996).
Similarities between the spectrin SH3 Resonance assignment and structural
domain denatured state and its folding analysis of acid denatured E. coli [U-
transition state. J Mol Biol 297, 1217– 15N]-glutaredoxin 3: use of 3D 15N-
1229. HSQC-(TOCSY-NOESY)-15N-HSQC.
169 Choy, W. Y. & Forman-Kay, J. D. Eur Biophys J 24, 179–184.
(2001). Calculation of ensembles of 178 Radford, S. E., Woolfson, D. N.,
structures representing the unfolded Martin, S. R., Lowe, G. & Dobson,
state of an SH3 domain. J Mol Biol C. M. (1991). A three-disulphide
308, 1011–1032. derivative of hen lysozyme. Structure,
170 Tollinger, M., Skrynnikov, N. R., dynamics and stability. Biochem J
Mulder, F. A., Forman-Kay, J. D. & 273(Pt 1), 211–217.
Kay, L. E. (2001). Slow dynamics in 179 Radford, S. E., Buck, M., Topping,
folded and unfolded states of an SH3 K. D., Dobson, C. M. & Evans, P. A.
domain. J Am Chem Soc 123, 11341– (1992). Hydrogen exchange in native
11352. and denatured states of hen egg-white
171 Mok, Y. K., Elisseeva, E. L., lysozyme. Proteins 14, 237–248.
Davidson, A. R. & Forman-Kay, J. D. 180 Roux, P., Delepierre, M., Goldberg,
(2001). Dramatic stabilization of an M. E. & Chaffotte, A. F. (1997).
SH3 domain by a single substitution: Kinetics of secondary structure
804 21 Conformation and Dynamics of Nonnative States of Proteins studied by NMR Spectroscopy
recovery during the refolding of 190 Bentrop, D., Bertini, I., Iacoviello,
reduced hen egg white lysozyme. R. et al. (1999). Structural and
J Biol Chem 272, 24843–24849. dynamical properties of a partially
181 Chung, E. W., Nettleton, E. J., unfolded Fe4S4 protein: role of the
Morgan, C. J. et al. (1997). Hydrogen cofactor in protein folding.
exchange properties of proteins in Biochemistry 38, 4669–4680.
native and denatured states monitored 191 Harding, M. M., Williams, D. H. &
by mass spectrometry and NMR. Woolfson, D. N. (1991).
Protein Sci 6, 1316–1324. Characterization of a partially
182 Buck, M. (1998). Trifluoroethanol and denatured state of a protein by two-
colleagues: cosolvents come of age. dimensional NMR: reduction of the
Recent studies with peptides and hydrophobic interactions in ubiquitin.
proteins. Q Rev Biophys 31, 297–355. Biochemistry 30, 3120–3128.
183 Hennig, M., Bermel, W., Dobson, C. 192 Stockman, B. J., Euvrard, A. &
M., Smith, L. J. & Schwalbe, H. Scahill, T. A. (1993). Heteronuclear
(1999). Determination of side chain three-dimensional NMR spectroscopy
conformations in unfolded proteins: 1 of a partially denatured protein: the A-
Distribution for lysozyme by state of human ubiquitin. J Biomol
heteronuclear 13C,15N NMR NMR 3, 285–296.
spectroscopy. J Mol Biol 288, 705–723. 193 Nash, D. P. & Jonas, J. (1997).
184 Klein-Seetharaman, J., Oikawa, M., Structure of the pressure-assisted cold
Grimshaw, S. B. et al. (2002). Long- denatured state of ubiquitin. Biochem
range interactions within a nonnative Biophys Res Commun 238, 289–291.
protein. Science 295, 1719–1722. 194 Permi, P. (2002). Intraresidual HNCA:
185 Baldwin, R. L. (2002). Making a an experiment for correlating only
network of hydrophobic clusters. intraresidual backbone resonances.
Science 295, 1657–1658. J Biomol NMR 23, 201–209.
186 Noda, Y., Yokota, A., Horii, D. et al. 195 Mizushima, T., Hirao, T., Yoshida,
(2002). NMR structural study of two- Y. et al. (2004). Structural basis of
disulfide variant of hen lysozyme: sugar-recognizing ubiquitin ligase. Nat
2SS[6–127, 30–115] – a disulfide Struct Mol Biol 11, 365–370.
intermediate with a partly unfolded 196 Fletcher, C. M., McGuire, A. M.,
structure. Biochemistry 41, 2130–2139. Gingras, A. C. et al. (1998). 4E
187 Bhavesh, N. S., Panchal, S. C., binding proteins inhibit the
Mittal, R. & Hosur, R. V. (2001). translation factor eIF4E without folded
NMR identification of local structural structure. Biochemistry 37, 9–15.
preferences in HIV-1 protease tethered 197 Daughdrill, G. W., Hanely, L. J. &
heterodimer in 6 M guanidine Dahlquist, F. W. (1998). The C-
hydrochloride. FEBS Lett 509, 218– terminal half of the anti-sigma factor
224. FlgM contains a dynamic equilibrium
188 Panchal, S. C., Bhavesh, N. S. & solution structure favoring helical
Hosur, R. V. (2001). Improved 3D conformations. Biochemistry 37, 1076–
triple resonance experiments, HNN 1082.
and HN(C)N, for HN and 15N 198 Mogridge, J., Legault, P., Li, J., Van
sequential correlations in (13C, 15N) Oene, M. D., Kay, L. E. &
labeled proteins: application to Greenblatt, J. (1998). Independent
unfolded proteins. J Biomol NMR 20, ligand-induced folding of the RNA-
135–147. binding domain and two functionally
189 Tafer, H., Hiller, S., Hilty, C., distinct antitermination regions in the
Fernandez, C. & Wuthrich, K. phage lambda N protein. Mol Cell 1,
(2004). Nonrandom structure in the 265–275.
urea-unfolded Escherichia coli outer 199 De Guzman, R. N., Martinez-
membrane protein X (OmpX). Yamout, M. A., Dyson, H. J. &
Biochemistry 43, 860–869. Wright, P. E. (2004). Interaction of
References 805
22
Dynamics of Unfolded Polypeptide Chains
Beat Fierz and Thomas Kiefhaber
22.1
Introduction
During protein folding the polypeptide chain has to form specific side-chain and
backbone interactions on its way to the native state. An important issue in protein
folding is the rate at which a folding polypeptide chain can explore conformational
space in its search for energetically favorable interactions. Conformational search
within a polypeptide chain is limited by intrachain diffusion processes, i.e., by the
rate at which two points along the chain can form an interaction. The knowledge
of the rates of intrachain contact formation in polypeptide chains and their depen-
dence on amino acid sequence and chain length is therefore essential for an under-
standing of the dynamics of the earliest steps in protein folding and for the charac-
terization of the free energy barriers for protein folding reactions. In addition,
intrachain diffusion provides an upper limit for the speed at which a protein can
reach its native state just like free diffusion provides an upper limit for the rate
constant of bimolecular reactions. Free diffusion of particles in solution was
treated extensively almost 100 years ago by Einstein and von Smoluchowski [1–3]
but until recently, only little was known about the absolute time scales of intra-
chain diffusion in polypeptides. Numerous theoretical studies have been made to
investigate the process of intrachain diffusion in polymers [4–10]. These studies
derived scaling laws for intrachain diffusion and made predictions on the kinetic
behavior of the diffusion process but they were not able to give absolute numbers.
This chapter will briefly introduce some theoretical concepts used to treat chain
diffusion and then discuss experimental results on intrachain diffusion from dif-
ferent model systems.
22.2
Equilibrium Properties of Chain Molecules
Since chain dynamics strongly depend on the chain conformation we will briefly
present some polymer models for chain conformations. This topic is discussed in
H Ri-1 H O H R i+1
C
Ci-1 Ni
ωi ϕ Ci Ni+1
i
ψ
i
Ri H H
O
°
3.8A
Fig. 22.1. Schematic representation of a polypeptide chain.
The Ri denote the different amino acid side chains, the dashed
lines denote the virtual chain segment. Adopted from Ref. [12].
more detail in Chapter 20 and in Refs [11–15]. The most simple description on an
unfolded state of a protein is the idealized notion of a random coil. In a random
coil no specific interactions between residues or more generally chain segments
persist and a large conformational space is populated. Polypeptide chains are
rather complex polymers as sketched in Figure 22.1. Usually simplified models
are applied to calculate general properties of random coils.
22.2.1
The Freely Jointed Chain
hr 2 i ¼ nl 2 ð1Þ
For such an ideal chain the end-to-end vectors are normally distributed. The often-
used notion of a Gaussian chain, however, refers to a chain model with Gaussian
distributed bond lengths. Thus the end-to-end vector distribution is also Gaussian
and Eq. (1) holds. This chain model does not describe the local structure of a chain
correctly but the global properties of long chains are modeled in a realistic way.
22.2.2
Chain Stiffness
In a chain with restricted angles between two neighboring segments the correla-
tion between segments n; m is no longer zero for n 0 m but asymptotically ap-
22.2 Equilibrium Properties of Chain Molecules 811
proaches zero with increasing distance. The chain is not as flexible as the freely
jointed chain because every segment is influenced by its neighbors. This behavior
can be described in terms of chain stiffness. The end-to-end distance for such a
chain is larger than calculated from Eq. (1). Flory introduced the characteristic ratio
(C n ) as a measure for the dimensions of a stiff chain compared to a freely jointed
chain [12]
hr 2 i ¼ C n nl 2 ð2Þ
For short chains C n increases with chain length due to preferential chain propaga-
tion into one direction (see Figure 22.2). For long chains ðn ! yÞ C n reaches a
constant limiting value ðCy Þ.
hr 2 i ¼ Cy nl 2 ð3Þ
In this limit hr 2 i grows proportional to n, like an ideal random-walk chain (see Eq.
(1)). However, in a real chain hr 2 i increases by a factor of Cy more per segment
compared with an ideal random-walk chain. The value of Cy gives the average
number of consecutive chain segments that propagate in the same direction (‘‘sta-
tistical segment’’). Consequently, for an ideal Gaussian chain where there is no
correlation between the bond vectors, the limiting characteristic ratio is 1 (see Eq.
(1)). For real chains Cy is larger than 1 and larger values of Cy indicate stiffer
chains.
Kuhn showed that for chains with limited flexibility and n bonds of length l an
equivalent to the freely jointed chain can be defined introducing an hypothetical
bond length b, the Kuhn length (b > l) [16, 17]. The Kuhn length is the length of
chain segments that can move freely (i.e., without experiencing chain stiffness).
The maximal length of the chain (rmax ) is the same as for a freely jointed chain
and thus rmax ¼ nl ¼ n 0 b with n 0 < n. The Kuhn length b is defined as
Another widely used term is the persistence length l p . It is a measure for the
distance that an infinitely long chain continues in the same direction (i.e., is
persistent). The Kuhn and the persistence lengths are connected by the simple
relationship
b ¼ 2l p ð5Þ
22.2.3
Polypeptide Chains
Any real macromolecule like the polypeptide chain has defined segments which
are connected by chemical bonds. Bond angles are constrained and the distribution
812 22 Dynamics of Unfolded Polypeptide Chains
8
polyserine
6
Cn
4
poly(glycine-serine)
2
polyglycine
0 20 40 60 80 100
n
Fig. 22.2. Effect of chain length (n) on the transformation matrices given by Flory [12].
characteristic ratio (C n ) for polyserine, The transformation matrix for alanine was used
polyglycine and a 50% mixture of serine and to calculate the properties of serine.
glycine. Cn was calculated using average
22.2.4
Excluded Volume Effects
In real (physical) chains two chain segments cannot occupy the same space. The
excluded volume effect was first discussed by Kuhn [16] and leads to a non-ideal
behavior of the chain. The chain dimensions increase as a result of the excluded
volume, which leads to larger end-to-end distances. Flory obtained an approximate
exponent for the chain dimensions including the contributions from the excluded
volume effect by simple calculations [11].
pffiffiffiffiffiffiffiffiffiffi
hr 2 i m l n n ; n ¼ 0:59 A or 3=5 ð6Þ
22.3 Theory of Polymer Dynamics 813
This shows that the excluded volume effect is especially important for long chains.
The excluded volume chain has been a subject of much research and is discussed
in detail in the literature [13, 23–25] (see also Chapter 20).
For long real chains the end-to-end distribution function, pðrÞ, depends on the
solvent conditions and on the temperature of the system if the intrachain interac-
tions have enthalpic contributions. pðrÞ for real chains can reasonably well be ap-
proximated by a skewed Gaussian function [26]
2
pðrÞ m 4pr 2 eððrr0 Þ=sÞ ð7Þ
where r is the end-to-end distance, s is the half-width of the distribution and r0 in-
dicates the shift of the skewed Gaussian from the origin. Polypeptides in solution
are complex molecules which may interact strongly with the solvent and with
themselves enabling proteins to fold. The unfolded state of natural proteins was
shown to contain both native and nonnative short-range and long-range interac-
tions [27–39] (see Chapter 21). But many of the properties of GdmCl-unfolded pro-
teins may be approximated by statistical chain models (see Section 22.4.2.4),
although in some proteins specific intrachain interactions have also been found
under strongly denaturing conditions [28, 39]. (see Chapter 21).
22.3
Theory of Polymer Dynamics
One of the major interests in polymer chemistry and biology is to elucidate the dy-
namic behavior of polymers in solution. Usually the dynamics of such molecules
are complex and cannot be easily described by classical concepts. This is especially
true for protein folding where the polypeptide chain not only moves freely in solu-
tion but also undergoes large cooperative structural transitions involving partially
folded and native states. In the following we will give a short introduction to the
theoretical concepts of polymer dynamics. More detailed information is given in
Refs [8, 13, 23, 24].
22.3.1
The Langevin Equation
The energy surface on which a system moves is often complex with sequential and
parallel events during the barrier crossing process (see Chapter 12.2). The motions
of the system exhibit a diffusional character. The polymer chain in solution is im-
mersed in solvent molecules which supply the energy for the movement by collid-
ing with the polymer chain and at the same time dissipate this energy by exerting a
frictional force on the molecule. The system can be described by the Langevin
equation for Brownian motion if one assumes that the relaxation time scale for
the solvent fluctuations is extremely short compared with the time scale of polymer
motion [40]:
814 22 Dynamics of Unfolded Polypeptide Chains
qUðxÞ
x€ ¼ M 1 gx_ þ M 1 Ffluc ðtÞ ð8Þ
qx
where M is the particle mass, x is the reaction coordinate, UðxÞ denotes the en-
ergy, and g is a friction coefficient, which is determined by the interactions of the
system with its environment and couples the system to the environment. It can be
related to solvent viscosity in real solutions. Ffluc ðtÞ is the random force which rep-
resents the thermal motion of the environment. The random force is modeled by
Gaussian white noise of zero mean and a d correlation function
hFfluc ðtÞi ¼ 0
hFfluc ðtÞ Ffluc ðt 0 Þi ¼ 2kB TgMdðt t 0 Þ ð9Þ
22.3.2
Rouse Model and Zimm Model
For long unfolded polypeptide chains, the simplified models used in polymer
theory should be applicable. The system, i.e., the peptide immersed in the solvent,
encounters no large barriers and undergoes random conformational changes. The
dynamic behavior of Gaussian chains can be described analytically using classical
polymer theory. Models of a higher complexity are only accessible, however, by
solving the equations numerically in computer simulations. Two models have
been initially proposed to describe polymer dynamics in dilute solutions: the Rouse
model [5] and the Zimm model [6]. In both models the polymer is treated as a set
of beads connected by harmonic springs. The Langevin equation (Eq. (8)) is used to
describe the brownian motion of the connected beads in the Rouse model. The
frictional and activating force from the solvent independently acts on all beads.
The movement of the chain is described as a set of n relaxation modes. These
modes can be compared to the vibrational modes of a violin string. Each mode p
is coupled to a relaxation time t p and involves the motion of a section of the chain
with n=p monomers. The mode with the longest relaxation time describes the over-
all motion (rotational relaxation) of the polymer chain. All other modes represent
internal motions.
Yet, the scaling laws obtained for Rouse chains are not consistent with experi-
mental results [24]. When a chain segment moves through a viscous solvent it
has to drag surrounding solvent molecules along. This creates a flow field around
the particle. Surrounding monomers are additionally affected by this flow field. As
a result, the motion of one chain segment influences the motion of all other seg-
ments by hydrodynamic interactions.
The resulting chain model including hydrodynamic interactions was put forward
by Zimm [6]. This model also gives internal relaxation modes, but the hydrody-
namic interactions lead to significant deviations from the Rouse model. The sol-
vent viscosity influences the system in two ways. First, it affects the random force
that delivers the energy and dissipates it at the same time (g, in the Langevin equa-
22.3 Theory of Polymer Dynamics 815
22.3.3
Dynamics of Loop Closure and the Szabo-Schulten-Schulten Theory
One particularly intensively treated dynamic event in polymer chains is the loop
closure or end-to-end diffusion reaction. Organic chemists are interested in loop
closure probabilities for the synthesis of cyclic compounds. Also in nucleic acids
end-to-end diffusion is important during formation of cyclic DNA. The kinetics of
such cyclization reactions critically depend on the rate of intrachain contact forma-
tion. A similar situation is encountered in protein folding, where residues have to
come together and form specific non-covalent interactions. Jacobsen and Stock-
mayer [4] derived a scaling law for end-to-end ring closure probability as a function
of the length of the polymer chain. The calculations were based entirely on en-
tropic contributions and yielded
where pring denotes the probability of ring formation and n is the number of
monomers in such a ring.
Szabo, Schulten, and Schulten [7] discussed the kinetics that can be expected
from a diffusion controlled encounter of two groups attached to the ends of a
Gaussian chain (SSS theory). They treated the loop closure reaction as an end-to-
end diffusion of a polymer on a potential surface. It was shown that the kinetics
of such a process can be approximated by an exponential decay of the probability
SðtÞ that the system has not reacted (i.e., has not made contact) at time t
The mean first passage time of contact formation can be related to the root mean
square end-to-end distance, r, by expanding Eq. (12):
pffiffiffiffiffiffiffiffiffiffi
t1 ¼ k contact m Dð hr 2 iÞ3 ð13Þ
for an ideal chain. The effect of viscosity on the rate of reaction is included in the
diffusion coefficient D, which is assumed to be inversely proportional to the sol-
vent viscosity. Hydrodynamic interactions, the excluded volume effect and other
non-idealities (e.g., a diffusion coefficient dependent on chain conformation) are
not considered in these approximations and probably contribute to the dynamics
and scaling laws in real systems.
22.4
Experimental Studies on the Dynamics in Unfolded Polypeptide Chains
22.4.1
Experimental Systems for the Study of Intrachain Diffusion
It is now widely accepted that the description of the unfolded state as a complete
random coil where the polypeptide chain behaves like an ideal polymer is not
accurate. Short stretches along the polypeptide chain behave in a highly non-ideal
manner because of chain stiffness, solvent interactions, and excluded volume. In
addition, there is evidence for residual short-range and long-range interactions in
unfolded proteins in water and even at high concentrations of urea and GdmCl
[27, 28, 34, 43, 44]. For the understanding of the dynamic properties of unfolded
proteins it is therefore essential to compare scaling laws and absolute rate con-
stants for chain diffusion in homopolymer models with the results from unfolded
proteins or protein fragments. Suitable experimental systems to study intrachain
diffusion should allow direct measurements of the kinetics of formation of van
der Waals contacts between specific groups on a polypeptide chain. To obtain abso-
lute rate constants, the applied method must be faster than chain diffusion and the
detection reaction itself should be diffusion controlled (see Appendix).
In the following, we will describe various experimental systems that have been
used to study the dynamics of intrachain diffusion. In Section 22.4.2 results from
these studies will be discussed in more detail in terms of dynamic properties of
unfolded polypeptide chains. A summary of the results is given in Tables 22.1 and
22.2.
k app (sC1 ) Loop size (n) a Method Labels b Sequence Conditions Reference
3:3 10 8 –1:1 10 8 4–8, 50c FRET Dansyl/naphthalene (heGln)n d Glycerol/TFE 48, 50c
2:8 10 4 60 Bond formation Heme/Met Unfolded cytochrome c 5.4 GdmCl, 40 C 52
5 10 7 –1:2 10 7 3–9 TTET Thioxanthone/NAla (Gly-Ser) x EtOH 50
9:1 10 6 10 Triplet quenching Trp/Cys (Ala-Gly-Gln)3 H2 O, phosphate 59
1:1 10 7 10 Triplet quenching Trp/cystine (Ala-Gly-Gln)3 H2 O, phosphate 59
1:7 10 7 10 Triplet quenching Trp/lipoate (Ala-Gly-Gln)3 H2 O, phosphate 59
6:2 10 4 10 Triplet quenching Trp/lipoate Pro9 H2 O, phosphate 59
2:4 10 7 10 Triplet quenching Trp/lipoate (Ala)2 Arg(Ala)4 ArgAla H2 O, phosphate 59
4:1 10 7 –1:1 10 7 1–21 Fluor. quenching DBO/Trp (Gly-Ser) x D2 O 65
3:9 10 7 –1 10 5 7 Fluor. quenching DBO/Trp Xaa6 e D2 O 66
6:6 10 6 8 Single molecule fluor. MR121/Trp Part of human p53 H2 O, PBS, 25 C 67
quenching
8:3 10 6 9 Single molecule fluor. MR121/Trp Part of human p53 H2 O, PBS, 25 C 67
quenching
1:8 10 8 –6:5 10 6 3–57 TTET Xanthone/NAla (Gly-Ser) x H2 O, 22.5 C 54
8 10 7 –3:4 10 7 3–12 TTET Xanthone/NAla (Ser) x H2 O, 22.5 C 54
2:5 10 8 –2:0 10 7 4 TTET Xanthone/NAla Ser-Xaa-Serf H2 O, 22.5 C 54
4:0 10 6 15 Triplet quenching Zn-porphyrin/Ru Unfolded cytochrome c 5.4 M GdmCl, 22 C 64
2:0 10 7 17 TTET Xanthone/NAla Carp parvalbumin H2 O, 22.5 C 55
an is the number of peptide bonds between the reacting groups.
b The structures of the labels are shown in Figure 22.3.
c Calculated in Ref. [50] from the diffusion coefficients and the donor–
f Xaa ¼ Gly, Ala, Ser, Gly, Arg, His, Ile, Pro. The highest (cis Pro) and
lowest values (trans Pro) for the observed rate constants are given. See
817
Figure 22.9.
818
Tab. 22.2. Comparison of end-to-end contact formation rates observed in different systems measured or extrapolated to n A 10.
k app (sC1 ) Loop size (n) Method Labels Sequence Conditions b Reference
N ðr; tÞ
with Nðr; tÞ ¼ ð15Þ
N0 ðr; tÞ
where R 0 is the distance of 50% FRET efficiency, which is a property of the FRET
labels, N0 ðrÞ denotes the equilibrium distribution of distances between donor and
acceptor, and t is the fluorescence lifetime of the donor. The first term (equilib-
rium term) on the right-hand side describes the change of donor excitation with
time through spontaneous decay and nonradiative energy transfer. The second
term (diffusion term) describes the replenishment of excited labels through brow-
nian motion. This dynamic component of FRET was exploited to determine diffu-
sion coefficients of segmental motion in short homopolypeptide chains (poly-N 5 -
(2-hydroxyethyl)-l-glutamine) in glycerol and trifluoroethanol mixtures [48, 49]
(see Chapter 17). Naphthyl- and dansyl-groups were used as donor and acceptor,
respectively (Figure 22.3A).
These studies allowed the first estimates of the time scale of chain diffusion pro-
cesses in polypeptide chains. Converting the reported average diffusion constants
reported by Haas and coworkers [48] into time constants (t ¼ 1=k) for contact
formation at van der Waals distance gives values between 9 and 3 ns for a donor–
acceptor separation of between 4 and 8 peptide bonds, respectively [50]. The dy-
namic component of FRET is sensitive to motions around the Förster distance
(R 0 ), which is around 22 Å for the dansyl–naphthyl pair used in these studies. It
therefore remained unclear whether the same diffusion constants represent the dy-
namics for formation of van der Waals contact between two groups. In addition,
analysis of the FRET data critically depends on the shape of the donor–acceptor
distribution function, which had to be introduced in the data analysis and thus
did not allow model-free analysis of the data. It was found that this distribution
can be described most accurately by the skewed Gaussian function mentioned
above [51]. It later turned out that the chain dynamics measured by FRET are sig-
nificantly faster than rate constants for contact formation (see below), which might
indicate a distance-dependent diffusion constant in unfolded polypeptide chains.
820 22 Dynamics of Unfolded Polypeptide Chains
A B
N(CH3)2 OH
O OH
HN O O Ri O O
H H H
S N N
N N C OH
H H H H2
H
O S N O H2C O
N H
O H O n
n
HN O
OH
C D
OH O Ri O
O O Ri O O H
H H N
H HS
N N N NH2
N C OH H H
N
H H H H2 NH2
H O
O H2C O N
O n
n
E
F N
S S O Ri O N
H
N
N NH2 HN
H H H
N
O
N
n O
O
H H
N OH
H2N N
H
O Ri
O
n
CH3
G CH3 S C Cys 14
H2
Ala 15
His
CH2 CH3 18
Xaa
CH2
n
COOH
22.4 Experimental Studies on the Dynamics in Unfolded Polypeptide Chains 821
FRET was also used in several proteins to determine donor–acceptor distance dis-
tribution functions and diffusion coefficients (see Chapter 17).
Rates of intrachain diffusion have also been estimated from the rate constant for
intramolecular bond formation in GdmCl-unfolded cytochrome c [52]. The reac-
tion of a methionine with a heme group separated by 50–60 amino acids along
the chain gave time constants around 35–40 ms. Extrapolating these data to shorter
distances using SSS or Jacobsen-Stockmeyer theory (k @ n1:5 ) gave an estimate of
1 ms for the time constant of contact formation at a distance of 10 peptide bonds,
which was predicted to be the distance for the maximum rate of contact formation
due to chain stiffness slowing down the diffusion process over shorter distances
[53]. A problem with this experimental approach is that the recombination reaction
of heme with methionine is not diffusion controlled. Thus, the chain dynamics
allow a faster breakage of the heme–methionine contact than formation of the
methionine–heme bond. Thus, not every contact was productive and the apparent
rate constants obtained in these measurements were slower than the absolute rate
constants measured with diffusion-controlled systems (see Experimental Protocols,
Section 22.7).
H——————————————————————————————————————————
Fig. 22.3. Structures of different experimental various homopolymer chains and natural
systems used to measure intrachain contact sequences with up to 57 amino acids between
formation. Results from these systems are donor and acceptor [54, 55]. D,E) Tryptophan
summarized in Table 22.1. A) Dansyl and and cysteine or lipoate-labeled peptides to
naphthyl-labeled poly-N 5 -(2-hydroxyethyl)-l- measure contact formation by Trp triplet
glutamine used to determine diffusion quenching in various short peptide chains [59,
constants by FRET [48]. B) Thioxanthone and 125, 126]. F) Short peptide chains (n < 10) for
naphthylalanine-labeled poly(Gly-Ser) peptides measuring DBO fluorescence quenching by
to measure contact formation by TTET [50]. C) tryptophan [65]. G) Quenching of Zn-porphyrin
Xanthone and naphthylalanine-labeled peptides triplets by Ru-His complexes in unfolded
to measure contact formation by TTET in cytochrome c [64].
822 22 Dynamics of Unfolded Polypeptide Chains
van der Waals contact to allow for electron transfer. This should be contrasted to
FRET, which occurs through dipole–dipole interactions and thus allows energy
transfer over larger distances (see above).
The triplet states of the labels have specific absorbance bands, which can easily
be monitored to measure the decay of the donor triplet states and the concomitant
increase of acceptor triplet states (Figure 22.5). The time constant for formation of
xanthone triplet states is around 2 ps [56]. TTET between xanthone and naph-
thyl acetic acid is faster than 2 ps and has a bimolecular transfer rate constant of
4 10 9 M1 s1 [54, 56], which is the value expected for a diffusion-controlled
reaction between small molecules in water (see Experimental Protocols, Section
22.7). Due to this fast photochemistry, TTET between xanthone and naphthalene
allows measurements of absolute rate constants for diffusion processes slower
than 10–20 ps (Figure 22.6 and Experimental Protocols, Section 22.7). The upper
22.4 Experimental Studies on the Dynamics in Unfolded Polypeptide Chains 823
0.08
Naphthylalanine
0.06
Absorbance
0.04
0.02
250
200
400 Xanthone
150
450
500 100 ns)
Wav 550 e(
eleng
th (n 600
50 Tim
m) 0
650
Fig. 22.5.Time-dependent change in the xanthone triplet absorbance band around
absorbance spectrum of a Xan-(Gly-Ser)14 - 590 nm is accompanied by a corresponding
NAla-Ser-Gly peptide after a 4 ns laser flash increase in the naphthalene triplet absorbance
at 355 nm. The decay in the intensity of the band around 420 nm. Adapted from Ref. [54].
limit of the experimental time window accessible by TTET is set by the intrinsic
lifetime of xanthone which is around 20–30 ms in water. Xanthone has a high
quantum yield for intersystem crossing (f ISC ¼ 0:99, e A 4000 M1 cm1 ) and the
triplet state has a strong absorbance band with a maximum around 590 nm in
water (eT590 A 10 000 M1 cm1 ). This allows single-pulse measurements at rather
low peptide concentrations (10–50 mM).
The low concentrations applied in the experiments also rule out contributions
from intermolecular transfer reactions that would have half-times higher than
50 ms in this concentration range [50, 54]. Also contributions from through-bond
transfer processes can be neglected, since this cannot occur over distances beyond
eight bonds [57, 58] and even the shortest peptides used in TTET studies had do-
nor and acceptor separated by 11 bonds.
Bieri et al. [50] initially used a derivate of thioxanthone as triplet donor and the
nonnatural amino acid naphthylalanine (NAla) as triplet acceptor (see Figure
22.3B). Because of the sensitivity of the triplet energy of the donor on solvent po-
larity the measurements had to be carried out in ethanol. TTET detected single ex-
ponential kinetics in all peptides with time constants of 20 ns for the shortest loops
in poly(glycine-serine)-based polypeptide chains [50]. The length dependence of
contact formation in these peptides did not exhibit a maximum rate constant for
transfer at n ¼ 10, in contrast to the predicted behavior [53] (see Section 22.4.2.2).
Based on these findings, the minimum time constant for intrachain contact forma-
tion was shown to be around 20 ns for short and flexible chains. The thioxanthone
used in the initial experiments as triplet donor was later replaced by xanthone [54],
which has a higher triplet energy than thioxanthone and thus allowed measure-
ments in water (see Figure 22.3C). These results showed two- to threefold faster
824 22 Dynamics of Unfolded Polypeptide Chains
rate constants for contact formation compared to the same peptides in ethanol
[54], setting the time constant for intrachain diffusion in short flexible chains in
water to 5–10 ns. The faster kinetics compared to the thioxanthone/NAla system
could be attributed to the effect of ethanol [54] (see below).
A disadvantage of TTET from xanthone to NAla in its application to natural pro-
teins is that Tyr, Trp, and Met interact with the xanthone triplet state either by
TTET or by triplet quenching and thus should not be present in the studied poly-
peptide chains [55].
Lapidus et al. [59] used a related system to measure chain dynamics in short
peptides. In this approach contact formation was measured by quenching of tryp-
tophan triplet states by cysteine (see Figure 22.3D). Tryptophan can be selectively
excited by a laserflash (f ISC ¼ 0:18, e266 A 3500 M1 cm1 , eT460 A 5000 M1 cm1 )
and its triplet decay can be monitored by absorbance spectroscopy [60]. The advan-
tage of this system is that donor chromophore and quencher groups are naturally
occurring amino acids, which can be introduced at any position in peptides and
proteins. As in the case on TTET from xanthone to naphthalene some amino acids
interfere with the measurements (e.g., Tyr, Met) since they interact with trypto-
phan triplets and should thus not be present in the studied polypeptide chains [61].
Major disadvantages of the Trp/Cys system are, (i) that the formation of trypto-
phan triplets is slow (t ¼ 3 ns) [62], (ii) that triplet quenching is accompanied by
the formation of S radicals [59], and (iii) that the quenching process is not diffu-
sion controlled [63]. These properties reduce the time window available for the
kinetic measurements and do not allow direct and model-free analysis of the
quenching time traces (see Experimental Protocols, Section 22.7). In addition, its
low quantum yield makes the detection of the triplet states and the data analysis
difficult, especially since the kinetics are obscured by radical absorbance bands. In
addition to cysteine quenching, two other systems were presented by the same
group [59] with cystine or the cyclic disulfide lipoate serving as quencher instead
of cysteine (Figure 22.3E). The advantage of using lipoate is that the quenching ki-
netics are much closer to the diffusion limit. Thus, the rates measured with this
system are generally faster. Still, all systems used for quenching of tryptophan trip-
lets gave significantly slower kinetics of intrachain contact formation compared to
TTET from xanthone to NAla in the same or in similar sequences (see Tables 22.1
and 22.2). This indicates that tryptophan triplet quenching does not allow mea-
surements of chain dynamics on the absolute time scale [63].
Another recent experimental approach investigated intrachain contact forma-
tion in unfolded cytochrome c using electron transfer from a triplet excited Zn-
porphyrine group to a Ru complex, which was bound to a specific histidine residue
(His33; Figure 22.3G). Since electron transfer is fast and close to the diffusion lim-
it these experiments should also yield absolute rate constants for chain diffusion.
Contact formation in the 15-amino-acid loop from cytochrome c was observed
with a time constant of 250 ns [64] in the presence of 5.4 M GdmCl. This is signif-
icantly faster than the dynamics in unfolded cytochrome c reported by Hagen et al.
[52] under similar conditions. However, the dynamics measured by Chang et al.
agree well with TTET measurements [50, 54, 55], when the kinetics are extrapo-
lated from 5.4 M GdmCl to water (see below).
22.4 Experimental Studies on the Dynamics in Unfolded Polypeptide Chains 825
22.4.2
Experimental Results on Dynamic Properties of Unfolded Polypeptide Chains
The different systems presented above were applied to measure intrachain contact
formation in a large variety of model peptides. The rates for contact formation have
kT kc k TT
2 ps < 2 ps
k -c
Fig. 22.6. Schematic representation of the allow determination of the absolute rate
triplet–triplet energy transfer (TTET) constant for contact formation (k c ) in the
experiments. Triplet donor and acceptor ensemble of unfolded conformation, if
groups are attached at specific positions on an formation of the triplet state (k T ) and the
unstructured polypeptide chain. Triplet states transfer process (k T T ) are much faster than
are produced in the donor group by a short chain dynamics (k T ; kT T g k c ; kc ; see
laser flash and transferred to the acceptor Experimental Protocols, Section 22.7). The rate
upon encounter at van der Waals distance in a constants given for k T and k T T were taken
diffusion-controlled process. The experiments from Ref. [56].
826 22 Dynamics of Unfolded Polypeptide Chains
been found to vary depending on the method applied, on the sequence of the pep-
tide and on the peptide length. Results from different groups are compared in
Tables 22.1 and 22.2. The results reveal that TTET and electron transfer, which were
shown to have fast photochemistry and diffusion-controlled transfer reactions, give
virtually identical rate constants and have the fastest contact rates of all applied
methods. These methods obviously allow the determination of absolute rate con-
stants for contact formation. In the following, we will discuss results from experi-
ments that used these methods to determine the effect of chain length, amino acid
sequence and solvent properties on the rate constants of contact formation.
0.12
n=9
n = 17
0.08
n = 29
A 590
0.04
0.00
0 1 . 10 -7 2 . 10 -7
Time (s)
5 . 10 -3
residuals
0
-5 . 10 -3
0 1 . 10 -7 2 . 10 -7
5 . 10 -3
residuals
0
-5 . 10 -3
0 1 . 10 -7 2 . 10 -7
5 . 10 -3
residuals
0
-5 . 10 -3
0 1 . 10 -7 2 . 10 -7
Time (s)
Fig. 22.7. Time course of formation and decay (n) between donor and acceptor are displayed.
of xanthone triplets in peptides of the form Additionally, single exponential fits of the data
Xan-(Gly-Ser) x -NAla-Ser-Gly after a 4 ns laser and the corresponding residuals are shown.
flash at t ¼ 0 measured by the change in The fits gave time constants shown in Figure
absorbance of the xanthone triplets at 590 nm. 22.8. Adapted from Ref. [54].
Data for different numbers of peptide bonds
sentative TTET kinetics for peptides from this series. As mentioned above, single
exponential kinetics for contact formation were observed for all peptides.
Figure 22.8 shows the effect of increasing loop size on the rate constant for con-
tact formation. For long loops (n > 20) the rate of contact formation decreases with
n1:7G0:1 (n is the number of peptide bonds between donor and acceptor). This in-
dicates a stronger effect of loop size on the rate of contact formation than expected
for purely entropy-controlled intrachain diffusion in ideal freely jointed Gaussian
828 22 Dynamics of Unfolded Polypeptide Chains
109
108
k c (s-1)
107
poly(Gly-Ser)
polyserine
parvalbumin loop
106
1 10 100
Number of peptide bonds (n)
Fig. 22.8. Comparison of the rate constants m ¼ 1:72 G 0:08 for the poly(glycine-serine)
(k c ) for end-to-end diffusion in poly(glycine- series and of k a ¼ ð8:7 G 0:8Þ 10 7 s1 ,
serine) (filled circles), polyserine (open circles) k b ¼ ð1:0 G 0:8Þ 10 10 s1 and m ¼ 2:1 G 0:3
and a parvalbumin loop fragment 85–102 for polyserine. The dashed lines indicate the
(triangles) measured by TTET between xanthone limiting cases for dynamics for short chains
and naphthylalanine. The data were fitted to (k c ¼ ka ) and long chains k c ¼ k b nm ). Data
Eq. (16) and gave results of k a ¼ ð1:8 G 0:2Þ were taken from Refs [54, 55].
10 8 s1 , k b ¼ ð6:7 G 1:6Þ 10 9 s1 and
chains, which should scale with k @ n1:5 (see above) [4, 7]. However, Flory already
pointed out that excluded volume effects should significantly influence the chain
dimensions [11]. Accounting for excluded volume effects in the end-to-end diffu-
sion model of Szabo, Schulten, and Schulten [7] gives k @ n1:8 , which is nearly
identical to the value found for the long poly(Gly-Ser) chains [54]. This indicates
that the dimensions and the dynamics of unfolded polypeptide chains in water
are significantly influenced by excluded volume effects, which is in agreement
with recent results on conformational properties of polypeptides derived from sim-
plified Ramachandran maps [25]. Additionally, hydrodynamic interactions and the
presence of small enthalpic barriers, which were observed in the temperature
dependence of intrachain diffusion might contribute to the length dependence
(F. Krieger and T. Kiefhaber, unpublished).
The observed simple scaling law breaks down for n < 20 and contact formation
becomes virtually independent of loop size for very short loops with a limiting
value of k a ¼ 1:8 10 8 s1 . As pointed out above, the limiting rate constant for
contact formation in short loops is not due to limits of TTET (k T T in Figure 22.6),
since this process is faster than 2 ps. Obviously, the intrinsic dynamics of polypep-
tide chains are limited by different processes for motions over short and over long
segments. This is in agreement with polymer theory, which suggests that the prop-
erties of short chains are strongly influenced by chain stiffness. This leads to a
breakdown of theoretically derived scaling laws for ideal chains [12]. To compare
the experimental results with predictions from polymer theory the effect of loop
size on the rate constants for contact formation (Figure 22.8) can be compared to
22.4 Experimental Studies on the Dynamics in Unfolded Polypeptide Chains 829
1
kc ¼ ð16Þ
1=k a þ 1=ðk b nm Þ
The fit of the experimental results to Eq. (16) is shown in Figure 22.8 and gives
a value of k a ¼ ð1:8 G 0:2Þ 10 8 s1 , k b ¼ ð6:7 G 1:6Þ 10 9 s1 , and m ¼
1:72 G 0:08 for poly(glycineserine), as discussed above.
ine are about two- to threefold slower than in the poly(glycine-serine) peptides,
which seems to be a small effect compared to the largely different properties ex-
pected for the stiffer polyserine chains (see Figure 22.2).
The decreased flexibility and the longer donor-acceptor distances in the polyser-
ine chains are probably compensated by the decreased conformational space avail-
able for polyserine compared to poly(glycine-serine) peptides. [54] For longer poly-
serine chains contact formation slows down with increasing loop size. The effect
of increasing loop size on the rates of contact formation seems to be slightly
larger in polyserine compared to the poly(glycine-serine) series (m ¼ 2:1 G 0:3;
k b ¼ ð1:0 G 0:8Þ 10 10 s1 ; see Eq. (16)). However, due to limitations in peptide
synthesis it was not possible to obtain longer peptides, which would be required
to get a more accurate scaling law for the polyserine peptides.
The kinetics of formation of short loops differ only by a factor of 2 for polyserine
compared with poly(glycine-serine), arguing for only little effect of amino acid se-
quence on local chain dynamics (Figure 22.8). The effect of other amino acids on
local chain dynamics was measured by performing TTET experiments on short
host–guest peptides of the canonical sequence Xan-Ser-Xaa-Ser-NAla-Ser-Gly using
the guest amino acids Xaa ¼ Gly, Ser, Ala, Ile, His, Glu, Arg, and Pro [54]. Figure
22.9 shows that the amino acid side chain indeed has only little effect on the rates
of contact formation. All amino acids except proline and glycine show very similar
dynamics. Interestingly, there is a small but significant difference in rate between
short side chains (Ala, Ser) and amino acids with longer side chains (Ile, Glu, Arg,
His). Obviously, chains that extend beyond the Cb -atom slightly decrease the rates
2.0×108
1.0×108
8.0×107
k c (s-1)
6.0×107
4.0×107
2.0×107
1.0×107
SGS SAS SSS SES SRS SHS SIS SPS SPS
trans cis
Sequence
Fig. 22.9.Effect of amino acid sequence on local chain
dynamics measured in host–guest peptides with the canonical
sequence Xan-Ser-Xaa-Ser-NAla-Ser-Gly. The guest amino acid
Xaa was varied and the rate constants for the different guest
amino acids are displayed. Data taken from Ref. [54].
22.4 Experimental Studies on the Dynamics in Unfolded Polypeptide Chains 831
of local chain dynamics, whereas charges do not influence the dynamics. Glycyl
and prolyl residues show significantly different dynamics compared with all other
amino acids, as expected from their largely different conformational properties
[19].
As shown before (Figure 22.8) glycine accelerates contact formation about
two- to threefold compared with serine. Proline shows slower and more com-
plex kinetics of contact formation with two rate constants of k 1 ¼ 2:5 10 8 s1
and k2 ¼ 2 10 7 s1 and respective amplitudes of A1 ¼ 20 G 5% and A2 ¼
80 G 5%. This essentially reflects the cis–trans ratio at the Ser-Pro peptide bond in
our host–guest peptide, which has a cis content of 16 G 2% as determined by 1D
1
H-NMR spectroscopy using the method described by Reimer et al. [72]. Since the
rate of cis–trans isomerization is slow (t @ 20 s at 22 C) there is no equilibration
between the two isomers on the time scale of the TTET experiments. This allows
the measurement of the dynamics of both the trans and the cis form. The re-
sults show that the two isomers significantly differ in their dynamic properties of
i; i þ 4 contact formation and that the cis-prolyl isomer actually shows the fastest
rate of local contact formation of all peptides (Figure 22.9).
1.0×108
8.0×107
A
6.0×107
k c (s-1) 4.0×107
2.0×107
1.0×107
0 2 4 6 8 10 12
Concentration (M)
B
108
k c (s-1)
107
106
1 10 100
Number of peptide bonds (n)
Fig. 22.10. A) Effect of various co-solvents on donor–acceptor distance (n) on the rate
the dynamics of intrachain contact formation constant of contact formation in series of
in a Xan-(Gly-Ser)4 -NAla-Ser-Gly peptide poly(Gly-Ser) peptides (cf. Figure 22.8). The
measured by TTET. Measurements were rate constants for contact formation in water
performed in aqueous solutions in the (filled circles) are compared with the values in
presence of ethanol (diamonds), urea (filled 8 M GdmCl (open circles). Data taken from
circles) and GdmCl (open circles). B) Effect of Ref. [54].
water, although the dynamics are significantly slowed down in the presence of
GdmCl both for formation of short and long loops (Figure 22.10). For long pepti-
des the effect of chain length on the rates of contact formation is similar in 8 M
GdmCl and water with k c @ n1:8G0:1 . These results revealed that denaturants like
GdmCl and urea slow down local chain dynamics but lead to more flexible chains
that behave like ideal polymers already at shorter donor–acceptor distances (Figure
22.10B). These seemingly contradicting findings can be rationalized based on
the effect of denaturants on the conformational properties of polypeptide chains.
Unlike water, solutions with high concentrations of denaturants represent good
solvents for polypeptide chains. This reduces the strength of intramolecular in-
22.4 Experimental Studies on the Dynamics in Unfolded Polypeptide Chains 833
teractions like hydrogen bonds and van der Waals interactions relative to peptide–
solvent interactions, which makes unstructured polypeptide chains more flexible
and leads to a behavior expected for an unperturbed chain.
In agreement with this interpretation the effect of loop size on the rates of con-
tact formation in 8 M GdmCl is close to the behavior of an unperturbed polypep-
tide chain predicted by Flory and coworkers [21] (see Figure 22.2). The decreased
rate of contact formation at high denaturant concentrations can only in part be
explained by an increased solvent viscosity. Additional effects like increased
donor–acceptor distance, which is expected in good solvents compared to water
and denaturant binding might also contribute to the decreased rate constants [74].
k0
k¼ ð17Þ
hb
with b a 1 (Figure 22.11). In the limit for long chains (n > 15) the rate of contact
formation was found to be inversely proportional to the viscosity (b ¼ 1) indepen-
dent of the co-solvent used to alter the viscosity of the solution. In the limit for
108
k c (s-1)
107
106
short chains (n < 10), b was smaller than unity (0.8–0.9) [50]. This is not compati-
ble with theoretical consideration which predict b ¼ 1 as a result from Stokes law
relating the solvent viscosity and the diffusion coefficient. b-values < 1 have also
been observed for motions of simpler polymers in organic solvents [75–77] and
for dynamics of native proteins [78–80].
Several origins have been proposed to explain such behavior like either position-
dependent [81] or frequency-dependent [82, 83] friction coefficients. In addition,
experiments on myoglobin [80] suggested that deviations from the Stokes law can
also be caused by the structure of protein–solvent interfaces, which preferably con-
tain water molecules and have the viscous co-solvent molecules preferentially ex-
cluded [84]. This results in a smaller microviscosity around the protein compared
with bulk solvent. Additional experiments will be needed to identify the origin of
fractional viscosity dependencies of dynamics in short chains.
II
III
ics are in agreement with dynamics expected from the host–guest studies shown
in Figure 22.9. The 18-amino-acid EF-loop of carp parvalbumin connects two a-
helices and brings two phenylalanine residues into contact in the native protein,
which were replaced by xanthone and naphthylalanine to measure TTET kinetics
(Figure 22.13) [55]. The measured kinetics were single exponential (Figure 22.14)
and the time constant for contact formation (t ¼ 50 ns) was comparable to the dy-
namics for polyserine chains of the same length (Figure 22.8). The EF-loop con-
tains several large amino acids like Ile, Val, and Leu but also four glycyl residues.
Obviously, the effects of slower chain dynamics of bulkier side chains and faster
dynamics around glycyl residues compensate, which leads to dynamics comparable
to polyserine chains. These results indicate that polyserine is a good model to esti-
mate the dynamics of glycine-containing loop sequences in proteins.
The Zn-porphyrine/Ru system was used to measure contact formation in cyto-
chrome c unfolded in 5.4 M GdmCl [64]. The donor and quencher groups were
separated by 15 amino acids (Figure 22.13) and contact formation was significantly
slower (t ¼ 250 ns) than for polyserine peptide chains of the same length. How-
ever, correcting for the effects of denaturant concentration (see Figure 22.10) and
end-extensions (Section 22.4.2.6) on chain dynamics gives rate constants for the cy-
tochrome c sequence which are comparable to the parvalbumin EF-loop in water.
836 22 Dynamics of Unfolded Polypeptide Chains
Phe85
Phe102
B
Residues 85-102 in carp parvalbumin:
O
H
N Lys Gly Ser Gly Gly Ile Val Glu
Leu Ala Asp Asp Asp Lys Gly Asp HN
O
acceptor
Chemically synthesized fragment with TTET labels:
O O
Lys Gly Ser Gly Gly Ile Val Glu Ser NH2
Leu Ala Asp Asp Asp Lys Gly Asp HN Gly
O
O
donor
C
Residues 14-33 in horse heart cytochrome c with triplet quenching labels:
quencher
3+
Ru(NH3)5 His
Ala Cys Thr Glu Gly Lys Lys Gly Asn
Cys Gln His Val Lys Gly His Thr Pro Leu
Zn
triplet label
22.5 Implications for Protein Folding Kinetics 837
1×10-1
8×10-2
∆A 590
4×10-2
1×10-8
0 2×10 -7 4×10 -7
0.01
Residuals
0.00
-0.01
0 2×10 -7 4×10 -7
Time (s)
Fig. 22.14. Time course of formation and dynamics of contact formation can be
decay of xanthone triplets in parvalbumin loop described by a single exponential with a time
fragment 85–102 after a 4 ns laser flash at constant of 54 G 3 ns. Data are taken from
t ¼ 0. The change of xanthone triplet Ref. [55].
absorbance is measured at 590 nm. The
22.5
Implications for Protein Folding Kinetics
22.5.1
Rate of Contact Formation during the Earliest Steps in Protein Folding
The results on the time scales of intrachain diffusion in various unfolded polypep-
tide chains show that the amino acid sequence has only little effect on local dynam-
ics of polypeptide chains. All amino acids show very similar rates of end-to-end dif-
fusion with time constants between 12 and 20 ns for the formation of i; i þ 4
H——————————————————————————————————————————
Fig. 22.13. A) Ribbon diagram of the structure region (residues 85–102) and the synthesized
of carp muscle b-parvalbumin [127]. Phe85 and fragment labeled with the two phenylalanine
Phe102 are shown as space-fill models. The residues at position 85 and 102 replaced by
phenylalanine residues have been replaced by donor (xanthone) and acceptor (naphthylalanine)
the triplet donor and acceptor labels, xanthonic groups for triplet–triplet energy transfer. C)
acid and naphthylalanine, respectively. The Sequence from cytochrome c between the
figure was prepared using the program MolMol Zn-pophyrine group (excited triplet group) and
[128] and the PDB file 4CPV [127]. B) Sequence the Ru-His complex used in triplet quenching
of the carp muscle b-parvalbumin EF loop experiments reported by Chang et al. [64].
838 22 Dynamics of Unfolded Polypeptide Chains
which involves formation of an i; i þ 4 interaction [88, 89]. Since helices are usually
free of glycyl and prolyl residues the initiation cannot occur faster than in about
12–20 ns (Figures 22.8 and 22.9). It should be noted that these values do not rep-
resent time constants for nucleation of helices and hairpins (Figure 22.15), which
most likely requires formation of more than one specific interaction and most like-
ly encounters additional entropic and enthalpic barriers. The results on the rates of
intrachain contact formation rather represent an upper limit for the dynamics of
the earliest steps in secondary structure formation (i.e., < for formation of the first
contact during helix nucleation). They represent the prefactors (k 0 ) in the rate
equation for secondary structure formation
z
k ¼ k 0 eDG =RT
ð18Þ
The measured contact rates thus allow us to calculate the height of the free en-
ergy barriers (DGz ) when they are compared to measured rate constants (k) for
secondary structure formation. However, up to date no direct data on helix or hair-
pin formation in short model peptides are available. The dynamics of secondary
structures were mainly studied by relaxation techniques like dielectric relaxation
[90], ultrasonic absorbance [91], or temperature jump [92–94] starting from
predominately folded structures (for a review see Ref. [95]). Since helix–coil and
hairpin–coil transitions do not represent two-state systems, neither thermodynam-
ically nor kinetically, the time constants for unfolding can not be directly related to
rate constants of helix formation [96]. However, the observation of relaxation times
around 1 ms for helix unfolding allowed the estimation of time constants for the
growth steps of nucleated helices of about 1 to 10 ns, depending on the experimen-
tal system [90, 91, 93].
It may be argued that intrachain contacts in a polypeptide chain with a stronger
bias towards folded structures can form faster than the observed dynamics in un-
structured model polypeptide chains. However, weak interactions like van der
Waals contacts between side chains and hydrogen bonds should dominate the early
interactions during the folding process and the data on the dynamics of unfolded
chains will provide a good model for the earliest events in folding. A stronger en-
ergy bias towards the native state will mainly increase the strength or the number
of these interactions but not their dynamics of formation.
22.5.2
The Speed Limit of Protein Folding vs. the Pre-exponential Factor
Many small single domain proteins have been found to fold very fast, some of
them even on the 10 to 100 ms time scale (for a review see Ref. [97]). These fast
folding proteins include a-helical proteins like the monomeric l-repressor [98, 99]
or the engrailed homeodomain [100], b-proteins like cold shock protein of Bacillus
subtilis or B. cacodylicus [101] or the WW-domains [102] and a,b-proteins like the
single chain arc repressor [103]. There has been effort to design proteins that fold
even faster [104, 105]. A folding time constant of 4.1 ms has been reported for a
designed Trp-cage.
840 22 Dynamics of Unfolded Polypeptide Chains
These very fast folding proteins sparked interest in finding the speed limit for
protein folding, which is closely related to the speed limit of the fundamental steps
of protein folding and thus also to the dynamics of the unfolded chain. The results
on dynamics of contact formation in short peptides showed that no specific inter-
actions between two points on the polypeptide chain can be formed faster than on
the 5–10 ns time scale. Considering that dynamics are slowed down if the two
groups are located in the interior of a long chain this time constant increases to
about 50 ns. Formation of intrachain interactions represents the elementary step
in the conformational search on the free energy landscape. Thus, the dynamics of
this process set the absolute speed limit for the folding reaction. However, it will
not necessarily represent the prefactor (k 0 ) for the folding process which repre-
sents the maximum rate for protein folding in the absence of free energy barriers
(DGz ¼ 0, see Eq. (18)). It was shown that even fast apparent two-state folding pro-
ceeds through local minima and maxima on the free energy landscape [106, 107]
and up to date it is unclear which processes contribute to the rate-limiting steps in
protein folding (see Chapters 12.1, 12.2 and 14).
The transition state for folding seems to be native-like in topology but still partly
solvated [108–111], Thus, both protein motions and dynamics of protein–solvent
interactions in a native-like topology might contribute to the pre-exponential factor
and k 0 will probably depend on the protein and on the location of the transition
state along the reaction coordinate, which may change upon change in solvent con-
ditions or mutation (see Section 5.3) [44, 106, 107].
22.5.3
Contributions of Chain Dynamics to Rate- and Equilibrium Constants for Protein
Folding Reactions
U(x)
kf
ku
ωb
ωU0 E fb E ub
ω0N
xU x TS xN
x
U TS N
Fig. 22.16. Potential U(x) with two states N the forward and back reaction. o denotes the
and U separated by an energy barriers. Escape frequencies of phase point motion at the
from state U to N occurs via the rates k f and extremes of the potential. The drawing was
k u . Ebf and Ebu are the activation energies for adapted from Ref. [114].
respond to most reactions in solution and also to protein folding reactions, the rate
of the reaction decreases when the friction is further increased. In this limit the
equation for the rate as a function of friction, barrier height and temperature can
be written as:
o0 oB ðEb =kTÞ
k¼ e ð19Þ
2pg
where o stands for the frequency of motion of the particle in the starting well (o0 )
and on top of the barrier (oB ). o depends on the shape of the barrier and on the
nature of the free energy surface [114]. By applying Stokes law this equation can
then readily be expressed in terms of thermodynamic parameters and solvent
viscosity.
C
k¼ expðDGz =RTÞ ¼ k 0 expðDGz =RTÞ ð20Þ
hðTÞ
It is not clear whether Kramers’ approach in its simple form as described above
is directly applicable to protein folding reactions, since it is difficult to measure
the viscosity-dependence of folding and unfolding rate constants. The co-solvents
used to modify solvent viscosity like glycerol, sucrose, or ethylene glycol generally
stabilize the native state [84] and in part also the transition state of folding. This
makes a quantification of the effect of solvent viscosity on protein folding rate con-
stants difficult [115, 116].
842 22 Dynamics of Unfolded Polypeptide Chains
The results from the viscosity dependence of end-to-end contact formation show
that the rate constants in short peptides (n < 10) are not proportional to h1 but
rather to h0:8 . This suggests that the presence of barriers in these peptides leads
to a more complex behavior than expected from Eq. (20) and indicates that the fric-
tional coefficient from Kramers’ theory (g, see. Eq. (19)) is not only determined by
solvent viscosity for protein folding reactions.
Application of Kramers’ theory to protein folding reactions shows the problems
with quantitatively characterizing the barriers for protein folding. It is very difficult
to determine pre-exponential factors for the folding and unfolding reaction, which
are most likely different for each protein. Since the pre-exponential factors depend
on the shape of the potential and on the dynamics in the individual wells (states),
o0 will be different for the refolding reaction starting from unfolded protein (oU 0)
and for the unfolding reaction starting from native protein (oN 0 ) and hence also the
prefactors will be different for the forward and backward reactions. However,
under all conditions Eq. (21) must be valid [117]:
kf
K eq ¼ ð21Þ
ku
DG ¼ DGz z
f DGu þ DGpref
ku0
DGpref ¼ RT ln ð23Þ
kf0
kf0 ðwtÞ
DDGz z
f ðappÞ ¼ DDGf þ RT ln ð24Þ
kf0 ðmutantÞ
As in the example above, a fivefold increase in k 0 of a mutant would apparently
decrease DGz 1
f by 4 kJ mol . This would have large effects on the interpretation of
the results in terms of transition state structure (f-value analysis, see Chapters 12.2
and 13), and might contribute to the large uncertainty in f-value from mutants
that lead to small changes in DG [111].
Mutants that disrupt residual interactions in the unfolded state will most likely
have only little effect on k 0 for unfolding (ku0 ), which should be determined by
chain motions in the native state. Most native protein structures and transitions
state structures were shown to be rather robust against mutations [44]. Thus, mu-
tations which change the dynamics of unfolded proteins will result in an apparent
change in DG , in analogy to the effects discussed above (Eq. (24)).
Figure 22.10 shows that denaturants like urea and GdmCl significantly decrease
the rate constants for intrachain diffusion (k c ). According to the considerations
discussed above, this will also affect the denaturant dependence of the rate con-
stants for protein folding (m f -values with m f ¼ qDGz f =q½Denaturant). Even if
pre-equilibria involving unstable intermediates may dominate the early stages of
folding, the observed denaturant dependence of intrachain diffusion will influence
these steps by changing the forward rate constants of these equilibria. The mc -
values (mc ¼ qðRT ln k c Þ=q½Denaturant for peptide dynamics vary only little
with chain length and show values around 0.35 kJ mol1 M1 for urea and 0.50
kJ mol1 M1 for GdmCl [74]. Typical m f -values for folding of the smallest fast
folding proteins with chain length between 40 and 50 amino acids are around 1.0
kJ mol1 M1 for urea, indicating that up to 30% of the measured m f -values may
arise from contributions of chain dynamics. Larger two-state folders consisting of
70–100 amino acids typically show m f -values around 3 kJ mol1 M1 for urea and
5 kJ mol1 M1 for GdmCl [107, 118], indicating that the denaturant dependence
of chain dynamics constitutes up to 10% of the experimental m f -values.
It is difficult to judge the effect of denaturants on the dynamics of the native
state, which will influence the experimental m u -values. However, it was observed
that the effect of denaturants on chain dynamics is mainly based on increased
chain dimensions (A. Möglich and T. Kiefhaber, unpublished results). This would
argue for only little effects of denaturants on the internal dynamics of the native
state and on the transition state structures, which were shown to be structurally
robust against changes in denaturant concentration [44, 107, 111]. Thus, the m u -
values should have only little contributions from the effects of denaturants on
chain dynamics.
Comparison of kinetic and equilibrium m-values (m eq ¼ qDG =q½Denaturant) is
frequently used to characterize protein folding transition states according to the
rate-equilibrium free energy relationship [44, 119, 120]:
mf
aD ¼ ð25Þ
m eq
844 22 Dynamics of Unfolded Polypeptide Chains
22.6
Conclusions and Outlook
Several experimental systems have been recently developed to measure rate con-
stants for intrachain diffusion processes. Experimental studies on the dynamics of
unfolded proteins have revealed single exponential kinetics for formation of spe-
cific intrachain interactions on the nanosecond time scale and have been able to
elucidate scaling laws and sequence dependence of chain dynamics. It has further
been shown that the results derived from homopolypeptide chains and host-guest
studies are in agreement with dynamics of loop formation in short natural se-
quences. Further studies should aim at the investigation of the dynamics in full
length unfolded proteins and in folding intermediates to obtain information on
the special equilibrium and dynamic properties of free energy landscapes for pro-
tein folding reactions.
22.7
Experimental Protocols and Instrumentation
22.7.1
Properties of the Electron Transfer Probes and Treatment of the Transfer Kinetics
The lifetime of excited states of the labels used in a specific system determine the
time scale on which experiments can be performed. Triplet states are very long
lived compared to fluorescence probes and thus allow studies in long chains and
in unfolded proteins. Donor groups which undergo very fast intersystem crossing
to the triplet state and which have a high quantum yield for intersystem crossing
are well-suited. The xanthone derivatives used by Bieri et al. [50] and Krieger et al.
[54, 55] form triplet states with a time constant of 2 ps and a quantum yield of 99%
[56]. Tryptophan in contrast, which was used by Lapidus et al., has about 18%
quantum yield [60] and forms triplet states with a time constant of 3 ns [62], which
sets a limit to the fastest processes that can be monitored using tryptophan as a
triplet donor. Triplet states usually have strong absorption bands that are easily ob-
servable. It is, however, important that the donor has an UV absorption band in
a spectral region where no other part of the protein absorbs (i.e., >300 nm). This
allows selective excitation of the donor. The requirements for the triplet acceptor
are that the molecule should have lower triplet energy than the donor to enable
exothermic TTET and it should not absorb light at the excitation wavelength of
the donor. Additionally, no radicals should be formed at any time in the reaction,
since they may damage the sample and their strong absorbance bands usually in-
terfere with the measurements.
Further requirements of experimental systems which allow to measure the
kinetics of contact formation on an absolute time scale are (i) that the process is
very strongly distance dependent to allow transfer only at van der Waals distance
between donor and acceptor and (ii) that the transfer reaction is faster than disso-
ciation of the complex so that each van der Waals contact between the labels leads
to transfer. The coupling of electron transfer to chain dynamics displayed in Fig-
ure 22.6 can be kinetically described by a three-state reaction if the excitation of the
donor is fast compared to chain dynamics (k c ; kc ) and electron transfer (k TT ).
kc k TT
O Ð C ! C ð26Þ
kc
O and C represent open chain conformations (no contact) and contact conforma-
tions of the chain, respectively. k c and kc are apparent rate constant for contact
formation and breakage, respectively. If the excited state quenching or transfer
(k TT ) reaction is on the same time scale or slower than breaking of the contact
(kc ), the measured rate constants (li ) are functions of all microscopic rate
constants.
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
B G B 2 4C
l1; 2 ¼
2
ð27Þ
B ¼ k c þ kc þ k TT
C ¼ k c k TT
846 22 Dynamics of Unfolded Polypeptide Chains
If the contact states have a much lower probability than the open conformations
(kc g k c ) the kinetics will be single exponential and the observed rate constant
corresponds to the smaller eigenvalue (l1 ) in Eq. (27). Since this is in agreement
with experimental results we will only consider this scenario in the following. We
can further simplify Eq. (27) if the time scales of electron transfer and chain dy-
namics are well separated. In the regime of fast electron transfer compared to
chain dynamics (k TT g k c ; kc ) Eq. (27) can be approximated by
l1 ¼ k c ð28Þ
This is the desired case, since the observed kinetics directly reflect the dynamics
of contact formation. In the regime of fast formation and breakage of the contact
compared to electron transfer (k TT f k c ; kc ) Eq. (27) can be approximated by
kc kc
l1 ¼ k TT G k TT ¼ K c k TT ð29Þ
k c þ kc kc
where K c reflects the ratio of contact conformations (C) over open chain conforma-
tions (O). In this limit the chain dynamics can not be measured but the fraction of
closed conformations can be determined. It should be kept in mind that these sim-
plifications only hold if the formation of excited states is fast compared to the fol-
lowing reactions. If the excitation is on the nanosecond time scale or slower, the
solutions of the linear four-state model have to be used to analyze the kinetics
[121, 122].
These considerations show that it is crucial to determine the rate constants for
the photochemical processes in the applied system to be able to interpret the ki-
netic data. The photochemistry of the donor excitation and of the electron transfer
process are usually characterized by investigating the isolated labels and the kinet-
ics should be studied on the femtosecond to the nanosecond time scale to gain
complete information on all photochemical processes in the system [56] (cf. Figure
22.6). To test whether each donor–acceptor contact leads to transfer it should be
tested whether the reaction is diffusion controlled. This can be done by studying
the bimolecular transfer process from the donor to acceptor groups and determine
its rate constants, its temperature-dependence and its viscosity dependence (F.
Krieger and T. Kiefhaber, in preparation). These tests will be described in detail in
the following. The results showed, that both the xanthone/NAla system [54–56]
and the porphyrin/Ru system [64] fulfill these requirements and thus allow mea-
surements of absolute rate constant for contact formation. The Trp/Cys triplet
quenching system, in contrast, is not diffusion controlled and gives significantly
slower apparent contact rates than the diffusion-controlled systems [59, 63]. Also
the DBO/Trp fluorescence quenching system does not seem to be completely diffu-
sion controlled and might have slow electron transfer kinetics, since it gives signif-
icantly slower dynamics [65] compared to the xanthone/NAla system in the same
peptides [54].
22.7 Experimental Protocols and Instrumentation 847
22.7.2
Test for Diffusion-controlled Reactions
k ¼ k q ½Q ð30Þ
k ¼ k 0 þ k q ½Q ð31Þ
Here, k 0 denotes the rate constant for triplet decay of the donor in absence of
quencher (acceptor) and k the time constant in presence of the acceptor or
quencher. For diffusion-controlled reactions of small molecules in water k q is
around 4–6 10 9 s1 .
2.0×10 7
1.5×10 7
k app (s-1)
1.0×10 7
5.0×10 6
0.0×10 0
0 1 2 3 4 5 6
[naphthylalanine] (mM)
Fig. 22.17. Pseudo first-order measurements of bimolecular
TTET from xanthonic acid to naphthylalanine in water.
Xanthone concentration was 30 mM. The slope gives a
bimolecular transfer constant (k q ) of 3 10 9 M1 s1 (see
Eq. (31)). The data were discussed in Ref. [54].
848 22 Dynamics of Unfolded Polypeptide Chains
kB T
D¼ ð32Þ
6prh
After correcting for the change in viscosity of the solvent with temperature, the
slope in an Arrhenius plot should not be higher than kB T. As any photophysical
reaction takes some time, diffusion control is only possible to a certain maximal
concentration of quencher molecules. After that the photochemistry of the quench-
ing or transfer process becomes rate limiting. This provides an upper limit of the
rate constants that can be measured, even if the system is diffusion controlled at
lower quencher concentrations. For the xanthone/naphthalene system TTET has
been shown to occur faster than 2 ps and thus this system is suitable to obtain ab-
solute time constants for all processes slower than 10–20 ps [56].
References 849
Nd:YAG laser
Oscilloscope
ll Arc lamp
Sample ce pulser
Computer
Mono-
chromator
mp
Xe arc la
tip lier
Photomul
22.7.3
Instrumentation
Acknowledgments
We thank Annett Bachmann, Florian Krieger, Robert Kübler and Andreas Möglich
for discussion and comments on the manuscript.
References
67 Neuweiler, H., Schulz, A., Böhmer, the chain local dynamics of pvc in
M., Enderlein, J. & Sauer, M. (2003). dilute solution. Carbon-13 NMR
Measurement of submicrosecond relaxation study. J. Polym. Sci. B 35,
intramolecular contact formation in 317–329.
peptides at the single-molecule level. 78 Beece, D., Eisenstein, L.,
J. Am. Chem. Soc. 125, 5324–5330. Frauenfelder, H. et al. (1980).
68 Zwanzig, R. (1997). Two-state models Solvent Viscosity and Protein
for protein folding. Proc. Natl Acad. Dynamics. Biochemistry 19, 5147–5157.
Sci. USA 94, 148–150. 79 Yedgar, S., Tetreau, C., Gavish, B.
69 Spörlein, S., Carstens, H., Satzger, & Lavalette, D. (1995). Viscosity
H. et al. (2002). Ultrafast spectroscopy dependence of O2 escape from
reveals sub-nanosecond peptide respiratory proteins as a function of
conformational dynamics and validates cosolvent molecular weight. Biophys. J.
molecular dynamics simulation. Proc. 68, 665–670.
Natl Acad. Sci. USA 99, 7998–8002. 80 Kleinert, T., Doster, W., Leyser, H.,
70 Bredenbeck, J., Helbing, J., Sieg, A. Petry, W., Schwarz, V. & Settles, M.
et al. (2003). Picosecond (1998). Solvent composition and
conformational transition and viscosity effects on the kinetics of CO
equilibration of a cyclic peptide. Proc. binding to horse myoglobin.
Natl Acad. Sci. USA 100, 6452–6457. Biochemistry 37, 717–733.
71 Schwalbe, H., Fiebig, K. M., Buck, 81 Gavish, B. (1980). Position-dependent
M. et al. (1997). Structural and viscosity effects on rate coefficients.
dynamical properties of a denatured Phys. Rev. Lett. 44, 1160–1163.
protein. Heteronuclear 3D NMR 82 Doster, W. (1983). Viscosity scaling
experiments and theoretical and protein dynamics. Biophys. Chem.
simulations of lysozyme in 8 M urea. 17, 97–103.
Biochemistry 36, 8977–8991. 83 Schlitter, J. (1988). Viscosity
72 Reimer, U., Scherer, G., Drewello, dependence of intramolecular
M., Kruber, S., Schutkowski, M. activated processes. Chem. Phys. 120,
& Fischer, G. (1998). Side-chain 187–197.
effects on peptidyl-prolyl cis/trans 84 Timasheff, S. N. (2002). Protein
isomerization. J. Mol. Biol. 279, 449– hydration, thermodynamic binding
460. and preferential hydration.
73 Pace, C. N. (1986). Determination and Biochemistry 41, 13473–13482.
analysis of urea and guanidine 85 Perico, A. & Beggiato, M. (1990).
hydrochloride denaturation curves. Intramolecular diffusion-controlled
Methods Enzymol. 131, 266–280. reactions in polymers in the optimized
74 Möglich, A., Krieger, F. & Rouse Zimm approach 1. The effects
Kiefhaber, T. (2004). Molecular basis of chain stiffness, reactive site
of the effect of urea and guanidinium positions, and site numbers.
chloride on the dynamics of unfolded Macromolecules 23, 797–803.
proteins. submitted. 86 Leszcynski, J. F. & Rose, G. D. (1986).
75 Zhu, W., Gisser, D. J. & Ediger, M. Loops in globular proteins: a novel
D. (1994). C-13 NMR measurements category of secondary structure.
of polybutadiene local dynamics in Science 234, 849–855.
dilute-solution – further evidence for 87 Wilmot, C. M. & Thornton, J. M.
non-Kramers behavior. J. Polym. Sci. B (1988). Analysis and prediction of the
Polym. Phys. 32, 2251–2262. different types of b-turns in proteins.
76 Zhu, W. & Ediger, M. D. (1997). J. Mol. Biol. 203, 221–232.
Viscosity dependence of polystyrene 88 Zimm, B. H. & Bragg, J. K. (1959).
local dynamics in dilute solutions. Theory of phase transition between
Macromolecules 30, 1205–1210. helix and random coil in polypeptide
77 Tylianakis, E. I., Dais, P. & Heatley, chains. J. Chem. Phys. 31, 526–535.
F. (1997). Non-Kramers’ behavior of 89 Lifson, S. & Roig, A. (1961). On the
854 22 Dynamics of Unfolded Polypeptide Chains
23
Equilibrium and Kinetically Observed Molten
Globule States
Kosuke Maki, Kiyoto Kamagata, and Kunihiro Kuwajima
23.1
Introduction
The molten globule state has captured the attention of researchers studying the
folding of globular proteins ever since it was proposed that this state is a general
intermediate of protein folding [1–4]. Historically, the apparent incompatibility be-
tween the efficient folding of a protein into the native structure and the availability
of an astronomically large number of different conformations, known as the Levin-
thal paradox [5], led to the premise that there must be specific pathways of folding,
so that by restricting the protein molecule to these pathways, the polypeptide chain
can reach its native structure efficiently. According to this view (the ‘‘classical’’
view), the folding occurs in a sequential manner with a series of intermediates
populated along the specific pathway of folding (the sequential model of protein
folding) [6–8]. Therefore, studies on protein folding were focused on detection
and characterization of the specific folding intermediates of globular proteins. The
discovery that the intermediates were formed during the kinetic folding of many
globular proteins convinced the researchers of the significance of the intermediates
in elucidating the mechanism of protein folding. The molten globule is the most
typical of these folding intermediates, and has been regarded as a general interme-
diate of protein folding.
Nevertheless, recent theoretical studies of protein folding have shown that what
is required for a protein to reach the unique native structure is not the presence of
the specific pathway of folding, but just the presence of a small bias of the free-
energy surface toward the native state in the multidimensional conformational hy-
perspace of the protein [9–11]. Therefore, according to this ‘‘new’’ view of protein
folding, the Levinthal paradox is no longer a real paradox. It is obvious that the at-
tempt to detect the folding intermediate to resolve the Levinthal paradox is now ir-
relevant, casting a fundamental doubt on the role of the molten globule state as a
productive intermediate in protein folding [11, 12]. Apparently in support of this
new perspective, kinetic folding of small, single-domain proteins with fewer than
typically 100 amino acid residues has been shown to occur simply in a two-state
manner without accumulation of the folding intermediates [13, 14]. Because the
folding can take place without the intermediates, they are apparently not prerequi-
sites for successful folding of these proteins. The molten globule state might not
be obligatory, but rather produced by misfolding events or kinetic traps at local
minima on the free-energy surface of the protein folding [3, 4, 15]. Therefore,
whether folding intermediates such as the molten globule state are necessary for
directing the folding reactions of globular proteins is still an open question, and
there has been intense debate on this issue.
However, it is probably more appropriate to address the following question: Is
the sequential model of protein folding truly in conflict with the simple two-state
folding of small globular proteins, which has been shown to support the new view
of protein folding? In spite of the argument in favor of the new perspective of fold-
ing, the experimental studies of the folding intermediates of various globular pro-
teins have clearly demonstrated that the molten globule is a real productive folding
intermediate in these proteins [16]. Therefore, there must be a more general model
that can accommodate both the sequential and the simple two-state folding reac-
tions of real globular proteins.
We have thus previously proposed a two-stage hierarchical model of protein fold-
ing [3, 4], in which the folding process of a protein is, in general, divided into at
least two stages: stage I, formation of the molten globule state from the unfolded
state; and stage II, formation of the native state from the molten globule, as shown
in Scheme 23.1.
I II
U → I →N
Scheme 23.1
The folding intermediates like the molten globule in this model are not to solve
the Levinthal paradox, but they are present because the protein folding takes place
in a hierarchical manner [17, 18], reflecting the hierarchy of three-dimensional
structure of natural proteins. In this model, the interactions that stabilize the mol-
ten globule intermediate should be approximately consistent with the interactions
that stabilize the native state. Whether or not the observed folding of a protein is a
two-state or non-two-state (multistate) transition may depend on the stability of the
intermediate (I) and/or the location of the rate-limiting step of folding. A main
purpose of this review is thus to test this hypothesis on the basis of recent studies
of various globular proteins.
In the present article, we thus first summarize the structural and thermody-
namic properties of the molten globules at equilibrium, and then describe how
the molten globule state has been identified as a productive intermediate in kinetic
refolding reactions of globular proteins. We also describe our recent statistical anal-
ysis of the kinetic refolding data of globular proteins taken from the literature,
which clearly demonstrates that the molecular mechanisms behind the non-two-
state and two-state protein folding are essentially identical. Finally, practical aspects
on the experimental study of the molten globule are presented.
858 23 Equilibrium and Kinetically Observed Molten Globule States
23.2
Equilibrium Molten Globule State
23.2.1
Structural Characteristics of the Molten Globule State
The molten globule state was originally an equilibrium unfolding intermediate as-
sumed by many globular proteins under mildly denaturing conditions [1, 19]. It is
structurally characterized by (i) the presence of a substantial amount of secondary
structure, (ii) the virtual absence of tertiary structure associated with tight pack-
ing of side-chains, (iii) the compact size of the protein molecule with a radius only
10–30% larger than that of the native state, (iv) the presence of a loosely packed
hydrophobic core that increases the hydrophobic surface exposed to solvent, and
(v) dynamic features of the structure that fluctuates in a time-scale longer than
nanoseconds. Thus, in short, the molten globule is a compact globule with a ‘‘mol-
ten’’ side chain structure [20].
The mildly denaturing conditions in which we can observe the molten globule
state at equilibrium include acid pH [1, 21, 22], moderate concentrations of dena-
turants [23–26], high temperatures [27, 28], low concentrations of alcohol and flu-
oroalcohol [29, 30], and high pressure [31–33]. It is, however, dependent on pro-
tein species whether the molten globule is stably populated under the conditions;
the protein may be still in the native state or more extensively unfolded. Certain
proteins extensively unfolded at acid pH are known to refold into the molten glob-
ule state in the presence of a stabilizing anion [34]. For ligand-binding proteins,
the partial unfolding induced by removal of a tightly bound ligand can bring about
the molten globule state [27, 35]. Chemical modification, site-directed mutation
and covalent bond cleavage of a natural protein often lead to the molten globule-
like partially unfolded state [36–40].
In conventional experimental studies, the molten globule state has been charac-
terized by peptide and aromatic circular dichroism (CD) spectra that detect second-
ary and tertiary structures, respectively [1, 41], by hydrodynamic techniques that
determine the molecular size of the protein [2, 42, 43], and by hydrophobic dye
(1-anilinonaphthalene-8-sulfonate (ANS))-binding experiments that detect forma-
tion of a loose hydrophobic core accessible to solvent [44].
In addition to these conventional techniques, new experimental techniques, such
as hydrogen/deuterium exchange NMR spectroscopy, NMR relaxation and hetero-
nuclear secondary chemical shifts and nuclear Overhauser enhancement (NOE)
measurements [45–53], solution small-angle X-ray scattering (SAXS) [54–57], and
protein engineering techniques [38–40, 58–63], have been used successfully to fur-
ther characterize the structure of the molten globule state in many globular pro-
teins during the past decade. Use of these techniques has enabled us to describe
the structure of the molten globule states more precisely, to at least the amino
acid residue level, and to reveal more detailed characteristics of this state.
Studies using these new techniques have shown the structure of the molten
globule state to be more heterogeneous than previously thought, that is, one por-
23.2 Equilibrium Molten Globule State 859
tion of the structure is more organized and native-like while the other portion is
less organized [49, 51, 64–66]. The degree of structural organization and the stabil-
ity of the molten globule state have both been found to be remarkably dependent
on the species of protein conformer, there being a range of species possible within
the basic molten globule characteristics of secondary structure, compactness and
hydrophobic core. Although the classical concept of little or no persistent tertiary
structure still often holds, it is not a universal requirement of the molten globule
state.
23.2.2
Typical Examples of the Equilibrium Molten Globule State
Representative examples for showing the diversity of the structure of the molten
globule state can be found in a-lactalbumin (a-LA) and Ca 2þ -binding lysozymes
from canine and equine milk [67–70] (Figure 23.1), which are homologous (123–
130 amino acid residues). These proteins exhibit the equilibrium molten globule
state at a moderate concentration of guanidinium chloride (GdmCl) as well as
acidic pH [1, 23–26, 71–73]. The overall structures of the molten globule state
of these proteins are similar; the a-helical domain (a-domain) composed of the
A-, B-, C-, and D-helices, and the C-terminal 310 -helix is more organized with the
b-domain less organized as revealed by hydrogen/deuterium exchange and other
techniques [22, 38–40, 47, 74–79]. Despite the similarity in overall structure, the
lysozymes are substantially more native-like than a-LA. Significant ellipticity in
the near-UV CD spectra and incomplete fluorescence quenching by acrylamide in-
dicate the presence of an immobile tryptophan residue buried inside the molecule
[22, 24, 25, 71, 80].
Apomyoglobin (apoMb), which is the apo-form of myoglobin, a 153-residue
heme protein with eight helices (A–H) (Figure 23.2) [81], illustrates the diversity
in the structure of the molten globule state [34, 82–84]. ApoMb forms a molten
globule at pH 4 (the pH 4 intermediate or I1 ) while the addition of a stabilizing
anion (trichloroacetate (TCA), citrate, or sulfate) refolds the protein to form the
second molten globule (I2 ) [85, 86]. A similar compact molten globule is formed
at acidic pH in the presence of chloride ion [34]. The structure of I1 has been stud-
ied by hydrogen/deuterium exchange NMR as well as by other NMR techniques
[45, 50, 66]. It has a compact subdomain consisting of the A-, G-, and H-helical
regions as well as part of the B-helix region.
Cytochrome c (cyt c), which is a 104-residue globular protein with a covalently
attached heme group surrounded by three a-helices (Figure 23.3) [87], forms a
compact intermediate that has the molten globule characteristics [20]. The molten
globule state stably populated at acidic pH in the presence of stabilizing anion
from salt or acid is more structurally organized and more stable than the molten
globule states of a-LA and apoMb [34, 88]. In contrast to the a-LA or apoMb molten
globules that correspond to the early folding intermediates, the cyt c molten glob-
ule corresponds to a late-folding intermediate that is present after the rate-limiting
step in the kinetic refolding [89, 90].
860 23 Equilibrium and Kinetically Observed Molten Globule States
Fig. 23.1.X-ray crystallographic structure of a) disulfide bonds are shown in yellow in ball-
human a-lactalbumin (PDB code: 1HML) and and-stick representation. One Ca 2þ ion is
b) canine milk lysozyme (PDB code: 1EL1) bound to a-LA and the lysozyme, shown as a
drawn using DS Modeling. Helices formed in green ball.
the molten globule state are in red. Four
23.2.3
Thermodynamic Properties of the Molten Globule State
Fig. 23.2. X-ray crystallographic structure of the structured molten globule state, is in pink.
sperm whale myoglobin in the holo form (PDB A bound heme and two tryptophan residues
code: 1MBC) drawn using DS Modeling. The (Trp7 and Trp14) are shown in ball-and-stick
A-, G-, and H-helix regions, which are stably representation. The heme is in green, and the
formed in the molten globule state, are in red. two tryptophan residues are in purple.
The B-helix region, which is partially formed in
23.2 Equilibrium Molten Globule State 861
Fig. 23.3. X-ray crystallographic structure of His18, Trp59, Met80, and a bound heme are
horse cyt c (PDB code: 1HRC) drawn using DS shown in ball-and-stick representation. His18
Modeling. The N-terminal and C-terminal and Met80 are in blue, and Trp59 is in purple.
helices, and the 60s helix, which are stably Heme is in green.
formed in the molten globule state, are in red.
ing from the native state. The best-characterized molten globules, such as the mol-
ten globule state of a-LA and the pH 4 intermediate of apoMb, do not apparently
exhibit cooperative thermal transitions [2, 27, 92–94], so that the absence of the
thermal transition was thought to be a characteristic of the molten globule state
for some time.
However, it is now clear that the molten globules of certain proteins do exhibit
a cooperative thermal unfolding transition with a significant increase in enthalpy
[24, 28, 73, 95–97]. Thus, the apparent absence of cooperative thermal unfolding
observed for the molten globules of a-LA and apoMb is due to their small enthalpy
change. The unfolding transitions of the molten globules are apparently noncoop-
erative for some proteins, but highly cooperative for others.
The most notable example of a cooperative thermal unfolding transition of a
molten globule is displayed by equine and canine milk lysozymes [24, 28, 73, 97].
Although the overall structure of these molten globule states is very similar to that
of the a-LA molten globule, the unfolding from these molten globules shows a
cooperative heat absorption peak. The molten globules of cyt c and staphylococcal
nuclease (SNase) also show cooperative thermal transitions [95, 96]. Therefore, it
appears that the cooperative thermal unfolding of the molten globule state is
more the rule than the exception.
The degree of cooperativity of the molten globule unfolding transition is, how-
ever, not only dependent on the protein species, but also, on the experimental con-
ditions or the nature of amino acid replacements for a given protein [59, 90, 98–
101]. The chloride-induced molten globule of apoMb at pH 2 shows a small but
distinct heat absorption peak on thermal unfolding, and the TCA-induced molten
globule is more stable and shows a still more cooperative unfolding transition [85,
99].
An important aspect concerning the variation in molten globule cooperativity is
862 23 Equilibrium and Kinetically Observed Molten Globule States
the correlation between the intrinsic stability and the degree of cooperativity of the
unfolding transition [101, 102]. The order of stability for the well-characterized mol-
ten globules for three proteins (a-LA, apoMb, and cyt c) is the same as the order of
the cooperativity of the molten globule unfolding, that is, a-LA < apoMb < cyt c
[23, 83, 86, 100, 103]. In apoMb, the molten globule state has been investigated
for different mutants in different anion conditions, and there is again a good cor-
relation between stability and cooperativity [101].
The correlation between the intrinsic stability and the degree of cooperativity of
the molten globule state may reasonably be interpreted in terms of the two-stage
hierarchical model of protein folding (see Scheme 23.1). In this model, the local
interactions and nonspecific hydrophobic interactions that stabilize the molten
globule state approximate, to a varying degree, the specific tertiary packing interac-
tions that stabilize the native structure. Thus, the presence of specific native inter-
actions further organizes structurally and stabilizes the molten globule state, and
makes the unfolding transition more cooperative.
23.3
The Kinetically Observed Molten Globule State
23.3.1
Observation and Identification of the Molten Globule State in Kinetic Refolding
23.3.2
Kinetics of Formation of the Early Folding Intermediates
The folding intermediates of globular proteins have been observed and character-
ized as the burst-phase intermediate in most cases because they already accumu-
lated during the dead time of typical stopped-flow instrumentations, which limited
the kinetic folding measurements to a time resolution of a few milliseconds. How-
ever, recent development of new techniques, which have dead times much shorter
than that of conventional stopped-flow instrumentations, allowed us to directly
monitor the folding kinetics, including the formation of the molten globule, in
submillisecond time region. These techniques include laser photochemically trig-
gering method [121, 122], the temperature-jump method with electrical or laser
triggering [123, 124], and the continuous-flow method [125–128].
The folding kinetics of apoMb from the cold-denatured state were monitored by
the nanosecond laser temperature-jump method with fluorescence detection, and
the subdomain assembly consisting of the A-, G-, and H-helices from the G-H
loop and nascent A helix was observed in 5–20 ms after initiation of the folding
[129]. CD and SAXS measurements of apoMb using the continuous-flow and
stopped-flow techniques have revealed that significant compaction and helix forma-
864 23 Equilibrium and Kinetically Observed Molten Globule States
tion occurs within @300 ms after initiation of the pH-induced refolding, which is
followed by the formation of an intermediate similar to the TCA-induced struc-
tured molten globule state of this protein [130].
The continuous-flow fluorescence measurements of cyt c revealed the accumula-
tion of a series of intermediates [131]. The earliest intermediate (IC , see Section
23.3.3) whose Trp fluorescence is fully quenched by the heme is formed with a
time constant of @65 ms, which is much shorter than a typical dead time (a few
milliseconds) of the stopped-flow measurement [131]. In fact, this intermediate
has been observed as the burst-phase intermediate when the stopped-flow method
is used. The continuous-flow CD and SAXS measurements also revealed two inter-
mediates along the sequential folding pathway of cyt c [132, 133]. The first inter-
mediate was, however, observed as the burst-phase intermediate because the dead
times of the measurements, 160–390 ms, were much longer than the time constant
of formation of the first intermediate. The first intermediate is more compact than
the unfolded state with @20% helicity of the native state, and may correspond to
the IC intermediate. The second intermediate formed with a time constant of
@400 ms is further more compact with a radius of gyration @30% larger than the
native state with @70% helicity of the native state, and hence its size is close to that
of the molten globule state of this protein [132, 133]. The second intermediate may
be more similar to the a-LA and apoMb molten globules that are known to be less
structurally organized and less stable than the salt-induced cyt c molten globule
(see Section 23.2.2).
23.3.3
Late Folding Intermediates and Structural Diversity
Although the molten globule states are formed in less than a few milliseconds as
burst-phase intermediates, in many of the proteins shown above, the equilibrium
molten globule states occasionally correspond to late-folding intermediates formed
after the burst-phase and other early intermediates have accumulated. The late-
folding intermediates, which are usually more structured than the burst-phase in-
termediates, often correspond to the structured molten globules formed by addi-
tion of stabilizing anions at equilibrium.
The best-studied example is the late-folding intermediate of cyt c [90, 134, 135].
Cyt c at neutral pH folds by a sequential mechanism, which involves three inter-
mediates (IC , INC , N*) as shown in Scheme 23.2 [89, 90, 108, 134].
U → IC → INC → N* → N
Scheme 23.2
deuterium exchange and mutagenesis techniques [89, 108, 136]. The last interme-
diate, N*, is known to affect the kinetics of unfolding, but does not accumulate
during refolding. It is structurally very close to the native state, but lacks the native
Met80 heme ligand. Studies on the KCl-induced kinetic refolding of the protein
from the acid-unfolded to the compact molten globule state suggest that the acid
compact equilibrium molten globule state may correspond to N* [90].
Similarly, apoMb accumulates at least two intermediates, Ia and Ib, in the kinetic
refolding, and the late intermediate, Ib, may correspond to the anion-induced,
more structured, molten globule state observed at equilibrium [86]. These inter-
mediates of apoMb may also correspond to the two folding intermediates detected
by the continuous-flow experiments (see Section 23.3.2).
Apparently, there is a diversity of molten globule states in terms of where they
accumulate along the folding pathway, and this is analogous to the diversity of
the equilibrium molten globule state that depends on the protein species and the
experimental conditions, as described in Section 23.2. When accumulation takes
place at a late stage of refolding, it is more structured than the early interme-
diate and may correspond to the structured molten globule state observed at
equilibrium.
23.3.4
Evidence for the On-pathway Folding Intermediate
The folding kinetics of certain globular proteins have been analyzed very carefully,
and hence we can address the question of whether the molten globule intermedi-
ate formed during kinetic folding is a productive, on-pathway intermediate, or not.
If the intermediate is productive, it must be placed between the fully unfolded and
the native states along the folding pathway. Observation of a lag phase during the
refolding is known to provide firm evidence that the intermediate is productive.
Jennings and her coworkers have studied kinetic refolding of interleukin-1b by
pulsed hydrogen/deuterium exchange mass spectrometric analysis, stopped-flow
CD, and fluorescence. The transient folding intermediate accumulates with a time
constant of 126 ms, and there is a lag phase in the production of the native state of
at least 400 ms, clearly indicating that the intermediate is a productive on-pathway
intermediate [137]. A similar lag phase has also been reported in SNase and dihy-
drofolate reductase (DHFR) [138–141], the refolding reactions of which have been
characterized by tryptophan fluorescence and methotrexate (an inhibitor of DHFR)
absorbance, respectively. In apoMb, two transient intermediates, Ia and Ib, accu-
mulate during refolding from the acid-unfolded state, and the kinetics analyzed
by interrupted refolding and interrupted unfolding experiments have been shown
to be consistent with a linear folding pathway (U ! Ia ! Ib ! N), in which both
are productive on-pathway intermediates [86].
A detailed analysis of Im7 folding and unfolding by monitoring the whole fluo-
rescence signal change using the continuous-flow and stopped-flow methods un-
ambiguously revealed that the kinetic folding intermediate of this protein is an
on-pathway, obligatory intermediate [142]. Similar quantitative analysis of the fold-
866 23 Equilibrium and Kinetically Observed Molten Globule States
ing and unfolding reactions of several proteins, including the B1 domain of pro-
tein G, ubiquitin, cyt c, and interleukin-1b, indicates that the experimentally ob-
served folding intermediates are on-pathway and obligatory [16, 143–146].
23.4
Two-stage Hierarchical Folding Funnel
Here, we consider the structural and energetic aspects of protein folding repre-
sented by the two-stage hierarchical model, and this together with the funnel rep-
resentation of protein folding will give us a reasonable picture of the structural
diversity of the molten globule state observed experimentally.
Structural changes and their associated energetic properties are different be-
tween the two stages in the two-stage hierarchical model of protein folding [3, 4].
In stage I, the protein molecule forms native-like secondary structure, tertiary fold
and compact shape, lacking specific side-chain packing, thus acquiring the broad
structural architecture of the native molecule. Local interactions that determine
preference for secondary structure, and nonspecific hydrophobic interactions that
determine the overall backbone topology and the compact shape are important in
this stage. On the other hand, in stage II the specific side-chain packing is orga-
nized, and specific van der Waals contacts are dominant in this process. Further-
more, the structure formed in stage I may be similar to an expanded version of
the structure formed in stage II, and the overall tertiary fold formed in stage I
brings the appropriate side chains close in space to enable the specific native pack-
ing interaction between them to take place and stabilize the overall native-like ter-
tiary fold.
Taking advantage of the funnel picture of protein folding, it is useful to describe
the two-stage hierarchical folding model in terms of the folding funnel. The pres-
ence of the two stages indicates the presence of two corresponding folding funnels,
I and II (Figure 23.4). The differences in the structural classes and energetics asso-
ciated with the two stages are reflected in differences in shape between the two
folding funnels. The compaction of protein, which occurs in stage I, results in a
much larger decrease in the conformational entropy, suggesting that funnel I is
much wider at its top than funnel II. The multiple specific tertiary packing interac-
tions occurring in stage II are enthalpic in nature, and, because they are not always
native-like, a previously formed interaction must often be broken to form a more
stable set of interactions later in stage II. There must, therefore, be a number of
enthalpic barriers along funnel II. Because the free-energy barrier is much more
entropic in funnel I, the energy landscape must be much more rugged in funnel
II.
The two-stage folding funnel provides a nice picture of why the experimentally
observed molten globule states are structurally diverse. As described in Section
23.2, some molten globules have no observable tertiary structure, while others
often contain native-like side-chain packing in a limited region of the molecule.
The degree of structure in the molten globule state depends on where the largest
23.5 Unification of the Folding Mechanism between Non-two-state and Two-state Proteins 867
Fig. 23.4. A schematic representation of the hierarchical folding funnel. See the text for details.
energetic barrier is located along funnel II. When it is located at the top of funnel
II, there will be no specific side-chain packing in the molten globule, but when
close to the bottom, a number of the specific side-chain interactions may already
be formed. In cases like apoMb, there may be two large energetic barriers along
funnel II, resulting in accumulation of two structurally different molten globules.
23.5
Unification of the Folding Mechanism between Non-two-state and Two-state Proteins
Now we re-address the question raised at the beginning of this chapter, because a
number of small globular proteins with fewer than 100 amino acid residues have
been found to exhibit two-state folding without any accumulation of the intermedi-
ate [14]. These two-state proteins often correspond to a part (domain) of an entire
protein molecule, and there are now more than 30 examples of the two-state pro-
teins, including Src homology 3 domain [147–150], E9 colicin-binding immunity
domain [151], cold shock protein B [152], N-terminal domain of l repressor [153],
IgG-binding domain of protein L [154], and chymotrypsin inhibitor 2 [155]. Appar-
ently, the accumulation of the folding intermediate is not a prerequisite for the suc-
cessful folding of these two-state proteins.
Questions thus arise. (i) Is the sequential model of protein folding truly in con-
flict with the simple two-state folding of small globular proteins? (ii) What is the
role of the molten globule intermediate in protein folding if the successful folding
takes place for the two-state proteins? (iii) What is a unified view of protein fold-
ing? Two possible answers to these questions are: (i) Protein folding is in principle
a highly cooperative two-state transition, and any molten globule-like species ob-
served at early stages of folding are not productive, but produced by kinetic trap-
868 23 Equilibrium and Kinetically Observed Molten Globule States
ping and misfolding events; and (ii) non-two-state folding with the molten globule
as a productive folding intermediate is more common in globular protein folding,
and the two-state folding observed in the small proteins is rather a simplified ver-
sion of the more common non-two-state folding. Which answer better represents
real protein folding is crucial to our understanding of the mechanism of folding.
A recent paper by Kamagata et al. has shed light on this issue, and strongly sug-
gested that the second answer is more appropriate in real globular proteins [156].
They have found that the rate constants for formation of the intermediate and the
native state of non-two-state proteins both show essentially the same dependence
on the native backbone structure as the rate constant of folding of two-state pro-
teins. This indicates that the folding mechanisms behind the non-two-state and
the two-state protein folding are essentially identical.
23.5.1
Statistical Analysis of the Folding Data of Non-two-state and Two-state Proteins
Kamagata et al. have collected the kinetic folding data of globular proteins from the
literature, classified the proteins into non-two-state and two-state proteins, and
investigated the relationships between the folding kinetics and the native three-
dimensional structure of these proteins. Classification of the proteins as two-state
folders was based on the criteria: (i) single-exponential refolding kinetics after ex-
clusion of slow isomerization steps such as proline isomerization in the unfolded
state, (ii) the absence of rollover behavior in the logarithmic folding rate constant
as a function of denaturant concentration, and (iii) the agreement of unfolding pa-
rameters between the equilibrium and kinetic experiments. The proteins that did
not satisfy these criteria might be non-two-state folders, but they also employed a
more rigorous rule to identify non-two-state proteins; namely, the protein that
showed the single-exponential refolding kinetics with the rollover behavior was
classified as a non-two-state folder only if one of the following criteria was satisfied:
(i) the presence of a burst-phase (i.e., missing amplitude) in the folding kinetics,
and/or (ii) the accumulation of a well-characterized kinetic folding intermediate.
Furthermore, only the proteins for which the kinetic folding mechanism had
been determined clearly were used for the analysis, and the proteins with a heme
group or disulfide bonds were excluded from the analysis. As a results, there were
16 natural proteins classified as the non-two-state folders; however, if we include
proteins that satisfied only the condition of rollover behavior, additional five pro-
teins could be included as well, resulting in 21 non-two-state folders. For 10 pro-
teins among these 21 folders, the folding kinetics were multiphasic, and the rate
constant for the formation of the folding intermediate was reported. On the other
hand, there were 18 proteins that were classified as two-state folders.
Figure 23.5 shows the relationship between the rate constant (kI ) for formation
of the folding intermediate and the parameter (the number of sequence-distant na-
tive pairs, Q D ) that represents the backbone topology of the native structure [157],
and the relationship between the rate constant (kN ) for formation of the native state
and Q D , in the non-two-state proteins, and these relationships are compared with
23.5 Unification of the Folding Mechanism between Non-two-state and Two-state Proteins 869
Fig. 23.5. A comparison of the folding rates of the 16- and 21-protein data set, respectively, of
non-two-state and two-state folders. The non-two-state folders. The solid lines represent
logarithmic rate constants are plotted against the best linear fit for the formation of the
the number of sequence-distant native pairs. native state for 18 two-state folders, the
Filled squares represent the folding of two- intermediate for 10 non-two-state folders, and
state folders. Filled triangles represent the the native state for 16 non-two-state folders,
formation of the intermediate of non-two-state respectively. The dashed line represents the
folders. Filled circles and open circles best linear fit for the formation of the native
represent the formation of the native state for state for 22 non-two-state folders [156].
the relationship between the folding rate constant (kUN ) and Q D in the two-state
proteins. It can be seen that all the rate constants (kI ; kN , and kUN ) are significantly
correlated with Q D that is determined by the native backbone topology as given by:
Lp X
X i1
QD ¼ Dij ð1Þ
i¼2 j¼1
where i and j are the residues numbers of two contacting residues for which the
Ca–Ca distance in space was within 6 Å in the PDB structure, and L p is the total
number of amino acid residues of a protein excluding the disordered terminal re-
gions. Dij ¼ 1 if i j > 12, and otherwise Dij ¼ 0. Similar correlations of these rate
constants were also found with other structure-based parameters, the absolute con-
tact order (ACO) [158] and cliquishness [159], but there were no significant corre-
lations between the rate constants for the non-two-state proteins (kI and kN ) and
the relative contact order (RCO) [160], and this is in contrast with the significant
correlation found between kUN and RCO in the two-state proteins. The difference
between the non-two-state and the two-state proteins with respect to the correlation
with RCO was ascribed to a difference in the chain-length distribution between the
two types of proteins. Because RCO is given by ACO/L p , the correlations of the log-
arithmic rate constants with ACO and the chain length canceled each other out in
the RCO for the non-two-state proteins that had a much wider distribution of L p .
The significant correlations of both the logarithmic rate constants, log kI and
log kN , with Q D (the correlation coefficient r ¼ 0:74@0:83) in the non-two-state
870 23 Equilibrium and Kinetically Observed Molten Globule States
proteins indicate that both the processes from the unfolded state (U) to the inter-
mediate (I) and from I to the native state (N) are rate-limited by the process of
forming a more native-like backbone topology. Protein molecules thus become pro-
gressively more native-like during the folding process from U to N via I, clearly
demonstrating that the kinetic intermediate of refolding in the case of non-two-
state proteins is a real, productive folding intermediate. Furthermore, both the
value of log kUN and its dependence on Q D in the two-state proteins are very simi-
lar to those found in the plots of log kI and log kN versus Q D for the non-two-state
proteins (Figure 23.5). This clearly demonstrates that the mechanism of folding
does not differ between the two classes of proteins. The non-two-state folding,
with the accumulation of a productive folding intermediate, may be a more com-
mon mechanism of protein folding, and the two-state folding may be apparently a
simplified version of the more common non-two-state folding.
23.5.2
A Unified Mechanism of Protein Folding: Hierarchy
values of less than 25, suggesting the validity of the first explanation. However,
when Q d is larger than 25, the two-state kUN shows variation in either the non-
two-state kI or kN, or between the kI and kN , and hence both mechanisms given by
the two explanations may play a role in the folding of two-state proteins.
However, it still remains unclear why the kN for non-two-state folders shows es-
sentially the same dependence on Q d as does the kI (Figure 23.5). This question
may be even more difficult to answer than those discussed above; however, the
known structural characteristics of the folding intermediates indicate that stage II
includes the process of the specific tertiary packing of side chains. It is possible
that this packing process is also dominated by the organization of the native back-
bone topology. Thus, the folding that occurs in stage II would again be correlated
with the native backbone topology, but would take place in a concerted manner
together with the tertiary packing of side chains and the organization of the back-
bone topology.
23.5.3
Hidden Folding Intermediates in Two-state Proteins
Although many small globular proteins fold fast and apparently in a two-state
manner without detectable intermediates, it is now becoming increasingly clear
that hidden meta-stable folding intermediate may exist along the folding pathway,
but behind the rate-limiting transition state. Sanchez and Kiefhaber have shown
that nonlinear activation free-energy relationships (i.e., nonlinear relationships of
the logarithmic rate constant of folding or unfolding versus denaturant concentra-
tion) reported for 23 two-state proteins are caused by sequential folding pathways
with consecutive distinct barriers and a few obligatory intermediate that are hidden
from direct observation by the high free energies of the intermediates [161]. Bai
has also suggested from native-state hydrogen/deuterium exchange data of a num-
ber of two-state proteins that partially folded intermediates may exist behind the
rate-limiting transition state in these proteins and evade detection by conventional
kinetic methods [162]. He proposed two types of the hidden intermediates: type
I intermediates, more stable than the unfolded state but hidden behind the rate-
limiting transition state located between the unfolded state and the intermediate,
and type II intermediates that are less stable than the unfolded state. These are
fully consistent with our explanation for the observation of the two-state folding
proteins (see Section 23.5.2).
It is thus concluded that there is essentially no difference in mechanism be-
tween the non-two-state and the two-state protein folding, and the apparent two-
state folding is merely a simplified version of more common non-two-state folding
that accumulates obligatory folding intermediates. The major difference between
the non-two-state and the two-state folding is the relative stability of the folding in-
termediates. In fact, non-two-state proteins also exhibit apparent two-state folding
and unfolding kinetics under a solution condition where the intermediates are de-
stabilized or by introduction of amino acid replacements that destabilize the inter-
mediates [163].
872 23 Equilibrium and Kinetically Observed Molten Globule States
23.6
Practical Aspects of the Experimental Study of Molten Globules
23.6.1
Observation of the Equilibrium Molten Globule State
NTU
where fN ðcÞ and fU ðcÞ ð fN þ fU ¼ 1Þ are the fraction native and unfolded at dena-
turant concentration c; AN and AU are the ideal values of the parameter in the
native and the unfolded states, respectively, and these are often obtained by linear
extrapolations of the dependence of the parameter values on c from the pretran-
sition (native) and the posttransition (unfolded) regions as A n ¼ a1 c þ a2 and
AU ¼ a3 c þ a4 , where a i is constant. Because an equation essentially the same as
Eq. (2) is obtained for the pH- and temperature-induced unfolding transitions, the
description of the equilibrium unfolding transition shown below also applies to the
pH- and temperature-induced unfolding transitions. Equation (2) is rewritten as:
23.6 Practical Aspects of the Experimental Study of Molten Globules 873
AðcÞ AU
fN ðcÞ ¼ ð3Þ
AN AU
This shows that a single observation probe can determine the fraction native and
unfolded ð fU ¼ 1 fN Þ in the two-state model.
Equation (3) indicates that the fraction native and unfolded calculated from the
unfolding transition curves monitored by different probes should be superimpos-
able to each other if the two-state approximation holds. This characteristic of the
two-state unfolding transition is often used to distinguish between the two- and
multistate unfolding transitions. Figure 23.6A shows an example of the two-state
unfolding transition of hen egg white lysozyme [23].
The unfolding transitions approximated by the two-state model exhibit a specific
point (for example, at a specific wavelength) in the spectra of the native and the
fully unfolded state, where these two states exhibit the identical value of the param-
eter used. If there exists such a point(s) in the spectra, all the spectra involved in
the unfolding transition at different denaturant concentrations exhibit essentially
the same value at that point with the condition that a1 A a3 A 0. This is readily
evidenced by Eq. (2). Here we take an example of absorbance as an optical probe,
and suppose that the absorbance spectra of the native and unfolded states ex-
hibit the same value ðAl Þ at a wavelength l. According to the above assumption
ðAN ¼ AU ¼ Al Þ, the absorbance value ðAðcÞÞ at l should be essentially identical
ðAl Þ at any denaturant concentration [165].
Thus, the existence of these specific points (a single point in many cases), re-
ferred to as isosbestic and isodichroic points for absorbance and CD, respectively,
lends support to the two-state unfolding transition whereas we have to think about
a multistate unfolding transition in the absence of these kinds of points.
NTITU
where AI is the ideal value of the parameter in the intermediate state, fI ðcÞ is the
fraction of the intermediate state at denaturant concentration c, and fN þ fI þ
fU ¼ 1. When the intermediate is fully populated at an intermediate concentration
of denaturant, we can experimentally determine AI and extrapolate the values into
regions where the intermediate is not stably populated. However, this is not always
the case. For the molten globule intermediate, we can often assume that AI ¼ AN
for the parameters that represent the native secondary structure and that AI ¼ AU
for the parameters that represent the specific tertiary structure of side chains (see
below). Although the two-state unfolding transition in principle requires only a
single probe to determine the fraction native or unfolded, the three-state unfolding
transition requires two independent probes when the two transitions, N T I and
I T U are not separated from each other.
As described previously, the molten globule state has a native-like secondary
structure with virtually no specific side-chain packing. Taking advantage of these
properties of the molten globule state, CD spectroscopy in the far- and near-UV re-
gions, which are sensitive to the backbone secondary structure and the asymmetry
of the side chains in aromatic residues, respectively, is extremely useful because
they can monitor individually these two properties specific for the molten globule
state.
23.6 Practical Aspects of the Experimental Study of Molten Globules 875
exhibits a substantial amount of the secondary structure with virtual lacking the
specific side-chain packing, there is little change in the far- and the near-UV CD
intensities on the N T I and I T U transitions, respectively, which results in non-
coincident single-step unfolding transitions measured by these probes. On the
other hand, the apoMb molten globule is sufficiently stable at the intermediate
pH values, and exhibits the properties distinct from those in the native and the un-
folded states, so that it emphasizes the population of the molten globule state in
the unfolding transition curves.
23.6.2
Burst-phase Intermediate Accumulated during the Dead Time of Refolding Kinetics
Fig. 23.8. Kinetic trace of refolding of a-LA and U denotes the ellipticity value of the
monitored by CD at 222 nm. The refolding unfolded state at 1 M GdmCl obtained by
reaction was initiated by a GdmCl linear extrapolation of the baseline for the
concentration jump from 5 M to 1 M. N unfolded state. From [106] with permission
denotes the ellipticity value of the native state, from Elsevier.
23.6 Practical Aspects of the Experimental Study of Molten Globules 877
23.6.3
Testing the Identity of the Molten Globule State with the Burst-Phase Intermediate
Fig. 23.9. CD spectra of the molten globule globule state at pH 2.0; a) 4 and 5, the
state of a-LA at pH 2.0 in the far- and near-UV thermally unfolded states at 41 and 78 C,
regions compared with the CD spectra of the respectively; 6, the unfolded state by GdmCl;
native and unfolded states. Open circles and b) 4, the thermally unfolded state at 62.5 C; 5,
squares show the CD values obtained by the unfolded state by GdmCl. From [104] with
extrapolating to zero time of the refolding permission from the American Chemical
curves. 1 and 2, the native state of the holo Society.
and apo forms, respectively; 3, the molten
where U, I, and N represent the unfolded, the intermediate, and the native states,
respectively.
The unfolding transition curve of the intermediate state is obtained by measur-
ing the refolding reactions at varying denaturant concentrations and by investigat-
ing the dependence of the burst-phase CD spectrum on the denaturant concentra-
tion. Comparison of the unfolding transitions of the molten globule state and the
burst-phase provide an additional test for the identity of these two states in terms
of stability. Figure 23.10 shows the unfolding transition curves of the burst-phase
intermediate and the equilibrium molten globule state of a-LA measured by ellip-
References 879
Fig. 23.10. The normalized unfolding solid line shows the theoretical curve assuming
transition curves of the burst-phase a two-state transition between the molten
intermediate of a-LA (filled circles) compared globule state and the unfolded state. From
with that of the equilibrium molten globule [106] with permission from Elsevier.
state at neutral pH (open diamonds). The
ticity at 222 nm [106]. The unfolding transition curve of the burst-phase interme-
diate is superimposable to the unfolding transition of the molten globule state that
has been observed at equilibrium, indicating, again, that these two states of a-LA
are identical to each other.
Although this method is powerful to compare the stability of the burst-phase
intermediate with that of the molten globule, thermodynamic parameters of the
burst-phase intermediate may not be able to be reliably obtained. As described in
Section 23.2.3, the unfolding transitions of the molten globule state may not occur
in a cooperative manner [58], which means that the two-state model may not apply
to the transition between the unfolded state and the intermediates. Nevertheless, it
is worth measuring these unfolding transitions for the purpose of comparing the
molten globule and the burst-phase intermediate.
References
24
Alcohol- and Salt-induced Partially Folded
Intermediates
Daizo Hamada and Yuji Goto
24.1
Introduction
The structures and stabilities of proteins are perturbed by additives such as small
organic and inorganic molecules. In this chapter, we discuss the effects of alcohols
and salts on inducing partially folded intermediates of proteins. These co-solvents
are expected to stabilize otherwise unstable folding intermediates. They can also
reveal the conformational preference of the intermediate or unfolded states. This
information is valuable for characterizing the structure and stability of the folding
intermediates under physiological conditions in the absence of co-solvents. Since
this chapter focuses on the effects on the intermediate and unfolded states among
various effects on proteins, readers should refer to other reviews on the general ef-
fects of alcohols [1, 2] and salts [3–7] on proteins.
In the early 1960s, Tanford and coworkers performed pioneering studies about
the effects of organic solvents on the conformational properties of proteins using
optical rotatory dispersion (ORD) [3, 4, 8]. For example, the conformational transi-
tion of bovine ribonuclease A induced by the addition of 2-chloroethanol into aque-
ous solvent occurred in two distinct stages involving partial unfolding of the native
structure followed by the formation of an a-helical structure. A similar conforma-
tional change was also observed for bovine b-lactoglobulin [3, 4] and diisopropyl-
phosphoryl chymotrypsin [9]. Recently, circular dichroism (CD) spectroscopy has
become more popular than ORD because of the simplicity in the interpretation of
data. However, the results obtained by CD provide essentially the same informa-
tion as can be obtained by ORD. Further information about the effects of alcohols
can be analyzed even to atomic resolution by solution nuclear magnetic resonance
(NMR) spectroscopy.
Alcohols are basically categorized as denaturants, as are guanidinium hydro-
chloride (GdmCl) or urea. Although GdmCl and urea induce random coil struc-
tures, alcohols tend to stabilize well-ordered conformations such as a-helical
structures. According to the review by Tanford [3], Imahori and Doty first recog-
nized that a high proportion of a-helical proteins form a-helical structures in 2-
chloroethanol. Similar effects have been found for many other alcohols. Impor-
tantly, Herskovits and Mescanti [10] have reported that a8 -casein, a typical natively
unfolded protein [11–14] is also converted into an a-helical structure in the pres-
ence of 2-chloroethanol or methanol. Although a significant number of a-helical
structures are formed by alcohols, they are certainly not equivalent to the native
conformation and are classified as denatured conformations, which lack the native
protein’s functions.
The properties of alcohols that induce the conformational change in proteins
are also widely used for analyzing the intrinsic structural preference of proteins
and peptides, particularly to analyze the kinetic intermediates accumulating at the
early stages of folding. Although proteins assume significant levels of nonnative
a-helical conformations at high concentrations of alcohols, several short peptides
form b-turn or b-hairpin conformations according to the structural preference of
the amino acid sequence [15–22]. The structures formed in alcohol/water solvent
somehow mimic the properties of early intermediates in protein folding, since
both structures are stabilized predominantly by local interactions which are formed
between residues near each other in amino acid sequence (Figure 24.1) [23, 24].
Moreover, the molten globule state, a compact denatured state with a significant
amount of native-like secondary structures but with fluctuating tertiary structures
[25–29] (see Chapter 23), is sometimes observed during the alcohol-induced dena-
turation of several proteins under moderate concentrations of alcohols. Thus, alco-
hols have the potential to stabilize different types of partially folded structures of
polypeptide chains depending on the alcohol species and concentration ranges
used (Figure 24.2). It is noted that, in this chapter, our definition of the molten
globule state is not as strict as the original proposal and can be also used for the
compact denatured state with some native-like secondary structures.
Fig. 24.1. Local (a) and nonlocal (b) interac- acid sequence, whereas the nonlocal interactions
tions [23]. In the figure, circles represent each are formed between residues apart each other
amino acid. The local interactions are formed in the sequence.
between residues near each other in the amino
886 24 Alcohol- and Salt-induced Partially Folded Intermediates
Salts have been used widely for various purposes in the preparation and charac-
terization of proteins, including salt-induced precipitation (salting out), dissolution
of proteins (salting in), and stabilizing or destabilizing protein structures [5–7] (see
Chapter 3). Among them, denaturation of proteins by GdmCl is one of the most
important applications of salt effects on proteins; we can understand the confor-
mational stability of proteins on the basis of the salt-induced denaturation [1, 3,
4]. However, the findings that horse cytochrome c and several proteins form the
molten globule intermediate in the presence of salts, as will be described in detail
in this chapter, attracted much attention on the salt effects, suggesting that salts
might be useful for specifically stabilizing the folding intermediates. Salt-stabilized
molten globule states have been the target of extensive studies addressing the con-
formation and stability of the folding intermediates. In particular, characterization
of the salt-dependent conformational changes of acid-denatured proteins have re-
vealed a more clear view of the acid denaturation and effects of ions in modulating
the conformations of the intermediate and unfolded states.
24.2
Alcohol-induced Intermediates of Proteins and Peptides
24.2.1
Formation of Secondary Structures by Alcohols
tional transition varies substantially (see below). Among various alcohols, 2,2,2-
trifluoroethanol (TFE) is often used because of its relatively strong effects and low
absorbance in the far-UV region, which makes far-UV CD measurements feasible.
1,1,1,3,3,3-hexafluoro-2-propanol (HFIP) has a higher potential than TFE to induce
the alcohol-dependent effects, although the formation of alcohol clusters might
complicate the interpretation of the alcohol effects (see below).
For many disordered peptides, a-helical structures are induced because they are
stabilized by the most local interactions, but various peptides assume b-hairpin or
b-turn structures in alcohol/water solvents [15–22]. Importantly, these secondary
structures are all stabilized by local hydrogen bonds. The formation of b-sheet
structures, which need a network of nonlocal hydrogen bonds, is not usually ex-
pected for short peptides unless the peptide molecules are associated into oligo-
meric structures or aggregates. Nevertheless, there are several examples of longer
polypeptides or proteins, including tendamistat and leucocin A [30, 31], showing
stabilization of b-sheet structures.
Thus, the exact types of secondary structures stabilized in alcohol/water solvent
depend on the amino acid sequence. The intrinsic structural preference of each
amino acid sequence is an important factor in determining the types of secondary
structures in alcohol/water solvent [15–22, 30–39]. The ability of alcohols to induce
the particular secondary structures is, therefore, useful to analyze the intrinsic con-
formational propensity of a short peptide with no appreciable ordered conforma-
tion in aqueous solution. The peptide conformation induced in alcohol/water sol-
vent can be readily analyzed by conventional CD or solution NMR methods. These
methods are also useful in detecting the structural properties of early intermediate
of protein folding [32, 36, 39–42].
The same experimental system has been successfully applied to elucidate the
conformational stabilities and properties of less structured de novo designed pro-
teins in water solvent [19–20]. As is the case for short peptides, de novo proteins
often fail to assume ordered conformations in water. In such a case, the addition
of alcohol induces the formation of otherwise unstable secondary structures. By
analyzing the alcohol-dependent transition, one can estimate the stability of the or-
dered conformation in water. Such an analysis is helpful to optimize the sequence
of de novo proteins in order to create a rigid native-like structure even in the ab-
sence of alcohols. Furthermore, the structural properties of membrane proteins
can be analyzed in alcohol/water solvent by solution NMR spectroscopy [43–46].
Interestingly, a high correlation was found between the a-helical content ex-
pected from secondary structure prediction based on amino acid sequence and
the a-helical content estimated by CD in the presence of high concentrations of
TFE [47, 48]. This observation supports the idea that the local interactions are
dominant in the alcohol-induced protein conformations. Conventional secondary
structure predictions consider mainly local interactions. A lesser correlation was
found between the a-helical content in the native structure and that in the TFE
state, indicating the role of nonlocal interactions for stabilizing the native struc-
tures. The a-helical content of the alcohol-induced state tends to be larger than
that of the native state. Thus, the TFE-induced structures of polypeptide chains
should contain a significant amount of nonnative a-helices.
888 24 Alcohol- and Salt-induced Partially Folded Intermediates
24.2.2
Alcohol-induced Denaturation of Proteins
shift index or coupling constants, also provides the evidence for the presence of a-
helices. On the other hand, the backbone amide hydrogens are much less pro-
tected against hydrogen/deuterium exchange even though significant parts of the
protein molecule are in an a-helical conformation. Thus, the secondary structures
formed in the TFE state are mobile, and parts of polypeptide chain are rapidly in-
terconverted to the extended configurations. In accordance with this, small-angle
solution X-ray scattering indicated that the TFE-induced state of hen lysozyme pos-
sesses a chain-like character as is the case of fully unfolded structures in high con-
centrations of urea or GdmCl [51]. On the other hand, the radius of gyration of the
TFE state was between the fully unfolded state and the compact native state. Taken
together, the TFE state is considered to be an ‘‘open helical conformation’’ in which
the a-helical rods are exposed to the solvent because of the absence of strong hydro-
phobic attraction between them, contrasting with the compact molten globule sta-
bilized by salt. Similar results were obtained with the methanol-induced a-helical
states of horse cytochrome c [52] and horse apomyoglobin [53].
24.2.3
Formation of Compact Molten Globule States
Tab. 24.1. Alcohol or salt-induced molten globule state under equilibrium conditions.
24.2.4
Example: b-Lactoglobulin
Fig. 24.3. Structure of native b-lactoglobulin fragment 2 (triangle) and fragment 3 (square)
(A) and the transitions of its peptide fragments are shown. The transitions by intact b-
by TFE (B). A) The positions of the fragments lactoglobulin (reverse triangle) and I4 fragment
are represented by red (fragment 1), blue from hen lysozyme (cross; taken from Ref.
(fragment 2) and green (fragment 3). The [32]) are also shown for comparison. Panel B
cysteine residues are also shown by yellow. reproduced from Ref. [36] with permission.
B) The transitions by fragment 1 (circle),
spectrum of the burst phase intermediate (Figure 24.4). The spectrum shows a
clear minimum around 222 nm, confirming the presence of an increased amount
of a-helical structure. The addition of 9.8% (v/v) TFE in the refolding buffer further
increased the a-helical content of the burst phase intermediate, supporting the
mechanism that the nonnative a-helices are formed at the early stage of the folding
reaction of b-lactoglobulin.
A detailed analysis of the early intermediates of b-lactoglobulin was performed
using ultra-rapid mixing techniques in conjunction with fluorescence detection
892 24 Alcohol- and Salt-induced Partially Folded Intermediates
(A)
Time (s)
(B)
Wavelength (nm)
Fig. 24.4. Refolding kinetics of b-lactoglobulin dotted lines, respectively. B) Far-UV CD spectra
in the presence and absence of TFE detected of the burst phase intermediate in the absence
by stopped-flow CD. A) Time traces for (circle) and presence (square) of 9.8% TFE.
refolding kinetics. The raw data are indicated The CD spectra of the native and unfolded
by circles. The numbers refer to the final states are shown by solid and dotted lines,
concentration of TFE in % (v/v). The respectively. Reproduced from Ref. [69] with
ellipticities at 222 nm of the native and permission.
unfolded states are indicated by dashed and
play an important role in the initial stage of the folding reaction of b-lactoglobulin
[73]. On the other hand, the role of nonnative a-helices in the early intermediate
state is still unknown. Indeed, the formation of such nonnative secondary struc-
tures might be less effective in terms of rapid folding since the structure should
be disrupted at the later stage of folding. It is probable, however, that the formation
of nonnative a-helical segments is useful for preventing intermolecular aggrega-
tion since the polypeptide chain becomes relatively compact. The folding mecha-
nism of b-lactoglobulin is still intriguing to understand the interplay between the
local and nonlocal interactions during protein folding, and moreover, to under-
stand the a–b transition suggested from several biologically important processes
[74, 75].
24.3
Mechanism of Alcohol-induced Conformational Change
Although alcohols are often used to probe the structural preferences of polypeptide
chains, the detailed mechanism of how these co-solvents induce the conforma-
tional change of polypeptide chains is still debatable. The chemical properties of
alcohols are relatively similar to detergents in a sense that the molecules consist
of both hydrophobic (hydrocarbon or halogenated hydrocarbon) and hydrophilic
(hydroxyl) groups. Possible mechanisms by which alcohols induce conformational
changes in polypeptides, therefore, include (1) destabilizing intramolecular hydro-
phobic interactions and (2) promoting the formation of local backbone hydrogen
bonds [23, 24, 76–79]. In alcohol/water solvents with low polarity or low dielectric
constant, hydrophobic interactions stabilizing the native structure are weakened
and instead the local hydrogen bonds are strengthened, resulting in denaturation
and the simultaneous formation of an ‘‘open helical conformation’’ or ‘‘open heli-
cal coil’’, i.e., solvent-exposed helices.
In the case of nonhalogenated alcohols, the efficiency as a helix inducer linearly
increases with an increase in the number of carbon atoms in an alcohol molecule
[80, 81]. This confirms that the hydrophobic interactions play an important role in
the alcohol effects. Although an alcohol is defined by the presence of hydroxyl
(OH) group(s), the contribution of the OH group to the alcohol effects is negative.
In other words, alcohol effects arise from the dissolved hydrophobic groups, and
the OH group is important mainly for dissolving otherwise insoluble hydrocarbon
groups into water. In this sense, halogenated alcohols might be expected to be poor
alcohols in stabilizing a-helical structures in polypeptide chains, since the halogen
atoms have a larger electronegativity than hydrogen atoms. In contrast, halogen-
ated alcohols such as TFE and HFIP have higher potentials to induce the a-helical
conformation than nonhalogenated alcohols. Systematic analysis of various halo-
genated alcohols indicated that the effect of fluoride is the lowest of the various
halogens [80, 81]. Nevertheless, the low absorbance of fluoride in the far-UV region
makes TFE and HFIP the most useful alcohols to examine the alcohol effects on
proteins by CD.
894
Importantly, studies with solution X-ray scattering provided the evidence that
water/alcohol mixtures of some halogenated alcohols such as TFE or HFIP assume
micelle-like clusters [82]. This suggests that the halogenated alcohols in fact have
higher hydrophobicity than nonhalogenated ones, producing dynamically hydro-
phobic clusters of alcohols, which can interact effectively with protein molecules,
thus enhancing the alcohol effects on proteins. In other words, the direct interac-
tions between proteins and alcohol molecules by hydrophobic interactions play an
important role in the alcohol effects on proteins.
From a systematic analysis of the alcohol-induced transitions of melittin, a basic
and amphiphilic peptide of 26 amino acid residues present in honey bee venom,
Hirota et al. [80, 81] indicated that the effects of each alcohol can be rationalized
by the additive contribution from the hydrocarbon (CH), hydroxyl (OH), and halo-
gen groups (F, Cl, and Br), providing the following equations for the m-value, a
measure of the cooperativity for the alcohol-induced transition:
where ASA(X) corresponds to the accessible surface area of each group X and the
value of a; b; c; d, and e are the empirically determined proportionality coefficients
which are summarized in Table 24.2. Equation (2) provides a better fit than Eq. (1)
when the effects of TFE and HFIP are included. It is noted that the coefficient for
the OH group is negative while others are positive. A similar relationship was ob-
served for the alcohol-induced denaturation of b-lactoglobulin [83].
These results indicate the importance of the direct interaction between alcohols
and polypeptide molecules. Accordingly, the results from NMR spectroscopy and
molecular dynamics simulations suggested that TFE molecules preferentially bind
to the backbone carbonyl oxygen group and minimize the solvent exposure of
amide hydrogen groups, leading to the stabilization of intramolecular hydrogen
bonds in a polypeptide chain [84–88].
These results, moreover, suggest that alcohols will be useful for dissolving pro-
tein aggregates stabilized by hydrophobic interactions. In fact, some alcohols such
Tab. 24.2. Fitting coefficients of the m(ts)a and m(hc)b values for Eqs (1) and (2).
a b c d e
as TFE or HFIP are useful to dissolve aggregates formed during peptide synthesis.
HFIP is often used to dissolve prion and Alzheimer’s amyloid b-peptides which
tend to form aggregates or amyloid fibrils [89–91]. On the other hand, these alco-
hols, in particular TFE, are known to induce well-ordered fibrillar architectures
similar to the amyloid fibrils [92–95]. These observations are apparently inconsis-
tent with each other and understanding of these contrasting phenomena will be of
special importance to clarify the alcohol effects.
24.4
Effects of Alcohols on Folding Kinetics
The studies discussed above are mostly concerned with the behavior of polypeptide
chains under equilibrium conditions. During protein folding, many different types
of interactions including local hydrogen bonds play important roles. A simple
question, therefore, arises: how do proteins behave during kinetic refolding in
alcohol/water solvents? Interestingly, Lu et al. [96] first found that the folding
kinetics of hen lysozyme is accelerated by the presence of low concentrations of
TFE below 5.5% (v/v). In the presence of such a small amount of TFE, the protein
is still able to fold into the native structure judging from several spectroscopic data
including solution NMR. In addition, pulse-labeling studies suggested that the
structural properties of folding intermediates of lysozyme are almost unaffected
by the presence of TFE.
An extensive analysis using more than 10 proteins revealed that the acceleration
of folding kinetics in the presence of a small amount of TFE is common to all of
the proteins examined [97] (Figure 24.6). The logarithm of the refolding rate con-
stant linearly increased with an increase in TFE concentration, but then decreased
above a certain concentration of TFE. The concentration range that provides the
maximum rate of folding depended on the protein species and other conditions
such as pH or temperature. Interestingly, for proteins which fold by a two-state
manner without accumulating observable intermediates (two-state proteins), the
extent of acceleration by TFE correlates well with the amount of local backbone
hydrogen bonds in the native structure (Figure 24.7). This behavior could be ra-
tionalized by the fact that TFE stabilizes the local hydrogen bonds in the polypep-
tide chains, thus forming a-helical or b-turn conformations.
On the other hand, for multi-state proteins which accumulate transient inter-
mediates during the folding process, the acceleration of folding by the presence of
TFE was always much less than that anticipated from the correlation found for the
two-state proteins (Figure 24.7). The multi-state proteins often assume the molten
globule intermediate at the burst phase just after the initiation of folding. Accord-
ing to stopped-flow CD or NMR pulse-labeling analysis, a significant amount of
native-like secondary structures is formed in the molten globule intermediates. In
such intermediates, the native-like local hydrogen bonds should be already formed.
Since the rate-limiting step of folding of the multi-state protein is mainly associ-
ated with the rearrangement of partially folded subdomains, the number of local
24.4 Effects of Alcohols on Folding Kinetics 897
r = 0.96
p < 0.1 %
Extent of acceleration
hydrogen bonds which are formed during the rate-limiting step is less than the
number in the native structure. The NMR pulse-labeling studies for several pro-
teins have shown that the number of local hydrogen bonds formed in the interme-
diate states varies depending on proteins. Consistent with this, the acceleration of
folding by TFE for multi-state proteins differs significantly even though the num-
ber of local hydrogen bonds in the native structure is similar, as is the case for
lysozyme and a-lactalbumin.
In the case of acylphosphatase, a two-state protein, the refolding reaction ob-
served by CD indicated that the absolute intensity at 222 nm for the burst phase
intermediate increases upon addition of TFE (Figure 24.8), suggesting the forma-
tion of partially folded intermediates which do not accumulate in the absence of
TFE [98]. The maximum rate of folding was obtained at the TFE concentration
where the ellipticity at 222 nm for the burst phase intermediate is the same as
that of the native protein. This observation is consistent with the conclusion drawn
from the analysis of Hammond behavior and nonlinear activation free energy of
the folding reaction [99, 100] (see opposing result by Yiu et al. [101]). The analysis
indicates that the free energy barriers encountered by a folding polypeptide are
generally narrow with robust maxima and that the change in the folding rate by
the perturbation of folding landscape, for example, by mutations or changing con-
ditions, is mainly due to the induction of folding intermediates for two-state pro-
teins. Thus, the data suggest that the activation energy is decreased in the presence
of TFE because of the decreased stability of the early intermediate relative to the
transition state.
24.5
Salt-induced Formation of the Intermediate States
24.5.1
Acid-denatured Proteins
Fig. 24.9. Salt-dependent change of the CD molten globule state at pH 2.0 in 0.5 M KCl;
spectrum of the acid-denatured Bacillus cereus 3) the native state at pH 7.0. Reproduced from
b-lactamase at 20 C. 1) The acid-unfolded Ref. [102] with permission.
state at pH 2.0 in the absence of salt; 2) the
24.5.2
Acid-induced Unfolding and Refolding Transitions
Fig. 24.10. Acid-induced unfolding and parison. B,C) Effects of increasing concentra-
refolding transitions of cytochrome c and tion of HCl on the ellipticity at 222 nm of
apomyolobin at 20 C. A) Far-UV CD spectra of cytochrome c (B) and apomyologobin (C).
cytochrome c as a function of HCl concentra- Upon decreasing pH by addition of HCl, the N
tion. The numbers refer to the HCl concentra- (native) ! U (acid-unfolded) ! MG (molten
tion in millimolar units. The spectrum of the globule) transition is observed. Reproduced
native state (dotted line) is shown for com- from Ref. [111] with permission.
902 24 Alcohol- and Salt-induced Partially Folded Intermediates
Fig. 24.11. Consistency between the HCl- A) Bacillus cereus b-lactamase, B) horse
induced (open circles) and KCl-induced (filled apomyoglobin, C) horse cytochrome c.
circles) transitions of acid-unfolded proteins Reproduced from Ref. [111] with permission.
as monitored by change in the far-UV CD.
globin [103, 104], a clear boundary between the fully unfolded state and the molten
globule state was observed. For the phase diagram of apomyoglobin, moreover, the
unfolding intermediate was observed at moderately acidic pH regions. On the
other hand, it is expected for a protein with less charge repulsion that the acid
denaturation directly produces the molten globule intermediate without maximally
unfolding the protein. This is considered to be the case for bovine a-lactalbumin.
Cytochrome c species with various degrees of charge repulsion were prepared by
acetylating lysine amino groups [107, 114]. The phase diagrams of these variously
modifed cytochrome c species showed that the boundary between the fully un-
folded state and the molten globule state depends on the charge repulsion: de-
creasing the net charge (that is, decreasing pH) decreases the concentration of salt
24.5 Salt-induced Formation of the Intermediate States 903
required to stabilize the molten globule state [113]. Thus, we can understand con-
sistently the variation of acid denaturation among proteins.
An important implication is that to maximally unfold a protein by acid, pH 2 un-
der the low salt conditions will be best. These conditions would also be optimal for
dissolving proteins and peptides taking advantage of the net charge repulsion.
The role of charge repulsion and hydrophobic interactions were characterized by
analyzing the salt-dependent transition of horse cytochrome c [107, 114]. Confor-
mational transitions of variously acetylated cytochrome c derivatives, that is, modi-
fied cytochrome c species with different pI values, revealed that the net charge re-
pulsion destabilizes the molten globule state by about 400 cal mol1 per charge at
25 C. In addition, thermal unfolding of these acytylated cytochrome c species veri-
fied the contribution of hydrophobic interactions in the stability of the molten glob-
904 24 Alcohol- and Salt-induced Partially Folded Intermediates
ule state, which is about 30–40% of that of the native state [107]. The enthalpy
change upon formation of the molten globule state and its temperature depen-
dence were directly determined by isothermal titration calorimetry measurements
of the salt-dependent conformational transition, which is consistent with the value
determined by DSC measurements [110]. Similar measurements were performed
with horse apomyoglobin [111].
24.6
Mechanism of Salt-induced Conformational Change
Although the importance of anions in shielding the charge repulsive forces is evi-
dent, anions can work in various ways [115]. (1) Salts can shield the charge repul-
sion through Debye-Hückel screening effects. (2) Some anions, such as sulfate, are
well known to stabilize the native state, which has been interpreted in terms of the
effects on water structure (that is, structure maker) consequently strengthening the
hydrophobic interactions of proteins. (3) Anions can directly interact with positive
charges on proteins to shield the charge repulsion. These three possibilities can be
distinguished by examining the effects of various anion species [115].
Debye-Hückel effects are independent of anion species. It is known that the ef-
fects of anions in stabilizing protein structure follow the Hofmeister series [5, 6].
The representative series is:
sulfate > phosphate > fluoride > chloride > bromide > iodide
> perchlorate > thiocyanate ð3Þ
sulfite > sulfate > perchlorate > thiocyanate > iodide > nitrate
> bromide > chloride > acetate ¼ fluoride ð4Þ
The results showed convincingly that the effects depend on anion species,
and the order of effectiveness was consistent with the electroselectivity series (Eqs
(4) and (5)). Anions with multiple charges show stronger potentials. Among the
monovalent anions, chaotropic anions such as iodide or bromide show stronger ef-
fects while the effects of chloride are the weakest. These results demonstrate that
the electrostatic attraction between the positively charged proteins and negatively
charged anions causes the direct interactions between them, thus shielding the
charge repulsion.
24.7
Generality of the Salt Effects
Fig. 24.14. GdmCl-induced refolding and cytochrome c (C) measured by the change in
unfolding transitions of acid-unfolded proteins. ellipticity at 222 nm. Open and filled circles
A) Far-UV CD spectra of horse apomyoglobin indicate the GdmCl-induced transitions in the
as a function of GdmCl concentration in 20 absence and presence of 0.4 M NaCl,
mM HCl (pH 1.8) at 20 C. The numbers refer respectively. Open squares show the reverse
to the GdmCl concentration in molar units. transition, in which proteins were initially
The broken line shows the spectrum of the unfolded in 4 M GdmCl. For comparison,
molten globule state stabilized by 0.4 M NaCl NaCl-induced refolding transitions in the same
in 20 mM HCl. B,C) Conformational transitions buffer are shown by broken lines. Reproduced
of horse apomyoglobin (B) and horse from Ref. [123] with permission.
24.8
Conclusion
Alcohols and salts affect the conformational stability of partially folded intermedi-
ate in various ways. The effects of alcohols are basically explained by destabiliza-
908 24 Alcohol- and Salt-induced Partially Folded Intermediates
References
Pineda, A., Rico, M., Santoro, J., 24 Dill, K. A., Bromberg, S., Yue, K.,
and Nieto, J. L. (1994) NMR solution Fiebig, K. M., Yee, D. P., Thomas,
structure of the isolated N-terminal P. D., and Chan, H. S. (1995) Princi-
fragment of protein-G B1 domain. ples of protein folding: a perspective
Evidence of trifluoroethanol induced from simple exact models. Protein Sci.
native-like b-hairpin formation. 4, 561–602.
Biochemistry 33, 6004–6014. 25 Ohgushi, M. and Wada, A. (1983)
16 Cann, J. R., Liu, X., Stewart, J. M., ‘Molten-globule state’: a compact form
Gera, L., and Kotovych, G. (1994) A of globular proteins with mobile side-
CD and an NMR study of multiple chains. FEBS Lett. 164, 21–24.
bradykinin conformations in aqueous 26 Ptitsyn, O. B. (1991) How does
trifluoroethanol solutions. Biopolymers protein synthesis give rise to the
34, 869–878. 3D-structure? FEBS Lett. 285, 176–
17 Vranken, W. F., Budesinsky, M., 181.
Fant, F., Boulez, K., and Borremans, 27 Ptitsyn, O. B. (1995) Molten globule
F. A. (1995) The complete consensus and protein folding. Adv. Protein
V3 loop peptide of the envelope Chem. 47, 83–229.
protein gp120 of HIV-1 shows 28 Kuwajima, K. (1989) The molten
pronounced helical character in globule state as a clue for under-
solution. FEBS Lett. 374, 117–121. standing the folding and cooperativity
18 Wang, J., Hodges, R. S., and Sykes, of globular-protein structure. Proteins
B. D. (1995) Effect of trifluoroethanol 6, 87–103.
on the solution structure and 29 Arai, M. and Kuwajima, K. (2000)
flexibility of desmopressin: a two- Role of the molten globule state in
dimensional NMR study. Int. J. Pept. protein folding. Adv. Protein Chem. 53,
Protein Res. 45, 471–481. 209–282.
19 de Alba, E., Jimenez, M. A., Rico, M., 30 Fregeau-Gallagher, N. L., Sailer,
and Nieto, J. L. (1996) Conforma- M., Niemczura, W. P., Nakashima,
tional investigation of designed short T. T., Stiles, M. E., and Vederas, J. C.
linear peptides able to fold into (1997) Three-dimensional structure of
b-hairpin structures in aqueous leucocin A in trifluoroethanol and
solution. Folding Des. 1, 133–144. dodecylphosphocholine micelles:
20 Reiersen, H., Clarke, A. R., and spatial location of residues critical
Rees, A. R. (1998) Short elastin-like for biological activity in type IIa
peptides exhibit the same bacteriocins from lactic acid bacteria.
temperature-induced structural Biochemistry 36, 15062–15072.
transitions as elastin polymers: 31 Schönbrunner, N., Wey, J., Engels,
implications for protein engineering. J., Georg, H., and Kiefhaber, T.
J. Mol. Biol. 283, 255–264. (1996) Native-like b-structure in a
21 Bienkiewicz, E. A., Moon Woody, trifluoroethanol-induced partially
A., and Woody, R. W. (2000) Confor- folded state of the all-beta-sheet
mation of the RNA polymerase II protein tendamistat. J. Mol. Biol. 260,
C-terminal domain: circular dichroism 432–445.
of long and short fragments. J. Mol. 32 Segawa, S., Fukuno, T., Fujiwara,
Biol. 297, 119–133. K., and Noda, Y. (1991) Local
22 Ramirez-Alvarado, M., Blanco, F. J., structures in unfolded lysozyme and
and Serrano, L. (2001) Elongation of correlation with secondary structures
the BH8 b-hairpin peptide: electro- in the native conformation: helix-
static interactions in b-hairpin forma- forming or -breaking propensity of
tion and stability. Protein Sci. 10, 1381– peptide segments. Biopolymers 31,
1392. 497–509.
23 Dill, K. A. (1990) Dominant forces in 33 Sancho, J., Neira, J. L., and Fersht,
protein folding. Biochemistry 29, 7133– A. R. (1992) An N-terminal fragment
7155. of barnase has residual helical struc-
910 24 Alcohol- and Salt-induced Partially Folded Intermediates
comparative analysis by circular and the BSE crisis. Science 278, 245–
dichroism spectroscopy and limited 251.
proteolysis. Proteins 49, 385–397. 75 Jackson, G. S., Hosszu, L. L., Power,
66 Bongiovanni, C., Sinibaldi, F., A., Hill, A. F., Kenney, J., Saibil, H.,
Ferri, T., and Santucci, R. (2002) Craven, C. J., Waltho, J. P., Clarke,
Glycerol-induced formation of the A. R., and Collinge, J. (1999)
molten globule from acid-denatured Reversible conversion of monomeric
cytochrome c: implication for human prion protein between native
hierarchical folding. J. Protein Chem. and fibrilogenic conformations. Science
21, 35–41. 283, 1935–1937.
67 Sirangelo, I., Dal Piaz, F., Malmo, 76 Thomas, P. D. and Dill, K. (1993)
C., Casillo, M., Birolo, L., Pucci, P., Local and nonlocal interactions in
Marino, G., and Irace, G. (2003) globular proteins and mechanisms of
Hexafluoroisopropanol and acid alcohol denaturation. Protein Sci. 2,
destabilized forms of apomyoglobin 2050–2065.
exhibit structural differences. 77 Liu, Y. and Bolen, D. W. (1995) The
Biochemistry 42, 312–319. peptide backbone plays a dominant
68 Kuwajima, K., Yamaya, H., and role in protein stabilization by
Sugai, S. (1996) The burst-phase naturally occurring osmolytes.
intermediate in the refolding of b- Biochemistry 34, 12884–12891.
lactoglobulin studied by stopped-flow 78 Cammers-Goodwn, A., Allen, T. J.,
circular dichroism and absorption Oslick, S. L., McClure, K. F., Lee,
spectroscopy. J. Mol. Biol. 264, 806– J. H., and Kemp, D. S. (1996)
822. Mechanism of stabilization of helical
69 Hamada, D., Segawa, S., and Goto, conformations of poplypeptides by
Y. (1996) Non-native a-helical water containing trifluoroethanol.
intermediate in the refolding of b- J. Am. Chem. Soc. 118, 3082–3090.
lactoglobulin, a predominantly b-sheet 79 Luo, P. and Baldwin, R. L. (1997)
protein. Nat. Struct. Biol. 3, 868–873. Mechanism of helix induction by
70 Forge, V., Hoshino, M., Kuwata, K., trifluoroethanol: a framework for
Arai, M., Kuwajima, K., Batt, C. A., extrapolating the helix-forming
and Goto, Y. (2000) Is Folding of properties of peptides from
b-lactoglobulin non-hierarchic? trifluoroethanol/water mixtures back
Intermediate with native-like b-sheet to water. Biochemistry 36, 8413–8421.
and non-native a-helix. J. Mol. Biol. 80 Hirota, N., Mizuno, K., and Goto,
296, 1039–1051. Y. (1997) Cooperative a-helix forma-
71 Kuwata, K., Shastry, R., Cheng, H., tion of b-lactoglobulin and melittin
Hoshino, M., Batt, C. A., Goto, Y., induced by hexafluoroisopropanol.
and Roder, H. (2001) Structural and Protein Sci. 6, 416–421.
kinetic characterization of early 81 Hirota, N., Mizuno, K., and Goto,
folding events in b-lactoglobulin. Nat. Y. (1998) Group additive contributions
Struct. Biol. 8, 151–155. to the alcohol-induced a-helix
72 Yagi, M., Sakurai, K., Kalidas, C., formation of melittin: implication for
Batt, C. A., and Goto, Y. (2003) the mechanism of the alcohol effects
Reversible unfolding of bovine on proteins. J. Mol. Biol. 275, 365–378.
b-lactoglobulin mutants without a free 82 Hong, D.-P., Kuboi, R., Hoshino,
thiol group. J. Biol. Chem. 278, 47009– M., and Goto, Y. (1999) Clustering of
47015. fluorine-substituted alcohols as a
73 Katou, H., Hoshino, M., Batt, C. A., factor responsible for their marked
and Goto, Y. (2001) Native-like effects on proteins and peptides.
b-hairpin retained in the cold- J. Am. Chem. Soc. 121, 8427–8433.
denatured state of bovine b-lacto- 83 Hirota-Nakaoka, N. and Goto, Y.
globulin. J. Mol. Biol. 310, 471–484. (1999) Alcohol-induced denaturation
74 Prusiner, S. B. (1997) Prion diseases of b-lactoglobulin: a close correlation
References 913
25
Prolyl Isomerization in Protein Folding
Franz Schmid
25.1
Introduction
Peptide bonds in proteins can occur in two isomeric states: trans, when the dihe-
dral angle o is 180 , and cis, when o is 0 . For the bonds preceding residues other
than proline (nonprolyl bonds1) the trans conformation is strongly favored energet-
ically over cis, and therefore cis nonprolyl bonds are rare in folded proteins. Peptide
bonds preceding proline (prolyl bonds) are frequently in the cis conformation. In
this case the trans form is only slightly favored energetically over cis. Cis peptide
bonds in native proteins complicate the folding process, because the incorrect trans
forms predominate in the unfolded or nascent protein molecules, and because the
cis Ð trans isomerizations are intrinsically slow reactions. Incorrect prolyl isomers
do not block the folding of a protein chain right at the beginning, but they strongly
decelerate the overall folding process. This is clearly seen for small single-domain
proteins. Many of them refold within milliseconds or even less when they contain
correct prolyl isomers, but when incorrect isomers are present (typically trans iso-
mers of bonds that are cis in the native state) folding is decelerated from millisec-
onds to the time range of several minutes.
Several strategies are useful for identifying proline-limited steps in a folding re-
action. They include interrupted unfolding and refolding in double mixing experi-
ments as well as the use of prolyl isomerases, which are enzymes that catalyze
prolyl isomerizations (see Chapter 10 in Part II). The prolyl isomerases are ubiqui-
tous proteins. They are found in all organisms, and in bacteria a prolyl isomerase
1) To facilitate reading I use the terms cis tion of Xaa’’ for the isomerization of the peptide
proline and trans proline for proline residues bond preceding Xaa. Peptide bonds preceding
that are preceded by a cis or a trans peptide proline are referred to as ‘‘prolyl bonds,’’ those
bond, respectively, in the folded protein. preceding residues other than proline as
‘‘Native-like’’ and ‘‘incorrect, nonnative’’ ‘‘nonprolyl bonds.’’ The folding reactions that
denote whether in an unfolded state a involve Xaa-Pro isomerizations as rate-
particular prolyl peptide bond shows the limiting steps are denoted ‘‘proline-limited’’
same conformation as in the native state or reactions.
not. Further, I use the expression ‘‘isomeriza-
is associated with the ribosome. This enzyme, the trigger factor, accelerates pro-
line-limited folding reactions particularly well and might in fact also catalyze prolyl
isomerizations in the folding of nascent proteins. Prolyl isomerization in protein
folding and its catalysis by prolyl isomerases are covered in a number of review
articles [1–10].
25.2
Prolyl Peptide Bonds
In a peptide bond the distance between the carbonyl carbon and the nitrogen is
0.15 Å shorter than expected for a CaN single bond and both the C and the N
atom show sp2 hybridization [11], which indicates that the peptide bond has con-
siderable double bond character. Peptide bonds are thus planar, and the flanking
Ca atoms can be either in the trans or in the cis conformation (equivalent to dihe-
dral angles o of 180 and 0 , respectively). For peptide bonds preceding residues
other than proline the cis state is strongly disfavored. Fischer and coworkers found
cis contents between 0.11 and 0.48% for a number of nonprolyl peptide bonds in
oligopeptides. The highest propensity to adopt the cis conformation was observed
for a Tyr-Ala bond in a dipeptide [12].
In native, folded proteins nonprolyl cis peptide bonds are very rare [13–16]. Only
43 cis nonprolyl peptide bonds were found in a survey of 571 protein structures
[16]. Interestingly carboxypeptidase A contains three of them [17].
For Xaa-Pro peptide bonds (‘‘prolyl bonds’’) the cis and trans conformations differ
much less in energy because the Ca of Xaa is always arranged in cis with a C atom:
either with the Ca or the Cd of the proline (Figure 25.1). The trans isomer is thus
favored only slightly over the cis isomer and in short peptides frequently cis con-
tents of 10–30% are observed [18–21]. The cis/trans ratio depends on the properties
of the flanking amino acids and on ionic or van der Waals interactions that are pos-
sible in one isomeric state, but not in the other.
In small, well-folded proteins the conformational state of each prolyl bond is
usually clearly defined. It is either cis or trans in every molecule, depending on the
structural framework imposed by the folded protein chain. About 5–7% of all
prolyl peptide bonds in folded proteins are cis [13–16], and 43% of 1435 nonredun-
dant protein structures in the Brookhaven Protein Database contain at least one cis
proline [20]. Several proteins are heterogeneous and show cis/trans equilibria at
Fig. 25.1. Isomerization between the cis and trans forms of an Xaa-Pro peptide bond.
918 25 Prolyl Isomerization in Protein Folding
one or more prolyl bonds. Most of these heterogeneities were discovered by NMR
spectroscopy. They will be discussed later (see Section 25.5).
The energy barrier between the cis and the trans form is enthalpic in nature. The
average activation enthalpy is about 80 kJ mol1 and the activation entropy is close
to zero. Because of this high energy barrier prolyl isomerizations are intrinsically
slow reactions with time constants that range between 10 and 100 s (at 25 C).
The absence of an activation entropy indicates that the surrounding solvent is not
reorganized when the activated state of this intramolecular isomerization reaction
is reached.
Secondary amide [22] and prolyl isomerizations are faster in nonpolar solvents
than in water. This was traditionally explained by assuming that the transition state
for isomerization is less polar than the ground state because the peptide bond is
twisted and thus the resonance between the carbonyl group and the nitrogen is
lost [23]. Eberhardt et al. [24] found, however, that the rate constant of prolyl iso-
merization in the dipeptide Ac-Gly-Pro-OCH3 does not correlate with the dielectric
constant of the solvent, but with its ability to donate a hydrogen bond to the car-
bonyl oxygen of the peptide.
The nitrogen of the amide bond can be protonated by very strong acids. This pro-
tonation abolishes the resonance between CbO and N of the prolyl bond. Thus, the
partial double bond character is lost and the barrier to rotation is diminished.
Prolyl isomerization is in fact well catalyzed by a solution of acetic acid in acetic
anhydride [25] or by b7 M HClO4 [26].
In summary, prolyl isomerization is slow because the resonance energy of the
CN partial double bond must be overcome. The reaction is decelerated when the
resonance is increased, such as in solvents that donate a hydrogen bond or a pro-
ton to the carbonyl oxygen. It is accelerated when the resonance is decreased, in
particular by N protonation. The mechanism of nonenzymic prolyl isomerization
is thoroughly discussed by Stein [23] and Fischer [21].
25.3
Prolyl Isomerizations as Rate-determining Steps of Protein Folding
25.3.1
The Discovery of Fast and Slow Refolding Species
In seminal work Garel and Baldwin [27] discovered that unfolded ribonuclease
A (RNase A) consists of a heterogeneous mixture of molecules. They found fast-
folding UF molecules2 that refolded in less than a second, and slow-folding US
molecules that required several minutes to fold to completion. Similar UF and US
2) Abbreviations used: U, I, N, the unfolded, the t and l, apparent time constant and rate
intermediate, and native state of a protein, constant, respectively, of a reaction; k,
respectively; UF ; US , fast- and slow-folding microscopic rate constant.
unfolded forms, IN , native-like intermediate;
25.3 Prolyl Isomerizations as Rate-determining Steps of Protein Folding 919
species were later found in the folding of many other proteins [7, 28, 29]. In 1975,
two years after the discovery by Garel and Baldwin, Brandts and coworkers [30] for-
mulated the proline hypothesis and suggested that the UF and US molecules differ
in the cis/trans isomeric state of one or more Xaa-Pro peptide bonds.
In the native protein (N) usually each prolyl peptide bond is in a defined confor-
mation, either cis or trans, depending on the constraints dictated by the ordered
native structure. After unfolding (N ! UF ), however, these bonds become free to
isomerize slowly as in short oligopeptides (in the UF Ð US i reaction), as shown
in Scheme 25.1.
25.3.2
Detection of Proline-limited Folding Processes
These criteria are valid only when direct folding is much faster than proline-
limited folding and when partially folded intermediates with incorrect prolines
do not accumulate during folding. U species with several incorrect prolines can
enter alternative refolding routes, and the rank order of the re-isomerizations
may change with the refolding conditions. Guidelines for the kinetic analysis
of folding reactions that are coupled with prolyl isomerizations are given by
Kiefhaber et al. [31, 32].
3. Slow refolding assays (double jumps). Prolyl isomerizations in the unfolded form
of a protein (after the N ! UF unfolding step, Scheme 25.1) can be measured
by slow refolding assays. The UF Ð US isomerizations usually cannot be moni-
tored directly by a spectroscopic probe. Rather, the amount of US formed after
different times of unfolding is determined by slow refolding assays in ‘‘double
jump’’ experiments [30, 33]. Samples are withdrawn from the unfolding solu-
tion at different time intervals after the initiation of unfolding; these samples
are transferred to standard refolding conditions, and the amplitudes of the re-
sulting slow refolding reactions, US ! N, are determined. These amplitudes
are proportional to the concentrations of the US species present at the time
when the sample was withdrawn for the assay, and their dependence on the
time of unfolding reveals the kinetics of the UF Ð US reactions. An experimen-
tal protocol for double-jump experiments is given in Section 25.9 (Experimental
Protocols). Slow refolding assays work very well when conditions can be found
under which the conformational and the proline-limited events are well sepa-
rated in both the unfolding step and in the subsequent refolding assay. It can-
not be used when conformational folding (UF ! N) in the second step shows a
similar rate as the UF Ð US reactions. In this case the prolyl isomerizations
couple with the UF ! N reaction in the refolding assay, and the amplitudes of
refolding no longer reflect the concentration of the US molecules, as present at
the time of sample transfer.
4. Catalysis by prolyl isomerases. Prolyl isomerases are excellent tools to identify
proline-limited folding reactions. Since they catalyze only prolyl isomerizations
(see Chapter 10 in Part II), the acceleration of a particular folding reaction by a
prolyl isomerase provides compelling evidence that it involves a proline-limited
step. Partial folding may render prolyl bonds inaccessible for a prolyl isomerase.
Therefore a lack of catalysis does not strictly rule out prolyl isomerization as the
rate-limiting step of a particular folding reaction. Several families of prolyl iso-
merases with different substrate specificities are available and can be used as
tools in folding studies [3, 7, 8].
5. Replacement of prolines by other residues. Site-directed mutagenesis of proline
residues is an apparently simple und straightforward means to detect proline-
limited folding reactions and to identify the prolines that are involved in a par-
ticular folding reaction. The interpretation of the changes in the folding kinetics
is usually straightforward when a trans prolyl bond of the wild-type protein is
changed to a nonprolyl trans bond by the mutation, or when a cis prolyl bond
is replaced by a ‘‘normal’’ nonprolyl trans bond. The protein stability and the
rate of the direct folding reaction should not be changed strongly by the proline
25.3 Prolyl Isomerizations as Rate-determining Steps of Protein Folding 921
substitution. For cis prolines such conditions are rarely met. Often, mutations of
critical cis prolines are strongly destabilizing and the cis conformation of the
respective bond is maintained even after the mutation to a residue other than
proline. Trans ! cis isomerizations at nonprolyl peptide bonds are also slow.
An unchanged slow folding reaction after the replacement of a particular pro-
line is therefore not sufficient evidence to exclude its isomerization as a possible
rate-limiting reaction. It is necessary to analyze the unfolding and refolding
kinetics after a proline replacement in detail, in order not to draw incorrect
conclusions.
6. Analysis of the folding kinetics by NMR spectroscopy. Perhaps the best approach
would be to follow the isomerization of a particular proline during folding
directly by real-time NMR spectroscopy. The CaH resonance of the residue
preceding the proline and the CdH resonance of proline are sensitive to the
isomeric state of the prolyl bond and could be used to follow a prolyl isomeriza-
tion directly in the course of a protein folding reaction. Unfortunately, these
resonances are in a very crowded region of the NMR spectrum of a protein,
and therefore, to the best of my knowledge, this approach has not yet been
employed.
25.3.3
Proline-limited Folding Reactions
Proline-limited steps were found in the folding kinetics of many small single-
domain proteins. The direct folding reactions of these proteins are usually fast
(often in the time range of milliseconds) and therefore the proline-limited steps
can easily be identified in unfolding and in refolding either directly or by double-
mixing experiments, as described in Section 25.3.2 and in Section 25.9 (Experi-
mental Protocols).
As expected, proline-limited reactions with large amplitudes are found for pro-
teins with cis prolines, such as the immunoglobulin domains [34], thioredoxin
[35], barstar [36], staphylococcal nuclease [37, 38], ubiquitin [39], plastocyanin
[40], or pseudo-azurin [41]. In most of these cases partially folded intermediates
accumulate in refolding before the final slow prolyl trans ! cis isomerization.
Single-domain proteins with only trans prolines also show slow refolding reac-
tions. Their amplitudes are usually small, because the trans isomer is favored in
the unfolded molecules as well. Often, the amplitude of slow folding is smaller
than expected from the number of trans prolines. This may originate from the
fact that in unfolded polypeptides the trans form is more favored over cis than in
short peptides and that some trans prolines are nonessential for folding. In addi-
tion, native-like intermediates may accumulate prior to re-isomerization, which
also decreases the amplitudes of the slow folding reactions. Proteins with only
trans prolines include cytochrome c [42], barnase [43], the chymotrypsin inhibitor
CI2 [44], FKBP12 [45, 46], acylphosphatase [47], MerP [48] the immunity proteins
Im7 and Im9 [49], the cell cycle protein p13suc1 [50], and the amylase inhibitor
Tendamistat [51].
922 25 Prolyl Isomerization in Protein Folding
1. Not all prolines are necessarily important for the folding kinetics.
2. The trans form is usually favored over the cis, and therefore cis ! trans isomer-
izations show small amplitudes and are about 5–10 times faster than trans !
cis isomerizations.
25.3 Prolyl Isomerizations as Rate-determining Steps of Protein Folding 923
Carbonic anhydrase provides a good example. It contains 15 trans and 2 cis pro-
lines and after long-term denaturation virtually all molecules are in US states, i.e.,
they have nonnative prolines and refold slowly [63]. More than half of the US mol-
ecules form rapidly after unfolding in a reaction that is about tenfold faster than
expected for a single prolyl isomerization [64]. This increase in rate is probably
caused by the additive and independent isomerizations of the many trans prolines
of this protein. The remainder of the slow-folding molecules is then created very
slowly in a reaction that is probably caused by the cis Ð trans isomerizations of
one or both cis prolines. Short-term unfolding of carbonic anhydrase yields UF
molecules with correct prolines, but only a fraction of them can in fact refold rap-
idly, because molecules with incorrect isomers continue to form early in refolding,
in competition with the direct refolding reaction, which is only marginally faster
than the multiple cis Ð trans isomerizations. Addition of cyclophilin had a peculiar
effect on the refolding of the short-term denatured molecules. At the onset of
the experiment this prolyl isomerase catalyzed cis Ð trans isomerization in the un-
folded molecules and thus increased the fraction of molecules that fold on slow
proline-limited paths. These molecules, however, refolded more rapidly, because
cyclophilin accelerated the proline-limited steps in their refolding [64]. A similar
behavior was found for the single-chain Fv fragment of an antibody, which con-
tains four trans and two cis prolines [65].
25.3.4
Interrelation between Prolyl Isomerization and Conformational Folding
When they initially suggested the proline hypothesis, Brandts and coworkers as-
sumed that the nonnative prolyl isomers can block refolding right at its beginning
[30], and that the slow US ! UF isomerization in the unfolded molecules is the
first and rate-limiting step of folding. Since then many proline-limited folding re-
actions have been investigated (see Section 25.4), and it is clear now that conforma-
tional folding can start while some prolines are still in their nonnative states, and
that partially folded intermediates can tolerate incorrect prolines. The final folding
steps, however, require correct prolines and therefore they are limited in rate by the
prolyl isomerizations [7, 29, 66, 67].
The extent of conformational folding that can occur prior to prolyl isomerization
depends on the location of the nonnative prolyl bonds in the structure, on the
924 25 Prolyl Isomerization in Protein Folding
25.4
Examples of Proline-limited Folding Reactions
25.4.1
Ribonuclease A
(IN ! N) originates from the trans ! cis isomerization of the Tyr92-Pro93 bond
[71, 72]. It could be assigned before the advent of site-directed mutagenesis, be-
cause it is accompanied by a change in the fluorescence of Tyr92 [73].
The intermediate IN is less stable and unfolds much faster than native RNase A.
The interconversion between IN and N could thus be measured by unfolding as-
says which exploited this strong difference between the unfolding rates of IN and
N. In these experiments refolding was interrupted after various times, samples
were transferred to standard unfolding conditions and the amplitudes of the fast
and slow unfolding reactions were determined as a function of the duration of re-
folding. These amplitudes reflected the time courses of IN and N, respectively
during folding [74]. Such interrupted folding experiments are excellent tools for
discriminating native-like intermediates from the native protein and for following
the time courses of these species in complex folding reactions. The method is so
sensitive, because the native molecules are often separated from partially folded
ones by a high activation barrier, and thus they differ strongly in the rate of unfold-
ing. Section 25.9 gives a protocol for measuring silent slow prolyl isomerizations in
largely folded intermediates by such unfolding assays.
The isomeric states of the four prolines in the various U species remained un-
clear for a long time. The kinetic analyses of the Baldwin, Brandts, and Schmid
groups (for reviews see Refs [28, 75]) had indicated already that US is heteroge-
neous and consists of a minor US I species and a major US II species, which refolds
via the native like intermediate IN .
The Scheraga group re-evaluated the folding kinetics of RNase A, among others,
by making many mutants with alterations at the four prolines (see Ref. [9] for a
review). They found that the UF species is heterogeneous as well [76], being com-
posed of 5% of a very fast folding species Uvf and 13% of a fast-folding species Uf .
Uvf and Uf differ in the isomeric state of Pro114. However, they fold with different
rates only at low pH, where the conformational stability of RNase A is low. At neu-
tral pH Uvf and Uf fold fast with indistinguishable rates. This is consistent with
the original finding of 20% fast-folding UF molecules [27], and it confirms the sug-
gestion that Pro114 is nonessential for the folding of RNase A [66] under favorable
conditions. Under unfavorable conditions, such as at low pH, or when an addi-
tional proline is incorrect (such as in the intermediate IN with incorrect trans iso-
mers at both Pro93 and Pro114), a trans Pro114 might in fact retard folding. Raines
and coworkers replaced Pro114 by 5,5-dimethyl-l-proline, which stays almost exclu-
sively in the native cis conformation. This mimic of a cis proline slightly accelerated
the US II ! IN reaction. The authors interpreted this to suggest that an incorrect
trans Pro114 slightly decelerates the folding reaction of the US II form of the wild-
type protein [77]. Together all this suggests that Pro114 of RNase A is a conditional
proline. It is unimportant for folding under favorable conditions, but retards fold-
ing slightly under unfavorable conditions.
US II contains an incorrect trans Pro93. Comparative experiments with homol-
ogous RNases from various species [71, 78] and early directed mutagenesis work
had already suggested a crucial role of Pro93 for the major folding reaction
926 25 Prolyl Isomerization in Protein Folding
US II ! IN ! N [79]. The more recent results from the Scheraga group confirm
this [9]. They also show that, after the Pro93Ala substitution, Tyr92 and Ala93
form a nonprolyl cis peptide bond in the folded form of P93A-RNase A [80].
The minor form US I contains one or more incorrect prolines in addition to the
incorrect trans Pro93 [71]. This destabilizes partially folded structure further and
impairs the formation of intermediates such as IN . Scheraga et al. have now found
that US I contains both an incorrect trans Pro93 and an incorrect cis Pro117 [9, 80–
82]. According to their work, Pro42 is unimportant for folding.
25.4.2
Ribonuclease T1
39c
k43 39c
k32 39c
N 55c U 55c U 55t
k34 k 23
k21 k12 k21 k12
39t
k 32 39t
U 55c U 55t
k23
Scheme 25.2. Kinetic model for the unfolding state. In the denatured protein the two iso-
and isomerization of RNase T1. This model is merizations are independent of each other,
valid for unfolding only. The superscript and therefore the scheme is symmetric with iden-
the subscript indicate the isomeric states of tical rate constants in the horizontal and ver-
prolines 39 and 55, respectively, in the correct, tical directions, respectively. At 25 C and
native-like cis (c) and in the incorrect, nonna- 6.0 M GdmCl, pH 1.6 k43 ¼ 0:49 s1 , k12 ¼
tive trans (t) isomeric states. As an example, 2:0 103 s1 , k21 ¼ 22:6 103 s1 , k23 ¼
39t
U55c is an unfolded species with Pro39 in the 8:7 103 s1 , k32 ¼ 50:5 103 s1 [103].
incorrect trans and Pro55 in the correct cis
25.4 Examples of Proline-limited Folding Reactions 927
largely to the more favorable trans state in the unfolded protein. As a consequence,
39c
only 2–4% of all unfolded molecules remain in the U55c state with Pro39 and
39t
Pro55 in the correct cis state. U55t , the species with two incorrect isomers, predom-
39t
inates at equilibrium, and the two species with single incorrect isomers (U55c and
39c
U55t ) are populated to 10–20% each.
Under strongly native conditions refolding is not a reversal of unfolding and can-
not be explained solely with Scheme 25.2. Rather, refolding paths via intermediates
originate from the different unfolded species (Scheme 25.3). The two slow-folding
39t 39c
species with one incorrect prolyl isomer each (U55c and U55t ) and the species with
39t
both prolines in the incorrect isomeric state (U55t ) can regain rapidly most of their
secondary structure in the milliseconds range (the Ui ! Ii steps, Scheme 25.3)
[104]. Subsequently, slow folding steps follow that involve the isomerizations of
the incorrect prolyl isomers. A peculiar feature of the folding model in Scheme
39t
25.3 is that the major unfolded species with two incorrect isomers (U55t ) can enter
two alternative folding pathways (the upper or the lower pathway in Scheme 25.3),
depending upon which isomerization occurs first. The distribution of refolding
molecules on these two pathways is determined by the relative rates of the
trans ! cis isomerizations of Pro39 and Pro55 at the stage of the intermediate
39t
I55t . The kinetic models in Schemes 25.2 and 25.3 were supported by experimental
data on the slow folding of variants of RNase T1 where the cis prolines were re-
placed by other amino acids [99, 100, 105].
At equilibrium only 2–4% of all unfolded RNase T1 molecules have correct cis
39c 39c
isomers at Pro39 and Pro55. Their fast direct refolding reaction (U55c ! N55c )
could therefore not be measured initially.
39t
U 55c
fast
39t
170 s I 55c 6500 s
39t
fast 39t 39c
fast 39c
U 55t I 55t
N 55c U 55c
400 s 39c
250 s
I 55t
fast
39c
U 55t
Scheme 25.3. Kinetic model for the slow incorrect trans (t) isomeric states. The rate
refolding of RNase T1 under strongly native constants given for the individual steps refer to
conditions. U stands for unfolded species, I for folding conditions of 0.15 M GdmCl, pH 5.0,
intermediates of refolding, and N is the native 10 C [106]. The rate constant of the direct
protein. The superscript and the subscript 39c
U55c 39c
! N55c reaction is 5.7 s1 in 1.0 M
indicate the isomeric states of prolines 39 and GdmCl, pH 4.6, 25 C [103].
55, respectively, in the correct cis (c) and the
928 25 Prolyl Isomerization in Protein Folding
39c
Mayr et al. [103] exploited the sequential nature of unfolding (N ! U55c ) and
39c
prolyl isomerizations (cf. Scheme 25.2) to characterize the direct U55c ! N fold-
ing reaction. In the first step of a stopped-flow double-mixing experiment they
39c
produced U55c transiently at a high concentration under conditions, where the
39c
N ! U55c unfolding reaction is much faster than the subsequent prolyl isomeriza-
tions and then measured its refolding in the second step. In addition, they varied
the duration of the unfolding step and monitored the concentrations of all species
of the wild-type protein in Scheme 25.2 as a function of time. Thus the rates of the
individual prolyl isomerizations in the unfolded protein (cf. Scheme 25.2) could be
determined.
The direct refolding reaction of RNase T1 with correct prolyl isomers shows a
time constant of 175 ms (at 25 C, pH 4.6) [103]. This reaction is almost unaffected
by the proline substitutions. It depends nonlinearly on temperature with a maxi-
mum near 25 C, which suggests that the activated state for this reaction re-
sembles the native rather than the unfolded state in heat capacity. The folding of
39t 39t 39c 39c
the species with single incorrect prolyl isomers (U55c ! I55c and U55t ! I55t ) was
only about fivefold slower than direct folding and was also accompanied by a
strong decrease in the apparent heat capacity.
Conformational folding and the re-isomerizations of the prolyl peptide bonds are
thus tightly interrelated in the folding of RNase T1. As discussed above, rapid par-
tial folding is possible in the presence of nonnative prolyl isomers [104, 106], but
the final events of folding are coupled with the prolyl isomerizations and therefore
very slow.
The partially folded structure in the intermediates affects the isomerization ki-
39t
netics. In particular, the trans ! cis isomerization at Pro39 (in the I55c ! N step)
39t
is strongly decelerated by the folded structure in the intermediate I55c [106]. In
the folding of pancreatic RNase A, however, prolyl isomerization is accelerated in
the intermediate IN [66, 67], and in the folding of dihydrofolate reductase an intra-
molecular catalysis of a prolyl isomerization was proposed to occur [107].
25.4.3
The Structure of a Folding Intermediate with an Incorrect Prolyl Isomer
In the S54G/P55N variant of RNase T1 [99] the cis Ser54-Pro55 prolyl bond is re-
placed by a trans Gly54-Asn55 peptide bond (Hinrichs et al., unpublished results).
Thus a cis bond is abolished and the branched folding mechanism (Scheme 25.3)
is strongly simplified to a sequential mechanism (Scheme 25.4). Only two un-
folded species remain: 15% as a fast folding species with the native-like cis isomer
of Pro39 (U 39c ), and 85% as a slow folding species with the incorrect trans isomer
of Pro39 (U 39t ).
The U 39t molecules (Scheme 25.4) refold in two steps. First the intermediate I 39t
is formed, which then converts to the native protein. This second step is limited in
rate by the trans ! cis isomerization of Pro39, which at 10 C shows a time con-
stant of about 8000 s. The I 39t intermediate is thus long-lived enough to obtain
highly resolved structure information by kinetic NMR experiments [40, 108–110]
25.5 Native-state Prolyl Isomerizations 929
and 2D-NOESY spectroscopy. Balbach et al. determined those protons in I 39t that
are already surrounded by a native environment and thus show native distances to
protons in close vicinity [111]. Surprisingly, amide protons in nonnative environ-
ments were found to be located not only close to the incorrect trans Tyr38-Pro39
bond in I 39t , but were spread throughout the entire protein. The destabilization
caused by the incorrect trans Pro39 is thus not confined to the local environment
of this proline, but involves several regions in the entire molecule.
25.5
Native-state Prolyl Isomerizations
Most prolyl peptide bonds in folded proteins show well defined conformations, be-
ing either in the cis or in the trans conformation. There are, however, a growing
number of exceptions to this rule. High-resolution NMR spectroscopy, in particu-
lar, identified a series of prolines that are conformationally heterogeneous and exist
as a mixture of cis and trans isomers in the folded state. Examples include staphyl-
ococcal nuclease [112], insulin [113], calbindin [114, 115], scorpion venom Lqh-8/6
[116], human interleukin-3 [117] and the TB6 domain of human fibrillin-1
[118, 119]. In folded staphylococcal nuclease cis/trans equilibria exist at Pro117 as
well as at Pro47 [112, 120, 121], and the two equilibria seem to be independent of
each other.
There are two strongly diverging, but not mutually exclusive, interpretations of
these findings. The first is that conformationally heterogeneous proline residues
occur in locally unfolded regions. They lack strong tertiary interactions, which
would be necessary to stabilize one isomer over the other. Alternatively, heteroge-
neity at prolines might point to a functionally important cis/trans isomerization,
which might be used as a slow molecular switch, for example, to regulate the func-
tion of the corresponding protein. Such switching functions, possibly modulated
by prolyl isomerases, were suggested long time ago [122], but the evidence for pro-
line switches remained circumstantial.
Evidence for a proline-dependent conformational switch was obtained for the
SH2 domain of the tyrosine kinase Itk [123, 124]. Unlike other proteins with cis
and trans prolyl conformers in the native state [115, 119, 125–127] the SH2 do-
main of Itk shows a pronounced change in structure upon cis/trans isomerization.
930 25 Prolyl Isomerization in Protein Folding
The cis and the trans forms of Itk-SH2 differ slightly in the affinity for the Itk-SH3
domain and for a phosphopeptide, which represent alternative substrates for the
SH2 domain [128]. This conformational switch thus seems to modulate substrate
recognition and can be regulated by the prolyl isomerase cyclophilin 18. A review
of the proline switch in Itk and in other proteins is found in Ref. [10].
The prolyl isomerase, PIN1, catalyzes the isomerization of phosphoserine-proline
bonds, and there is compelling evidence that PIN1 integrates phosphorylation-
dependent and prolyl-isomerization-dependent switches for a wide range of signal-
ing reactions [129] (see also Chapter 10 in Part II).
25.6
Nonprolyl Isomerizations in Protein Folding
For nonprolyl bonds the trans state is strongly favored over cis, and the cis content
as measured for oligopeptides is far below 1% [12, 130]. A large amount of confor-
mational energy is therefore required to lock such a bond in the cis state. A shift in
the equilibrium from 0.2% to 99% cis requires a Gibbs free energy of 15 kJ mol1 .
Nevertheless cis nonprolyl bonds are found in folded proteins, but they are very
rare, and presumably they are important for the function of a protein. In principle,
a nonprolyl cis peptide bond would be well suited to construct a molecular switch
in a protein that can be actuated only under a very strong load. A tight-binding li-
gand could, for example, supply a sufficient amount of free energy from its own
binding to turn such a strong switch. A compilation of proteins with cis peptide
bonds is found in Ref. [16]. Because the trans isomer is so strongly favored over
the cis, trans ! cis isomerizations at nonprolyl bonds during refolding must occur
for such proteins in virtually all molecules.
The first evidence for a nonprolyl isomerization as a rate-determining step in
protein folding was obtained for a variant of RNase T1. The replacement of one of
the two cis prolines, cis-Pro39, of the wild-type protein by an alanine residue led to
a variant with a cis nonprolyl peptide bond between Tyr38 and Ala39 [131]. This cis
bond is located adjacent to His40, which is a catalytic residue. The Pro39Ala muta-
tion reduced the stability of RNase T1 by about 20 kJ mol1 and caused a major
change in the folding mechanism. As shown in Scheme 25.5, the conformational
39c
unfolding of P39A-RNase T1 occurs first (in the N ! U55c reaction) with a time
39c
constant of 20 ms. It leads to U55c , an unfolded form that has both Ala39 and
Scheme 25.5. Kinetic mechanism for the and in the incorrect, nonnative trans (t)
unfolding and isomerization of P39A-RNase isomeric states. The rate constants refer to
T1. The superscript and the subscript indicate unfolding in 6.0 M GdmCl (pH 1.6) at 25 C
the isomeric states of Ala39 and Pro55, [96].
respectively, in the correct, native-like cis (c)
25.6 Nonprolyl Isomerizations in Protein Folding 931
Pro55 still in the native-like cis conformation. Subsequently, the Tyr38-Ala39 bond
39c 39t
isomerizes from cis to trans (in the U55c Ð U55c step) in virtually all molecules.
This nonprolyl isomerization (t ¼ 730 ms) is about 60-fold faster than the corre-
sponding prolyl isomerization (of the Tyr38-Pro39 bond) in wild-type RNase T1
[96]. It shows almost the same time constant as the cis ! trans isomerization of
the Tyr-Ala bond in the pentapeptide AAYAA (t ¼ 560 ms [12]). In the unfolding
the Tyr-Ala isomerization is then followed by the slow cis Ð trans equilibration at
Pro55 (t ¼ 16 s), which is still present in P39A-RNase T1.
Unfolded molecules in which the Tyr38-Ala39 bond was still in the native-like cis
39c
conformation (U55c ) were produced in a stopped-flow double-mixing experiment
by a short 100-ms unfolding pulse. They refolded rapidly to the native state
with t ¼ 290 ms (in 1.0 M GdmCl, pH 4.6, 25 C). After the Ala39 cis ! trans iso-
merization in the unfolded state, refolding is 1600-fold retarded and shows a time
constant of 480 s. This slow refolding is limited in rate by the trans ! cis re-
isomerization of the Tyr38-Ala39 bond. Under the same refolding conditions the
trans ! cis isomerization of the Tyr38-Pro39 bond in the wild-type protein is only
twofold slower (t ¼ 1100 s) [100, 105]. This comparison with the folding kinetics of
wild-type RNase T1 indicates that the kinetics of Tyr38-Pro39 and of Tyr38-Ala39
isomerization differ predominantly in the rate of the cis ! trans, rather than of
the trans ! cis reaction. The ratio of the rate constants for the cis ! trans and the
trans ! cis reactions at the Tyr38-Ala39 bond gives an equilibrium constant of
0.0015, again in good agreement with the equilibrium constant of 0.0011 in the
peptide AAYAA [12].
Very similar results were obtained later for the Pro93Ala mutation in RNase A
[80, 132]. As in RNase T1, this mutation converted a cis Tyr-Pro bond into a cis
Tyr-Ala bond and abolished the fast-folding reaction. The cis ! trans isomerization
of the Tyr-Ala bond in the unfolded protein occurred with t ¼ 1400 ms at 15 C
[80], which agrees well with t ¼ 730 ms, as measured for RNase T1 at 25 C.
TEM-1 b-lactamase from E. coli contains a cis peptide bond between Glu166 and
Pro167, in a large W loop near the active site. It is required for the catalytic activity
of b-lactamase [133], and its trans ! cis isomerization is a slow step in refolding
[134–137]. Similar to Pro39 of RNase T1 the cis character of the 166–167 bond
is retained when Pro167 of b-lactamase is replaced by a Thr residue, and the
trans ! cis isomerization of the Glu166-Thr167 peptide bond becomes rate limit-
ing for the refolding of the P167T variant [135]. Cis peptide bonds between the res-
idues equivalent to Glu166 and Pro167 of TEM-1 b-lactamase of E. coli are found in
many b-lactamases, but interestingly, these cis peptide bonds are not always prolyl
bonds. The b-lactamase PC1 from Staphylococcus aureus shows a cis Glu-Ile bond at
this position [138].
In summary, the results obtained for RNase T1, RNase A and the b-lactamases
show that the trans ! cis isomerizations of nonprolyl peptide bonds are slow reac-
tions (with time constants of several hundred seconds at 25 C) and control the
folding of proteins with nonprolyl cis peptide bonds. The trans ! cis isomeriza-
tions of prolyl and nonprolyl bonds show similar rates, the reverse cis ! trans iso-
merizations are 50–100 fold faster, however, for the nonprolyl peptide bonds.
932 25 Prolyl Isomerization in Protein Folding
Because the trans conformation is so strongly preferred, only one out of 500 or
1000 nonprolyl peptide bonds is in cis in an unfolded protein chain. Furthermore,
it re-isomerizes to trans within less than a second when the protein folds at 25 C.
It is clear that such rare reactions are not easily identified in experimental folding
kinetics.
First evidence for nonprolyl cis ! trans isomerization in the refolding of a
protein was obtained for a proline-free mutant of the small protein tendamistat
[139]. Ninety-five per cent of all molecules of this protein fold rapidly with t ¼ 50
ms, but 5% refold more slowly with t ¼ 400 ms (at 25 C). The slow-refolding mol-
ecules are created by isomerizations in the unfolded protein, most likely by the
cis Ð trans isomerizations of nonprolyl peptide bonds. The fraction of molecules
with incorrect cis nonprolyl isomers is small, because the correct trans isomer is
favored so strongly over the cis.
25.7
Catalysis of Protein Folding by Prolyl Isomerases
Prolyl isomerases are ubiquitous enzymes that catalyze the cis $ trans isomeriza-
tion of prolyl peptide bonds both in oligopeptides and during the folding of pro-
teins. They are described in Chapter 10 in Part II. The first evidence for a catalysis
of protein folding by a prolyl isomerase (porcine cytoplasmic cyclophilin 18,
Cyp18) was obtained for the immunoglobulin light chain, porcine ribonuclease
(RNase) and the S-protein fragment of bovine RNase A [140]. The slow refolding
of RNase A could not be catalyzed, probably because the incorrect trans Tyr92-
Pro93 is shielded from prolyl isomerases in the native-like intermediate IN (cf. Sec-
tion 25.5.1).
25.7.1
Prolyl Isomerases as Tools for Identifying Proline-limited Folding Steps
The prolyl isomerases, in particular Cyp18, are valuable tools to examine whether
a slow folding reaction is rate-limited by a prolyl isomerization. A catalysis of
proline-limited steps was observed in the folding of many proteins, including bar-
nase [141], carbonic anhydrase [142, 143], b-lactamase (A. Lejeune, unpublished
results), chymotrypsin inhibitor CI2 [44], yeast iso-2 cytochrome c [144, 145], the
immunoglobulin light chain [57, 140], staphylococcal nuclease [146], and trp apore-
pressor [147]. Cis ! trans and trans ! cis isomerizations are catalyzed equally well
in the folding of these proteins. In the maturation of the collagen triple helix prolyl
and hydroxyprolyl isomerizations are rate-limiting steps, and this folding reaction
is also accelerated by Cyp18 [148–150].
Proteins with incorrect prolyl isomers can start to fold before prolyl isomeriza-
tion, and some of them reach a state that is native-like already. This often hinders
prolyl isomerases in their access to the incorrect prolines and impairs catalysis of
isomerization. The role of the accessibility of the prolines was studied for iso-2
cytochrome c. In aqueous buffer the folding of this protein is barely catalyzed by
25.7 Catalysis of Protein Folding by Prolyl Isomerases 933
25.7.2
Specificity of Prolyl Isomerases
The catalytic function of the prolyl isomerases is restricted to prolyl bonds. They do
not catalyze the isomerizations of ‘‘normal’’ nonprolyl peptide bonds. As outlined
in Section 25.6, the folded form of the P39A variant of RNase T1 has a cis Tyr38-
Ala39 bond, and the refolding of virtually all molecules is limited in rate by the
very slow trans ! cis re-isomerization of the Tyr38-Ala39 bond. This isomerization
is not catalyzed by prolyl isomerases of the cyclophilin, FKBP, and parvulin fami-
lies. It is also not catalyzed in the peptide Ala-Ala-Tyr-Ala-Ala [156].
Schiene-Fischer et al. discovered that DnaK, a chaperone of the Hsp70 family
catalyzes the isomerization of nonprolyl bonds in oligopeptides and in the folding
of the P39A variant of RNase T1. This new isomerase is described in Chapter 10 in
Part II.
934 25 Prolyl Isomerization in Protein Folding
As folding enzymes, the prolyl isomerases can catalyze their own folding. The
folding reactions of human cytosolic FKBP12 and of parvulin from E. coli are in-
deed autocatalytic processes [45, 155, 157].
25.7.3
The Trigger Factor
The trigger factor is a ubiquitous bacterial protein, which associates with the large
subunit of the ribosome (see Chapter 13 in Part II). Originally it was thought to
be involved in the export of secretory proteins [158, 159], but in 1995 Fischer and
coworkers [160] discovered that the trigger factor is a prolyl isomerase. This enzy-
matic activity originates from the central FKBP domain, which encompasses resi-
dues 142 to 251. Indeed, a weak sequence homology had been noted between this
region of the trigger factor and human FKBP12 [161].
Three domains have been identified in the trigger factor. The central FKBP do-
main harbors the prolyl isomerase activity of the trigger factor [162, 163] and the
N-terminal domain (residues 1–118) mediates the interaction with the ribosome
[164–168]. The function of the C-terminal domain is largely unknown. Its pres-
ence is required for the high activity of the trigger factor as a catalysts of protein
folding [169, 170].
The trigger factor catalyzes the folding of RCM-RNase T1 very efficiently. The ad-
dition of as low as 2.5 nM trigger factor led to a doubling of the folding rate of this
protein, and in the presence of 20 nM trigger factor this folding reaction is 14-fold
accelerated. This remarkable catalytic efficiency of the trigger factor as a folding en-
zyme is reflected in a specificity constant k cat /KM of 1:1 10 6 M1 s1 . This is al-
most 100-fold higher than the respective value for human FKBP12 [170]. The KM
value is 0.7 mM, and the catalytic rate constant k cat . is 1.3 s1 [170]. For Cyp18 the
catalysis of the trans $ cis prolyl isomerization in a tetrapeptide is characterized
by KM and k cat values of 220 mM and 620 s1 , respectively [64]. This comparison
indicates that the high activity of the trigger factor as a folding catalyst does not
originate from a high turnover number, but from a high affinity for the protein
substrate.
Permanently unfolded proteins are strong, competitive inhibitors of trigger-
factor-catalyzed folding. One of those, reduced and carboxymethylated bovine a-
lactalbumin (RCM-La) inhibits the trigger factor of Mycoplasma genitalium with a
KI value of 50 nM. This binding of inhibitory proteins is independent of proline
residues. Unfolded RCM-tendamistat (an a-amylase inhibitor with 74 residues)
binds to the trigger factor with equal affinity in the presence and in the absence
of its three proline residues [171]. The catalysis of folding of RCM-(-Pro55)-RNase
T1 occurs at Pro39. The good inhibition by a nonfolding variant of RNase T1 that
lacks Pro39 showed that this proline is dispensable for substrate binding [171].
The affinity of the trigger factor for short peptides is also independent of prolines
[172]. The isolated FKBP domain is fully active as a prolyl isomerase towards a
short tetrapeptide [162], but in protein folding its activity is about 800-fold reduced,
and, moreover, this low residual activity of the FKBP domain alone is not inhibited
by unfolded proteins, such as RCM-La [170]. The high enzymatic activity in protein
25.7 Catalysis of Protein Folding by Prolyl Isomerases 935
folding thus requires the intact trigger factor [169]. Together, these results suggest
that the high-affinity binding site for unfolded proteins is probably distinct from
the catalytic site of the trigger factor. It extends over several domains of the intact
trigger factor or requires the interaction of these domains.
The good binding of the intact trigger factor to unfolded chain segments proba-
bly decelerates the dissociation of the protein substrate from the trigger factor, and
the low k cat value of 1.3 s1 may reflect a change in the rate-limiting step from
bond rotation (in tetrapeptide substrates) towards product dissociation (in protein
substrates). In fact, an unfolded protein dissociates from the trigger factor with a
rate constant of 5.8 s1 [173], which is only fourfold higher than the k cat for cata-
lyzed folding. Since high-affinity binding to unfolded proteins is independent of
prolines, it is likely that many binding events are nonproductive, because the reac-
tive prolyl peptide bonds are not positioned correctly within the prolyl isomerase
site. Indeed, a lowering of both KM and k cat , as observed for the intact trigger fac-
tor, points to nonproductive binding of a substrate to an enzyme [174].
The trigger factor thus seems to have properties of a folding enzyme and of a
chaperone. The cooperation of both functions is required for its very high catalytic
efficiency in protein folding. Indeed, trigger factor can function as a chaperone in
vitro [175–177] and presumably also in vivo [178, 179]. A combination of the func-
tions as a protein export factor and as a prolyl isomerase seems to be required in
the secretion and maturation of a protease of the pathogenic bacterium Streptococ-
cus pyogenes [180].
25.7.4
Catalysis of Prolyl Isomerization During de novo Protein Folding
The evidence for a catalysis of proline-limited folding steps during de novo protein
folding in the cell remains circumstantial. Early evidence for a role of prolyl iso-
merization and of prolyl isomerases in cellular folding was provided by studies of
collagen folding, which is limited in rate by successive prolyl isomerizations both
in vitro and in vivo. Collagen maturation in chicken embryo fibroblasts is retarded
by cyclosporin A, possibly by the inhibition of the cyclophilin-catalyzed folding in
the endoplasmic reticulum [181]. A similar effect was found for the folding of luci-
ferase in rabbit reticulocyte lysate [182].
Proteins that are targeted to the mitochondrial matrix must unfold outside the
mitochondria, cross the two mitochondrial membranes and then refold in the
matrix. In a synthetic precursor protein the presequence of subunit 9 of the Neuro-
spora crassa F1 F0 -ATPase was linked to mouse cytosolic dihydrofolate reductase
(Su9-DHFR) and used to investigate the function of cyclophilins as potential cata-
lysts of protein folding in isolated mitochondria [183–185]. When measured by the
resistance against proteinase K, the refolding of DHFR inside the mitochondria
showed half-times of about 5 min and folding was about fivefold decelerated
when the mitochondria had been preincubated with 2.5–5 mM CsA to inhibit the
prolyl isomerase activity of mitochondrial cyclophilin. The refolding of DHFR
was similarly retarded in mitochondria that were derived from mutants that
lacked a functional mitochondrial cyclophilin. This suggests that the mitochondrial
936 25 Prolyl Isomerization in Protein Folding
25.8
Concluding Remarks
Cis peptide bonds decelerate folding reactions, but nevertheless they occur fre-
quently in folded proteins, mainly before proline and occasionally before other
amino acid residues. Cis prolines are very well suited to introduce tight turns into
proteins, but the structural consequences of cis peptide bonds cannot be the sole
reason for their widespread occurrence. The coupling with prolyl isomerization is
a simple means for increasing the energetic barriers of folding, and thus not only
refolding but also unfolding would be decelerated. In such a way a native protein
would be protected from sampling the unfolded state too frequently.
Nonprolyl cis peptide bonds are strongly destabilizing, because they introduce
strain into the protein backbone. They are found preferentially near the active sites
of enzymes where strained conformations might be important for the function.
In the overall folding process prolyl isomerizations and conformational steps are
linked. Incorrect prolines decrease the stability of partially folded intermediates,
and conformational folding can modulate the rate of prolyl isomerization.
Prolyl isomerizations are catalyzed by prolyl isomerases. Their functions reach
far beyond the acceleration of de novo protein folding. In addition to the well-
understood role in immunosuppression, prolyl isomerases modulate ion channels
and transmembrane receptors, participate in hormone receptor complexes, are re-
quired for HIV-1 infectivity, and participate in the regulation of mitosis. The PIN1
protein discriminates between phosphorylated and unphosphorylated Ser/Thr-Pro
sequences in its target proteins. Two principles of regulation are probably inte-
grated at this point: phosphorylation and prolyl cis/trans isomerization.
Prolyl isomerization in folded proteins is thus well suited for switching between
alternative states of a protein. The positions of the switch (cis and trans) can be de-
termined by the binding of effectors, the rate of swichting can be modulated by
prolyl isomerases. Some isomerases might be effector and isomerase at the same
time. For such isomerases the distinction between binding and catalytic function,
which has been discussed for a long time, may be inappropriate.
25.9
Experimental Protocols
25.9.1
Slow Refolding Assays (‘‘Double Jumps’’) to Measure Prolyl Isomerizations in an
Unfolded Protein
UF and US are usually equally unfolded species and show similar physical proper-
ties. Therefore the kinetics of the UF Ð US equilibration reactions cannot be mea-
sured directly. Rather, the amount of US formed after different times of unfolding
can be determined by slow refolding assays. Samples are withdrawn from the un-
folding solution at different time intervals after the initiation of unfolding; these
samples are transferred to standard refolding conditions, and the amplitude of the
slow refolding reaction, US ! N, is determined. Refolding of the UF species is
usually complete within the time of manual mixing. The amplitude of the slow
refolding reaction is proportional to the concentration of US which was present at
the time when the sample was withdrawn for the assay. The dependence of these
slow refolding amplitudes on the duration of unfolding yields the kinetics of the
UF Ð US reaction. The measured rate constant l is equal to the sum of the rate
constants in the forward (k FS ) and the reverse (k SF ) directions, l ¼ k FS þ k SF .
When the equilibrium constant K ¼ [US ]/[UF ] ¼ k FS /k SF is known, the individual
rate constants k FS and k SF can be derived.
Refolding conditions The slow refolding assays are best carried out under condi-
tions where (1) the UF ! N refolding reaction is complete within the time of man-
ual mixing, and (2) the slow US ! N reaction occurs in a single phase in a time
range that is convenient for manual mixing experiments. This facilitates the accu-
rate determination of the refolding amplitudes. After each refolding assay the
actual protein concentration should be determined to correct for variations in the
protein concentration in the individual assays. Usually the final absorbance or flu-
orescence reached after the refolding step is suitable for such a correction. As in
the unfolding step, the rate of slow refolding can be fine-tuned by varying the con-
centration of denaturant, the temperature and the pH.
In cases where more than one US species is formed after unfolding, the individ-
ual rates of formation of these species can be measured when the refolding assays
are carried out under conditions where the refolding of the various US species can
be separated kinetically.
938 25 Prolyl Isomerization in Protein Folding
Refolding assays After different time intervals 70-mL samples are withdrawn from
the unfolding solution and quickly diluted into 830 mL of the refolding buffer (1.2
M GdmCl in 0.1 M cacodylate, pH 6.2) in the spectrophotometer cell, which is kept
at 25 C. This gives final conditions of 1.5 M GdmCl, pH 6.0 and 25 C. These con-
ditions are chosen because slow refolding occurs in a single exponential phase in a
time range, which is convenient for manual sampling techniques. The slow refold-
ing reaction of US is monitored at 287 nm. At the end of each refolding assay the
absorbance of the refolded sample is recorded at 277 nm to determine the actual
concentration of RNase A in the assay.
Data treatment The refolding kinetics are analyzed. The time constant for refold-
ing in the assay should be independent of the time of sampling, because the final
folding conditions are the same in all assays. The amplitudes of refolding are cor-
rected for variations in the protein concentration (if necessary) and plotted as a
function of the time of unfolding. The increase in the refolding amplitude with
the duration of unfolding yields the kinetics of the formation of the slow-folding
species US .
25.9.2
Slow Unfolding Assays for Detecting and Measuring Prolyl Isomerizations in
Refolding
Often protein molecules with incorrect prolyl isomers can fold to a conformation
that appears native-like by spectroscopic properties or even by enzymatic activity.
In such cases the re-isomerization of the incorrect prolines in the final step of fold-
ing is silent and cannot be followed by spectroscopic probes or by a functional
assay.
To follow such silent isomerizations during refolding, a two-step assay is used. It
measures the kinetics of the formation of fully folded protein molecules during a
refolding reaction [33, 74]. This assay is based on the fact that native protein mole-
References 939
cules have passed beyond the highest activation barrier in their refolding and thus
are separated from the unfolded state by this energy barrier. They unfold slowly
when the conditions are switched to unfolding, because they must cross the high
barrier again, now in the reverse direction. Partially folded molecules (such as
those with incorrect prolyl isomers) have not yet passed the final transition state
and unfold rapidly.
This translates into a double-mixing procedure. First, unfolded protein is mixed
with refolding buffer to initiate refolding. Then, after variable time intervals, sam-
ples are withdrawn, transferred to standard unfolding conditions, and the ampli-
tude of the subsequent slow unfolding reaction is determined. It is a direct
measure for the amount of native molecules with correct prolyl isomers that had
been present at the time when refolding was interrupted, and the increase of the
amplitude of slow refolding with time gives the kinetics of formation of the native
protein.
References
Natl Acad. Sci. USA 1989, 86, 2195– Fulton, A. H. Andreotti, J. Am.
2198. Chem. Soc. 2003, 125, 15706–15707.
115 J. Kördel, S. Forsen, T. 129 M. B. Yaffe, M. Schutkowski, M. H.
Drakenberg, W. J. Chazin, Shen, X. Z. Zhou, P. T. Stukenberg,
Biochemistry 1990, 29, 4400–4409. J. U. Rahfeld, J. Xu, J. Kuang, M. W.
116 E. Adjadj, V. Naudat, E. Quiniou, Kirschner, G. Fischer, L. C.
D. Wouters, P. Sautiere, C. T. Cantley, K. P. Lu, Science 1997, 278,
Craescu, Eur. J. Biochem. 1997, 246, 1957–1960.
218–227. 130 C. Schiene-Fischer, G. Fischer, J.
117 Y. Feng, W. F. Hood, R. W. Forgey, Am. Chem. Soc. 2001, 123, 6227–6231.
A. L. Abegg, M. H. Caparon, B. R. 131 L. M. Mayr, D. Willbold, P. Rösch,
Thiele, R. M. Leimgruber, C. A. F. X. Schmid, J. Mol. Biol. 1994, 240,
McWherter, Protein Sci. 1997, 6, 288–293.
1777–1782. 132 M. A. Pearson, P. A. Karplus, R. W.
118 X. Yuan, A. K. Downing, V. Knott, Dodge, J. H. Laity, H. A. Scheraga,
P. A. Handford, EMBO J. 1997, 16, Protein Sci. 1998, 7, 1255–1258.
6659–6666. 133 N. C. Strynadka, H. Adachi, S. E.
119 X. Yuan, J. M. Werner, V. Knott, Jensen, K. Johns, A. Sielecki, C.
P. A. Handford, I. D. Campbell, K. Betzel, K. Sutoh, M. N. James,
Downing, Protein Sci. 1998, 7, 2127– Nature 1992, 359, 700–705.
2135. 134 M. Vanhove, X. Raquet, J.-M. Frere,
120 S. N. Loh, C. W. Mcnemar, J. L. Proteins Struct. Funct. Genet. 1995, 22,
Markley, Techn. Protein Chem. 1991, 110–118.
II, 275–282. 135 M. Vanhove, X. Raquet, T. Palzkill,
121 D. M. Truckses, J. R. Somoza, K. E. R. H. Pain, J. M. Frere, Proteins
Prehoda, S. C. Miller, J. L. Struct. Funct. Genet. 1996, 25, 104–
Markley, Protein Sci. 1996, 5, 1907– 111.
1916. 136 M. Vanhove, A. Lejeune, R. H. Pain,
122 F. X. Schmid, K. Lang, T. Kiefhaber, Cell Mol. Life Sci. 1998, 54, 372–377.
S. Mayer, R. Schönbrunner, Prolyl 137 M. Vanhove, A. Lejeune, G.
isomerase. Its role in protein folding Guillaume, R. Virden, R. H. Pain,
and speculations on its function in the F. X. Schmid, J. M. Frere,
cell. In Conformations and Forces in Biochemistry 1998, 37, 1941–1950.
Protein Folding (B. T. Nall and K. A. 138 O. Herzberg, J. Mol. Biol. 1991, 217,
Dill, Eds). AAAS, Washington, D.C. 701–719.
1991. 139 G. Pappenberger, H. Aygun, J. W.
123 R. J. Mallis, K. N. Brazin, D. B. Engels, U. Reimer, G. Fischer, T.
Fulton, A. H. Andreotti, Nat. Struct. Kiefhaber, Nat. Struct. Biol. 2001, 8,
Biol. 2002, 9, 900–905. 452–458.
124 K. N. Brazin, R. J. Mallis, D. B. 140 K. Lang, F. X. Schmid, G. Fischer,
Fulton, A. H. Andreotti, Proc. Nature 1987, 329, 268–270.
Natl Acad. Sci. USA 2002, 99, 1899– 141 A. Matouschek, J. T. Kellis, L.
1904. Serrano, M. Bycroft, A. R. Fersht,
125 P. A. Evans, C. M. Dobson, R. A. Nature 1990, 346, 440–445.
Kautz, G. Hatfull, R. O. Fox, Nature 142 P. O. Freskgård, N. Bergenhem,
1987, 329, 266–268. B. H. Jonsson, M. Svensson, U.
126 R. K. Gitti, B. M. Lee, J. Walker, Carlsson, Science 1992, 258, 466–468.
M. F. Summers, S. Yoo, W. I. 143 G. Kern, D. Kern, F. X. Schmid, G.
Sundquist, Science 1996, 273, 231– Fischer, FEBS Lett. 1994, 348, 145–
235. 148.
127 T. A. Ramelot, L. K. Nicholson, 144 S. Veeraraghavan, B. T. Nall,
J. Mol. Biol. 2001, 307, 871–884. Biochemistry 1994, 33, 687–692.
128 P. J. Breheny, A. Laederach, D. B. 145 S. Veeraraghavan, S. Rodriguez-
944 25 Prolyl Isomerization in Protein Folding
26
Folding and Disulfide Formation
Margherita Ruoppolo, Piero Pucci, and Gennaro Marino
26.1
Chemistry of the Disulfide Bond
R1SH
S bH R2 S bH
S S Sb
-S
a + R1 Sa
Sa
S
R2S -
R1
+
R2
S S
R1
R2S -
Sb S R1
R 1 S Sa
Fig. 26.1. Protein disulfide bond formation.
ily depends on the energetics of the protein conformational changes that bring the
two cysteines into proximity to form the disulfide bond [8]. Among intermediates
with the same number of intramolecular disulfide bonds, species with the most
stable disulfides will have the highest equilibrium concentration regardless the re-
dox buffer composition. The structures of intermediates that result in the forma-
tion of the most stable disulfide bond will be present at the highest concentration.
Since interconversion among the intermediates with the same number of disulfide
bonds constitute a rearrangements that does not involve the net use or production
of oxidizing equivalents, the equilibrium distribution of the intermediates with the
same number of disulfides will not be affected by the choice of the redox buffer or
its composition but it will be an intrinsic property of the protein itself.
26.2
Trapping Protein Disulfides
The strategy of using disulfide bonds as probes to study the folding of disulfide-
containing proteins was introduced and developed by Creighton and his group
[8]. His pioneering work deserves credit and constitutes a milestone in the field of
oxidative folding.
Disulfides present at any instant of time of folding can be trapped in a stable
948 26 Folding and Disulfide Formation
form by simply blocking, rapidly and irreversibly, all thiol groups in the solution.
The quenching reaction should be as rapid as possible to ensure that it faithfully
traps the species present at the time of addition of the quenching reagent. Trap-
ping with reagents like iodoacetamide or iodoacetic acid has the advantage of being
irreversible, but the reaction has to be carried out in highly controlled experimental
conditions to be effective. Once all free thiol groups are irreversibly blocked, the
trapped species are indefinitely stable and may be then separated and characterized
in details provided that disulfide shuffling is prevented.
Acidification also quenches disulfide bond formation, breakage, and rearrange-
ment. Acidification is very rapid and is not prevented by steric accessibility, but
it is reversible. There is then a possibility that separation methods used to sepa-
rate the acidified species will induce disulfide rearrangements. Finally, acid-trapped
species have the advantage that they can be isolated and then returned to the fold-
ing conditions, to define their kinetic roles.
The conformations that favor the formation of a particular disulfide bond are sta-
bilized to the same extent by the presence of that specific disulfide bond in trapped
intermediate. Because of this close relationship between the stability of disulfide
bonds and the protein conformations that favored them, the procedures described
above trap not only the disulfide bonds but also the conformations of the protein at
that time of folding. Some aspects of the conformation of folding intermediates
can then be deduced by identifying the cysteine residues that are involved in the
disulfides present in the trapped intermediates.
Although the trapping reaction has the advantage to stabilize the folding inter-
mediates, it has the disadvantage of altering the conformation of the intermediates
because of the presence of the blocking groups on the cysteine residues not in-
volved in disulfide bonds. Even intermediates trapped with acid have their free cys-
teine residues fully protonated, whereas the thiol groups should be at least partially
ionized during folding process. Some studies were therefore addressed to prepare
the analogs of the folding intermediates replacing the free cysteine residues with
Ser or Ala by protein engineering methods [9–12]. These methods have the addi-
tional advantage that intermediates that do not accumulate to substantial levels and
that they can be prepared in quantities sufficient for structural analysis.
26.3
Mass Spectrometric Analysis of Folding Intermediates
t1 t2 ti tn
REFOLDING …..….
PROTEIN
NATIVE
a1 a2 ai an
Alkylation
of the disulfide bonded species present at a given time can be used to establish the
folding pathway of the protein and to develop a kinetic analysis of the process. The
method can also be used to determine the effect of folding catalysts on each step of
the folding pathway. Any alteration in the relative distribution of the disulfide
bonded species present at a given time due to the catalysts action, in fact, can be
identified and quantitated [15].
It should be noted that the intermediates identified by the ESMS analysis are
populations of molecular species characterized by the same number of disulfide
bonds which may include nearly all possible disulfide bond isomeric species. Once
the folding intermediates have been characterized by ESMS in terms of their con-
tent of disulfide bonds, a further step would consist in the structural assignment
of the various disulfide bonds formed at different times in the entire process by
matrix-assisted laser desorption ionization mass spectrometry (MALDIMS) or
liquid chromatography ESMS. The experimental approach is based upon determi-
nation of the masses of disulfide-linked peptides of unfractionated or partially pu-
rified proteolytic digests of folding aliquots in mapping experiments [16, 17]. The
folding intermediates should be cleaved at points between the potentially bridged
cysteine residues under conditions known to minimize disulfide reduction and
reshuffling, using aspecific enzymes or a combination of enzymes, in attempts to
isolate any cysteine residue within an individual peptide. The disulfide mapping
approach is then used to search for any SaS bonded peptides, which are character-
ized by their unique masses. The interpretation is then confirmed by performing
reduction or EDMAN reactions followed by rerunning the MS spectrum [16].
26.4
Mechanism(s) of Oxidative Folding so Far – Early and Late Folding Steps
During the past three decades, the oxidative folding pathways of various small pro-
teins containing disulfide bonds have been studied by identifying the nature of the
intermediates that accumulate during their folding process [18–22]. At present,
950 26 Folding and Disulfide Formation
no single oxidative folding scenario seems to distinctly emerge from these detailed
studies. However some common principles have been highlighted. Disulfide fold-
ing pathways are not random: they converge to a limited number of intermediates
that are more stables than others. The greater thermodynamic stability of some in-
termediate conformations can have kinetic consequences, as further folding from
the stable conformations is favored. This is most clearly illustrated in the case of
bovine pancreatic trypsin inhibitor (BPTI) [8], where early folding intermediates
appear to be stabilized by strong native-like interactions. A pre-equilibrium occurs
very quickly when the protein is placed under folding conditions; the nature of the
pre-equilibrium mixture will only depend upon the final folding conditions. Any
favorable conformations present in the pre-equilibrium need not to be present
at detectable levels in the denatured state. Conversely, nonrandom conformations
that might be present in the denatured state may have no relevance for folding.
In the case of BPTI, the most stable partially folded conformations in the pre-
equilibrium appear to involve interactions between elements of secondary struc-
ture, and there is a considerable evidence for the rapid formation of some elements
of secondary structure during the folding of many proteins [8].
The rate-limiting step in disulfide folding occurs very late in the process before
the acquisition of the native conformation [8, 15, 23, 24]. The highest free energy
barrier separates the native conformation from all other intermediates. At late
stages in ‘‘structured’’ intermediates, the existing disulfides are sequestered from
the thiols of the protein and from the redox reagent, resulting in the slowing of
the thiol-disulfide rearrangement reaction by several orders of magnitude (as com-
pared to thiol-disulfide exchange in the ‘‘unstructured’’ intermediate at the early
stages). As a result, ‘‘structured’’ species frequently accumulate during the folding
process [23].
The rearrangements that take place in the slow steps of oxidative folding may
constitute examples of the conformational rearrangements that take place during
the rate-limiting steps in folding transitions not involving disulfide bonds. Such
conformational rearrangements are the result of the cooperativiity of protein
structure and the similarity of the overall folding transition state to the native
conformation.
26.5
Emerging Concepts from Mass Spectrometric Studies
The study of oxidative folding carried out by our group indicated that in quasi-
physiological conditions, the process occurs via reiteration of two sequential steps:
(i) formation of a mixed disulfide with glutathione, and (ii) internal attack of a
free SH group to form an intramolecular disulfide bond. This sequential pathway
seems to be a general mechanism for single-domain disulfide-containing proteins
as it has been observed in many different experimental conditions.
According to the above concepts, only a limited number of intermediates were
detected in the folding mixture and isomerization between species with the same
26.5 Emerging Concepts from Mass Spectrometric Studies 951
number of disulfides was found to be only extensive at late stages of the process
where slow conformational transitions become significant. Theoretically, the num-
ber of all the possible intermediates during the folding process should increase ex-
ponentially with the increase in the number of cysteine residues. In contrast, the
number of the populations of intermediates increases linearly with the increase in
the number of disulfide bonds. Within the frame of this general pathway, differ-
ences exist in the amount and accumulation rate of individual intermediates aris-
ing from the individual protein under investigation. When the folding pathways of
RNase A and toxin a, both containing four disulfide bonds, are compared, some
differences can be detected (see below). Species with two disulfide bonds predomi-
nate along the RNase A folding [15], while the intermediates containing three di-
sulfides represent the most abundant species in the case of toxin a [15].
Data obtained using different proteins suggests the occurrence of a temporal
hierarchy in the formation of disulfide bonds in oxidative folding, in which the se-
quential order of cysteine couplings greatly affect the rate of the total process. Even
the order of formation of native disulfide bonds is in fact important to prevent the
accumulation of nonproductive intermediates.
When the folding was carried out in the presence of protein disulfide isomerase
(PDI), the folding pathway of reduced proteins was unchanged, but the relative
distribution of the various populations of intermediates was altered. All the experi-
ments suggest that PDI catalyzes: (1) formation of mixed disulfides with gluta-
thione, (2) reduction of mixed disulfides, and (3) formation of intramolecular disul-
fide bonds. These results are not surprising considering the broad range of
activities shown by PDI [4].
The oxidative folding pathway of some proteins investigated in our laboratory
will be described in detail below.
26.5.1
Three-fingered Toxins
Snake toxins, which are short, all b-proteins, display a complex organization of di-
sulfide bonds. Two SaS bonds connect consecutive cysteine residues (C43-C54,
C55-C60) and two bonds intersect when bridging (C3-C24, C17-C41) to form a par-
ticular structure classified as ‘‘disulfide b-cross’’ [26] because cysteine residues tend
to make a cross symbol when viewed along the length of the b-strands. The general
organization of the polypeptide chain in snake toxins generates a trefoil structure
termed the ‘‘three-fingered fold’’ (Figure 26.3). ‘‘Three-fingered’’ snake toxins act
by blocking ion channels (neurotoxins), or enzymes such as acetylcholinesterase
(fascicullin) or Na/K-ATPase and protein kinase C (cardiotoxins), in addition to
less-defined targets.
We have shown that three-fingered snake toxins fold according to the general
sequential mechanism described above [25, 27]. A single mutation located in an
appropriate site of the neurotoxin structure is sufficient to substantially alter the
rate of the sequential folding process of the protein and to deeply modify the pro-
portion of intermediates that accumulate during the oxidative folding process. We
952 26 Folding and Disulfide Formation
α LECHNQQSSQPPTTKTC-PGETNCYKKVWRDHRGTIIERGCGCPTVKPGIKLNCCTTDKCNN
α 62 RICFNHQSSQPQTTKTCSPGESSCYNKQWSDFRGTIIQRGCGCPTVKPGIKLSCCESEVCNN
α 60 LECHNQQSSQPPTTKTC-PG-TNCYKKVWRDHRGTIIERGCGCPTVKPGIKLNCCTTDKCNN
-------loop I--------> <---------------loop II-----------------> -------loop III--------->
used three variants for which the length of a large turn, turn 2 (Figure 26.3), was
increased from three (variant a60, sequence CPG) to four (toxin a, a61, sequence
CPGE) and then to five (variant a62, sequence CSPGE) residues. The increase in
the length of loop 2 greatly decreased the rate of the folding process. After 2 hours
of incubation, about 90% of the variant a60 was folded, whereas no more than 70%
of the 61 residues of toxin a and 20% of the variant a62 had reached the native
state. Interestingly, three disulfide-containing intermediates were the predominant
species in the folding of all variants but this population was markedly more abun-
dant and persistent for the slowest-folding neurotoxin.
Two intermediates containing three disulfides were found to accumulate to de-
tectable levels during early and late stages of neurotoxin a62 folding. Both inter-
mediates consist of chemically homogeneous species containing three of the four
native disulfide bonds and lacking the C43-C54 and the C17-C41 coupling, re-
spectively. The des-[43–54] intermediate is the immediate precursor of the native
species. Conversely, the des-[17–41] species is unable to form the fourth disulfide
bond and has to rearrange into intermediates that can directly reach the native
state. These isomerization reactions provide an explanation for the accumulation
of three disulfide intermediates along the pathway, which caused the slow oxidative
folding observed for neurotoxin a62.
The particular pathway adopted by neurotoxin a62 shares a number of common
26.5 Emerging Concepts from Mass Spectrometric Studies 953
features with that followed by BPTI, although the two proteins differ both in the
number of disulfides and in their secondary structures. The main similarity is
that no scrambled fully oxidized species occur as folding intermediates. In addi-
tion, all-but-one-disulfide-containing intermediates (i.e., intermediates containing
three and two disulfides) exclusively possess native disulfide bonds [9, 28]. Finally,
the native form for both the variant a62 and BPTI results from an intramolecular
rearrangement within their respective all-but-one native disulfide populations, with
one predominantly productive species, des-[43–54] for the variant a62 and C30-C51
and C5-C55 for BPTI [7]. However, formation of the native form from the produc-
tive species seems much faster in BPTI [7] than in neurotoxin a62. It is noteworthy
that the folding pathway of variant a62 is much closer to that of BPTI than to those
of other all-b disulfide-containing proteins, such as epidermal growth factor (EGF)
[29]. In this case disulfide-scrambled isomers accumulate along the pathway and
they have to rearrange before forming the native structure.
On more general grounds, the observation that only intermediates with native
disulfide bonds accumulate strongly suggests that native disulfide bonds are domi-
nant early in the folding, resulting in funnelling the conformations towards the
native state.
26.5.2
RNase A
Bovine pancreatic ribonuclease A has been the model protein for folding studies
[30–36] since the landmark discovery by Anfinsen [37] that the amino acid se-
quence provides all the information required for a protein to fold properly. RNase
A contains four disulfide bonds (C26-C84, C40-C95, C65-C72, C58-C110), which
are critical both to the function and stability of the native enzyme [38, 39]. The
two disulfide bonds C26-C84 and C58-C110 that link a a-helix and a b-sheet in the
protein core are the most important to conformational stability. On the other hand,
the two disulfide bonds C40-C95 and C65-C72 that link surface loops greatly affect
the catalytic activity because of their proximity to active site residues.
The folding of RNase A has been studied in many laboratories, and different
mechanisms of folding have been proposed depending on the experimental condi-
tions used. Initial folding studies performed by Creighton and coworkers [40–43]
in the presence of reduced and oxidized glutathione suggested that the process
proceeds through a single pathway, the rate-determining step being the formation
of the C40-C95 disulfide bond.
Parallel studies were performed by Scheraga and coworkers in the presence of
reduced and oxidized dithiothreitol [5, 23, 44–46]. These authors suggested that
RNase A regenerates through two parallel pathways involving the formation of
two native-like species containing three disulfide bonds. The main pathway pro-
duces a species lacking C40-C95, in line with former results [40–43], while a
species lacking C65-C72 is produced in a minor regeneration pathway.
We showed that the folding of RNase A [15] proceeds throughout the sequen-
tial mechanism discussed above. The characterization of the population of one-
954 26 Folding and Disulfide Formation
containing species in the folding processes of these two mutants, thus suggesting
that the rate-limiting step might be the formation of the third disulfide bond. The
assignments of disulfide bonds during the folding of the C58A/C110A and C26A/
C84A mutants show that many nonnative disulfide bonds are still present in solu-
tion at late stages of the reaction, indicating that many scrambled molecular spe-
cies have not yet completed the folding process. The formation of C58-C110 or
C26-C84 disulfide bonds can function as a lock on the native structure, hamper-
ing the three-disulfide-containing species reshuffling in the folding of wild-type
RNase A.
Finally, the identification of disulfide bonds formed during the folding of RNase
A mutants showed that C110 is the most actively engaged cysteine residue in the
formation of disulfide bonds, as occurred in the folding of the wild-type protein.
C110 can then function as an internal catalyst able to promote reshuffling of disul-
fides in order to accelerate isomerization reactions in the regaining of the native
state.
26.5.3
Antibody Fragments
26.5.4
Human Nerve Growth Factor
Nerve growth factor (NGF), together with brain-derived neurotropic factor (BDNF),
neurotrophin-3 (NT-3) and NT-4/5, belongs to the neurotrophin family of protein
growth factors. Neurotrophins promote growth, survival, and plasticity of specific
neuronal populations during developmental and adult life phases [60, 61]. NGF,
BDNF, and other growth factors such as transforming growth factor (TGFb2),
bone morphogenetic protein 2 (BMP-2), and platelet-derived growth factor (PDGF)
share characteristic tertiary and quaternary features: the proteins are homodimeric
in their native and active conformations and have a typical tertiary fold, the cys-
teine knot. Our studies showed that the pro-region of recombinant human (rh)-
NGF facilitates protein folding. Folding yields and kinetics of rh-pro-NGF were sig-
nificantly enhanced when compared with the in vitro folding of mature rh-NGF.
The characterization of the folding pathway of rh-pro-NGF indicates that structure
formation is very efficient since a unique 3S species containing only the native di-
sulfide bonds was detected at early time points of folding. The characterization of
intermediates produced in the folding of mature rh-NGF revealed that the process
is very slow with the formation of multiple unproductive intermediates containing
nonnative cysteine couplings. Two out of the three native disulfide bonds formed
after 1 hour, while the formation of the complete set of native disulfide bonds was
reached only after 24 hours. Since denatured, mature rh-NGF is known to fold
quantitatively as long as the cysteine knot is intact [62], it has been suggested
that the pro-sequence assists the oxidative folding by possibly contributing to the
formation of the cysteine knot. However, unlike BPTI, the pro-sequence of NGF
does not contain a cysteine that could facilitate disulfide bond shuffling during
folding of NGF [63].
A possible function of the pro-peptide may be that it acts as a specific scaffold
during structure formation of the cysteine knot. Once the native disulfide bonds
are formed, the characteristic b-sheet structure is built on and the two monomers
can associate to form the rh-pro-NGF dimer. Alternatively, the pro-sequence of
NGF could passively confer solubility to aggregation prone folding intermediates
by shielding hydrophobic patches and thus enhancing folding yields.
26.6
Unanswered Questions
It is quite clear that many aspects of oxidative folding have been clarified so far.
However some questions remain unsolved. An aspect that is still to be clarified
26.8 Experimental Protocols 957
concerns how far the disulfide folding mechanism can be extrapolated to protein
folding mechanisms in general. Some common aspects have been highlighted
throughout the chapter. However oxidative folding shows peculiar features due to
the fact that disulfide-containing proteins constitute a unique class of proteins. In
our opinion, the main unanswered questions are related to the observation that
some disulfide-containing proteins fold with high efficiency into their native struc-
ture while others do not. These aspects are intimately related to the role played
in vivo by the multiple redox machineries that cooperate to make folding fast and
efficient. These aspects are discussed in Chapters 9 and 19 in Part II.
26.7
Concluding Remarks
Practical considerations drawn from the study of oxidative folding are very impor-
tant in the production of recombinant proteins with disulfide bonds, many of
which are therapeutically useful. A systematic exploration of pH, salt concentra-
tion, effectors, cofactors, temperature, and protein concentrations has been made
in order to define the best experimental conditions to achieve the highest yield of
folded protein. However, fast and high-throughtput methods are still required to
investigate optimal folding conditions. There is no doubt that mass spectrometry
can be a very useful tool in these kinds of studies. Few solid conclusions have
been reached that suggest widely applicable methods for folding denatured and
reduced proteins [64]. Guidelines for the optimization of folding methods will be
discussed in other chapters.
26.8
Experimental Protocols
26.8.1
How to Prepare Folding Solutions
26.8.2
How to Carry Out Folding Reactions
1. Reduce protein with the reduction and denaturation buffer by incubation with
reduced DTT (DTT mol/SaS mol ¼ 50/1) for 2 h at 37 C, under nitrogen
atmosphere.
2. Add 0.2 vol of 1 M HCl.
3. Desalt the reaction mixture on a gel-filtration column equilibrated and eluted
with 0.01 M HCl.
4. Recover the protein fraction, test for the SH content, lyophilize and store at
20 C.
Folding reactions
1. Dissolve reduced proteins in 0.01 M HCl and then dilute into the folding buffer
to the desired final concentration.
2. Add the desired amounts of GSH and GSSG stock solutions to initiate folding.
The choice of the concentrations of reduced and oxidized glutathione depends
on the protein under investigation and is based on conditions giving the highest
yield of native protein at the end of the folding process. Adjust the pH of the
solution to 7.5 with Tris–base and incubate at 25 C under nitrogen atmo-
sphere. In the case of antibody fragments start folding by diluting 100-fold the
reduced and denatured protein in 0.1 M Tris–HCl (pH 8.0), 1 mM EDTA, con-
taining 6 mM GSSG. Carry out the process at 4 C.
3. Monitor the folding processes by removing 50–100 mL samples of the folding
mixture at appropriate intervals.
4. Alkylate the protein samples as described below.
5. Purify from the excess of blocking reagent by rapid HPLC desalting.
6. Recover the protein fraction and lyophilize.
7. Alternatively, quench the folding by adding hydrochloric acid to a final concen-
tration of 3% (experimentally determined pH @ 2).
1. Add the folding aliquots (50–100 mL) to an equal volume of a 2.2 M iodoaceta-
mide solution. Perform the alkylation for 30 s, in the dark, at room temperature,
under nitrogen atmosphere as described [65].
26.8 Experimental Protocols 959
2. Add 100 mL of 5% trifluoroacetic acid, vortex the aliquots, and store on ice prior
loading on the HPLC for the desalting.
26.8.3
How to Choose the Best Mass Spectrometric Equipment for your Study
26.8.4
How to Perform Electrospray (ES)MS Analysis
100 8H 4S
100
1 min 4h
% %
1S6H
1S1G5H
3S2H
2S4H 1G7H
mass mass
0 0
6000 6500 7000 7500 8000 6000 6500 7000 7500 8000
Fig. 26.4. ESMS spectra of intermediates produced in the folding of toxin a.
960 26 Folding and Disulfide Formation
…CSPGETNC…
Turn 2
100
Relative percentage of intermediates
80
3S2H
60 4S
40
20
2S4H
0
0 40 80 120 160 200 240
Time (min)
26.8.5
How to Perform Matrix-assisted Laser Desorption Ionization (MALDI) MS Analysis
des [17-41]
Cys3-Cys24
2441.4 (40-51) + (52-62)
2 SS, 1 CAM 3084.2 (1-15) + (16-27) Cys43-Cys54
1 SS, 1 CAM
Cys55-Cys60
m/z 896.6
55 60 62
CTTDKCNN
3rd Edman step
Fig. 26.6.Disulfide mapping of intermediates des-[17–41]
accumulating in the folding of toxin a62.
Figure 26.6 shows the MALDI mass spectrum of the tryptic mixtures of peptides
derived by intermediate des-[17–41] accumulating in the folding of toxin a62. The
Edman strategy adopted to assign the disulfide bonds present in the intermediate
is also reported.
References
27
Concurrent Association and Folding
of Small Oligomeric Proteins
Hans Rudolf Bosshard
27.1
Introduction
Our current understanding of protein folding has been mostly obtained through
the study of monomeric, globular proteins. However, oligomeric proteins are
probably more abundant in nature. There is strong evolutionary pressure for
monomeric proteins to associate into oligomers [1]. Many results and conclusions
acquired from the study of monomeric proteins can be applied to oligomeric pro-
teins. For example, the stabilities of both monomeric and oligomeric proteins re-
sult from a delicate balance between opposing forces, the major contributors to
stability being the hydrophobic effect, van der Waals interactions, and hydrogen
bonds among polar residues [2–4]. In contrast to monomeric proteins, the stabiliz-
ing and destabilizing forces operate not only within single subunits but also be-
tween the subunits of an oligomeric protein; that means, both intramolecular and
intermolecular forces contribute to stability. Nevertheless, there are no fundamen-
tal differences between the mechanisms of folding of monomeric and oligomeric
proteins. For both types of proteins singular as well as multiple folding pathways
are observed with no, a few, or many intermediates and sometimes with nonnative,
dead-end products.
However, one difference is important: At least one step in the folding of an oligo-
meric protein must be concentration dependent. This is because the formation of
an oligomeric protein includes folding and association; and folding and association
are concurrent reactions. In some cases, folding and association can be studied as
separate and virtually independent reactions, in others they are fully coupled into
a single reaction step (‘‘two-state’’ mechanism or ‘‘single-step’’ mechanism), yet
most often they can be distinguished but are not independent of each other. The
degree of coupling between folding and association largely depends on the balance
between the intra- and intermolecular forces that stabilize the oligomeric structure.
If the isolated monomeric subunits are stabilized by strong intramolecular forces
and the folded monomeric subunits themselves are thermodynamically stable
folded structures, then folding and association are separate steps: formation of the
oligomer corresponds to the association of preformed folded monomers. If the iso-
lated monomeric subunits are unstable and stability is gained only by intermolec-
ular forces within the oligomeric structure, then folding and association are inti-
mately dependent on each other and are tightly coupled.
The concurrence of intra- and intermolecular reactions and the dependence of
folding on concentration adds an extra level of complexity to folding. This is per-
haps the main reason why our knowledge of the stability and folding of multisub-
unit proteins is still rudimentary and often of only a qualitative nature. Indeed,
quantitative thermodynamic and kinetic analyses have been restricted to mainly di-
meric proteins. Reliable quantitative thermodynamic and kinetic data on trimeric
and higher order oligomeric structures are still scarce.
The present chapter focuses on the detailed, quantitative in vitro analysis of the
thermodynamics and kinetics of folding of small dimeric proteins and protein mo-
tifs, with only brief and passing reference to trimeric and tetrameric structures.
The emphasis is on concepts and methods. We do not present a comprehensive re-
view of the assembly of dimeric proteins but select appropriate examples to illus-
trate experimental methods, concepts and principles. In vitro and in vivo folding
and association of oligomeric and multimeric proteins has been reviewed before
(see Refs [5–8]). In vivo folding and assembly of large, higher order multisubunit
proteins is covered in Chapter 2 in Part II.
27.2
Experimental Methods Used to Follow the Folding of Oligomeric Proteins
In principle, the methods used in the study of monomeric proteins can be used to
follow dissociation and unfolding of oligomeric proteins under equilibrium condi-
tions and to measure the kinetics of folding, unfolding, and refolding. The main
difference is in data analysis, which has to consider the association step that is
lacking in monomeric proteins.
27.2.1
Equilibrium Methods
molecules interact with each other is measured and the reaction enthalpy is calcu-
lated from the integrated heat change. Isothermal titration calorimetry can only be
applied to heterodimeric proteins whose subunits do not form homomeric struc-
tures. The procedure yields the change of the enthalpy, the Gibbs free energy and
the subunit stoichiometry of the oligomeric structure [12–14].
It should be noted that all the methods based on dissociation under benign
buffer conditions work only if the association between the subunits is relatively
weak and association/dissociation occurs in the micromolar to nanomolar concen-
tration range. If an oligomeric protein assembles at less than nanomolar concen-
tration, spectroscopic signal changes and calorimetric heat changes may be too
weak to monitor by dilution.
27.2.2
Kinetic Methods
The same kinetic methods used for monomeric proteins can be applied to study
the time course of folding and association or unfolding and dissociation of oligo-
meric proteins. Time-resolved methods include conventional stopped-flow and
quenched-flow techniques for the time range of milliseconds to seconds, and fast
relaxation methods like temperature jump to follow more rapid reactions (see
Chapter 14). Variation of the observation signal allows different aspects of the
reaction to be investigated. For example, far-UV CD spectroscopy, pulse-labeling
NMR and mass spectroscopy are used to follow secondary structure formation
and formation of hydrogen bonds; intrinsic and extrinsic fluorescence measure-
ments monitor molecular packing; fluorescence anisotropy and small angle X-ray
estimate changes in molecule size; ligand binding reports the gain or loss of bio-
logical activity. (See Chapter 2 and Ref. [15] for spectroscopic methods used to
study protein folding.)
As for monomeric proteins, unfolding and refolding is often studied by rapid
dilution into denaturant and out of denaturant, respectively. This necessitates ex-
trapolation to standard conditions in water. To avoid extrapolation, unfolding under
benign buffer conditions may be studied from relaxation after a rapid change of
temperature, pH or simply after rapid dilution (see Refs [16] and [17] and Appen-
dix: Section A3). If an oligomeric protein is composed of different subunits, fold-
ing can be studied in a very direct way by rapid mixing of the different subunits.
This means there is no need to first unfold the heterooligomeric protein and there-
after follow its refolding by returning to folding conditions.
The rates of dissociation of an oligomeric protein can also be measured by sub-
unit exchange. To this end, the protein has to be labeled with a fluorescence tag
or another appropriate marker. Folded protein with the marker is rapidly diluted
with a large excess of folded protein lacking the marker. The change of the signal
of the marker is then proportional to the rate of dissociation of the marked protein
provided that reassociation is much more rapid than dissociation, which can be
achieved by increasing the protein concentration [17, 18].
27.3 Dimeric Proteins 969
27.3
Dimeric Proteins
Here and elsewhere, we use the notation M for monomer and D for dimer. The
subscripts indicate unfolded and folded, respectively. For a monomeric protein
both k f and k u are monomolecular rate constants (s1 ). In the case of a dimeric
protein, k f is a bimolecular rate constant (M1 s1 ) and only k u is unimolecular
(s1 ). Square brackets indicate molar concentrations. Also, in contrast to a mono-
meric protein, the equilibrium dissociation constant K d has the unit of a con-
centration. It is related to the thermodynamic parameters of unfolding by the
well-known thermodynamic equation:
Equation (1) says that the folding and association reactions are completely linked.
In practice, this means that the association of the monomeric subunits and their
folding to the native conformation of the final dimer cannot be experimentally sep-
arated. However, this does not exclude the fleeting existence of intermediates. In-
deed, there are several instances where folding is thermodynamically two-state but
kinetics reveal transitory intermediates (see the examples in Table 27.3). Hence, a
more physically meaningful mechanism for the formation of a dimeric protein is
described by Eq. (3).
k1 k2 k3
2Mu T 2Mi T Di T Df
k1 k2 k3
ð3Þ
½Mu k1 ½Mi 2 k2 ½Di k3
K d1 ¼ ¼ K d2 ¼ ¼ K d3 ¼ ¼
½Mi k1 ½Di k2 ½D k3
under equilibrium conditions, but the kinetics of folding and unfolding may ex-
hibit different phases indicative of transitory formation of intermediates.
27.3.1
Two-state Folding of Dimeric Proteins
Several dimeric proteins fold according to Eq. (1), exhibiting two-state transition
between two unfolded monomers and a folded dimer. For monomeric proteins it
has been shown that a two-state folding model is justified if the protein rapidly
equilibrates between many unfolded conformations prior to complete folding [19].
By the same reasoning, two-state folding seems justified if intermediates Mi and Di
of Eq. (3) are ensembles of barely populated conformations at rapid equilibrium
with Mu and Df , respectively. One has to keep in mind that two-state folding is an
operational concept: The sole observation of a folded dimer and an unfolded mono-
mer under a given set of experimental conditions does not rule out the existence of
folding intermediates. Table 27.1 lists operational criteria that typically apply to
two-state folding. At least one of criteria i–iv referring to equilibrium experiments,
and one of criteria v–vii referring to kinetic experiments, as well as criterion viii
should be met to demonstrate two-state folding. The calorimetric and van’t Hoff
equality criterion ix is more difficult to test because thermal unfolding has to be
over 90% reversible, which is not always the case.
The thermodynamic and kinetic formalism for two-state folding is relatively sim-
ple and has often been described (see the Appendix for details). As for monomeric
proteins, the equilibrium dissociation constant K d is obtained from the fraction
of monomeric protein fM , or the fraction of dimeric protein fD ¼ 1 fM . For a
(i) Equilibrium unfolding followed with different physical probes such as light absorption,
fluorescence emission, circular dichroism yield superimposable curves
(ii) Equilibrium measurements are dependent on protein concentration
(iii) The change of the free energy of unfolding with denaturant concentration is linear
(iv) The reciprocal midpoint temperature of unfolding (1/Tm ) is linearly proportional to the
logarithm of the total protein concentration
(v) Refolding kinetics is concentration dependent, unfolding kinetics is concentration
independent
(vi) Kinetics of folding and unfolding exhibit single phases, and these account for the entire
amplitude of the signal changes of folding and unfolding, respectively
(vii) The change of the unfolding and refolding rate constants with denaturant is linear
(viii) Equilibrium constants determined experimentally are the same within error as those
calculated from the ratio of the association and dissociation rate constants
(ix) The enthalpy change of the dimer to monomer transition obtained by van’t Hoff analysis
of thermal unfolding curves is the same within error as the calorimetrically measured
enthalpy change
27.3 Dimeric Proteins 971
2½Pt fM 2
Kd ¼ ð4Þ
1 fM
½Pt fM 2
Kd ¼ ð5Þ
2ð1 fM Þ
In the above equations, [Pt ] is the total protein concentration expressed as the total
monomer concentration: [Pt ] ¼ [M] þ 2[D]. It is important to note that the protein
concentration has to be accounted for in analyzing the thermodynamic and kinetic
data. For example, the well-known linear extrapolation of chemical unfolding data
to zero denaturant concentration has to consider protein concentration because
DGU is not zero at the transition midpoint of chemical unfolding (see Appendix:
Section A2.2).
kf
Tab. 27.2. Examples of dimeric proteins folding through a simple two-state mechanism: 2Mu T D f .
ku
Fig. 27.1. Equilibrium unfolding of dimeric measured with different total peptide
Arc repressor according to a two-state concentrations (different symbols) are linearly
mechanism. A) Guanidinium chloride (GdmCl) dependent on GdmCl concentration (lower
denaturation curves monitored by fluorescence data) and urea concentration (upper data).
(squares) and CD (circles) are superimposable Linear extrapolation of urea and GdmCl
and the solid line is described by Eq. (4) for according to Eq. (A12) yields the same free
two-state folding. B) Urea denaturation is energy of unfolding of DGU (H2 O) ¼ 46 kJ
concentration dependent as predicted by Eq. mol1 , supporting the validity of the linear
(4). Experiments were performed with 1.6 mM extrapolation method. Figure adapted from
(squares) and 16 mM (circles) total protein Ref. [10] with permission.
concentration. C) Free energies of unfolding
974 27 Concurrent Association and Folding of Small Oligomeric Proteins
Fig. 27.2. Urea dependence of the rate analysis including both unfolding and refolding
constants of refolding (squares, left ordinate) as described by Eqs (A27) and (A29).
and unfolding (circles, right ordinate) of Arc Experiments were performed at pH 7.5 and
repressor. Empty symbols refer to data analysis 20 C. Figure adapted from Ref. [20] with
neglecting unfolding (Eq. (A22)) or refolding permission.
(Eq. (A25)). Dotted symbols refer to data
crossover point of the unfolding and refolding lines in Figure 27.2 do not yield the
midpoint concentration of the denaturant at which [M] ¼ [D]/2 and DGU ¼ 0. This
is because the refolding reaction and hence the midpoint of denaturation depends
on the protein concentration.
Extrapolation of the folding rate constant of the Arc repressor to zero denaturant
yields k f of about 9 10 6 M1 s1 , which is about two orders of magnitude below
the diffusion limit. Moreover, the folding rate is nearly independent of solvent vis-
cosity. It seems that only a small fraction of collisions of unfolded monomers are
productive, otherwise the rate should approach the diffusion limit and depend on
solvent viscosity. This is supported by a dimeric transition state of folding exhibit-
ing regions of native-like structure. In particular, there are native-like backbone in-
teractions in the transition state of folding of the Arc repressor [21–23]. Mutating a
cluster of buried charged residues forming a complex salt bridge network into hy-
drophobic residues speeds up folding, makes the folding rate viscosity dependent
and brings the structure of the dimeric transition state closer to the unfolded state
[24].
The leucine zipper domain of the yeast transcriptional activator protein GCN4
forms a coiled-coil structure in which two a-helices are wound around each other
to form a superhelix [25]. This leucine zipper domain of GCN4 folds by a two-state
mechanism. However, calorimetric analysis indicates two monomolecular pretran-
sitions at 20 C and 50 C before the main concentration-dependent, bimolecular
transition takes place [26]. Another coiled coil whose folding has been analyzed in
much detail is the heterodimeric leucine zipper AB [27, 28]. This short, designed
leucine zipper consists of two different 30 residue peptide chains whose sequences
27.3 Dimeric Proteins 975
are related to the GCN4 leucine zipper [25]. One chain is acidic (A-chain) and the
other is basic (B-chain) so that inter-subunit electrostatic interactions are maxi-
mized [27, 29]. Figure 27.3 shows thermal unfolding of leucine zipper AB. The
sigmoidal shape is described by Eq. (5) for two-state unfolding of a heterodimeric
protein. The midpoint of the thermal transition yields the transition temperature
Tm where half of the protein is dimeric and half is monomeric. The enthalpy of
unfolding at Tm ; DHU; m , is obtained from the data in Figure 27.3 by plotting ln K
versus T according to the van’t Hoff equation (Eq. (A14)). A further test for two-
state unfolding is provided by a plot of 1/Tm against the logarithm of the total pro-
tein concentration, as shown in the inset of Figure 27.3 (Eq. (A18)).
To obtain the unfolding free energy, DGU ðTÞ, at any temperature T, the data
from the thermal unfolding curve of Figure 27.3 need to be extrapolated by the
Gibbs-Helmholtz equation describing the change of DGU with temperature.
T T
DGU ðTÞ ¼ DHU; m 1 þ DCP T Tm T ln
Tm Tm
RT ln½K d ðTm Þ ð6Þ
The last term of Eq. (6) accounts for the concentration dependence of DGU . This
term is missing in the equation used for monomeric proteins. Figure 27.4 shows
976 27 Concurrent Association and Folding of Small Oligomeric Proteins
the thermal stability curve of the heterodimeric leucine zipper AB. The curve was
constructed using data sets from thermal unfolding in the range 340–350 K (Tm
was varied with protein concentration) and from isothermal titration calorimetry
performed between 283 and 313 K. Since neither peptide chain alone associates to
a homodimer, the formation of the heterodimeric ‘‘AB-zipper’’ can be followed by
titrating one chain to the other by isothermal titration calorimeter. The experiment
yields the enthalpy and free energy changes of the coupled association and folding
reaction [27]. In this way, the data from thermal unfolding can be complemented
with thermodynamic data pertaining to a temperature range in which the folded
protein dominates. The stability of the leucine zipper peaks at 309 K (36 C) and
decreases above and below this temperature. This is an example of cold denatura-
tion visible already at ambient temperature. Extrapolation of the stability curve
yields the temperatures of heat and cold denaturation, Tg and T 0 g , at which
DGU ¼ 0.
The shape of the stability curve is governed by the heat capacity change, DCP , of
the unfolding reaction. DCP of 4.3 kJ M1 K1 is obtained from a fit of the data in
Figure 27.4 using Eq. (6). The positive value of DCP implies that the unfolded
monomeric peptide chains can take up more heat per degree K than the folded leu-
cine zipper. The specific DCP normalized per g of protein is 0.6 J g1 K1 . This is
27.3 Dimeric Proteins 977
of the same magnitude as the average specific DCP for the unfolding of monomeric
globular proteins [30] and points to similar types of inter- and intramolecular
forces stabilizing dimeric and monomeric proteins. There have been many at-
tempts to calculate DCP from structural data based on the amount of polar and
nonpolar surface buried at the interface of a dimeric protein [31, 32].
The kinetics of folding and unfolding of the leucine zipper AB are presented
in Figure 27.5. This dimeric leucine zipper is characterized by several interheli-
Fig. 27.5. Dependence of the rate constants expressed as the logarithm of the mean
and equilibrium dissociation constants of rational activity coefficient of NaCl [33]. Rate
folding and association of the leucine zipper constants were determined under benign
AB on ionic strength. A) Rate constant k f of buffer conditions by rapid mixing of the
coupled association and folding. B) Rate unfolded peptide chains. Kinetic traces were
constant k u of coupled dissociation and analyzed with the help of Eq. (A29). Figure
unfolding. C) Equilibrium dissociation constant adapted from Ref. [28] with permission.
calculated as K d ¼ k u /k f . The ionic strength is
978 27 Concurrent Association and Folding of Small Oligomeric Proteins
27.3.2
Folding of Dimeric Proteins through Intermediate States
As noted above, two-state folding is an operational concept since the lack of ob-
servable intermediates (or ensembles of intermediates) may have experimental rea-
sons. Detection of intermediates depends on their stability as well as on the sensi-
tivity and time resolution of thermodynamic and kinetic detection methods. Table
27.3 lists representative examples of dimeric proteins folding by a multistate mech-
anism, that means through at least one detectable intermediate. A good indication
for a thermodynamic intermediate is the disparity between the free energy of un-
folding determined by different methods. For example, the dimeric cAMP receptor
27.3 Dimeric Proteins 979
Fig. 27.6. Guanidinium chloride-induced circles, right ordinate). The latter midpoint
unfolding of the homodimeric cAMP receptor is assigned to an equilibrium between folded
protein (CPR) followed by different spectro- dimer and (partly folded) monomeric interme-
scopic methods shows a sharp transition at diate. The midpoint at 3 M GdmCl is assigned
3 M GdmCl (filled circles, left ordinate). In to the equilibrium between the fully folded
contrast, the monomer/dimer ratio followed dimer and the fully unfolded monomers.
by sedimentation equilibrium analysis has a Figure redrawn from Ref. [51] with permission.
transition midpoint at 2 M GdmCl (empty
The first four proteins of Table 27.3 fold through thermodynamically stable inter-
mediates. In many cases, however, the equilibrium distribution between folded
dimer and unfolded monomer is well described by a two-state mechanism yet tran-
sitory intermediates are detectable in the kinetic analysis. Representative examples
in Table 27.3 are the ketosteroid isomerase, the glutathione transferase A1-1 and
the tryptophan repressor protein (TRP). Equilibrium analysis of all these proteins
conforms to two-state folding and fulfills at least one of criteria i–iv but none of
criteria v–viii listed in Table 27.1. Thus, dimeric ketosteroid isomerase is stabi-
lized by DGU ¼ 90 kJ mol1 under equilibrium conditions [53]. However, the time
course of unfolding exhibits three kinetic phases, a slow monomolecular phase
Df ! Di , a bimolecular phase Di ! 2Mi , and a rapid monomolecular phase
Mi ! Mu . Thus, two additional states are seen in kinetics. One may say that the
simple Eq. (1) applies to the enzyme at equilibrium yet the more realistic Eq. (3)
to the time-dependent nonequilibrium situation of folding and unfolding.
Folding of TRP has been studied in much detail [54–58]. Matthews and co-
workers have formulated the folding scheme shown in Figure 27.7 based on a large
Fig. 27.7. Complex folding scheme for refer to the rate of the final conformational
tryptophan repressor protein (TRP) from rearrangement from the dimeric intermediate
Escherichia coli through three ‘‘folding populations Di fast ; Di slow , and Di medium to the
channels.’’ Following the formation of two unique folded dimer Df . Di slow is a native-like
ensembles of partly folded monomeric intermediate and Di fast and Di medium are
intermediates, Mi fast and Mi slow , folding to the nonnative intermediates. Only the slow folding
dimer is controlled by three kinetic phases reaction through native-like intermediate Di slow
(‘‘channels’’). These are due to the association is reversible. Figure adapted from Ref. [58] with
of two molecules of Mi fast or Mi slow or of permission.
Mi fast þ Mi slow . ‘‘Fast,’’ ‘‘slow,’’ and ‘‘medium’’
982 27 Concurrent Association and Folding of Small Oligomeric Proteins
Fig. 27.8.Complex folding scheme for the meric (Mbx ) and dimeric (Dbx ) dead-end
heterodimeric bacterial luciferase composed of species formed from the b-subunit inter-
an a-subunit and a b-subunit. Folding proceeds mediate Mbi slow the folding and reduce the
through the monomeric intermediates Mai and folding yield. Figure adapted from Ref. [59]
Mbi and the dimeric intermediate Di . Mono- with permission.
27.4 Trimeric and Tetrameric Proteins 983
27.4
Trimeric and Tetrameric Proteins
We only briefly discuss a few examples of small trimeric and tetrameric proteins
and refer to Chapter 2 in Part II presenting the folding and assembly in vivo of
large, higher order multisubunit proteins. For statistical reasons, any reaction
higher than second order is very slow. Therefore, simultaneous association of three
polypeptide chains to a trimer is statistically unlikely, and simultaneous association
of four chains to a tetramer is impossible. Thus, trimers (Tr) and tetramers (Te) are
most likely to form by consecutive bimolecular reactions. For a trimer one can
write:
k1 k2
3M T D þ M T Tr
k1 k2
two-step folding of trimer ð7Þ
½M 2 k1 ½M½D k2
K d1 ¼ ¼ K d2 ¼ ¼
½D k1 ½Tr k2
Subscripts u; f , and i have been omitted in Eq. (7) to leave open the conformational
state of the three species. Of course, there can be monomeric, dimeric, and tri-
meric intermediates, making the folding mechanism very complex.
Three-state or two-step folding described by Eq. (7) has been reported for the de-
signed homotrimeric coiled-coil protein LZ16A composed of three 29-residue pep-
tide chains [60]. In the range of 1–100 mM total protein concentration, monomer,
dimer, and trimer are populated in a concentration-dependent equilibrium [61].
Figure 27.9 shows the decrease of the CD spectral minimum as a function of total
protein concentration. The data are best fit by two equilibria with K d1 ¼ 7:7 107
M and K d2 ¼ 2:9 106 M, where K d1 refers to the monomer/dimer equilibrium
and K d2 to the dimer/trimer equilibrium. K d1 < K d2 means that the dimer is a sta-
ble intermediate dominating at lower protein concentration. For example, at 10 mM
total protein concentration, there is 50% dimer, 35% trimer, and 15% monomer.
From kinetic refolding experiments the four rate constants defined by Eq. (7) have
been determined and were found to be in agreement with the equilibrium data
(see the legend to Figure 27.9).
So it may seem surprising that a trimeric protein can apparently fold in a single-
step reaction, which means by a two-state folding mechanism according to Eq. (8).
kf ½Mu 3 k u
3Mu T Trf ; Kd ¼ ¼ one-step folding of trimer ð8Þ
ku ½Trf kf
Fig. 27.9. Concentration dependence of the the rate constants k 1 ¼ 7:8 10 4 M1 s1 ,
CD signal at 222 nm of the trimeric coiled k1 ¼ 1:5 102 s1 , k2 ¼ 6:5 10 5 M1 s1 ,
coil LZ16A. The solid line is a best fit for a and k2 ¼ 1:1 s1 . Inset: backbone structure of
monomer–dimer–trimer equilibrium according dimer and trimer; ribbon models are from
to Eq. (7) with K d1 ¼ 7:7 107 M and 2ZTA.pdb (dimer) and 1gcm.pdb (trimer).
K d2 ¼ 2:9 106 M. These values are in good Figure adapted from Ref. [60] with permission.
agreement with the K1 and K2 calculated from
Fig. 27.10. Crystal structure of a six-helix to Eq. (8). DGU (H2 O) determined by
bundle composed of three ‘‘hairpins.’’ Each denaturant unfolding is 116 kJ mol1 , and
‘‘hairpin’’ comprises two helices linked by a 114 kJ mol1 when calculated from kinetic rate
short loop; the loop is not visible in this crystal constants. The trimolecular rate constant of
structure from Ref. [63]. Left: Axial view looking association is 1:3 10 15 M2 s1 and the
down the threefold axis of the helix bundle. unimolecular rate constant of dissociation is
Right: Lateral view. The trimer folds from 1:08 105 s1 [62].
unfolded monomers in a single step according
associate through at least two steps to the folded trimeric enzyme [64]. The folding
scheme is 3Mu ! 3Mi !! Trf .
Finally, we consider two tetrameric proteins: the C-terminal 30-residue domain
of the tumor suppressor protein p53 [65] and a small bacterial dihydrofolate reduc-
tase composed of four 78-residue polypeptide chains [66]. The C-terminal p53 do-
main folds into a homotetramer in a thermodynamically fully reversible reaction.
Interestingly, under equilibrium conditions, only folded tetramer and unfolded
monomer are observed [67]. However, the time course of folding shows two
consecutive bimolecular reactions with two kinetic, dimeric intermediates (Figure
27.11) [65]. Folding can be described as an association of dimers, in accord with
the crystal structure, which has been likened to a dimer of dimers. The two inter-
mediates shown in Figure 27.11 were deduced from extensive F-value analysis
[65]. The first bimolecular association reaction is between highly unstructured
monomers and can be described by a ‘‘nucleation-condensation’’ mechanism [68].
The first dimeric intermediate Di; 1 is formed through an early transition state that
has little similarity to the native structural organization of the dimer. The second
dimeric intermediate Di; 2 is highly structured and already includes most of the
structural features of the native protein. The transition Di; 2 ! Tef is an example
of a ‘‘framework mechanism’’ in which stable preformed structural elements ‘‘dif-
fuse and collide’’ [68].
A final example of a tetrameric protein is the bacterial R67 dihydrofolate reduc-
986 27 Concurrent Association and Folding of Small Oligomeric Proteins
Fig. 27.11. Qualitative reaction coordinate single bimolecular association reaction and a
diagram for the folding of the tetrameric single monomolecular dissociation reaction,
protein domain from tumor suppressor p53. which could be assigned as indicated [65].
Under equilibrium conditions, only a single Ribbon model of tetrameric C-terminal domain
transition between unfolded monomer (Mu ) of tumor suppressor p53 from sak.pdb. See the
and folded tetramer (Tef ) is seen [67]. Kinetic text for a more detailed discussion. Figure
refolding and unfolding experiments show a adapted from Ref. [65] with permission.
tase [66]. Here, two monomeric intermediates (the formation of one being due to
slow isomerization of peptidyl-proline bonds) precede dimer formation. Thereafter,
two dimers associate to the folded tetramer. The folding reaction can be summa-
rized as 4Mu ! 4Mi; 1 ! 4Mi; 2 ! 2Di ! Te.
27.5
Concluding Remarks
exposed in the transitory intermediates. This surface becomes buried only in the
final folded oligomer. Indeed, intermolecular or intersubunit forces dominate the
stability of many oligomeric proteins. However, the interface between the subunits
of the folded oligomeric proteins is highly variable. Shape complementarity be-
tween the contacting interfaces is a common feature and a main contributor to
the specificity of intersubunit interactions. In a survey of 136 homodimeric pro-
teins, a mixture of hydrophobic and intersubunit contacts was identified [69].
Small hydrophobic patches, polar interactions and water molecules are scattered
over the intersubunit contact area in two-thirds of the proteins studied. Only in
one-third of the dimers there was a single, large and contiguous hydrophobic core
between the subunits. Though the hydrophobic effect plays an important role in
subunit interactions, it seems to be less strong than that observed in the interior
of monomeric proteins [70]. Only in a few cases, oligomerization conforms to the
association of preformed stable subunits. The case of the trimeric coiled-coil
LZ16A at equilibrium with a stable dimeric coiled coil is probably an exception.
From studies addressing the nature of the transition states of folding, it fol-
lows that the same principles deduced for monomeric proteins [71] apply also to
oligomers. Folding of the subunits and their concurrent association to the oligo-
meric protein occurs by diffusion and collision of preformed structural frameworks
(‘‘framework model’’) as well as by simultaneous nucleation and condensation.
This is not surprising since the folded structure is always based on an extensive
interplay of secondary and tertiary interactions in monomeric proteins and of sec-
ondary, tertiary, and quaternary interactions in oligomeric proteins. Because there
is no intrinsic difference between tertiary and quaternary interactions there is no
principal difference of folding between monomers and oligomers. The main dif-
ference is in the association step. Whereas the folding of a monomeric protein
can be described by one or several consecutive or parallel monomolecular (single
exponential) reactions, the folding of an oligomeric protein exhibits at least one
concentration-dependent reaction step. On the one hand this feature complicates
the thermodynamic and kinetic analysis. On the other hand the concentration de-
pendence adds a useful variable to assist in the measurement of folding since the
monomer/oligomer equilibrium can be studied by simple dilution under benign
buffer conditions. Comparison of the data from dilution and from chemical or
heat denaturation can give information about the contribution of intermolecular
and intramolecular forces to the stability of the oligomeric protein. Also, determi-
nation of thermodynamic parameters such as the free energy of unfolding or the
heat capacity change of unfolding is facilitated when the protein concentration
can be adduced as an additional variable. Finally, in the case of heterooligomeric
proteins, folding can be probed by simple mixing of the different subunits.
In this Appendix we have put together a set of equations for the analysis of thermo-
dynamic and kinetic data of two-state folding. Many equations are equivalent to
988 27 Concurrent Association and Folding of Small Oligomeric Proteins
those for the folding of monomeric proteins except for an additional term account-
ing for protein concentration. Derivation of equilibrium association constants for
two-state folding of oligomeric systems and the relationship between equilibrium
constants and the thermodynamic parameters DG; DH, and DS have been de-
scribed in more detail by Marky and Breslauer [78]. The folded (native) state is
taken as the reference state so that the thermodynamic parameters refer to the un-
folding reaction (DGU ; DHU , etc.). A special form of Eq. (A29) describing reversible
two-state folding of a homodimer has been published by Milla and Sauer [20]. The
method of deducing kinetic rate constants from the shift of a pre-existing equilib-
rium is briefly summarized (method of Bernasconi [16]).
The kinetic and equilibrium equations presented here are useful for the analysis
of simple reaction steps, or steps that are kinetically or thermodynamically well
separated. If the reaction steps are linked, numerical integration can be used to
analyze kinetic and equilibrium data. There are good computer programs to aid
kinetic analysis, some of the programs are freely available on the Web (for example
DynaFit [79]). Depending on the scatter of experimental data, numerical fitting of
linked reactions may result in large errors and distinguishing between different
models can be very difficult.
Numbering of equations Equations appearing only in this Appendix have the pre-
fix A. Other equations are numbered as in the main text.
A1
Equilibrium Constants for Two-state Folding
nM T N ðA1Þ
½M n
Kd ¼ ðA2Þ
½N
Appendix – Concurrent Association and Folding of Small Oligomeric Proteins 989
½M
fM ¼ ð0 a fM a 1Þ ðA3Þ
½Pt
we can write K d as
n½Pt n1 fM n
Kd ¼ ðA4Þ
1 fM
where [Pt ] ¼ [M] þ n[N] is the total protein concentration expressed as total mono-
mer concentration. At fM ¼ 0:5 we have
M1 þ M2 þ þ Mn T N ðA6Þ
½Pt n1 fM n
Kd ¼ ðA8Þ
n n1 ð1 fM Þ
and at fM ¼ 0:5
n1
½Pt
K d; fM ¼0:5 ¼ ðA9Þ
2n
In Eqs (A8) and (A9), [Pt ] ¼ [M1 ] þ [M2 ] þ þ [Mn ] þ n[N]. The difference be-
tween Eqs (A4), (A5) and (A8), (A9) reflects the statistical difference between homo-
oligomeric and heterooligomeric equilibria. Equations (4) and (5) of the main text
correspond to Eqs (A4) and (A8), respectively.
990 27 Concurrent Association and Folding of Small Oligomeric Proteins
A2
Calculation of Thermodynamic Parameters from Equilibrium Constants
The superscript indicating standard conditions is omitted for simplicity. The van’t
Hoff enthalpy of unfolding is
d ln K d
DHUvH ¼ RT 2 ðA10Þ
dT
The term on the left hand side of Eq. (A12) follows from Eqs (2) and (A4), that on
the left hand side of Eq. (A13) from Eqs (2) and (A8). The slope m is the change of
DGU with denaturant concentration (J mol1 M1 ).
A2.3 Calculation of the van’t Hoff Enthalpy Change from Thermal Unfolding Data
From the change of K d with T, the van’t Hoff enthalpy is calculated according
to Eq. (A10). In practice, one calculates DHUvH at Tm from a plot of ln K d against
1/T. With Eq. (2) one obtains the Arrhenius equation
Appendix – Concurrent Association and Folding of Small Oligomeric Proteins 991
DHU; m DSU
ln K d ¼ þ ðA14Þ
RT R
where DHU; m is the van’t Hoff enthalpy at Tm . Only data points in a narrow inter-
val around Tm should be used to calculate DHU; m from the slope DHU; m /R since
Eq. (A14) neglects the change of the enthalpy with T (heat capacity change as-
sumed to be zero).
A2.4 Calculation of the van’t Hoff Enthalpy Change from the Concentration-
dependence of Tm
Since association of an oligomeric protein is concentration dependent, Tm changes
with concentration. Measuring Tm for varying [Pt ] is another way to calculate
DHU; m . From Eqs (A5) and (A9) it follows that at Tm ; K d is described by
From Eqs (2), (A15), and (A16) one obtains after rearranging [78]:
1 ðn 1ÞR DS ðn 1ÞR ln 2 þ R ln n
¼ ln½Pt þ homooligomer ðA17Þ
Tm DHm DHm
1 ðn 1ÞR DS ðn 1ÞR ln 2n
¼ ln½Pt þ heterooligomer ðA18Þ
Tm DHm DHm
The term RT ln½K d ðTm Þ on the right hand side is necessary to account for the
concentration dependence of DGU . K d ðTm Þ corresponds to K d at fM ¼ 0:5. Depend-
ing on the number of subunits and on whether the protein is a homo- or a hetero-
oligomer, the appropriate form of K d from Eq. (A5) or (A9) is inserted in Eq. (6).
For example, the correction term for a homodimeric protein is RT lnð½Pt Þ. The
concentration-independent form of the Gibbs-Helmholtz equation is obtained by
using the reference temperature Tg at which DGU ¼ 0
T T
DGU ðTÞ ¼ DHðTg Þ 1 þ DCP T Tg T ln ðA19Þ
Tg Tg
992 27 Concurrent Association and Folding of Small Oligomeric Proteins
A3
Kinetics of Reversible Two-state Folding and Unfolding: Integrated Rate Equations
The following integrated rate equations for two-state folding of a dimeric protein
are used to calculate the folding rate constant k f and the unfolding rate con-
stant k u from kinetic traces. Depending on experimental conditions, only folding
(A3.1), only unfolding (A3.2), or both (A3.3) have to be taken into account.
d½M
¼ k f ½M 2 ðA21Þ
dt
After integration, the time course of disappearance of the monomer fraction be-
comes
½M0
fM ðtÞ ¼ ðA22Þ
½Pt ðk f t þ 1Þ
ku
D ! 2M ðA23Þ
d½D
¼ k u ½D ðA24Þ
dt
2½D0
fD ðtÞ ¼ expðk u tÞ ðA25Þ
½Pt
where [D0 ] is the initial dimer concentration at t ¼ 0. [D0 ] ¼ [Pt ]/2 if [M] ¼ 0 at
t ¼ 0. Equation (A25) is used to obtain k u from an unfolding trace expressed as
Appendix – Concurrent Association and Folding of Small Oligomeric Proteins 993
d½M
¼ k f ½M 2 þ 2k u ½D ðA26Þ
dt
d½M
¼ dt ðA27Þ
a þ b½M þ c½M 2
where
a ¼ k u ½M0 ; b ¼ k u ; c ¼ k f ðA28Þ
ðb þ sÞðZ 1Þ
fM ðtÞ ¼
½Pt 2cð1 ZÞ
ðA29Þ
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2c½M0 þ b s
where Z ¼ expðstÞ and s ¼ b 2 4ac
2c½M0 þ b þ s
Equation (A29) is used to obtain k f and k u from a folding trace expressed as fM ðtÞ
under conditions where both folding and unfolding have to be taken into account
(for example at intermediate denaturant concentration).
d½M1
¼ k f ½M1 ½M2 þ k u ½D ðA31Þ
dt
Eq. (A27) except that the constants a; b, and c are different. If [M1 ] ¼ [M2 ], a; b,
and c are
If [M1 ] 0 [M2 ]
or
Equation (A33) is valid for fM1 ðtÞ and (A34) for fM2 ðtÞ; subscript 0 indicates con-
centration at t ¼ 0.
A4
Kinetics of Reversible Two-state Folding: Relaxation after Disturbance of a Pre-
existing Equilibrium (Method of Bernasconi)
kf 1
2M T D ¼ 4k f ½M þ k u (A35)
ku t
kf 1
nM T N ¼ n 2 k f ½M n1 þ k u (A36)
ku t
kf 1
M 1 þ M2 T D ¼ k f ð½M1 þ ½M2 Þ þ k u (A37)
ku t
k1 k2
2M T Di T D t1 < t2 : bimolecular association is faster than (A38)
k1 k2
monomolecular rearrangement
1 1 4k 1 k2 ½M
¼ 4k 1 ð½MÞ þ k1 , ¼ þ k2
t1 t2 4k 1 ð½MÞ þ k1
k1 k2
2M T Di T D t1 > t2 : bimolecular association is slower than (A39)
k1 k2
monomolecular rearrangement
1 1 k1 k2
¼ k 1 þ k1 , ¼ 4k 1 ½M þ
t1 t2 k2 þ k2
a Concentrations with overbars are equilibrium concentrations after
relaxation to the new equilibrium.
References 995
librium between folded and unfolded protein is rapidly disturbed, for example by
rapid dilution or rapid increase of temperature (T-jump), and the subsequent relax-
ation to the new equilibrium is observed. This relaxation is characterized by the re-
laxation time t ¼ 1/kapp , where kapp (s1 ) is the apparent rate constant of the expo-
nential decay to the new equilibrium. The number of relaxation times equals the
number of reaction steps. Table 27.4 lists a few equations for different reversible
folding reactions. They describe the relationship between the experimentally mea-
sured relaxation time t and the individual rate constants of the reversible folding–
unfolding reaction. The main difficulty with this procedure is that the equilibrium
concentrations of the reactants at the new equilibrium, that is after relaxation, have
to be known (concentrations with overbars in Table 27.4). These concentrations can
be calculated if the equilibrium constants are known from independent experi-
ments. Alternatively, a fitting procedure can be used in which initial estimates of
the equilibrium constants are iteratively adapted until a best fit is obtained.
Acknowledgments
I thank Ilian Jelesarov and Daniel Marti for fruitful discussions and helpful com-
ments on the manuscript. Work from my own laboratory has been supported in
part by the Swiss National Science Foundation.
References
28
Folding of Membrane Proteins
Lukas K. Tamm and Heedeok Hong
28.1
Introduction
The major forces that govern the folding and thermodynamic stability of water-
soluble proteins have been studied in significant detail for over half a century [1,
2]. Even though the reliable prediction of protein structure ab initio from amino
acid sequence is still an elusive goal, a huge amount of information on the mech-
anisms by which soluble proteins fold into well-defined three-dimensional struc-
tures has been obtained over this long time period [3]. In marked contrast, our
current knowledge on thermodynamic and mechanistic aspects of the folding of
membrane proteins is several orders of magnitude less advanced than that of solu-
ble proteins, although integral membrane proteins comprise about 30% of open
reading frames in prokaryotic and eukaryotic organisms [4]. There are several rea-
sons why the research on membrane protein folding lags far behind similar re-
search on soluble proteins.
First, the first structure of a soluble protein (myoglobin) was solved in 1958, but
the first membrane protein structure (photosynthetic reaction center) was only de-
termined 27 years later in 1985. Even today, of the approximately 25 000 protein
structures in the Protein Structure Databank (PDB), only about 140 (i.e., less than
0.6%) are structures of integral membrane proteins.
Second, membrane proteins are more difficult to express, purify, and handle in
large batches than soluble proteins. They require the presence of nondenaturing
detergents or lipid bilayers to maintain their native structure, which adds an addi-
tional layer of complexity to these systems compared with soluble protein systems.
The requirement for detergents or membranes also explains why it is more diffi-
cult to crystallize membrane proteins or subject them to nuclear magnetic reso-
nance (NMR) for high-resolution structural studies.
Third, it is very difficult to completely denature membrane proteins [5]. Solvent
denaturation with urea or guanidinium chloride (GdmCl) is generally not appli-
cable to membrane proteins because secondary structures that are buried in the
membrane are often not accessible to these denaturants. Membrane proteins are
also quite resistant to heat denaturation, and even if they can be (partially) dena-
tured, they are prone to irreversible aggregation. These properties have not helped
to make membrane proteins favorable targets for folding studies. However, as will
be shown in this chapter, new and different techniques are being developed to
study the folding of membrane proteins. If this is done with sufficient care, mem-
brane proteins can be brought back ‘‘into the fold’’ and very useful and much
needed information can be gathered on the molecular forces that determine their
thermodynamic stability and mechanisms that lead to their native structures. Even
though most helical membrane proteins are inserted into biological membranes by
means of the translocon [6], knowledge of the energetics and kinetics of mem-
brane protein structure formation are important from a basic science standpoint
and for structure prediction and protein engineering of this still neglected class of
proteins, which constitutes the largest fraction of all targets of currently available
drugs.
To understand the folding of membrane proteins, we first need to understand
the dynamic structure of the lipid bilayer (i.e., the ‘‘solvent’’) into which these
proteins fold. Fluid lipid bilayers are highly dynamic, yet well-ordered two-
dimensional arrays of two layers of lipids with their polar headgroups exposed to-
wards water and their apolar fatty acyl chains shielded from water and facing the
center of the bilayer. This structure [7, 8] is maintained almost entirely by the hy-
drophobic effect [9]. The hydrocarbon chains in the core of the bilayer are quite
well ordered, but their order degrades towards the midplane of the bilayer. This
region comprises a width of about 30 Å in a ‘‘typical’’ bilayer of dioleoylphosphati-
dylcholine. The two interfaces of the bilayer are chemically and structurally quite
complex and comprise a width of about 15 Å each. These regions contain the phos-
phate and choline headgroups, the glycerol backbone and about 25 molecules of
bound, but disordered water for each phosphatidylcholine (PC) headgroup. The lip-
ids are free to diffuse laterally (@1 mm 2 s1 ) and rotationally (@1 ms1 ), but basi-
cally do not flip-flop across the membrane.
The molecular packing of lipids in a fluid lipid bilayer is maintained mechani-
cally by a combination of three types of opposing forces: lateral chain repul-
sions in the core region, headgroup repulsions, and surface tension at the polar–
nonpolar interface [10]. These forces create a lateral pressure profile along the
membrane normal that cannot be directly measured, but that has been calculated
based on the known magnitudes of the relevant forces [11]. This packing and lat-
eral pressure profile has profound consequences on the lipid hydrocarbon chain
dynamics [12] and the folding of membrane proteins as will be summarized in
later sections of this chapter. Therefore, the mechanical and dynamical properties
as well as the chemical properties of membranes change dramatically as the com-
position of chemical groupings changes along the bilayer.
To dehydrate and bury a single peptide bond in the lipid bilayer is energetically
unfavorable and costs 1.15 kcal mol1 [13]. The energetic gain of folding a poly-
peptide into an internally hydrogen-bonded secondary structure in membranes is
much smaller (i.e., of the order of 0.1 to 0.4 kcal per residue) [14–17]. This
shows that partitioning of the bare backbone (e.g., polyglycine) into membranes is
energetically unfavorable irrespective of whether the peptide is folded or not. Pro-
1000 28 Folding of Membrane Proteins
28.2
Thermodyamics of Residue Partitioning into Lipid Bilayers
The partitioning of apolar amino acid side chains into the membrane provides the
major driving force for folding of a-helical and b-barrel membrane proteins. Fold-
ing of membrane proteins is always coupled to partitioning of the polypeptide
from water into the membrane environment. A large number of hydropathy scales
have been developed over the years. The different scales have been derived based
on very different physical or statistical principles and, therefore, differ for different
residues sometimes quite significantly. It is beyond the scope of this chapter to re-
view the different scales that are in use. The only scales that are based on compre-
hensive partitioning measurements of peptides into membranes and membrane-
mimetic environments are the two Wimley & White (WW) scales, namely the
WW interface [19] and the WW octanol scale [13, 20].
The WW interface scale refers to partitioning into the interface region of phos-
phatidylcholine bilayers and the octanol scale refers to partitioning into octanol,
which is thought to be a good mimetic of the core of the lipid bilayer. As has been
demonstrated for the hydrophobic core of soluble proteins [21], the dielectric con-
stant in the core of a lipid bilayer may be as high as 10 due to water penetration
28.3 Stability of b-Barrel Proteins 1001
(i.e., considerably larger than 2 in pure hydrocarbon and closer to that of wet octa-
nol). Both scales are whole-residue scales, which means that they take into account
the thermodynamic penalty of also partitioning the peptide backbone into the re-
spective membrane locations. Based on these partition data, one may group the
amino acid residues into three groups, namely those that do not favor membranes
(Ala, Ser, Asn, Gly, Glu , Asp , Asp 0 , Lysþ , Argþ , Hisþ ), those that favor the inter-
face (Trp, Tyr, Cys, Thr, Gln, His 0 , Glu 0 ), and those that favor the hydrocarbon core
(Phe, Leu, Ile, Met, Val, Pro). Sliding a 19-residue window of summed WW hydro-
pathy values over the amino acid sequences of membrane proteins and using the
thermodynamic DG ¼ 0 cutoff predicts TM helices with high accuracy [13]. The
residues that fall in the first and third group are not surprising. However, the resi-
dues that prefer the interface deserve a few comments. Statistical analyses and
model studies of membrane proteins have indicated for quite some time that the
aromatic residues Trp and Tyr residues have a strong preference for membrane
interfaces [22] (see also Figure 28.1). It has been suggested that their bulkiness,
aromaticity and slightly polar character are the major factors that determine their
preferred location in the interface [23].
Hydrogen bonding is probably not important for this localization. The moder-
ately polar residues in this group are not surprising. However, the uncharged
form of glutamic acid interestingly is apolar enough to localize into the interface.
The differential partitioning of glutamate depending on its charge state likely has
profound consequences on the pK a value of this residue at membrane surface be-
cause the pK a is now coupled to partitioning. A manifestation of this shifted pK a is
seen, for example, when the binding and membrane penetration of the fusion pep-
tide of influenza hemagglutinin is examined [16]. This peptide contains two Glu
residues in its N-terminal portion. A shift of the pH from 7 to 5 is sufficient to
bind the peptide more tightly and more deeply (i.e., deep enough to cause mem-
brane fusion at pH 5, but not at pH 7).
28.3
Stability of b-Barrel Proteins
All b-barrel membrane proteins whose structures are known at high resolution are
outer membrane proteins of gram-negative bacteria [24]. However, several mito-
chondrial and chloroplast outer membrane proteins, namely those that are part of
the protein translocation machinery of these membranes also appear to be b-barrel
membrane proteins [25, 26]. Bacterial outer membrane proteins are synthesized
with an N-terminal signal sequence and their translocation through the inner
membrane is mediated by the translocon consisting of the SecY/E/G complex
with the assistance of the signal recognition particle and the SecA ATPase, which
together constitute the prokaryotic protein export machinery [27]. Helical mem-
brane proteins that reside in the inner membrane have contiguous stretches of
about 20 hydrophobic residues that form TM helices, which presumably can exit
the translocon laterally into the lipid bilayer [6]. The TM strands of outer mem-
1002 28 Folding of Membrane Proteins
brane proteins on the other hand are composed of alternating hydrophilic and hy-
drophobic residues [28]. Therefore, they lack the signal for lateral exit from the
translocon into the inner membrane and become secreted into the periplasmic
space, where they are presumably greeted by periplasmic chaperones such as Skp
and SurA [29, 30]. Skp binds unfolded outer membrane proteins, prevents them
from aggregation, but does not appear to accelerate their insertion into lipid model
membranes (D. Rinehart, A. Arora, and L. Tamm, unpublished results).
The outer membrane proteins of gram-negative bacteria may be grouped into six
families according to their functions [31]. Structures of members of each of these
families have been solved by X-ray crystallography or NMR spectroscopy. The six
families with representative solved structures are: (i) the general porins such as
OmpF and PhoE [32, 33], (ii) passive sugar transporters such as LamB and ScrY
[34, 35], (iii) active transporters of siderophores such as FepA, FecA, and FhuA
[36–39] and of vitamin B12 such as BtuB [40], (iv) enzymes such as the phospholi-
pase OmpLA [41], (v) defensive proteins such as OmpX [42], and (vi) structural
proteins such OmpA [43]. All outer membrane proteins form b-barrels. The b-
barrels of these proteins consist of even numbers of b-strands ranging from 8 to
22. The average length of the TM b-strands is 11 amino acid residues in trimeric
porins and 13–14 residues in monomeric b-barrels. Since the strands are usually
inclined at about 40 from the membrane normal, they span about 27–35 Å of the
outer membrane, respectively.
Most outer membrane proteins have long extracellular loops and short periplam-
sic turns that alternatingly connect the TM b-strands in a meandering pattern. The
loops exhibit the largest sequence variability within each outer membrane protein
family [31]. The largest and smallest outer membrane proteins are monomeric, but
others such as OmpLA are dimers and the porins are trimers of b-barrels.
Two general approaches have been taken to examine the thermodynamic stabil-
ity of b-barrel membrane proteins: solvent denaturation by urea and guanidinium
chloride and differential scanning calorimetry (DSC). In the following, we will dis-
cuss the thermodynamic stability of the simple monomeric b-barrel protein OmpA
and then extend the discussion to complexities that arise by the presence of the
central plug domain of FepA and FhuA and trimer formation in the case of porins.
The most extensive thermodynamic stability studies have been carried out with the
outer membrane protein A (OmpA) of Escherichia coli. OmpA is an abundant struc-
tural protein of the outer membrane of gram-negative bacteria. Its main function
is to anchor the outer membrane to the peptidoglycan layer in the periplasmic
space and thus to maintain the structural integrity of the outer cell envelope [44].
The 325-residue protein is a two-domain protein whose N-terminal 171 residues
constitute an eight-stranded b-barrel membrane-anchoring domain and whose C-
terminal 154 residues form the periplasmic domain that interacts with the peptido-
glycan. The three-dimensional structure of the TM domain has been solved by X-
ray crystallography [43] and, more recently, by solution NMR spectroscopy [45].
The structure resembles a reverse micelle with most polar residues facing the inte-
rior, where they form numerous hydrogen bonds and salt bridges, and the apolar
residues facing the lipid bilayer [43]. Two girdles of aromatic side chains, trypto-
28.3 Stability of b-Barrel Proteins 1003
Fig. 28.1. Three-dimensional structures of (1QJP) from Escherichia coli. The aromatic side
representative a-helical and b-barrel membrane chains (Trp, Tyr, Phe) are concentrated at the
proteins. Left: Bacteriorhodopsin from bilayer–water interfaces as shown in both
Halobacterium salinarium (PDB code: 1C3W). structures.
Right: Transmembrane domain of OmpA
phans and tyrosines, delineate the rims of the barrel where they interact with the
two membrane interfaces (Figure 28.1).
Extensive mutagenesis studies show that OmpA is quite robust against many
mutations especially in the loop, turn, and lipid bilayer-facing regions [46]. OmpA
can even be circularly permutated without impairing its assembly and function in
outer membranes [47]. Dornmair et al. [48] showed that OmpA could be extracted
from the outer membrane and denatured and solubilized in 6–8 M urea and that
the protein spontaneously refolded into detergent micelles by rapid dilution of
the denaturant. They subsequently showed that urea-unfolded OmpA can also be
quantitatively refolded into preformed lipid bilayers by rapid dilution of urea [49].
Recently, we could achieve the complete and reversible refolding of OmpA in lip-
id bilayers [50]. The reaction was shown to be a coupled two-state membrane par-
tition folding reaction with the unfolded state in urea being completely dissociated
from the membrane and the folded state completely integrated into the membrane
(Figure 28.2). This represents the first example of an integral membrane protein,
for which the thermodynamic stability could be quantitatively accessed. To achieve
this goal, a urea-induced equilibrium folding system was developed for OmpA.
The system consisted of small unilamellar vesicles composed of 92.5 mol% phos-
phatidylcholine and 7.5 mol% phosphatidylglycerol in weakly basic (pH 7.0–10)
and low ionic strength conditions. These conditions facilitate both the folding and
unfolding reactions and dissociation of the unfolded form from the membrane sur-
face. Folding of OmpA is monitored by tryptophan fluorescence or CD spectros-
copy, or by sodium dodecylsulfate polyacrylamide gel electrophoresis (SDS-PAGE)
(Figure 28.3).
1004 28 Folding of Membrane Proteins
Bending stress
Lateral pressure
Each of these techniques monitors a different aspect of the folding reaction. Trp
fluorescence reports on the insertion of tryptophans into the lipid bilayer, far-UV
CD on the formation secondary structure, and SDS-PAGE on the completion of
the tertiary b-barrel structure. If samples are not boiled prior to loading onto the
SDS gels, they run at an apparent molecular mass of 30 kDa if the protein is com-
pletely folded, but at 35 kDa if it is unfolded or incompletely folded.
This shift on SDS gels has proven to be a very useful assay for tertiary structure
formation of OmpA and other outer membrane proteins [51]. Complete refolding
as measured by the SDS-PAGE shift correlates with the reacquisition of the ion
channel activity of OmpA [52]. Since unfolding/refolding curves as a function of
denaturant concentration measured by three techniques that report on vastly differ-
ent kinetic processes superimpose, it is very likely that folding/unfolding is a ther-
28.3 Stability of b-Barrel Proteins 1005
Unfolded fraction
Fig. 28.3. Unfolded fraction of OmpA as a binding, secondary structure, and tertiary
function of urea concentration obtained from structure, respectively, superimpose in
the average fluorescence emission wavelength equilibrium measurements although they
hli, far-UV CD spectroscopy, and the SDS- develop at different times in kinetic
PAGE shift assay. These measures of lipid experiments. Adapted from Ref. [50].
Fig. 28.4. Dependence of A) the free energy of small lateral bilayer pressure. Open circles:
folding DG H2O and B) the m-value reflecting double-unsaturated acyl chain series with
the cooperativity of folding on the hydrophobic increasing lateral bilayer pressure as
thickness of phosphatidylcholine bilayers. hydrophobic thickness decreases. Adapted
Filled circles: saturated acyl chain series with from Ref. [50].
were investigated. Increasing the thickness of the lipid bilayer increases the stabil-
ity (i.e., decreases DG H2O ) of OmpA. It also increases the cooperativity (m-value) of
folding (Figure 28.4). The decrease in DG H2O is 0.34 kcal mol1 per Å of
increased bilayer thickness, which converts to 4 cal mol1 per Å 2 of increased
hydrophobic contact area. This is only about 20% of the standard value for the hy-
drophobic effect [9]. We conclude that elastic energy due to lipid deformation
counteracts the energy gain that would be expected for a full development of the
hydrophobic effect.
Including cone-shaped lipids with a smaller cross-sectional headgroup than
acyl chain area increases the lateral pressure within lipid bilayers [53]. Increasing
the lateral pressure in membranes by including lipids with increasing relative
28.3 Stability of b-Barrel Proteins 1007
Fig. 28.5. Effect of increasing mol fraction of fluorescence. B) Dependence of DG H2O and
phosphatidylethanolamine (C16:0 C18:1 PE) in m-value of OmpA folding on the cone-shaped
phosphatidylcholine (C16:0 C18:1 PC) bilayers and lateral pressure-inducing lipid C16:0 C18:1 PE.
on the thermodynamic stability of OmpA. Adapted from Ref. [50].
A) Unfolding curves measured by Trp
amounts of cis double bonds in the hydrocarbon region increased the stability (i.e.,
decreased DG H2O ) of OmpA from 3.4 to 7 kcal mol1 and more negative val-
ues when phosphatidylethanolamines were included as cone-shape lipids (Figure
28.5). The m-value exhibited an increase that was correlated with the DG H2O de-
crease, in the case of bilayer thickening, but a decrease when the bilayer pressure
was increased with cis double bonds. The various bilayer forces that act on OmpA
and modulate its thermodynamic stability in membranes are illustrated in Figure
28.2.
Gram-negative bacteria possess active transport systems for the uptake of iron–
siderophore complexes, vitamin B12 , and other essential nutrients. The systems
consist of active ATP-requiring transporters in the inner membrane, the TonB
linker protein that spans the periplasm, and a family of proteins called TonB-
dependent transporters in the outer membrane. The TonB-dependent trans-
porters share common structural features. They are monomeric 22-stranded
b-barrels. The lumina of these large-diameter barrels are filled with a N-terminal
1008 28 Folding of Membrane Proteins
in a hydrophobic contact with a few residues inside the barrel. The loop forms a
constriction in the channel and its acidic residues together with a cluster of basic
residues on the opposite channel wall are thought to exert a transverse electric field
across the channel opening. The shorter loop L2 forms a ‘‘latch’’ that reaches over
from one subunit to a neighboring subunit and thereby further stabilizes inter-
subunit interactions. Porins are extremely stable towards heat denaturation, pro-
tease digestion, and chemical denaturation with urea and GdmCl [31]. For exam-
ple, extraction of OmpF from the E. coli outer membrane requires the heating of
outer membranes in mixtures of isopropanol and 6 M GdmCl at 75 C for more
than 30 min [58]. The trimer structure itself is maintained up to 70 C in 1%
SDS [59].
A mutagenesis study combined with DSC and SDS-PAGE shift assays demon-
strates that the inter-subunit salt bridge and hydrogen bonding interactions involv-
ing loop L2 contribute significantly to the trimer stability [59]. For example, muta-
tions breaking the salt bridge between Glu71 and Arg100 decrease the trimer–
monomer transition temperature from 72 to 47–60 C and DHcal from 430 to
200–350 kcal mol1 . A similar behavior was observed when residues 69–77 of L2
were deleted. The energetic contribution of the nonpolar contact between neighbor-
ing barrel walls has not yet been investigated in porins, but would be interesting to
examine and compare with lateral associations of interacting TM helices (see below).
28.4
Stability of Helical Membrane Proteins
In 1990, Popot and Engelman proposed a two-stage model for the folding of a-
helical membrane proteins [60]. In the framework of this model, TM helices are
thought to be independently folded units (stage I) that may laterally associate into
TM helix bundles (stage II). The model was inspired by the finding of the same
group that the seven-TM-helix protein bacteriorhodopsin (BR) can be split into he-
lical fragments that would still integrate as TM helices and then laterally assemble
into a seven-helix protein that was capable of binding to cofactor retinal just as the
wild-type protein [61]. Since then, it has been realized that more stages are needed
to describe the folding of helical proteins.
The interface is a membrane compartment that may be important in early stages
of folding [62]. In addition, cofactors and polypeptide hinges and loops can be ad-
ditional factors that sometimes play important roles in late stages of helical mem-
brane protein folding [63]. Although the real world of membrane proteins is clearly
more complex, the two-stage model has some merits because it is conceptually
simple and because it appears to correctly describe the hierarchy of folding of at
least a few simple membrane proteins. Therefore, we describe in this section how
individual TM helices are established and then proceed in the following section
with lateral helix associations and higher order structural principles of membrane
protein folding.
As indicated above, open hydrogen bonds are energetically unfavorable in mem-
1010 28 Folding of Membrane Proteins
branes. Sequence segments with a sufficient number of apolar residues are driven
into membranes by the hydrophobic effect where they fold into TM or interfacial a-
helices. The thermodynamic gain of folding a 20-residue peptide into a TM helix is
about 5 kcal mol1 , if we take 0.25 kcal mol1 as the per-residue increment of
forming an helix in membranes [16]. Generally and in contrast to soluble protein
a-helices, TM helices are quite rich in glycines. Prolines are also not uncommon in
TM helices, where they often induce bends, but do not completely break the helix.
Finally, serines, threonines, and cysteines are frequently interspersed into other-
wise apolar TM sequences. These latter residues often form hydrogen bonds with
backbone carbonyls and may therefore have a more apolar character in TM helices
than in aqueous environments.
It is generally not possible to obtain interpretable results on the stability of heli-
cal integral membrane proteins from thermal denaturation studies because ther-
mal denaturation of these proteins usually leads to irreversibly denatured states
[5]. However, the thermodynamic stabilities of a few polytopic helical membrane
proteins have been studied by partial denaturation with strong ionic detergents.
Early refolding studies by Khorana and coworkers showed that BR could be quan-
titatively refolded by step-wise transfer from organic solvent to SDS to renaturing
detergents or lipids [64, 65]. These protocols were later refined to measure the ki-
netics of refolding of BR upon transfer from the ionic detergent SDS into mixed
lipid/detergent micelles [66, 67]. It should be kept in mind that although tertiary
contacts are lost, most secondary structure is retained or perhaps even partially re-
folded in SDS. Therefore, the reactions observed with these protocols report only
on the last stages of membrane protein folding. In 1997, Lau and Bowie developed
a system to reversibly unfold native diacylglycerol kinase (DAGK) in n-decyl-b-
maltoside (DM) micelles by the addition of SDS [68]. In these carefully controlled
studies, the ratios of the denaturing and native structure-supporting detergents
were systematically varied, so that reversibility of the partial unfolding reaction
could be demonstrated. By analyzing their data with the two-state model of protein
folding, these authors determined the stability of the three-helix TM domain of
DAGK in DM relative to SDS to be 16 kcal mol1 . This large (negative) value is
somewhat surprising and indicates an extremely high thermodynamic stability of
DAGK. A similar study has been carried out recently with BR [69].
28.5
Helix and Other Lateral Interactions in Membrane Proteins
The packing density and the hydrophobicity of amino acid residues involved in the
intramolecular contact of a-helical membrane proteins are similar to those in the
hydrophobic cores of water-soluble proteins. The basic principles of the hydropho-
bic organization of residues in water-soluble proteins appear to hold true also for
membrane proteins. However, the energy gain of the hydrophobic effect has al-
ready been expended on inserting individual a-helices into membranes and cannot
be a driving force for helix association. From comprehensive mutagenesis work
28.5 Helix and Other Lateral Interactions in Membrane Proteins 1011
number, the 26 cal mol1 per Å 2 , packing energy may be more universal than pre-
viously thought and may reflect van der Waals, hydrophobic, and lipophobic inter-
actions. Alternatively, water may access helical surfaces in partially denaturing SDS
micelles better than in native structure-supporting DMPC/CHAPSO micelles, in
which case the result may still conform to the classical hydrophobic effect.
An additional mechanism of helix interaction is thought to arise from inter-helix
hydrogen bonds. Side chain-to-side chain (Asn, Gln, Asp, Glu, His) hydrogen
bonds have been introduced into engineered TM helical bundles and were found
to stabilize homodimers and homotrimers [76–79]. Ser and Thr side chains were
also examined in these host–guest model sequences, but were insufficient to sup-
port helix association. It has also been suggested that backbone Ca-H to backbone
carbonyl hydrogen bonds might stabilize membrane proteins [80]. However, al-
though such hydrogen bonds are observed quite frequently in crystal structures of
membrane (and soluble) proteins, they may merely allow close packing and may
not make significant energetic contributions to the association of neighboring hel-
ices as has been demonstrated for one such bond in BR [81]. Pore loops are found
to intercalate into open angled helical membrane protein structures as exemplified
by the KcsA potassium channel structure [82]. The loops are generally also fixed to
the helical framework of the protein by side chain-to-side chain hydrogen bonds,
namely a hydrogen bond from a loop Tyr to a neighboring helix Trp and Thr in
the case of KcsA. Finally, many membrane protein structures contain cofactors
and pigments that likely contribute significantly to the stability and interaction spe-
cificity of these structures. Intersubunit interactions that determine quaternary
structures of membrane proteins generally follow the same helix packing rules as
those determining their tertiary structures [83]. In case of BR, van der Waals pack-
ing between nonpolar surfaces, hydrogen bonding, and aromatic ring stacking of
tyrosines in the bilayer interface all contribute to trimer formation.
28.6
The Membrane Interface as an Important Contributor to Membrane Protein Folding
In this and the following sections we turn to pathways and possible intermediates
of membrane protein folding. Jacobs and White [84] proposed a three-step thermo-
dynamic model of membrane protein folding based on structural and partitioning
data of small hydrophobic peptides. In this model, the first three steps of mem-
brane protein folding are thought include partitioning into the interface, folding
in the interface, and translocation across the membrane to establish a TM helix.
The model was later extended to a four-step model, which added helix–helix inter-
actions as the fourth step [62]. The model is rather a thermodynamic concept than
an actual pathway of folding of constitutive helical membrane proteins because
their TM helices are inserted with the assistance of the translocon in biological
membranes. However, the concept is relevant for the insertion of helical toxins
into membranes, which occurs without the assistance of other proteins, and for
the placement of interfacial helices into membranes. Separation of a partition and
28.7 Membrane Toxins as Models for Helical Membrane Protein Insertion 1013
folding step in the interface is probably also mostly semantic because in all observ-
able reactions these two processes appear to be coupled. Nevertheless, to think of
protein folding as a coupled process of two components is useful for separating
the thermodynamic contributions of each component [14–16]. Partition–folding
coupling was observed more than 20 years ago for toxic peptides [85], peptide
hormones [86], signal peptides [87, 88], and membrane fusion peptides [89].
Beta-sheet forming proteins and peptides may also insert and fold in the mem-
brane interface before they become completely inserted into and translocated
across the membrane [90–93]. An example of the respective contributions of
DG; DH, and DS to partitioning and folding of an a-helical fusion peptide in lipid
bilayers is shown in Figure 28.6.
28.7
Membrane Toxins as Models for Helical Membrane Protein Insertion
There are several proteins that assume globular structures in solution, but upon
interaction with a membrane receptor or at low pH insert into and sometimes
translocate across membranes. Among them are several colicins, which are trans-
located across outer bacterial membranes via porins and subsequently inserted into
the inner membrane where they form toxic pores. Other toxins such as diphtheria,
tetanus, and botulinum toxins spontaneously insert their translocation domains
into mammalian cell membranes and thereby permit the translocation of linked
catalytic domains into the cells. Since these proteins form helical structures in so-
lution and in membranes, they are good models to study the refolding of soluble
helical proteins into membranes. The channel-forming colicins are perhaps in this
regard the best-studied bacterial toxins. The structures of the soluble forms of the
channel domains of several colicins are known [94]. They consist of a central hy-
drophobic helical hairpin that is surrounded by eight amphipathic helices.
Refolding at the membrane interface has been implicated in many models of
colicin insertion into lipid bilayers. An early study found a molten globule interme-
diate at an early stage of colicin A insertion into membranes [95]. The helices were
formed, but tertiary contacts were not. Subsequent fluorescence studies indicated
that the helices of colicin E were located in the interface, but widely dispersed in
this state [96, 97]. The central hydrophobic helical hairpin of colicin A appears to
be more deeply inserted, but does not assume a complete TM topology [98]. How
the voltage-gated ion channel is opened from this stage is still unclear. Apparently,
long stretches of sequence translocate across the membrane upon the application
of a membrane potential by a process that is still poorly understood structurally
and mechanistically [99, 100].
The process of membrane insertion of the translocation domain of diphtheria
toxin likely resembles that of the colicins. The structure of the soluble form
features a hydrophobic helical hairpin similar to that of the colicins [101]. It is
thought that this hairpin inserts first into the lipid bilayer. The membrane-bound
protein can exist in two conformations with an either shallow or deeply inserted
1014 28 Folding of Membrane Proteins
0
0.2
–9
P8 P13 P 16 P20 G 1S G 1V
0
DDH (kcal mol-1)
8.1
–12 9.1 7.7
–16
P8 P13 P 16 P20 G 1S G 1V
9
-TDDS (kcal mol-1)
6 6.5
6.2
7.4
3.7
3
4.0 3.4
2.7 3.1 2.9
1.5
0
P8 P13 P16 P20 G1S G1V
Fig. 28.6. Thermodynamics of partition- bilayers. P8 through P20 are 8- to 20-residue
folding of the helical fusion peptide from peptides of the same sequence, and G1S and
influenza hemagglutinin. The lighter bars G1V are single amino acid mutations in
represent contributions from partitioning and position 1 of P20 that cause hemi-fusion or are
the darker bars represent contributions from defective in causing fusion, respectively. The
the coil ! a-helix transition to DG; DH, and numbers in the bars refer to the respective
TDS upon peptide insertion into mixed energies in kcal mol1 . Adapted from Ref. [16].
phosphatidylcholine/phosphatidylglycerol
helical hairpin [102]. Similar to the colicins, diphtheria toxin switches between two
very different topologies in membranes and their relative populations depend on
the applied membrane potential [103]. Again, the translocation mechanism is
only poorly understood at the present time. However, despite many still unex-
plored details, the bacterial toxins appear to be an interesting class of facultative
membrane proteins, from which we expect to learn a great deal about the mecha-
nisms and energetics of refolding helical proteins in lipid bilayers.
28.8 Mechanisms of b-Barrel Membrane Protein Folding 1015
28.8
Mechanisms of b-Barrel Membrane Protein Folding
nificantly as the bilayer thickness decreased. When the bilayers were sufficiently
thin, folding of OmpA into large unilamellar vesicles was observed, whereas in
average or thick bilayers complete folding and insertion occurs only in small unila-
mellar vesicles. Small vesicles are more strained and therefore exhibit more defects
than large vesicles, which permits the quantitative refolding of OmpA in small, but
not in large unilamellar vesicles if the hydrocarbon thickness of the bilayer is more
than 20 Å (i.e., that of diC12 -phosphatidylcholine). Interestingly, the hydrophobic
thickness of outer bacterial membranes is thought to be thinner than that of the
inner membranes [109].
28.9
Experimental Protocols
28.9.1
SDS Gel Shift Assay for Heat-modifiable Membrane Proteins
1. Dilute OmpA in 8 M urea more than 100-fold into sonicated small unilamellar
vesicles (SUV) composed of the desired lipids to give a final OmpA concentra-
tion of 12 mM and lipid-to-protein molar ratio of 800.
2. Incubate for 3 h at 37 C for refolding.
3. Divide the refolded protein-lipid complex into 10 aliquots.
4. Add appropriate amounts of a freshly made 10 M urea and buffer (10 mM gly-
cine, pH 10, 2 mM EDTA) solutions to the aliquots to reach a 2.5-fold dilution
of OmpA and urea concentrations in the range of interest.
5. Incubate reactions overnight at 37 C.
6. Terminate reactions by mixing the samples with an equal volume of 0.125 M
Tris buffer, pH 6.8, 4% SDS, 20% glycerol, and 10% 2-mercaptoethanol at
room temperature.
28.9 Experimental Protocols 1017
The fit parameters are the free energy of unfolding, DG H2O , and the m-value.
(A)
35 kDa
30 kDa
35 kDa
30 kDa
(B)
Unfolded fraction
Fig. 28.7. Reversible urea-induced unfolding/ 30-kDa form represents the native state, and
refolding of OmpA in lipid bilayers composed the 35-kDa form represents denatured states.
of 92.5% phosphatidylcholine (C16:0 C18:1 PC) B) Unfolded fractions as a function of urea
and 7.5% phosphatidylglycerol (C16:0 C18:1 PG) concentration estimated from A. The data are
at pH 10 and 37 C monitored by the SDS- fitted with the two-state equilibrium folding
PAGE shift assay. A) Unfolding (upper panel) equation Eq. (1).
and refolding (lower panel) of OmpA. The
1018 28 Folding of Membrane Proteins
28.9.2
Tryptophan Fluorescence and Time-resolved Distance Determination by Tryptophan
Fluorescence Quenching
Fig. 28.8. Urea-induced unfolding of OmpA and lipid concentrations were 1.2 mM and 10
in lipid bilayers composed of 92.5% mM, respectively, and the urea concentration
phosphatidylcholine (C16:0 C18:1 PC) and 7.5% ranged from 0 to 8 M (left to right). B)
phosphatidylglycerol (C16:0 C18:1 PG) at pH 10 Unfolding transition monitored by the average
and 37 C monitored by Trp fluorescence. A) emission wavelength hli (Eq. (2)) calculated
Fluorescence emission spectra of OmpA as a from the spectra in A and fitted with the two-
function of urea concentration. The protein state equilibrium folding equation Eq. (3).
28.9 Experimental Protocols 1019
X
ðFi l i Þ
i
hli ¼ X ð2Þ
ðl i Þ
i
1
hliF þ hliU exp½mð½denaturant Cm Þ=RT
QR
hli ¼ ð3Þ
1
1þ exp½mð½denaturant Cm Þ=RT
QR
1. Mix solutions of DOPC and one each of DOPC plus 4,5-, 6,7-, 9,10-, and 11,12-
diBrPC, respectively, at molar ratios of 7:3 (DOPC:m,n-diBrPC) in chloroform.
2. Dry the mixtures on the bottom of test tubes under a stream of nitrogen and
further in a desiccator under a high vacuum overnight.
3. Disperse dried lipids in 1.0 mL buffer (10 mM glycine, pH 8.5, 1 mM EDTA,
150 mM NaCl) to a lipid concentration of 2 mg mL1 .
4. Prepare SUVs by sonication of the lipid dispersions for 50 min using the mi-
crotip of a Branson ultrasonifier at 50% duty cycle in an ice/water bath.
1020 28 Folding of Membrane Proteins
( ( ))
Fðd Q Þ S 1 d Q d Trp 2
¼ exp pffiffiffiffiffi exp ð4Þ
F0 s 2p 2 s
F0 and Fðd Q Þ are the fluorescence intensities in the absence and presence of a
particular m,n-diBrPC quencher. d Q are the known distances of the quenchers
from the bilayer center, and d Trp is the distance of the fluorophor from the cen-
ter. The dispersion s is a measure of the width of the distribution of the fluo-
rophor in the membrane, which in turn depends on the sizes and thermal
fluctuations of the fluorophors and quenchers. S is a function of the quench-
ing efficiency and quencher concentration in the membrane. The fits yield
d Trp ; s, and S.
11. Plot d Trp and s as a function of time.
28.9.3
Circular Dichroism Spectroscopy
Circular dichroism (CD) refers to the differential absorption of the right- and left-
handed circular polarized lights by a sample that exhibits molecular asymmetry.
Far-UV CD spectroscopy has been used to characterize the secondary structures of
28.9 Experimental Protocols 1021
Fig. 28.9. Measurement of the translocation the cis side and does not cross the bilayer.
rate of single trypophans of OmpA into and B) Time course of the movement of Trp143
across lipid bilayers composed of phos- across the lipid bilayer at 30 C. This Trp
phatidylcholine (diC18:1 PC) by time-resolved translocates across the bilayer with a time
fluorescence quenching (TDFQ). A) Time constant of 3.3 min at 30 C. The solid lines
course of the movement of Trp7 of OmpA are fits of the data to exponential functions.
into lipid bilayer at 2 C. This Trp stays on Adapted from Ref. [107].
ing rule of thumb values for 100% secondary structure: y222 ¼ 30 000 deg cm 2
dmol1 for a-helix and y216 ¼ 15 000 deg cm 2 dmol1 for b-sheet.
28.9.4
Fourier Transform Infrared Spectroscopy
metric and antisymmetric methylene stretches at 2850 cm1 and 2920 cm1 , re-
spectively, and ester carbonyl stretch at 1730 cm1 .
1. Fill liquid ATR cell containing germanium plate with D2 O buffer (e.g., 5 mM
HEPES/10 mM MES pH 7.3, 135 mM NaCl) and record FTIR spectra at verti-
cal and horizontal polarizations with a spectral resolution of 2 cm1 to obtain
reference spectra.
2. Deposit lipid monolayer on germanium plate by the Langmuir-Blogett tech-
nique at 32 mN m1 .
3. Add proteoliposomes (typically about 0.5 mM lipid and 20 mg mL1 protein in
same H2 O buffer) to supported monolayer in assembled ATR cell and incubate
with gentle shaking for 1 h. The proteoliposomes will fuse with the monolayer
during this time and form a complete planar bilayer on the surface of the ger-
manium plate.
4. Flush excess proteoliposomes out of measuring cell with 10 volumes of D2 O
buffer. This step also exchanges H2 O for D2 O.
5. Incubate exchanged sample for 1 h at room temperature to equilibrate labile
amide protons with deuteriums. This is done in the FTIR spectrometer and
the spectrometer compartment is purged with dry air or nitrogen at the same
time to remove water vapor.
6. Collect the sample spectra using the same experimental settings as reference
spectra at vertical and horizontal polarizations.
7. Ratio sample and reference spectra and calculate absorption spectra.
The following steps describe a recipe to extract secondary structure information from
complex (overlapped) amide I 0 bands:
8. Calculate second-derivative spectra by differentiation with 11-point Savitsky-
Golay smoothing. The minima in the second derivatives indicate possible com-
ponent peaks.
9. Using only prominent and clearly identified component peaks as starting
points, carry out spectral decomposition by least-squares fitting with Gaussian
component line shapes. Use as few components as possible to reproduce the
experimental line shape. Accept only fits that keep the Gaussian peak compo-
nents centered close to the component frequencies that were identified in the
second derivative spectra.
10. Integrate spectral components, assign to secondary structures, and calculate
area fractions to indicate respective secondary structure fractions.
The following steps describe how to obtain orientations of secondary structure ele-
ments of proteins and orientations (order parameters) of lipid chemical groups:
11. Calculate ATR dichroic ratio for the amide I 0 bands and symmetric or antisym-
metric methylene stretching band intensities according to
1024 28 Folding of Membrane Proteins
ð
A== ðnÞ dn
ATR
R amideI ¼ð ð5Þ
A? ðnÞ dn
and
where y is the angle between the axis of rotational symmetry of an axially sym-
metric molecule (long axis of the lipid, a-helix, b-barrel, etc.) and the normal to
the germanium plate and the angular brackets denote a time and space aver-
age over all angles in a given sample.
13. Calculate the order parameter for an amide I 0 band component as:
Ex2 R amideI
ATR
Ey2 þ Ez2
SamideI ¼ ATR E 2 2E 2
ð8Þ
Ex2 R amideI y z
Ex2 R amideI
ATR
Ey2 þ Ez2
SH ¼ 2:46 ATR E 2 2E 2
ð9Þ
Ex2 R amideI y z
SH is the order parameter that describes the helix orientation relative to the
germanium plate and membrane normal. For b-sheets, there is no helical sym-
metry and Eq. (8) describes the order parameter of the transition dipole mo-
ment of the amide I 0 band, which roughly coincides with the order parameter
of the amide carbonyl bond orientations.
14. Lipid order parameters are obtained from the experimental dichroic ratios of
the symmetric and/or antisymmetric methylene stretch vibrations:
Ex2 RATR 2 2
L Ey þ Ez
SL ¼ 2 ð10Þ
Ex2 RATR 2 2
L Ey 2Ez
Lipid order parameters should always be recorded along with those of em-
bedded proteins because they provide essential controls of the bilayer quality.
References 1025
Results on protein orientations are only meaningful if lipid orientations (to the
germanium plate) are satisfactory.
15. Use the following form of Beer-Lambert’s law modified for polarized ATR
spectroscopy to calculate the lipid/protein ratio in supported bilayers [116]:
ð 2980
A? ðn L Þ dn
1 SamideI 2800
L=P ¼ 0:208ðn res 1Þ ð 1690 ð11Þ
1 þ SL =2
A? ðnamideI Þ dn
1600
where n res is the number of residues in the polypeptide and n L and namideI are
the wavenumbers of the lipid methylene stretch and the protein amide I 0 vibra-
tions, respectively.
Acknowledgments
We thank members of the Tamm laboratory, past and present, for their contribu-
tions and many discussions. The work from our laboratory was supported by
grants from the National Institutes of Health.
References
29
Protein Folding Catalysis by Pro-domains
Philip N. Bryan
29.1
Introduction
This review will focus on what general inferences about protein folding can be
made from pro-domain catalyzed folding. General questions include whether na-
tive protein conformation is always thermodynamically determined, what causes
high kinetic barriers between unfolded and native states, and what do pro-domains
do to reduce these kinetic barriers. The proteases that have provided the most
mechanistic information on catalyzed folding are subtilisin (SBT) and a-lytic pro-
tease (ALP). Both are serine protease secreted from bacteria but are not evolutio-
narily related [17, 18]. Each has eukaryotic homologs. ALP is a structural homolog
of chymotrypsin and trypsin and SBT is homologous in structure to prohormone
convertases [19–21].
29.2
Bimolecular Folding Mechanisms
k1 k2 koff
P þ U () P-I () P-N () N þ P
k1 k2 kon
29.3
Structures of Reactants and Products
29.3.1
Structure of Free SBT
Subtilisin BPN 0 is the canonical member of the subtilisin family of proteases [22].
It is a heart-shaped protein consisting of a seven-stranded parallel b-sheet packed
between two clusters of a-helices (Figure 29.1). Six a-helices are packed against
one face of the central b-sheet and two a-helices are packed against the other [23–
1034 29 Protein Folding Catalysis by Pro-domains
25]. The catalytic triad is distributed as follows: The active-site Ser221 is on the N-
terminal end of a long, central helix (helix F); Asp32 is on an interior strand of the
b-sheet (b1); His64 is in the N-terminal end of the adjacent helix (helix C). Subtili-
sin contains a number of notable structural features which are highly conserved
across the family. A cis-peptide bond occurs between Tyr167 and Pro168 and cre-
ates part of the substrate binding pocket. A rare left-handed cross-over occurs be-
tween strands b2 and b3. A loop of nine amino acids (75–83) interrupts the last
turn of helix C [25]. The loop is part of the left-handed cross-over and forms the
central part of a high affinity calcium-binding site (site A). Four carbonyl oxygen
ligands to the calcium are provided by the loop (Figure 29.2). The other calcium
ligands are a bidentate carboxylate (Asp41) which occurs on an o-loop (36–45)
and the side chain Gln2. Three hydrogen bonds link the N-terminal segment (1–
4) to residues 78–82 in parallel-b arrangement. The 75–83 loop also has extensive
interactions with a b-hairpin (202–219) so that the calcium ion is buried within the
protein structure.
A second ion-binding site (site B) is located 32 Å from site A in a shallow crevice
between two segments of polypeptide chain near the surface of the molecule. Evi-
29.3 Structures of Reactants and Products 1035
site A
L75
D41
Q2 N77
I79
dence that site B binds calcium comes from determining the occupancy of the site
in a series of X-ray structures from crystals grown in 50 mM NaCl with calcium
concentrations ranging from 1 to 40 mM [26]. In the absence of excess calcium,
this locus was found to bind a sodium ion. The binding of these two ions appears
to be mutually exclusive so that as the calcium concentration increases, the sodium
ion is displaced, and a water molecule appears in its place directly coordinated to
the bound calcium [26]. When calcium is bound, the coordination geometry of this
site closely resembles a distorted pentagonal-bipyramid. Three of the ligands are
derived from the protein and include the carbonyl oxygen atom of Glu195 and the
two side-chain carboxylate oxygens of Asp197. Four water molecules complete the
coordination sphere. When sodium is bound, its position is removed by 2.7 Å from
that of calcium and the ligation pattern changes. The sodium ligands derived from
the protein include a side-chain carboxylate oxygen of Asp197 and the carbonyl ox-
ygen atoms of Gly169, Tyr171, Val174, and Glu195. Two water molecules complete
the coordination sphere.
It is not understood how any of these structural features affect folding rate, ex-
cept for the ion-binding sites. The architecture of site A is the major impediment
to folding. Deleting amino acids 75–83 accelerates folding rate by at least 10 7 -fold
[27–29]. The X-ray structure of the deletion mutant has shown that except for the
region of the deleted calcium-binding loop and N-terminal amino acids 1–4, the
structure of the mutant and wild-type protein are remarkably similar (Figure
29.3). The structures of SBT with and without the deletion superimpose with an
rms difference between 261 Ca positions of 0.17 Å with helix C exhibiting normal
helical geometry over its entire length. The deletion also abolishes the calcium
binding at site A. As will be discussed below, the formation of calcium site A ap-
pears to have a high activation barrier which may contribute to the inability of wild-
type SBT to fold.
1036 29 Protein Folding Catalysis by Pro-domains
85
64
Site A
Fig. 29.3. Calcium-binding loop of subtilisin. Region of
deletion (75–83) is shown by the dashed line. From 1SUC in
the Protein Data Bank [29].
29.3.2
Structure of SBT/Pro-domain Complex
The structure of the native SBT is almost identical in free and complexed states. In
the complex, the pro-domain is a single compact domain with an antiparallel four-
stranded b-sheet and two three turn a-helices (Figure 29.1) [30, 31]. The b-sheet of
the pro-domain packs tightly against the two parallel surface a-helices Asp and Glu
(residues 104–116 and 133–144). The pro-domain residues Glu69 and Asp71 form
helix caps for the N-termini of the two SBT helices. In another charge-dipole inter-
action, the carboxylate of Glu112 of SBT accepts H-bonds from the peptide nitro-
gens of pro residues 42, 43 and 44. The C-terminal residues 72–77 extend out
from the central part of the pro-domain and bind in a substrate-like manner along
SBT’s active-site cleft. Residues Tyr77, Ala76, His75, and Ala74 of the pro-domain
occupy subsites S1–S4 of SBT, respectively. If Tyr77 occupies the same position in
unprocessed pro-SBT then residues 1–10 of mature SBT would be displaced from
their native positions. This hypothetical conformer can be modeled by minor tor-
sional reorganizations involving residues 1–10 with Ala1 and Gln2 of mature SBT
occupying the S1 0 and S2 0 substrate sites. A major consequence for this ‘‘active
site’’ conformer, is that the calcium site A cannot fully form until after processing
because one of the calcium ligands (Gln2) cannot simultaneously bind in the S2 0
subsite and bind to calcium in the A-site.
29.3 Structures of Reactants and Products 1037
29.3.3
Structure of Free ALP
The overall fold of ALP is of the chymotrypsin type [17]. The fold is typified by the
association of two similar domains each consisting of a six-stranded antiparallel b-
barrel [32]. The active-site triad occurs in the interface between the domains with
His57 and Asp102 in domain 1 and Ser221 in domain 2 (Figure 29.4). Numbering
of ALP residues will be given according to the homologous position in chymotry-
sin. Proteases in this family typically contain multiple intradomain disulfide bonds
and ALP contains three. One of these (137–159) is not found in the mammalian
enzymes. The structural basis for the inability of ALP to fold independently is not
apparent from its basic fold. The activation pathways of trypsin and chymotrypsin
do not involve pro-domains but rather small N-terminal activation peptides. In fact
active trypsin can be produced in E. coli from a gene lacking the hexapeptide acti-
vation region [33–35]. Hence the basic fold is not intrinsically problematic
for folding. Unique structural features found in ALP and bacterial relatives with
pro-domains include a high glycine content (16%) and the extended b-hairpin
(167–179). Both features have been suggested as possible impediments to folding
[36].
An additional feature which should be noted is a buried salt bridge between
Arg138 and Asp194 (Figure 29.4). Asp194 is a key residue in zymogen activation
of trypsinogen and chymotrypsinogen. The proteolytic event which creates the
mature N-terminus (e.g., Ile16 in chymotrypsin) causes amino acids 189–194 to
rearrange into the active conformation [1]. The rearrangement results from a salt
bridge formed between the new N-terminus (Ile16) and Asp194. Thus a buried
salt bridge occurs in both ALP and mammalian relatives but the burial of the salt
bridge in trypsin and chymotrysin follows folding of the zymogen.
29.3.4
Structure of the ALP/Pro-domain Complex
The 166-amino-acid ALP pro-region wraps around the C-terminal half of mature
ALP [37]. The contact interface is more than 4000 Å 2 , almost four-times that of
SBT and its pro-domain (Figure 29.4). Most of the contacts to ALP are made with
the C-terminal half of the pro-domain (amino acids 65–166) [38], although both N-
and C-terminal portions of the pro-domain are required for folding of ALP [39].
Amino acids 160–166 of the pro-domain are inserted like a substrate into the active
site, reminiscent of the SBT complex. The edges of a three-stranded b-sheet in the
C-terminal part of pro and a b-hairpin (167–179) of ALP abut and form a five-
stranded b-sheet in the complex.
As with SBT, the C-terminus of the pro-domain binds tightly in the active site of
ALP in the bimolecular complex [37] (Figure 29.4). Thus in the unprocessed mole-
cule, the N-terminus of the active-site conformer would be displaced by 24 Å from
1038 29 Protein Folding Catalysis by Pro-domains
its position in native ALP. Unlike SBT, ALP does not have the complications of
metal binding, but it is possible that pro-domain binding organizes Asp194 so
that it can form the buried salt bridge with Arg138.
There are some general features of the C-terminal half of the ALP pro-domain
that are topologically similar to the SBT pro-domain [38]. Specifically, b-strands 2,
3, 4 and the a-helices connecting strands 1 to 3 and 2 to 4 in the SBT pro-domain
have structural counterparts in the ALP pro-domain. Apart from the general simi-
larity of the substrate-binding cleft interactions outlined above, however, there is
little similarity in the interactions of pro-SBT and pro-ALP at their binding inter-
faces with the mature proteases.
ALP pro-domain has no similarity to the small activation peptides of its eukary-
otic homologs trypsin and chymotrypsin [1]. The SBT pro-domain on the other
hand has high structural homology to the pro-hormone convertase pro-domain, al-
though little sequence similarity [40, 41]. Surprisingly, the SBT pro-domain also
has the same basic fold as the pro-domains of eukaryotic carboxypeptidases [42,
43]. This is in spite of the lack of any structural similarity between the mature re-
gions of SBT and the carboxypeptidases.
29.4 Stability of the Mature Protease 1039
29.4
Stability of the Mature Protease
29.4.1
Stability of ALP
The most basic folding experiment is determining the free energy of the folding
reaction. This is difficult with ALP and SBT because of the problem of establishing
equilibrium under conditions in which the ratio of folded and unfolded states can
be accurately measured. Denatured ALP diluted into native conditions rapidly col-
lapses into a ensemble of conformations with an average hydrodynamic radius
about midway between the native and guanidium chloride (GdmCl)-denatured
forms and with almost the same amount of regular secondary structure as native
ALP [44]. This collapsed state is stable for weeks at neutral pH without detectable
conversion to the native state [44]. The free energy of a two-state reaction can
sometimes be determined, however, by (1) measuring a folding rate at a standard
state; (2) measuring unfolding rates as a function of denaturant; or (3) extrapolat-
ing the unfolding rate to the standard state. From the ratio of the rates of folding
and unfolding at the standard state, the free energy of folding is determined. This
was the approach taken with ALP [45]. To minimize the influence of autodigestion
during the folding reaction, the measurements were made at pH 5.0. As the pH is
decreased below the pK a of the active-site histidine, the proteolytic activity of serine
proteases decreases. A pH around 5 is commonly used as a compromise between
low activity and preserving some conformational stability [46]. The refolding rate of
ALP was determined by incubating denatured protein in native conditions and re-
moving aliquots at intervals. The amount of active ALP was determined by measur-
ing peptidase activity using a synthetic substrate. Over the course of days small but
linear increases in activity were detected. Quantitation in the experiment is tricky
because the accumulation of active ALP causes digestion of the unfolded protein,
but the initial rate of the folding reaction is 6 1012 s1 at 25 C. This was com-
pared with the unfolding rate which was determined as a function of [GdmCl].
The rate of unfolding without GdmCl was estimated by linear extrapolation to be
1 107 s1 at 25 C. Since DGunfolding ¼ RT lnðk unfolding =k folding Þ in a two-state
system, the simplest interpretation of the unfolding and refolding rates would
mean that DGunfolding is about 6 kcal mol1 at 25 C.
Even given the uncertainties in determining the folding and unfolding rates, it is
convincing that the unfolded state of ALP is more stable than the folded state at
pH 5.0. The effect of pH 5.0 on the protonation of Asp194 also may bear consider-
ation. If the formation of the buried salt bridge between Asp194 and Arg138 is in-
volved in the transition state for folding, then partial protonation of Asp194 at pH
5.0 could destabilize its interaction with Arg138 and slow folding significantly.
The stability measurements on ALP were made with the three disulfide bonds
intact in all forms, U, I and N. An unstrained disulfide cross-link should stabilize
a protein by decreasing the entropic cost of folding. The loss of conformational en-
1040 29 Protein Folding Catalysis by Pro-domains
tropy in a polymer due to a cross-link has been estimated by calculating the proba-
bility that the ends of a polymer will simultaneously occur in the same volume ele-
ment. According to statistical mechanics, the three disulfide bonds in ALP should
destabilize the unfolded state by @10 kcal mol1 at 25 C [47, 48]. Since the life-
span of a protease depends upon the kinetics rather than the thermodynamics of
unfolding, however, disulfide bonds are a surprising feature. The effects of cross-
links on the stability of the unfolded state would not generally be manifested in
the activation energy of the unfolding reaction, because the transition states for un-
folding reactions usually are compact, with a slightly larger heat capacity than the
native state. In folding experiments on ALP, the disulfide bonds are present in
both the I and N states of the protein and thus do not contribute to the thermody-
namic measurements [45, 49].
29.4.2
Stability of Subtilisin
Ion binding has a dominant role in stabilizing SBT. The active-site mutant Ala221
of SBT can be unfolded at 25 C by incubation in 30 mM Tris-HCl, pH 7.5, 0.1 mM
EDTA. The rate of unfolding under these conditions is @0.5 h [1]. (The S221A mu-
tation reduces proteolytic activity of SBT by about 10 6 -fold [50] and eliminates the
problem of autoproteolysis during the unfolding process.) Addition of micromolar
concentrations of free calcium or millimolar concentrations of sodium will reduce
the unfolding rate to >days1 . The role of calcium site A in the kinetic stability of
SBT is well-documented [51]. SBT with the calcium site A occupied (e.g., 100 mM
CaCl2 , 100 mM NaCl) unfolds 1000-times slower than the apo-form of SBT (e.g., 1
mM EDTA, 100 mM NaCl) [52] and the activation energy for calcium dissociation
from the folded state is 23 kcal mol1 . The binding parameters of calcium at
site A have been determined previously by titration calorimetry: DHbinding ¼ 11
kcal mol1 and DG binding ¼ 9:3 kcal mol1 at 25 C [28]. Thus the binding of
calcium is primarily enthalpically driven with only a small net loss in entropy
(DSbinding ¼ 6:7 cal deg1 mol1 ). This is surprising since transfer of calcium
into water results in a loss of entropy of 60 cal deg1 mol1 . Therefore the free-
ing of water upon calcium binding to the protein will make a major contribution
to the overall DS of the process. The gain in solvent entropy upon binding must
be compensated for by a loss in entropy of the protein. Loop amino acids 75–83,
N-terminal residues 1–5, the o-loop (36–45) and the b-hairpin (202–219) have
increased mobility when calcium is absent from the A-site [27].
Ion site B also influences stability but to a lesser extent. The K d of site B for so-
dium is @1 mM and 15 mM for calcium [52]. By binding at specific sites in the ter-
tiary structure of subtilisin, cations contribute their binding energy to the stability
of the native state and should have predictable effects on thermodynamic stability.
If cation binding is exclusively to the native state then the contribution of cation
binding to the free energy of unfolding would be:
3.1 10 4 800
3.0 10 4 600
Relative fluorescence
Residuals
2.9 10 4 400
2.8 10 4 200
2.7 10 4 0
2.6 10 4 -200
0 400 800 1200 1600
Time (s)
Fig. 29.5. Kinetics of refolding Ala221 D75–83 is the average of three experiments. The
SBT. Refolding in 0.2 M KCl, 30 mM Tris-HCl, residuals after subtracting the data from a
pH 7.5, and 25 C is followed by the increase single exponential fit (k ¼ 0:0027 s1 ) are
in tryptophan fluorescence. The curve plotted plotted on a five-times expanded scale.
a ligand which binds tightly and preferentially to the native state. For example, the
streptomyces SBT inhibitor protein (SSI) has no detectable influence on the fold-
ing rate of SBT even though its binds to SBT more tightly than the pro-domain
does [56]. Thus pro-domains must be capable of stablizing some structure in the
folding pathway whose formation is limiting in its absence.
29.5
Analysis of Pro-domain Binding to the Folded Protease
kon
PþN ! P-N
koff
where K f is the equilibrium constant for folding the pro-domain and KP is the as-
sociation constant of folded pro-domain for SBT. The observed binding constant is
expected to be: KðPþPuÞ ¼ ½K f =ð1 þ K f ÞKP . The data show that as the fraction of
folded pro-domain approaches one, the observed association constant approaches
its maximum, @10 10 M1 [63].
The kinetics of the pro-domain binding to SBT also have been determined by
measuring the rate of fluorescence change that occurs upon binding. If the reac-
tion is carried out with a 10-fold or greater excess of P, then one observes a pseudo
first-order kinetic process with a rate constant equal to kon ½P þ koff . The kon for
binding the pro-domain to folded SBT is @10 6 M1 s1 [69]. This value is fairly
insensitive to mutations that change the stability of the pro-domain.
29.6
Analysis of Folding Steps
k1 k2
P þ U T P-I T P-N
k1 k2
1044 29 Protein Folding Catalysis by Pro-domains
1.5
5 µM proR9
Relative fluorescence
1.4
1.3
1.2
1.1 5 µM proWT
1
0 5 10 15 20 25 30 35 40
Time (s)
Fig. 29.6. Folding rate of Ala221, D75–83 SBT reaction was followed by the increase in
in the presence of the wild-type pro-domain tryptophan fluorescence which occurs upon
and proR9. Pro-domain (5 mM) and 1 mM folding of SBT into the pro-domain SBT
denatured SBT mutant were mixed in 5 mM complex.
KPO4 , 30 mM Tris pH 7.5 at 25 C. The
>10-fold for each lost amino acid. Effects of K1 are not detectable until three or
more amino acids are deleted. Recall that these amino acids bind to ALP in a sub-
strate-like manner (Figure 29.4). It is tempting to speculate that the reduction of k2
is due to the diminished capacity of the deletion mutants to organize the 189–194
loop.
The catalyzed folding reaction of SBT has also been analyzed using an A221 mu-
tant which lacks the calcium binding loop of site A and which folds more rapidly
than wild-type SBT. Using wild-type pro-domain as catalyst, the rate of SBT folding
increases linearly as a function of [P] (½P a 100 mM) [56]. The absence of curva-
ture in the plot implies that the formation of the initial complex, P-I, is the limit-
ing step in the reaction up to [P] ¼ 100 mM. These results indicated that initial as-
sociation of wild-type pro-domain with D75–83 SBT is weak. As the pro-domain
is stabilized, the folding reaction becomes faster and distinctly biphasic (Figure
29.6). The fast phase of the reaction corresponds to the second-order binding step
(1 10 5 M1 s1 ) and the slow phase to the isomerization step (0.15 s1 ). With the
folded pro-domain proR9 the maximum isomerization rates for SBT and D75–83
SBT are similar (@0.1 s1 ).
It was possible to demonstrate that proline isomerization is rate limiting in fold-
ing D75–83 SBT. There are 13 proline residues in SBT. All but one (Pro168) exist as
trans isomers in the native structure [25]. The peptide bond between proline and
its preceding amino acid (Xaa-Pro bonds) exist as a mixture of cis and trans isomers
in solution unless structural constraints, such as in folded proteins, stabilize one of
the two isomers. In the absence of ordered structure, the trans isomer is favored
slightly over the cis isomer. The isomerization trans $ cis is an intrinsically slow
reaction with rates at 0.1–0.01 s1 at 25 C. Upon denaturation of SBT, the struc-
tural constraints are removed and the trans and cis isomers of all 13 prolyl peptide
1046 29 Protein Folding Catalysis by Pro-domains
where U is denatured SBT with all prolyl peptide bond isomers the same as in
the native structure. According to the mechanism proposed above, the pseudo
first-order rate constant would reach its maximum when [P] is high enough that
the value k 1 ½P > ðk2 þ k2 Þ. That point has not been reached at [P] ¼ 20 mM.
Thus, in the presence of an independently folded pro-domain mutant, the rate of
SBT folding into a complex becomes typical of in vitro folding rates for many small
globular proteins. That is, independent of proline isomerization, folding is largely
complete within a few seconds [71].
In comparison, the rate of ALP folding plateaus at 0.03 s1 , which is in the range
of proline isomerization. ALP has three trans and one cis proline. It is not known
how proline isomerization is involved in the kinetics of the P-I to P-N transition,
however.
29.7
Why are Pro-domains Required for Folding?
Fig. 29.7. Schematic representation of the exchanging protons reside throughout the
subtilisin topology. The positions of the 49 central seven-stranded parallel b-sheet and in
slowest exchanging amide protons in SBT all the five major a-helical segments. Adapted
are shown by black dots. The slowest 49 from Siezen et al. [85].
strands, b8 and b9 (Figure 29.7). The large cooperative folding core may create a
folding problem, because most of the tertiary structure must be acquired before a
free energy well is encountered in conformational space. The enormous loss of
conformational entropy before that energy well occurs would result in a large tran-
sition state barrier to folding [66, 75]. In this model, the transition state of uncata-
lyzed folding is of high energy because of its low conformational entropy. The pro-
domain decreases the entropic barrier by pushing the transition state back toward
a less folded form of the proteinase. Thus much less conformational entropy is lost
in the transition state. Once over the barrier, folding of P-I to P-N is rapid.
29.8
What is the Origin of High Cooperativity?
Formation of the calcium A-site may be responsible for the cooperative folding re-
action of SBT. SBT is unstable in the absence of calcium and at low ionic strength.
1048 29 Protein Folding Catalysis by Pro-domains
Addition of calcium increases stability by 8 kcal mol1 per mM but does accelerate
the folding reaction to an observable level. The reason that the calcium does not
affect folding rate may be that site A is not fully formed until considerable tertiary
structure has accrued. The structure elements involved in forming the A-site are
the N-terminus, the o-loop, the 75–83 loop and the 202–219 b-hairpin. Thus cal-
cium binding may not contribute to the stability of intermediately folded states
but only to a substantially folded molecule. This situation creates a highly coopera-
tive folding reaction. In addition, the burial of the charged calcium and the require-
ment for its desolvation prior to burial within the protein creates an additional
kinetic barrier to folding. This suggestion is supported by the observation that the
D75–83 mutant SBT folds b 10 7 -times faster than SBT with the calcium A-site
intact.
The origins of high cooperativity in ALP may also be related to a buried ionic
interaction: the formation of the buried salt bridge between Arg138 and Asp194.
Formation of a structurally homologous ionic interaction is involved in zymogen
activation in trypsin and chymotrypsin.
Buried ionic interactions have been implicated previously in raising the activa-
tion barrier to folding because of unfavorable protein–solvent rearrangements.
For example, replacing a buried salt bridge in ARC repressor dimer with a hydro-
phobic cluster has little effect on thermodynamic stability but increases the fold-
ing rate of the dimer by up to 1000-fold [76, 77]. Desolvating these charged res-
idues in ALP may raise kinetic but not necessarily the thermodynamic barriers to
unfolding.
29.9
How Does the Pro-domain Accelerate Folding?
The simplest model of catalyzed folding is one in which the pro-domain acceler-
ates folding by stabilizing a substructure on the folding pathway which does not
form frequently in the absence of pro-domain binding. Once the substructure
forms, it acts as a folding nucleus with subsequent folding propagating into other
regions. The nature of this substructure can be guessed at based on the structures
of the folded complexes. In the bimolecular SBT complex, the pro-domain binds to
an a–b–a structure (amino acids 100–144) which includes the parallel surface a-
helices D and E. It also supplies caps to the N-termini of the two helices (Figure
29.1). In the intermediate P-I, the pro-domain may stabilize the a–b–a structure
relative to other unfolded states. Subsequent folding may propagate from this fold-
ing nucleus into N- and C-terminal regions of SBT. In the absence of the pro-
domain, the a–b–a structure may not have sufficient independent stability to initi-
ate folding very frequently. It should be noted that unimolecular folding of proSBT
may circumuvent the problem of calcium A-site formation. The covalent attach-
ment of the pro-domain to SBT prevents the final folding of the calcium A-site re-
gion, until after the active site is sufficiently formed to cleave the pro-domain from
the mature [30, 78].
In ALP, the pro-domain may stabilize the formation of a long b-hairpin (118–
29.11 Experimental Protocols for Studying SBT Folding 1049
130) in the C-terminal half of ALP [36, 37] but may also be involved in organizing
the 189–194 loop through the interaction of the C-terminus of the pro-domain
with the active-site Ser195.
29.10
Are High Kinetic Stability and Facile Folding Mutually Exclusive?
29.11
Experimental Protocols for Studying SBT Folding
29.11.1
Fermentation and Purification of Active Subtilisin
Subtilisin BPN 0 and active mutants were expressed in B. subtilis in a 1.5-L New
Brunswick fermenter at a level of @500 mg L1 . After fermentation 16 g of Tris–
1050 29 Protein Folding Catalysis by Pro-domains
base and 22 g of CaCl2 (dihydrate) were added to @1400 mL of broth. Cells and
precipitate were pelleted into 250-mL bottles by centrifugation at 12 000 g for 30
min, 4 C. Acetone (70% final volume) was added to the supernatant. The 70% ace-
tone mixture was then centrifuged in 500-mL bottles at 12 000 g for 30 min, 4 C.
The pellet was resuspended in @150 mL 20 mM HEPES, pH 7.0, 1 mM CaCl2 .
Resuspended material was centrifuged at 12 000 g for 10 min, 4 C to remove in-
soluble material. Using a vacuum funnel, the sample was passed over 150 g DE52,
equilibrated in 20 mM HEPES, pH 7.0. The DE52 was washed twice with 150 mL
20 mM HEPES, 1 mM CaCl2 buffer and the all washes pooled. Solid NH4 SO4 was
added to the sample to a final concentration of 1.8 M. Final purification was carried
out using a 2 30 cm Poros HP 20 column on a Biocad Sprint. The sample was
loaded and washed in 1.8 M NH4 SO4 , 20 mM HEPES, pH 7.0 and then eluted
with a linear gradient (1.8–0 M NH4 SO4 in 20 mM HEPES, pH 7.0). Subtilisin
BPN 0 eluted at a conductivity of 108 mS. Assays of peptidase activity were per-
formed by monitoring the hydrolysis of sAAPF-pNA [82]. The [subtilisin] was de-
termined using 1 mg mL1 ¼ 1:17 at 280 nm.
29.11.2
Fermentation and Purification of Facile-folding Ala221 Subtilisin from E. coli
Mutagenesis and protein expression of inactive D75–83 SBT mutants were per-
formed using a vector called pJ1. Vector pJ1 is identical to pG5 [83] except that
the ClaI site of pG5 has been removed. The SBT gene was inserted between NdeI
and HindIII sites in the polycloning site of pJ1 and transformed into E. coli produc-
tion strain BL21(DE3).
The transformed production strain BL21(DE3) was plated out on selection plates.
Ten milliliters of 1LB broth (10 g of tryptone, 5 g of yeast extract, 10 g of NaCl L1
media) supplemented with 100 mg mL1 ampicillin was inoculated with 4–6 ampi-
cillin-resistant colonies in a 250-mL baffled flask. The culture was grown at 37 C,
300 rpm, until mid-log phase. This culture was used to inoculate 1.5 L of L Broth
buffered with 2.3 g of KH2 PO4 , 12.5 g of K2 HPO4 L1 media supplemented with
100 mg mL1 ampicillin. The culture was grown at 37 C in a BioFlo Model C30
fermenter (New Brunswick Scientific Co., Edison, NJ, USA) until an A600 1–1.2
was attained, upon which 1 mM IPTG was added to induce the production of T7
RNA polymerase that directs synthesis of target DNA message. Three hours after
induction, the cells were harversted by centrifugation for 30 min at 4 C, 10 000
rpm in a J2–21 centrifuge (Beckman Instruments, Fullerton, CA, USA) with a JA-
14 rotor.
E. coli paste from a 1.5-L fermentation (@5 g) was resuspended in 50 mL lysis
buffer (50 mM Tris-HCl pH 8.0/100 mM NaCl/1 mM EDTA) and PMSF was added
to a final concentration of 1 mM and DNase I (in 40 mM Tris-HCl, 1 M MgCl2 ) to
20 mg mL1 . The cells were lysed by two passes through a French Press and addi-
tional DNase I was added to a final concentration of 25 mg mL1 . Inclusion bodies
containing mutant SBT were recovered by centrifugation at 10 000 g, 4 C with a
JA-17 rotor for 20 min in a Beckman J2-21 centrifuge. The resulting pellet was
29.11 Experimental Protocols for Studying SBT Folding 1051
29.11.3
Mutagenesis and Protein Expression of Pro-domain Mutants
The gene fragment encoding the 77-amino-acid pro-domain was cloned into
vector pJ1. When superinfected with helper phage M13K07, the M13 origin
enables the DNA replication in single-stranded form for mutagenesis and se-
quencing. Mutagenesis of the cloned pro-domain gene was performed according to
the oligonucleotide-directed in vitro mutagenesis system, version 2, Sculptor sys-
tem (Amersham International plc., Bucks, UK) or Kunkel dUTP incorporation
method [84, 85]. Escherichia coli TG1 [K12, D(lac-pro), supE, thi, hsdD5/F 0 traD36,
proA þ Bþ , lacI q , lacZDM15] was obtained from the oligonucleotide-directed in
vitro mutagenesis system version 2 (Amersham International plc.). Competent E.
coli TG1 cells were transformed with mutagenesis product and single-stranded
phagemid DNA was then prepared from TG1 cells superinfected with helper
phage, while double-stranded phagemid DNA was isolated using Wizard Plus
Minipreps DNA Purification System. Phagemid DNA was sequenced as described
in Sequenase Protocol (United States Biochemical, Cleveland, OH, USA), using ei-
ther single-stranded or double-stranded phagemid DNA. Upon confirmation of the
mutations, the mutant double-stranded phagemid DNA was used to transform the
E. coli production strain BL21(DE3)[F , hsdS, gal] [86], BL21(DE3) cells contain
1052 29 Protein Folding Catalysis by Pro-domains
29.11.4
Purification of Pro-domain
Fermentation of of pro-domain was carried out in E. coli as described for SBT. Pro-
domain was purified using a freeze-thaw method to release soluble protein from
the cells [84]. E. coli paste from a 1.5-L fermentation (@5 g) was suspended in 10
mL of cold phosphate-buffered saline (PBS) and 20 mL water. PMSF was added to a
final concentration of 1 mM, DNase I to 10 mg mL1 , and the oxidized form of glu-
tathione (100 mg in 1 mL total volume of 100 mM KPi, pH 7.0) to 1.6 mg mL1 .
The resuspension was frozen by submerging in a dry-ice/ethanol bath until it was
completely frozen, and then thawed in a room temperature water bath. This cycle
was repeated two additional times, with addition of 100 mL glutathione stock solu-
tion before each cycle. The final mixture was centrifuged at 4 C for 10 min at
10 000 rpm in a J2-21 centrifuge (Beckman Instruments) with a JA-14 rotor. The
supernatant containing the soluble protein was diluted four-fold with 20 mM
HEPES, pH 7.0, and further purified as described [56]. The concentration of
proR9 was determined by UV absorbance using 1 mg mL1 ¼ A275 of 0.67.
29.11.5
Kinetics of Pro-domain Binding to Native SBT
The rate of binding of the pro-domain to folded SBT was monitored by fluores-
cence (excitation l ¼ 300 nm, emission l, 340 nm cutoff filter) using a KinTek
Stopped-Flow Model SF2001. The reaction was followed by the 1.2-fold increase
in the tryptophan fluorescence of SBT upon its binding of the pro-domain [56].
Pro-domain solutions of 10–160 mM in 30 mM Tris-HCl, pH, 5 mM KPi, pH 7.5
were mixed with an equal volume of 2 mM subtilisin, 30 mM Tris-HCl, pH, 5 mM
KPi, pH 7.5 in a single mixing step. Typically 10–15 kinetics traces were collected
for each [P]. Final [SBT] was 1 mM and final [P] were 5–80 mM.
29.11.6
Kinetic Analysis of Pro-domain Facilitated Subtilisin Folding
pared for refolding studies. Subtilisin was denatured by diluting 20 mL of the stock
solution into 1 mL of 50 mM HCl. The samples were neutralized by mixing the
subtilisin and HCl solution into an equal volume of 60 mM Tris-base, pH 9.4, KPi
and pro-domain in the KinTek Stopped-flow (final buffer concentrations were 30
mM Tris-HCl, pH, 5 mM KPi, pH 7.5). The final concentration of subtilisin was 1
mM, while the pro-domain concentration was varied from 5 to 20 mM. Typically 10–
15 kinetics traces were collected for each [P].
References
one and two intact disulfide bonds. J The alpha-lytic protease pro-region
Biol Chem 263, 11820–5. does not require a physical linkage to
49 Derman, A. I. & Agard, D. A. (2000). activate the protease domain in vivo.
Two energetically disparate folding Nature 341, 462–4.
pathways of alpha-lytic protease share 60 Peters, R. J., Shiau, A. K., Sohl,
a single transition state. Nat Struct J. L. et al. (1998). Pro region C-
Biol 7, 394–7. terminus:protease active site
50 Carter, P. & Wells, J. A. (1988). interactions are critical in catalyzing
Dissecting the catalytic triad of a the folding of alpha-lytic protease.
serine protease. Nature 332, 564–8. Biochemistry 37, 12058–67.
51 Voordouw, G., Milo, C. & Roche, 61 Bryan, P., Wang, L., Hoskins, J. et
R. S. (1976). Role of bound calcium in al. (1995). Catalysis of a protein
thermostable, proteolytic enzymes. folding reaction: Mechanistic
Separation of intrinsic and calcium implications of the 2.0 Å structure of
ion contributions to the kinetic the subtilisin-prodomain complex.
thermal stability. Biochemistry 15, Biochemistry 34, 10310–18.
3716–24. 62 Li, Y., Hu, Z., Jordan, F. & Inouye,
52 Alexander, P. A., Ruan, B. & Bryan, M. (1995). Functional analysis of the
P. N. (2001). Cation-dependent propeptide of subtilisin E as an
stability of subtilisin. Biochemistry 40, intramolecular chaperone for protein
10634–9. folding. Refolding and inhibitory
53 Eder, J., Rheinnecker, M. & Fersht, abilities of propeptide mutants. J Biol
A. R. (1993). Folding of subtilisin Chem 270, 25127–32.
BPN 0 : Characterization of a folding 63 Ruvinov, S., Wang, L., Ruan, B. et al.
intermediate. Biochemistry 32, 18–26. (1997). Engineering the independent
54 Eder, J., Rheinnecker, M. & Fersht, folding of the subtilisin BPN 0
A. R. (1993). Folding of subtilisin prodomain: analysis of two-state
BPN 0 : role of the pro-sequence. J Mol folding vs. protein stability.
Biol 233, 293–304. Biochemistry 36, 10414–21.
55 Hayashi, T., Matsubara, M., 64 Kojima, S., Minagawa, T. & Miura,
Nohara, D., Kojima, S., Miura, K. & K. (1997). The propeptide of subtilisin
Sakai, T. (1994). Renaturation of the BPN 0 as a temporary inhibitor and
mature subtilisin BPN 0 immobilized effect of an amino acid replacement
on agarose beads. FEBS Lett 350, 109– on its inhibitory activity. FEBS Lett
12. 411, 128–32.
56 Strausberg, S., Alexander, P., 65 Kojima, S., Minagawa, T. & Miura,
Wang, L., Schwarz, F. & Bryan, P. K. (1998). Tertiary structure formation
(1993). Catalysis of a protein folding in the propeptide of subtilisin BPN 0
reaction: Thermodynamic and kinetic by successive amino acid replacements
analysis of subtilisin BPN 0 and its close relation to function.
interactions with its propeptide J Mol Biol 277, 1007–13.
fragment. Biochemistry 32, 8112–19. 66 Ruan, B., Hoskins, J. & Bryan, P. N.
57 Ohta, Y., Hojo, H., Aimoto, S., (1999). Rapid folding of calcium-free
Kobayashi, T., Zhu, X., Jordan, F. subtilisin by a stabilized pro-domain
& Inouye, M. (1991). Pro-peptide as mutant. Biochemistry 38, 8562–71.
an intramolecular chaperone: 67 Kojima, S., Yanai, H. & Miura, K.
renaturation of denatured subtilisin E (2001). Accelerated refolding of
with a synthetic pro-peptide subtilisin BPN 0 by tertiary-structure-
[corrected]. Mol Microbiol 5, 1507–10. forming mutants of its propeptide.
58 Baker, D., Silen, J. L. & Agard, D. A. J Biochem (Tokyo) 130, 471–4.
(1992). Protease pro region required 68 Ruan, B., Hoskins, J., Wang, L. &
for folding is a potent inhibitor of the Bryan, P. N. (1998). Stabilizing the
mature enzyme. Proteins 12, 339–44. subtilisin BPN 0 pro-domain by phage
59 Silen, J. L. & Agard, D. A. (1989). display selection: how restrictive is the
1058 29 Protein Folding Catalysis by Pro-domains
amino acid code for maximum protein protein stability and conformational
stability? Protein Sci 7, 2345–53. specificity? Nat Struct Biol 2, 122–8.
69 Wang, L., Ruan, B., Ruvinov, S. & 78 Yabuta, Y., Subbian, E., Takagi, H.,
Bryan, P. N. (1998). Engineering the Shinde, U. & Inouye, M. (2002).
independent folding of the subtilisin Folding pathway mediated by an
BPN 0 pro-domain: correlation of pro- intramolecular chaperone: dissecting
domain stability with the rate of conformational changes coincident
subtilisin folding. Biochemistry 37, with autoprocessing and the role of
3165–71. Ca 2þ in subtilisin maturation.
70 Ruan, B. (1998). Folding of subtilisin: J Biochem (Tokyo) 131, 31–7.
Study of independent folding and 79 Baker, D., Shiau, A. K. & Agard,
pro-domain catalyzed folding. PhD D. A. (1993). The role of pro regions
Dissertation, University of Maryland, in protein folding. Curr Opin Cell
College Park. Biol 5, 966–70.
71 Brandts, J. F., Halvorson, H. R. & 80 Strausberg, S., Alexander, P.,
Brennan, M. (1975). Consideration of Gallagher, D. T., Gilliland, G.,
the Possibility that the slow step in Barnett, B. L. & Bryan, P. (1995).
protein denaturation reactions is due Directed evolution of a subtilisin with
to cis-trans isomerism of proline calcium-independent stability. Bio/
residues. Biochemistry 14, 4953–63. technology 13, 669–73.
72 Bai, Y., Sosnick, T. R., Mayne, L. & 81 Alexander, P. A., Ruan, B.,
Englander, S. W. (1995). Protein Strausberg, S. L. & Bryan, P. N.
folding intermediates: native-state (2001). Stabilizing mutations and
hydrogen exchange. Science 269, 192– calcium-dependent stability of
7. subtilisin. Biochemistry 40, 10640–4.
73 Chamberlain, A. K., Handel, T. M. 82 DelMar, E., Largman, C., Brodrick,
& Marqusee, S. (1996). Detection of J. & Geokas, M. (1979). A sensitive
rare partially folded molecules in new substrate for chymotrypsin. Anal
equilibrium with the native Biochem 99, 316–20.
conformation of RNaseH. Nat Struct 83 Alexander, P., Fahnestock, S., Lee,
Biol 3, 782–7. T., Orban, J. & Bryan, P. (1992).
74 Chung, E. W., Nettleton, E. J., Thermodynamic analysis of the
Morgan, C. J. et al. (1997). Hydrogen folding of the streptococcal protein G
exchange properties of proteins in IgG-binding domains B1 and B2: why
native and denatured states monitored small proteins tend to have high
by mass spectrometry and NMR. denaturation temperatures.
Protein Sci 6, 1316–24. Biochemistry 31, 3597–603.
75 Jaswal, S. S., Sohl, J. L., Davis, J. H. 84 Kunkel, T. A. (1985). Rapid and
& Agard, D. A. (2002). Energetic efficient site-specific mutagenesis
landscape of alpha-lytic protease without phenotypic selection. Proc
optimizes longevity through kinetic Natl Acad Sci USA 82, 488–92.
stability. Nature 415, 343–6. 85 Kunkel, T. A., Sabatino, R. D. &
76 Waldburger, C. D., Jonsson, T. & Bambara, R. A. (1987). Exonucleolytic
Sauer, R. T. (1996). Barriers to proofreading by calf thymus DNA
protein folding: formation of buried polymerase delta. Proc Natl Acad Sci
polar interactions is a slow step in USA 84, 4865–9.
acquisition of structure. Proc Natl 86 Studier, F. W. & Moffatt, B. A.
Acad Sci USA 93, 2629–34. (1986). Use of bacteriophage T7 RNA
77 Waldburger, C. D., Schildbach, polymerase to direct selective high-
J. F. & Sauer, R. T. (1995). Are level expression of cloned genes. J Mol
buried salt bridges important for Biol 189, 113–30.
1059
30
The Thermodynamics and Kinetics
of Collagen Folding
Hans Peter Bächinger and Jürgen Engel
30.1
Introduction
30.1.1
The Collagen Family
The human collagen family of proteins now consists of 27 types. The individual
members are numbered with roman numerals. The family is subdivided in dif-
ferent classes: The fibrillar collagens (types I*, II, III, V*, XI*, XXIV, and XXVII),
basement membrane collagens (type IV*), fibril-associated collagens with inter-
rupted triple helices (FACIT collagens, types IX*, XII, XIV, XVI, XIX, XX and
XXII), short chain collagens (types VIII* and X), anchoring fibril collagen (type
VII), multiplexins (types XV and XVIII), membrane-associated collagens with in-
terrupted triple helices (MACIT collagens, types XIII, XVII, XXIII, and XXV), and
collagen type VI*. The types indicated by an asterisk are heterotrimers, consisting
of two or three different polypeptide chains. All others consist of three identical
chains or the chain composition is unknown. For type IV collagens six different
polypeptide chains are known that form at least three distinct molecules and type
V collagens contain three polypeptide chains in probably three molecules. The
common feature of all collagens is the occurrence of triple helical domains with
the repeated sequence -Gly-Xaa-Yaa- in the primary structure and the high content
of proline and hydroxyproline residues. This sequence allows for the formation of a
tertiary structure consisting of three polypeptide chains in a left-handed polypro-
line II helix. The three chains form a right-handed supercoiled triple helix, which
is stabilized by hydrogen bonds. The glycine residues in every third position are
packed tightly in the center of the triple helix. The residues in the Xaa and Yaa
positions are exposed to the solvent and are often proline and hydroxyproline,
respectively. All collagens are extracellular matrix proteins responsible for the
architecture of connective tissues, such as bone, tendon, cartilage, skin, and base-
ment membranes. Besides their structural roles, collagens interact with numerous
other molecules and are crucial for development and homeostasis of connective
tissue.
30.1.2
Biosynthesis of Collagens
The biosynthesis of procollagens is now fairly well understood at least for the most
abundant fibrillar collagens [1, 2]. The current hypothesis of fibrillar procollagen
biosynthesis involves the following steps:
1. Translocation of the emerging N-terminal end of the pro-a-chains into the rER.
2. Soon after the synthesis of the N-terminal propeptide, this domain folds and
forms intrachain disulfide bonds. HSP47, a collagen-specific heat shock protein,
may specifically bind to the N-terminal propeptide, but additionally has binding
sites along the helical portion of the molecule.
3. The synthesis continues and some of the proline residues become hydroxylated
by prolyl 4-hydroxylase (EC 1.14.11.2) and prolyl 3-hydroxylase (EC 1.14.11.7).
During the continued synthesis of the major helical sequences, some interac-
tions with molecular chaperones take place to prevent premature triple helix
formation. Some lysine residues are hydroxylated by lysyl hydroxylase (EC
1.14.11.4).
4. After completion of the synthesis of the C-terminal propeptide, this part of the
molecule folds, forms intrachain disulfide bonds, and interacts directly or indi-
rectly (with a docking molecule) with the lipid bilayer of the rER.
5. Selection and association of the correct chains occur by diffusion of the C-
terminal propeptides attached to the rER membrane.
6. A nucleus for triple helix formation is formed that aligns the chains in the right
order. This nucleus initiates triple helix formation and is between propeptides
stabilized by interchain disulfide bonds being formed, a reaction probably cata-
lyzed by protein disulfide isomerase (PDI). In mutations of the C-terminal pro-
peptides, an association with GRP78/BiP is found, indicating a potential role for
GRP78/BiP in the assembly of procollagen chains.
7. Hydroxylation of proline residues and some lysine residues continues and triple
helix formation proceeds from the C-terminal end towards the N-terminal end.
The fast propagation of the triple helix formation is followed by a slower folding
determined by cis peptide bond isomerization at proline residues. These peptide
bonds need to be isomerized into trans conformation to allow triple helix forma-
tion to continue. This step is catalyzed by peptidyl-prolyl cis-trans isomerases
(PPIases).
8. After completion of the folding of the major helix, the N-terminal propeptides
associate and form the small triple helix within this domain. Further modifica-
tions occur during the transport through the Golgi stack by cisternal matura-
tion. The long-standing problem of how a 300-nm-long procollagen molecule
is transported to the Golgi in 50-nm transport vesicles has recently been inves-
tigated [3, 4]. Procollagen traverses the Golgi stack without leaving the lumen
of the cisternae, but rather is transported by cisternal maturation. Nothing is
known about what proteins direct that process.
The rate of synthesis of procollagen chains was measured for type I and II colla-
gen, and found to be 209 residues per minute [5, 6]. For a fibrillar procollagen mol-
30.1 Introduction 1061
ecule, the synthesis time then is about 7 min. The time for secretion was 30 min
when analyzed in bone [7] and biphasic secretion kinetics with half-times of 14
and 115 min were found in fibroblasts [8].
30.1.3
The Triple Helical Domain in Collagens and Other Proteins
The length of the triple helix in fibrillar collagens is about 300 nm and consists of
330–340 Gly-Xaa-Yaa tripeptide units. These molecules can be described as semi-
rigid rods, well suited for the formation of fibrils. The fibrils have an axial 67-nm
periodicity, which results from electrostatic and hydrophobic sidechain interactions
[9]. In other collagens, stretches of tripeptide units are interspersed by interrup-
tions, making these molecules more flexible. A large variation in the length of
triple helices can be found in nature, the shortest collagen described is 14 nm
long (mini-collagen of hydra) and the longest one 2400 nm (cuticle collagen of an-
nelids) [10]. Gly-Xaa-Yaa repeats that form triple helices are also found in a num-
ber of other proteins: complement protein C1q, lung surfactant proteins A and D,
mannose-binding protein, macrophage scavenger receptors A (types I, II, and III)
and MARCO, ectodysplasin-A, scavenger receptor with C-type lectin (SRCL), the fi-
colins (L, M and H), the asymmetric form of acetylcholinesterase, adiponectin, and
hibernation proteins HP-20, 25, and 27.
30.1.4
N- and C-Propeptide, Telopeptides, Flanking Coiled-Coil Domains
All collagens are synthesized with noncollagenous domains. The fibrillar colla-
gens contain an N-terminal propeptide, an N-terminal telopeptide, the major triple
helix, a C-terminal telopeptide and a C-terminal propeptide. The propeptides pre-
vent premature fibril formation within the cell and are cleaved by specific pro-
teases within both the N- and C-terminal telopeptides. The C-terminal propeptide
is responsible for chain selection and association. The N-terminal end of the C-
terminal propeptide contains three to four heptad repeats indicative of a coiled-coil
structure that might facilitate its trimerization and the nucleation of the triple helix
[11]. After cleavage, the remaining telopeptides play important roles in the forma-
tion of fibrils. The telopeptides also contain the lysine residues for the formation
of covalent cross-links between molecules. The N-terminal propeptide consists of
a globular domain and a minor triple helix. The globular domain potentially inter-
acts with growth factors. Coiled-coil domains are prevelant in nonfibrillar collagens
as well. The transmembrane domain-containing molecules with triple helices show
a preference for a coiled-coil domain at the N-terminal end of the ectodomain.
30.1.5
Why is the Folding of the Triple Helix of Interest?
Folding studies of collagens are of interest because of the unusual structure of the
collagen triple helix, which is an obligatory trimer. The three chains have to com-
1062 30 The Thermodynamics and Kinetics of Collagen Folding
bine to the native state and do not have a defined structure if dissociated. Collagen
belongs to the class of filamentous proteins and the mechanism of the folding
process is distinctly different from folding pathways in globular proteins. Because
of the repeating sequence and the uniform structure, wrong alignments of chains
are likely and these are prevented by oligomerization domains.
There are also a number of practical aspects, which render folding studies inter-
esting. Inherited and spontaneous mutations decrease triple helix stability and
slow down the folding kinetics. Folding studies of wild-type and mutated collagens
are needed for an understanding of theses diseases, which cause severe tissue dis-
orders. The present review surveys the current knowledge of collagen folding. An
earlier recent review on collagen folding is by Baum and Brodsky [12] and a useful
general chapter on collagens was written by Bateman et al. [13].
30.2
Thermodynamics of Collagen Folding
30.2.1
Stability of the Triple Helix
The collagen molecules undergo a highly cooperative transition from a triple helix
to an unfolded state upon heating. The temperature at the midpoint of this transi-
tion is termed Tm and is specific for different collagens from different species. In-
terestingly, the Tm of the collagens is around the body temperature of the organism
from which it is isolated [14]. A linear correlation between the Tm and the number
of imino acid residues was noticed early on [15]. Later, a better correlation was
found between the Tm and the 4(R)-hydroxyproline content [16, 17]. Hydroxylation
of proline residues is therefore important for the stability of the collagen triple
helix and a summary of the posttranslational modifications of collagen molecules
is needed. Further stabilization of the collagen molecules is achieved by the forma-
tion of supramolecular structures such as fibrils. In this review, we will only deal
with individual collagen molecules in solution. The unfolding transition of triple
helices occurs in a very narrow temperature interval, indicating a highly coopera-
tive process. The Tm was found to be dependent on the rate of the increase in tem-
perature and a hysteresis or apparent irreversibility is observed for the thermal
transition of most extracted collagens. These difficulties in measuring such transi-
tion curves have led to the proposal that the collagen molecule is only kinetically
stable [18, 19]. True equilibrium transition curves have been measured only re-
cently for such collagens [14, 20]. From these measurements and from transition
curves of short fragments of collagen or collagen-like synthetic peptides, it is now
established that the collagen triple helix is a thermodynamically stable structure
but establishment of equilibrium is slow. Synthetic collagen-like peptides have
been extensively used in the characterization of the triple helical structure and its
stability.
30.2 Thermodynamics of Collagen Folding 1063
30.2.2
The Role of Posttranslational Modifications
30.2.3
Energies Involved in the Stability of the Triple Helix
The enthalpy change of the coil T triple helix transition for collagens was deter-
mined to be DH (15–18) kJ mol1 tripeptide units [15]. The enthalpy change
increases with increasing imino acid content. On a per residue basis, the change
of negative enthalpy for structure formation of triple helices is significantly larger
than that of globular proteins. The source of this large change in enthalpy is still
controversial.
Another unique feature is the temperature independence of DH . The specific
heat of the reaction is zero within the error limits of the measurements [15]. Of
the classical energies that determine protein stability, the electrostatic effect is the
easiest to deal with in collagen. The stability of the collagen triple helix is only min-
imally affected by pH and salt concentrations, therefore electrostatic interactions
can only play a minor role in the stability of the triple helix. The hydrophobic effect
probably plays a role in stabilizing the triple helix, but its effect is not as dominant
as in the stability of globular proteins. From the structure of the triple helix, it is
evident that an interchain hydrogen bond between the NH group of glycine and
the CO group of the amino acid in the Xaa position of a neighboring chain is pres-
1064 30 The Thermodynamics and Kinetics of Collagen Folding
ent. Hydrogen exchange studies indicate that two classes of hydrogen bonds exist.
The first class consists of 1:0 G 0:1 very slowly exchanging hydrogens per tripep-
tide unit and a second class of 0:7 G 0:1 slowly exchanging hydrogens per tripep-
tide unit [166]. It is likely that the first class comprises the above-mentioned hydro-
gen bond. If the Xaa and Yaa positions are not occupied by proline residues, it was
suggested that an additional interchain hydrogen bond can form between the car-
bonyl group of glycine and the amino group of the Xaa residue, probably involving
one or two water molecules.
The particular effectiveness of 4(R)-hydroxyproline in stabilizing the triple helix
has led to suggestions that it may form additional hydrogen bonds through an ex-
tended water structure [23]. This notion became weaker, when it was shown that
4(R)-fluoroproline formed more stable triple helices than 4(R)-hydroxyproline [24,
25]. The replacement of the hydroxyl group by fluorine has several effects. The in-
ductive effect of the fluorine group influences the pucker of the pyrrolidine ring.
This is a stereoelectronic effect as it depends on the configuration of the substitu-
ent. 4(R) substituents stabilize the Cg-exo pucker, while 4(S) substituents stabilize
the Cg-endo pucker. The X-ray structure of (Pro-Pro-Gly)10 showed a preference of
proline residues in the Xaa position in Cg-endo pucker, while the Yaa position pro-
lines preferred Cg-exo puckering [26, 27]. This puckering influences the range of
the main chain dihedral angles j and c of proline, which are required for optimal
packing in the triple helix. In addition, the trans/cis ratio of the peptide bond is
influenced by the substituent [24]. Because all peptide bonds in the triple helix
have to be trans, the helix is stabilized, if the amount of trans peptide bonds in
the unfolded state is increased. None of these individual effects can fully explain
the observed stability of the triple helix and it is difficult to quantitate the contribu-
tion of each of them.
A thermodynamic analysis of the triple coil T triple helix transition of fibrillar
collagens shows that the calorimetric enthalpy and the van’t Hoff enthalpy are sig-
nificantly different [15]. The ratio of the van’t Hoff enthalpy and the calorimetric
enthalpy determines the cooperative length. For fibrillar collagens type I, II, and
III a cooperative length of about 80–100 tripeptide units was determined [28].
This corresponds to about one-tenth of the total length of the triple helix. There-
fore, the all-or-none model should be a good approximation for the shorter syn-
thetic peptides. The cooperative length is an average parameter that applies to the
whole triple helix and does not take into account the sequence variations that occur
within a given molecule.
Because the rate of unfolding of the collagen triple helix is very slow in the tran-
sition region (see Section 30.3.4.4), great care must be given to the experimental
rate of heating. Figures 30.1 and 30.2 show the dependence of the Tm of type III
collagen as a function of the rate of heating. The Tm does not vary linearly with
the heating rate and true equilibrium curves are only obtained by extrapolation of
the heating rate to 0 (Figure 30.2). It should also be noted that most collagens
show an irreversible denaturation curve, because the molecules were proteolytically
cleaved and lack important domains for folding.
No equilibrium intermediates were observed during the unfolding transition for
30.2 Thermodynamics of Collagen Folding 1065
Fig. 30.1. Thermal unfolding of bovine pN triple helix formation was measured by ORD
type III collagen at different heating rates (thin at 365 nm. The endpoints of the unfolding
line ¼ 10 C h1 , thick line ¼ 2 C h1 , and (squares) and refolding (circles) kinetics at the
dashed line ¼ 0.5 C h1 ). The fraction of indicated temperature are also shown.
T m (°C)
the triple helices of type I, II, and III collagen, despite the heterogeneity of the
amino acid sequences along the triple helix. Several attempts have been made to
calculate differences in stabilities of triple helical regions with the collagen triple
helices. Sequence-dependent stabilities were calculated and these can explain the
occurrence of unfolding intermediates in the transition of types V and XI collagen
[29].
30.2.4
Model Peptides Forming the Collagen Triple Helix
bs x s s s s
Hx !
3C ! Hi !
! !
H3n2 ð1Þ
½H3n2 F
K¼ ¼ bs 3n2 ¼ ð2Þ
½C 3 3c02 ð1 FÞ 3
3½H3n2
F¼ ð3Þ
c0
From
and from Eq. (2) it follows that at the midpoint of the transition (F ¼ 0:5 and
T ¼ Tm ),
DH
Tm ¼ ð5Þ
DS þ R lnð0:75c02 Þ
where DG is the standard Gibbs free energy, DH is the standard free enthalpy,
and DS is the standard entropy. It is important to note that the Tm is concentra-
tion dependant and stability values should only be compared at identical molar
chain concentrations or concentrations which were corrected for concentration dif-
ferences with Eq. (5).
DH can be obtained from the slope of the transition curves. A first approxima-
tion of DH can be obtained from the van’t Hoff equation
1068 30 The Thermodynamics and Kinetics of Collagen Folding
dF
DH ¼ 8RTm2 ð6Þ
dT F ¼0:5
DH T
K ¼ exp 1 ln 0:75c02 ð7Þ
RT Tm
which is obtained by solving Eq. (5) for DS and substituting for DS in Eq. (4).
In this equation the specific heat c p is assumed to be zero, as measured for several
collagens by Privalov (see Section 30.2.3). In this regard, it is important to calculate
F as a function of temperature. In the helical region and especially in the coiled
region the circular dichroism (CD) signal shows a linear decrease with increasing
temperature.
For a direct fit of F by K and co Eq. (2) is rewritten as the cubic equation:
F 3 3F 2 þ bF 1 ¼ 0 ð8Þ
with:
9Kc02 þ 1
b¼ ð9Þ
3Kc02
By the method of Cardano [55] the real solution of this equation was obtained as:
F ¼uþvþ1 ð10Þ
with:
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u
u 2 ffi
t
3 q q p3
u¼ þ þ ð11Þ
2 4 27
and:
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u
u 2 ffi
t
3 q q p3
v¼ þ ð12Þ
2 4 27
1
in which p ¼ q ¼ .
3Kc02
When measuring CD the molar ellipticity is given by
where ½Yh and ½Yc are the ellipticities for the peptide in either the completely tri-
ple helical conformation or in the unfolded conformation. The linear temperature
dependencies of ½Yh and ½Yc can be described by
Here the ellipticities at the midpoint temperature Tm are arbitrarily used as refer-
ence points. For a best fit of the transition curves, initially the Tm is kept constant
and DH ; p and q are varied. After an initial fit, the Tm is then also varied.
For peptides that contain a cross-link, the triple helix T coil transition becomes
concentration independent and the equations for the all-or-none model simplify to:
F
K¼
1F
then F becomes
K
F¼
1þK
and
DH
Tm ¼ :
DS
Tab. 30.1. Thermodynamic data of the triple helix–coil transition of model peptides in aqueous
solution determined by circular dichroism (DH vH ; DS ) and differential scanning calorimetry
(DH cal ). The values are expressed per mol of tripeptide units in a triple helix. The number of
tripeptide units in the triple helix was calculated as 3n 2 for the peptide (Gly-Xaa-Yaa)n .
cleavage site.
f Heterotrimeric type IV collagen peptide containing the a1b1 integrin
binding site.
g Heterotrimeric (Gly-Pro-4(R)Hyp) with cysteine bridge.
5
h Dilysine bridged peptide with a collagen type III sequence and a Gly-
Flp, fluoroproline.
namic data also arise from the facts that the transition curves monitored by CD
are measured at a much lower concentration than the calorimetric measurements
and that the concentration dependence is unknown in many studies. All mea-
surements of single-chain peptides include the uncertainty of misaligned chains
and errors in the fitting procedures. Data from different sources are summarized
in Tables 30.1 and 30.2. In view of the many sources of error it is highly recom-
mended to consult the individual publications in order to gain information on
reliability.
For the reasons mentioned above, the published thermodynamic data for the tri-
ple helix coil transitions of synthetic peptides vary significantly. The commercially
available peptides (Pro-Pro-Gly)10 and (Pro-4(R)Hyp-Gly)10 have been measured in
several laboratories. The van’t Hoff enthalpy change varies from 7.91 to 18.8 kJ
mol1 tripeptide units for (Pro-Pro-Gly)10 and from 13.4 to 25.25 kJ mol1 tri-
peptide units for (Pro-4(R)Hyp-Gly)10 . The published values for the calorimetric
enthalpies of these peptides are more consistent.
1072 30 The Thermodynamics and Kinetics of Collagen Folding
Tab. 30.2. Thermodynamic data of the triple helix formation of host-guest peptides in PBS
determined by circular dichroism (DH vH ; DS ) and differential scanning calorimetry (DH cal ).
The values are expressed per mol of triple helix.
Generally, the triple helix T coil transitions of short peptides can be described
by the all-or-none model discussed in the previous section. Most peptides for which
calorimetric data are available show that the ratio of the van’t Hoff enthalpy and the
calorimetric enthalpy is close to 1, indicating that the all-or-none mechanism is a
good approximation, which is expected for peptides smaller than the cooperative
length. However, exceptions have been reported, especially in heterotrimeric pep-
tides [56].
Figure 30.3 shows the triple helix coil transition of Ac-(Gly-Pro-4(R)Hyp)10 -NH2
measured by CD in water. The heating/cooling rate was 10 C h1 and the unfold-
ing and refolding curves are shown for three different concentrations.
The experimental data can be described by the all-or-none model (Mizuno and
Bächinger, unpublished). The Tm increases with increasing concentration, but
a concentration-dependent hysteresis is observed during refolding (see Section
30.3.4.4).
The studies with synthetic model peptides confirm the stabilizing role of
4(R)Hyp in the Yaa position of the triple helix. The influence of 4(S)Hyp,
3(S)Hyp, 4(S)Flp, and 4(R)Flp on the stability of the triple helix is entirely based
on measurements with synthetic peptides (see Section 30.2.4.7).
The peptides in the host–guest system carry no charges at the ends and the val-
ues for the enthalpy and entropy changes are measured over a (Gly-Pro-4(R)Hyp)7
background. The stability of (Pro-4(R)Hyp-Gly)10 was shown to be pH dependent.
At acidic and basic pH the peptide carries charges at only one end and is more sta-
ble than at neutral pH, where charges are present at both ends [57]. The peptides
have the general sequence Ac-(Gly-Pro-4(R)Hyp)3 -Gly-Xaa-Yaa-(Gly-Pro-4(R)Hyp)4 -
Gly-Gly-NH2 . The thermodynamic data for the guest peptides can be used to pre-
dict the stability of triple helices with any sequence [58, 59]. The limitations of this
approach are discussed in Section 30.2.4.4. Table 30.2 lists the thermodynamic
parameters determined for the host–guest peptides. In contrast to other synthetic
peptides and CNBr peptides of type I collagen [60], where the enthalpy change in-
creases with an increasing amount of imino acids, the Gly-Pro-4(R)Hyp contain-
ing host peptide has one of the lowest enthalpy change observed. The host–guest
studies clearly show that hydrophobic residues do not play a major role in the sta-
bilization of the triple helix, rather they perform important tasks in the assembly
of collagen molecules into fibrils [36]. Gly-Gly containing peptides show a strong
influence in destabilizing the triple helix, which may play an important role in
specific functions by modulating the local triple helical stability [61]. These studies
also clarify the dependence of the triple helical stability on the identity, position
and ionization state of charged residues [62].
average to simulate cooperativity [66]. While empirical, this method was able to
qualitatively explain the occurrence of unfolding intermediates in the triple helix
of type V and XI collagens. In contrast to other fibrillar collagens, collagens type V
and XI show unfolding intermediates in the triple helix coil transition [29]. A com-
parison of the sequence-dependent stability profiles indicates that type V and XI
collagens lack a very stable region at the N-terminal end of the triple helix, that
is present in other fibrillar collagens. The N-terminal region of the triple helix
unfolds at lower temperatures in collagen types V and XI, and this leads to stable
unfolding intermediates, that consist of the more C-terminal regions of the triple
helix.
Brodsky’s group used a host–guest system to quantitate the influence of
naturally occurring tripeptide units to the triple helical stability [36, 58, 59, 61, 62,
67, 68, 69, 70]. The host–guest peptide system Ac-(Gly-Pro-Hyp)3 -Gly-Xaa-Yaa-(Gly-
Pro-Hyp)4 -Gly-Gly-NH2 is used to study the influence of the Xaa and Yaa residues
on the thermal stability of the triple helix. While there are over 400 possible combi-
nations of host–guest peptides, only about 80 appear with a significant frequency in
known collagen sequences and only 24 tripeptides have a frequency of higher than
1% [71]. These peptides can be divided into three groups: the Gly-Xaa-Hyp group,
the Gly-Pro-Yaa group, and the Gly-Xaa-Yaa group. Not surprisingly, the most
stable of the 20 peptides of group 1 is Gly-Pro-Hyp, followed by Gly-Glu-Hyp and
Gly-Ala-Hyp. The Tm values range from 47.3 C for Gly-Pro-Hyp to 31.9 C for Gly-
Trp-Hyp. For group 2, the most stable of the 20 peptides is again Gly-Pro-Hyp, fol-
lowed by Gly-Pro-Arg and Gly-Pro-Met. The Tm values range from 47.3 C for Gly-
Pro-Hyp to 26.1 C for Gly-Pro-Trp. A reasonable correlation was found between
the observed stabilities and the frequencies of occurrence in fibrillar collagen se-
quences [58]. Forty-one peptides of group 3 that contained no imino acids were
synthesized and their stabilities measured [69]. Theoretical calculations, assuming
no interaction between the Xaa and Yaa residues, allow the prediction of the stabil-
ity of all 361 possible peptides. Sixteen of the 41 peptides show a reasonable agree-
ment between the predicted and the measured stability (DTm a 2 C). Seven pep-
tides had a lower Tm than predicted and 18 had a higher Tm, most likely indicating
side-chain interactions in the triple helix. These studies should now allow the
prediction of the thermal stability of the triple helix in any Gly-Xaa-Yaa sequence.
Unfortunately, this is not the case. The tripeptide units Gly-Hyp-Pro and Gly-
3Hyp-Hyp show Tm values in the host-guest system that are comparable to Gly-
Pro-Hyp, but neither forms a triple helix as a homopolymer [72]. A similar result
was also obtained for the guest tripeptide unit Gly-Pro-fluoroproline [59]. The con-
clusion one has to draw is that these stabilities are dominated by the very stable
tripeptide units at either end of the guest tripeptide unit, and that this system has
little predictive power for the stability of Gly-Xaa-Yaa in homopolymers.
the synthetic model peptide (Pro-Pro-Gly)10 showed a 7/2 symmetry, giving rise to
3.5 residues per turn [77]. In 1994, a true single crystal was grown from a peptide
that consisted of 10 repeats of Pro-Hyp-Gly with a single substitution of a glycine
residue by an alanine in the middle [78]. This peptide was a triple helical molecule
with a length of 87 Å and a diameter of 10 Å. The structure was solved to 1.9 Å and
was consistent with the general parameters derived from the fiber diffraction
model, but showed a 7/2 symmetry. It was hypothesized that imino acid-rich pep-
tides have a 7/2 symmetry, but imino acid-poor regions adopt a 10/3 helix [78].
This was later confirmed with a synthetic peptide that contained an imino acid
poor region [79, 80]. The structure also confirmed the expected hydrogen bond
between the GlyNH and CbO of the proline in the Xaa position. In addition, an
extensive water network forms hydrogen bonds with all available carbonyl and
hydroxyl groups [23, 81].
The structure of the peptides (Pro-Pro-Gly)10 [82, 83] and (Pro-Hyp-Gly)10
[84] was also determined. The structure of (Pro-Pro-Gly)10 obtained from crystals
grown in microgravity, which diffracted up to a resolution of 1.3 Å, showed a pref-
erential distribution of proline backbone and side-chain conformations, depending
on the position [85, 26]. Proline residues in the Xaa position exhibit an average
main chain torsion angle of 75 and a positive side-chain w1 (down puckering),
while proline residues in the Yaa position were characterized by a significantly
smaller main-chain torsion angle of 60 and a negative w1 (up puckering). These
results were used to explain the stabilizing effect of hydroxyproline in the Yaa
position, because hydroxyproline has a strong preference for up puckering. It also
would explain the destabilizing effect of 4(R)-hydroxyproline in the Xaa position
(see also Section 30.2.4.7).
The structure of (Gly-Pro-Pro)10 was also solved in the context of the attached
oligomerization domain foldon [86].
NMR studies with synthetic model peptides of the triple helix are difficult be-
cause of overlapping resonances of the repetitive sequence and by peak broadening
from the shape [87–89]. Isotopic labeling was used to observe specific residues us-
ing heteronuclear NMR techniques. Hydrogen exchange studies were used to show
that the NH groups of glycine exchanged faster in the imino acid-poor region of
a synthetic peptide compared with the Gly-Pro-Hyp region [87]. The NH of threo-
nine and the NH of glycine showed a very slow exchange with deuterium in the
galactosyl-containing peptide Ac-(Gly-Pro-Thr(b-Gal))10 -NH2 [90]. Heteronuclear
NMR was used to monitor the folding kinetics of the triple helix [91, 92]. The hy-
dration of the triple helix was also studied by NMR [93]. The hydration shell was
found to be kinetically labile with upper limits for water molecule residence times
in the nanosecond to subnanosecond range.
requires all peptide bonds to be in the trans conformation, so changes in the cis
to trans ratio of peptide bonds in the unfolded state have a direct influence on the
stability (see Section 30.2.4.8). Another example comes from studies with synthetic
peptides, in which galactosyl threonine was incorporated [94]. The deep-sea hydro-
thermal vent worm Riftia pachyptila has a cuticle collagen that has a very low con-
tent of proline and hydroxyproline residues, but a high Tm of 37 C [95, 42]. The
Yaa position of the triple helical sequence in this cuticle collagen is frequently oc-
cupied by threonine. Threonines in the Yaa position were frequently galactosylated
with one to three galactosyl units. The peptide Ac-(Gly-Pro-Thr)10 -NH2 was unable
to form a triple helix, whereas the galactosylated version Ac-(Gly-Pro-Thr(b-Gal))10 -
NH2 showed a Tm of 39 C [94]. The stabilizing effect of galactosyl threonine has
been explained by the occlusion of water molecules and by hydrogen bonding [90].
Additionally, galactosylation of the threonine residues also restricts the available
conformations in the unfolded state. The end-to-end distance, determined by fluo-
rescence energy transfer, is increased in the peptide Dabcyl-(Gly-Pro-Thr(b-Gal))5 -
Gly-EDANS when compared with the peptide Dabcyl-(Gly-Pro-Thr)5 -Gly-EDANS
(Bächinger, unpublished). However, it is difficult to quantitate the contribution of
these different effects to the overall stability of the triple helix.
The importance of the restriction of imino acids in the unfolded state for the ef-
ficient folding of the triple helix has recently been described [96].
Hydroxyproline is also found in other types of collagens, such as type I [100], type
V [101, 102], type X [103], and interstitial and cuticle collagens of annelids [105].
The only reported sequences containing 3(S)Hyp are in Gly-3(S)Hyp-4(R)Hyp tri-
peptide units. Synthetic peptides with 3(S)Hyp in a host–guest system decrease
the stability of the triple helix in either the Xaa or the Yaa position [38]. A more
severe decrease was observed for 3(S)Hyp in the Yaa position. It was concluded
that the inductive effect of the 3-hydroxyl group of 3(S)Hyp decreases the strength
of the GlyNH OC3(S)Hyp hydrogen bond, when 3(S)Hyp is in the Xaa position,
and that, when 3(S)Hyp is in the Yaa position, the pyrrolidine ring pucker leads to
inappropriate mainchain dihedral angles and steric clashes [38]. When the influence
of 3(S)Hyp was studied in homopolymers, the peptide Ac-(Gly-Pro-3(S)Hyp)10 -
NH2 did not form a triple helix [104]. Surprisingly, the peptide Ac-(Gly-3(S)Hyp-
4(R)Hyp)10 -NH2 was also unable to form a triple helix as a homopolymer. Even
when foldon was attached to (Gly-3(S)Hyp-4(R)Hyp)10 , no triple helix formation
was observed, ruling out a kinetic difficulty for this peptide to form such a struc-
ture [104].
While fluoroproline was used in biosynthetic experiments in the sixties [105,
106, 107], its importance to the stability of the triple helix was established by incor-
porating 4(R)-fluoroproline (4(R)Flp) into collagen model peptides [25]. Raines and
coworkers showed that the substitution of 4(R)Hyp by 4(R)Flp in the peptide (Gly-
Pro-4(R)Hyp)10 leads to a significant increase in the stability of the triple helix. The
authors initially cited an ‘‘unappreciated inductive effect’’ as the reason for this
increase in stability, but more importantly, this result showed that an electron with-
drawing substituent in the 4(R) position of proline can stabilize the triple helix
without an extended network of water molecules [108]. Since this initial discovery,
the inductive effect has been characterized in great detail [109, 110, 111]. The
gauche effect determines the proline ring puckering (up puckering, Cg-exo) and
therefore predetermines the main-chain dihedral angles. In addition, the trans to
cis ratio of the peptide bond preceeding 4(R)Flp increases. Both of these effects in-
crease the stability of the triple helix. Peptides with 4(R)Flp in the Xaa position
do not form a triple helix [112, 113]. Peptides with 4(S)Flp in the Yaa position also
do not form a triple helix [44, 110]. In contrast to the peptide (4(S)Hyp-Pro-Gly)10 ,
which does not form a triple helix, the peptides (4(S)Flp-Pro-Gly)n form a stable
triple helix (n ¼ 7 [113]; n ¼ 10 [112]). This is an unexpected result, because the
trans to cis ratio of the peptide bond is decreased by 4(S)Flp. On the other hand,
the preorganization of the ring puckering should promote triple helix formation.
Why then do peptides with 4(S)Hyp in the Xaa position not form a triple helix? It
was hypothesized that unfavorable steric interactions occur with 4(S)Hyp that are
absent with 4(S)Flp [113].
The substitution of 4(R)Hyp by 4(R)-aminoproline in the peptide (Gly-Pro-
4(R)Hyp)6 also increases the stability of the triple helix. The extent of this increase
in stability was strongly pH dependent [114].
CaN bond. Consequently, the peptide bond is planar and the flanking Ca atoms
can be either in trans (o ¼ 180 ) or cis (o ¼ 0 ) conformation. For peptide bonds
preceding residues other than proline, only a small fraction (0.11–0.48%) was
found to be in the cis conformation [115]. Peptide bonds preceding proline are
much more frequently in the cis conformation, because the energy difference be-
tween the two conformations is rather small. Cis contents from 10 to 30% were
found in short unstructured peptides [116, 117]. The activation energy for the cis-
trans isomerization is high (80 kJ mol1 ) and the rate of isomerization is slow (see
Section 30.3.2.4).
The collagen triple helix can accommodate only trans peptide bonds, so the ratio
of cis to trans peptide bonds in the unfolded state has a direct influence on the
stability of the triple helix. As mentioned above, the stereoelectronic effect of 4-
substituents of proline residues influences the cis/trans ratio of the peptide bond
preceding proline. It was proposed that the conformational stability of the triple
helix relies on the change in the cis/trans ratio [110, 118]. These authors measured
the K cis=trans for a number of 4-substituted proline residues. Table 30.3 summerizes
these results.
There is indeed a good correlation of the observed cis/trans ratios and the ther-
mal stability of homopolymers containing these residues in the Yaa position. How-
ever, a quantitation of this effect can be calculated [119]. The observed stability dif-
ferences in the homopolymeric peptides (Pro-Pro-Gly)10 and (Pro-4(R)Hyp-Gly)10
are too large to be accounted for solely by the change in the cis/trans ratio of pep-
tide bonds in the unfolded state (Bächinger, unpublished).
Tab. 30.3. Kcis/trans in model compounds and unfolded type I collagen measured by NMR.
contributions from the unfolded and folded state lead ultimately to the stable triple
helix.
30.3
Kinetics of Triple Helix Formation
30.3.1
Properties of Collagen Triple Helices that Influence Kinetics
The complexity of the kinetics largely depends on the length of the triple helix to
be formed. For short collagens and for short model peptides an all-or-none mecha-
nism is a good approximation. The kinetics of systems with up to 45 tripeptide
units per molecule (15 units per chain) have been successfully treated using this
approximation [35, 120, 121]. Triple helices of most collagens are however much
longer (1000–7200 amino acids) and the extent of cooperativity in triple helix for-
mation is not sufficient for all-or-none transitions in systems of this size. The coop-
erative length of the triple helix in interstitial collagens with about 1000 tripeptide
units was estimated to be about 100 tripeptide units from equilibrium studies [28].
This value was determined for the main triple helix of collagens I, II, and III and
may apply to other collagens only in a first approximation because of sequence var-
iations. The value is however consistent with the experimentally observed all-or-
none nature of transitions with less than 45 tripeptide units. As expected the kinet-
ics of long triple helices are more complex than those of short naturally occurring
or designed model systems.
A second feature that largely influences the kinetics is the presence or absence
of a cross-link between the three chains in a collagen molecule. As described in
the Introduction most collagens contain nucleation domains (also called registra-
tion or oligomerization domains), which are located N-terminal or C-terminal of
the triple helical domain. As explained in the first part of this review these do-
mains serve to register the three chains at a side at which triple helix formation
is nucleated. They are also involved in the selection of different chains in cases in
which the collagen is a heterotrimer. The presence of trimerizing noncollagenous
domains (designated as NC-domains or N-propeptides and C-propeptides) was
found to be essential for proper folding of the triple helical domain. In addition,
many collagen triple helical domains contain disulfide cross-links between their
chains. The best-studied example is collagen III in which a disulfide knot of six
cysteines (two per chain) connects the three chains at the C-terminus. During the
physiological folding process, formation of this knot is most likely dependent on
earlier noncovalent interactions between the adjacent NC-domains. After its forma-
tion it serves as an ideal registration and nucleation site, thus replacing the action
of the NC-domain [122].
We shall first deal with the kinetics of folding from noncross-linked single
chains and then turn to collagen triple helices, which are trimerized by either the
disulfide knot of collagen III or by NC1-domains with strong trimerization poten-
1082 30 The Thermodynamics and Kinetics of Collagen Folding
tial. Natural proteins or fragment but also designed collagen-like proteins were em-
ployed in these studies.
30.3.2
Folding of Triple Helices from Single Chains
Fig. 30.4. Kinetics of triple helix formation calculated by F ¼ ð½Y225 ½Yu Þ=ð½Yf ½Yu Þ,
of A) (ProProGly)10 ((PPG)10 )) and B) where ½Yf and ½Yu are the ellipticities of the
(ProHypGly)10 ((POG)10 )) at different peptide unfolded and folded state. The time courses
concentrations. The conversion was monitored for 60, 180, and 1000 mM concentrations of
at 7 C after a temperature jump (dead time (PPG)10 demonstrate a high dependence of the
2 min) from 60 C in (A) and 70 C in (B), folding rate on concentration. For (POG)10 the
temperatures at which the peptides are fully concentration dependence is much smaller
unfolded. The degree of conversion F was and the rates are higher.
1084 30 The Thermodynamics and Kinetics of Collagen Folding
Fig. 30.5. Plot of the logarithm of initial rate nonlinear dependencies predicted by
of triple helix formation so ¼ ðdF=dtÞt¼0 as a mechanism 13. The triangular point indicates
function of the logarithm of total chain the rate constant for (ProProGly)10 in
concentration co for (ProProGly)10 and (GlyProPro)10 –foldon. This point was placed at
(ProHypGly)10 . Broken lines show the best a peptide concentration of 1 M (log co ¼ 0),
linear fits according to Eq. (2) with the which is the estimated intrinsic chain
apparent reaction orders of n ¼ 2:5 and 1.5 for concentration at the junction of the
(ProProGly)10 and (ProHypGly)10 , respectively. (GlyProPro)10 and foldon domain in
Continuous curves show better fits with (GlyProPro)10 –foldon.
Satisfactory fits of the time courses with theoretical integrated equations for even
reaction orders could not be achieved, but fits with reaction orders of 2 were better
than for 1 or 3. In agreement with similar observation with host–guest peptides
[128], the data suggested that the reaction order changed with the progress of
the kinetics. Therefore, initial slopes so were determined by careful extrapolations
of the initial phases of the time courses (Figure 30.5).
Plotting log so as a function of log co yielded reaction orders that were 3 at very
low concentrations but dropped with increasing concentration. Here co is the total
concentration of single chains. Interestingly the concentration dependence of so for
single chains of (PPG)10 converged to the value measured for the first-order folding
of the peptide cross-linked by foldon or the Cys-knot (see (GPP)10 –foldon and
(GPP)10 –Cys2 below).
Activation energies were determined from the temperature dependence of the
folding kinetics of (PPG)10 and (POG)10 . Values were much lower than for the
cross-linked polypeptides (Table 30.4).
30.3 Kinetics of Triple Helix Formation 1085
Tab. 30.4. Rate constants and activation energies of triple helix formation from single chains
and trimerized chains at 20 C.
(ProProGly)10 900a 7
(ProHypGly)10 about 10 6 * 8
(GlyProPro)10 –foldon 0.00197 54.5
(GlyProPro)10 –Cys2 0.00033 52.5
a Experimental values at 7 C were 800 s1 and 875 000 s1 ,
respectively.
C
þ ð13Þ
k1
! k3 k4
2C k2 D ! H ! H
D is present only in very small amounts and therefore a steady state equilibrium
(d[D ]/dt ¼ 0) is achieved shortly after the start of the reaction. Furthermore, dis-
sociation into monomers is much faster than formation of H (k2 g k3 [C]) and the
last term in Eq. (14) can be neglected. It follows that
It follows that the apparent third order rate constant ka is the product of the equi-
librium constant K and a true second-order rate constant k3 .
It is safe to assume that the nuclei D and H do not significantly contribute to
the CD signal. This assumption is justified by the high instability and lack of triple
helical structure of D and a very short segment of H (compared with the final
length of H), which acts as a nucleus. Mechanism 13 predicts two limiting cases
in which either nucleation or propagation is rate limiting.
At very low concentrations formation of H is the rate-limiting step and the rate
of triple helix formation d[H]/dt equals the rate of nucleus formation as predicted
by Eq. (16). In this case, the rate of helix propagation is faster than the rate of
nucleation k4 ½H g ka ½C 3 . The experimentally observed degree of conversion F is
related to [H] by F ¼ 3½H=½C0 in which [C0 ] is the total concentration of chains.
Consequently the initial rate
Theoretical curves were calculated for mechanism 13 also for intermediate situa-
tions by numerical integration [121] and these curves are included in Figure
30.5. The dependencies were calculated with ka ¼ 800 and 875 000 M2 s1 and
k4 ¼ 0:0007 s1 and 0.002 s1 for (PPG)10 and (POG)10 , respectively. It can be
seen that limiting case 1 is nearly reached for (PPG)10 , whose reaction order is
2.8–3 in the lowest concentration range. The average reaction order for all data
30.3 Kinetics of Triple Helix Formation 1087
narrow range and differences of no more than a factor of 10 in rate constants are
expected.
An additional feature that may influence the kinetics is the equilibrium ratio of
cis to trans, which again depends to some extent on the amino acid side chains
preceding prolines or hydroxyprolines [117]. The approximate value for the equilib-
rium constant K cis-trans is 0.2 for the peptide bond between glycine and prolines. A
rate constant kcis-trans ¼ 0:003 s1 at 20 C was determined for the isomerization of
this bond in short peptides [116, 117]. Furthermore, variations of the prolines-- and
hydroxyprolines-- content in the sequence will lead to differences in the number of
cis–trans isomerization steps required for folding of a collagen segment. The last
point does not apply to the comparison of (PPG)10 and (POG)10 , but is relevant
for the folding of natural collagens.
kct k rf
3Ccis T 3 Ctrans T H
k tc k fr
in which kct and k tc are the rate constants of cis–trans isomerization and k rf and k fr
are the rate constants of folding of residues in trans conformation. To improve the
fits the mechanism was expanded by a branching reaction at Ctrans . Fitting results
supported a nucleation domain composed mainly of the (GPO)4 moiety, which
must be in trans form before the monomer is competent to initiate triple helix
formation. Contrary to the highly positive activation energy of cis–trans isomeriza-
tion, a negative activation energy was found, which was explained by a fast pre-
equilibrium. Contrary to mechanism 13 the mechanism does not contain a dimer
precursor and assumes a third-order reaction at any concentration in contrast to
experimental findings with other model peptides.
30.3 Kinetics of Triple Helix Formation 1089
Fig. 30.7.Sequences of the short N-terminal defined by amino acid sequencing of bovine
and the central triple helix in collagen III. In collagen III. In the other sequences proposed
major parts of the GXY repeats, the residues in by cDNA sequencing, proline residues in the Y
the X and Y positions are not indicated. position are probably also hydroxylated, and
Hydroxyprolines residues (O) were only are indicated by O.
The folding of the designed peptide T1-892 was also investigated by NMR spec-
troscopy [12, 92]. By this technique cis–trans isomerization steps of Gly-Pro and
Pro-Hyp bonds were monitored. The study identified individual residues involved
in cis–trans isomerization as the rate-limiting step in triple helix propagation. Inter-
estingly, the authors were able to define the direction of growth. It started in the
(Gly-Pro-Hyp)4 part of the peptide, which was defined as a nucleation domain.
This finding was incorporated in the above mechanism [132]. Furthermore, a
zipper-like folding (see below) was confirmed by the NMR study.
30.3.3
Triple Helix Formation from Linked Chains
Pioneering kinetic work was performed with the short N-terminal and the long
central triple helical domains of collagen III, which consists of three identical
chains [122, 130, 131]. Both domains are terminated by a disulfide knot. Sequences
from three species are compared in Figure 30.7 with the N-terminal parts of the
triple helices schematically indicated by GXY repeats [133]. It should be recalled
that prolines in Y-positions are hydroxylated to hydroxyproline probably in all
chains (Figure 30.7).
Note that the cysteine-containing sequences are different in the two domains. It
was however established by molar mass determinations that the three chains are
interlinked to trimers in both cases. The three-dimensional structure of the disul-
fide knots has not been solved yet but two likely models were proposed for the knot
terminating the central helix [45, 131].
30.3.3.1 The Short N-terminal Triple Helix of Collagen III in Fragment Col1–3
Fragment Col1–3 consists of the N-terminal propeptide, the short triple helix and
a noncollagenous telopeptide. Trimerization by the knot led to an increase in ther-
mal stability and the fragment melts reversibly in an all-or-none type transition
1090 30 The Thermodynamics and Kinetics of Collagen Folding
Fig. 30.8. Dodecyl sulfate slab gel electro- sulfate was added and the sample was run on
phoresis of chain fragments which were pro- 10% polyacrylamide gel under reducing
tected against trypsin digestion by refolding. conditions. Eleven trypsin-resistant chain
Type III collagen was denatured at 45 C for fragments designated by the letters a–k can be
20 min and renatured at 25 C for the time clearly distinguished. For comparison, native
interval indicated. After incubation with untreated collagen (C), native trypsin-treated
trypsin at 20 C for 2 min, sodium dodecyl collagen (CT) and trypsin (T) were also run.
case intermediates are not as well resolved because of their three times larger size
but the same conclusions can be drawn. Furthermore, identical results were ob-
tained with pN-collagen III, demonstrating that the N-terminal propeptides are
not involved in the folding process.
For the zipper-like folding from the C-terminus zero-order kinetics is expected
for a large initial fraction of the conversion (see next section). This behavior was
experimentally observed for time courses followed by CD (Figure 30.9) [122].
Comparing the kinetics of the long central triple helix with those of the Col1–3
domain and the so-called one-quarter fragment obtained by selective proteolytic
cleavage [136], a direct proportionality of the half-times to chain length was ob-
served (Figure 30.9). This observation provides strong additional support to the
zipper-like folding.
From the temperature dependence of refolding, activation energies of 85 kJ
mol1 were derived for the folding of the central triple helical domains, supporting
that the propagation of the zipper is rate limited by cis–trans isomerization steps
[130]. With the help of a model mechanism, it was concluded that on average, 30
amino acid residues occur in uninterrupted stretches without cis peptide bonds.
They convert in a fast reaction after each isomerization step. The average rate con-
stant of the isomerization steps was found to be 0.015 s1 at 20 C.
1092 30 The Thermodynamics and Kinetics of Collagen Folding
Fig. 30.10. Notation of the collagen-like part conformation is called h. In the coiled state,
of Col1–3 as a linear sequence of letters. The tripeptide units (v) with all peptide bonds in
three disulfide bridges are called h 0 and a the trans configuration are distinguished from
tripeptide unit while the correct helical units (w) that contain a cis peptide bond.
30.3 Kinetics of Triple Helix Formation 1093
bond. Tripeptide units are numbered in the way triple helix formation is believed
to occur in a sequential way: to an already formed triple helix a tripeptide unit of
chain A is added, followed by addition of the closest unit from chain B and then
from C. A, B, and C are the designations for the three different chains in collagens.
Each addition of a tripeptide unit is defined as a propagation step.
With the above notation, any state during folding can be written as a linear
sequence of letters h, v, and w. For example, the partially folded state of Figure
30.10 reads h 0 hhhhhhvwvvv . . . , with h 0 acting as a permanent nucleus. It was
shown by designed models for the triple helix (see following sections) that nuclea-
tion is mainly achieved by enforcement of a close neighborhood of residues near to
the disulfide knot. The local concentration near the disulfide knot is close to 1 M as
estimated from the model of the disulfide knot [131] and from the stability of
(GPP)10 –foldon (see equilibrium part). The studies with designed proteins demon-
strated that the disulfide knot may be replaced by any other trimerizing agent with-
out affecting the nucleation potential. It should also be recalled that the disulfide
knot (or other trimerizing cross-links) serve the important function of keeping the
chains in the correct register and preventing misalignments.
Only the stretch of uninterrupted v units, which starts from h 0 and is terminated
by the first w, can convert to the triple helix without cis–trans isomerization. Under
conditions in which back reactions can be neglected, the relative amplitude of the
fast phase (dA f ) or slow phase (dA s ) becomes
dA f dA s hi v i
¼1 ¼ ð19Þ
dA 0 max dA 0 n
with
X
n1
p v ð1 pvn Þ
hi v i ¼ ipvi p w þ npvn ¼ ð20Þ
i¼1
1 pv
K
pv ¼ ð21Þ
1þK
It can be shown for large n that hi V i may be approximated by K. For peptide Col1–
3 with n ¼ 43 (Figure 30.9) the relative amplitude of the fast phase was 0.5 and a
value of K ¼ 25–30 is derived. Note that this equilibrium constant is an average
over regions of variable proline and hydroxyproline content and is much larger
than the equilibrium constant for the isomerization of a single X-Pro or X-Hyp
bond. Assuming that K is identical for the central triple helix (n ¼ 1026) and the
1094 30 The Thermodynamics and Kinetics of Collagen Folding
From kinetic experiments, (Figure 30.9) it was only possible to determine the ap-
parent rate constant kapp from the initial rates of the curves. Assuming a rate con-
stant of k ¼ 0:015 s1 as measured for the isomerization of a Gly-Pro bond [137],
other data in [116, 117, 130] follow K ¼ 30 in close agreement with the value ob-
tained from the amplitude of the fast phase. This implies that on average about
every thirtieth tripeptide unit in the coiled chains of collagen III contains a cis pep-
tide bond. This value fits well with qualitative estimates from the distribution of
proline and hydroxyproline residues.
It was also possible to derive the time course of the entire slow phase (Figure
30.11).
The probability of finding i w segments in a molecule of n coiled segments is
given by a binominal distribution with the probabilities for trans and cis segments
defined in Eq. (14)
Fig. 30.11. Comparison of the degree of calculated from the length of the trypsin-
conversion F and length of the chains in triple restistant fragments of Figure 30.8. The
helical conformation nh during refolding of experimental time dependencies of nh and F
type III pN collagen at 25 C. The experimental are compared with theoretical refolding
curve for the recovery of F with time at 25 C kinetics (drawn-out curve). This was calculated
was taken from Figure 30.9. It is compared by Eqs (24) and (28) with k ¼ 0:015 s1 and
with the number of residues in helical K ¼ 30.
conformation nh (open circles), which was
30.3 Kinetics of Triple Helix Formation 1095
n
p^in ¼ pvni pwi ð23Þ
i
dF kð1 þ KÞ
¼ f hw ðtÞ ð24Þ
dt n
k k k k k
M1 ! M2 ! ! Mr ! ! Miþ1
i i1 irþ1 0 ð25Þ
df1
¼ kf1 ð26aÞ
dt
dfr
¼ kfr þ kfr1 ð26bÞ
dt
dfrþ1
¼ kfrþ1 þ kfr ð26cÞ
dt
Here f1 ; f2 . . . fr ; frþ1 . . . fiþ1 are the fractions of species in reaction (25). A frac-
tion is defined by fr ¼ cr =c10 where c10 is the initial concentration of M1 . To find a
solution of this set of first-order linear differential equations we introduce
it follows that
dfrþ1 dC kt
¼ kfrþ1 þ e
dt dt
dC
¼ kFðtÞ
dt
The initial conditions at t ¼ 0 are f10 ¼ 1 and all other fr ¼ 0. Integration of the
first differential Eq. (26a) with this conditions yields
It follows that
In order to obtain the fraction of molecules which contain at least one w segment
we have to take the sum of fr from r ¼ 1 to r ¼ i. Since these are fractions of the
species present in the initial distribution we have to multiply each sum over fr by
corresponding p^in and to take the sum over this distribution in order to obtain the
total fraction of species with at least one w segment and with one h–w junction
X
i¼n X
r ¼i
ðktÞ r1 kt
f hw ¼ p^in e ð28Þ
i¼0 r ¼1
ðr 1Þ!
The rate of helix formation can now be calculated by Eqs (24) and (28) and the time
dependence of F can be obtained by numerical integration of Eq. (24) into which
f hw Eq. (28) was substituted. A good fit to the experimental time course was ob-
tained with the parameters k ¼ 0:015 s1 and K ¼ 30 (see Figure 30.11).
30.3 Kinetics of Triple Helix Formation 1097
30.3.4
Designed Collagen Models with Chains Connected by a Disulfide Knot or by
Trimerizing Domains
Natural collagens contain triple helices whose chemical and physical properties
are different at different positions of the helix. This property originates from varia-
tions of residues in X and Y positions. As described in the equilibrium part stabil-
ities largely depend on the nature of X and Y in short peptides and differences may
also result in long triple helices although cooperative influences of neighboring
segments may reduce the effect.
Kinetic measurements of designed model proteins with GPP-, GPO-, or other
collagen-like repeats, avoid complication caused by sequence heterogeneity. In ad-
dition, it is possible to study the influences of cross-linking in a quantitative way.
Two methods of cross-linking were applied. In the first one, two Cys residues were
added to the ends of the chains either by peptide synthesis or by fusion of the di-
sulfide knot of type III collagen by recombinant technology. The chains were then
joined by controlled oxidative coupling. In the second method, noncollagenous do-
mains were fused to either the N- or C-terminus of the model peptide.
constants are included in Table 30.1 for comparison with the apparent third-order
constants for the free uncross-linked chains. For reasons discussed in Ref. [121]
activation energies (Table 30.4) were somewhat lower than measured for the cis–
trans isomerzation of dipeptides [116, 117]. There is no doubt that the slow phase
of the refolding reflects the propagation step, whose rate is determined by cis–trans
isomerization of peptide bonds. The rate constants for the isomerization of individ-
ual prolines containing bonds in (GPP)10 –Cys2 are unknown and it should be re-
called that the overall rate constants are composites of these values.
(GlyProPro)n –foldon [86] several bad contacts were observed in the contact region
between the triple helix and the foldon domain. It was concluded that unfavorable
enthalpic interactions may in part counteract the entropic stabilization by cross-
linking. The structure of foldon–(GlyProPro)n was not solved yet but different
and probably more unfavorable interactions are anticipated in its contact region.
It was also noticed that the insertion of Gly-Ser spacers between the oligomerza-
tion domain and the triple helix is essential. Most importantly, however, the kinet-
ics of triple helix formation was identical for all four model systems (GlyProPro)n –
foldon, foldon–(GlyProPro)n , (GlyProPro)n –Cys2 and Cys2 –(GlyProPro)n . Also, the
activation energies of folding were identical for all four peptides. Rate constants
of folding were about 103 s1 at 20 C and the activation energy was 50 kJ mol1
[165]. In summary, triple helix formation proceeds with closely similar rates in
both directions. For the alignment of strands and nucleation of triple helical fold-
ing, oligomerization domains are needed but these may be placed at both ends of
the triple helix. Their more frequent occurrence at the C-end is not explained by an
easier folding from this side.
1
k1
app ¼ ð29Þ
kf þ k u
in which k f and k u are the rate constants of folding and unfolding, respectively.
Rate constant k f is dependent on temperature according to the Arrhenius relation
with the high activation energy Ea; f of cis–trans isomerization (see Section 30.3.3).
The temperature dependence of k u is much lower and its activation energy Ea; v can
be calculated from Ea; f and the enthalpy of the reaction by Ea; v ¼ Ea; f DH . A
satisfactory fit of the experimentally observed change of helicity F during heating
or cooling is obtained when the rate law
dF
¼ k f ð1 FÞ k u F ð30Þ
dt
is integrated with the starting conditions (1) F ¼ 1 at t ¼ 0 for heating and (2)
F ¼ 0 at t ¼ 0 for cooling. Case 1 yields the time course
K 1
F¼ þ ekapp t ð31Þ
1þK 1þK
and case 2
K
F¼ ð1 ekapp t Þ ð32Þ
1þK
30.3.5
Influence of cis–trans Isomerase and Chaperones
tion in vivo. FKBP65 had only a small effect on the rate of refolding on type III
collagen [148]. PPIases are widely distributed and the Escherichia coli variant was
also investigated with collagen [149]. It was speculated that the relatively small ac-
celerations may originate from the use of enzymes that are not specific for colla-
gen. A PPIase with an acceleration factor of 100 or more as observed for other pro-
teins has not yet been observed. Such an enzyme may, however, be a necessity for
animals living in a cold environment at which cis–trans isomerization is very slow,
because of its high activation energy.
Numerous other helper proteins have been identified in the context of collagens
(for reviews see Refs [150, 151, 152]). A collagen-specific chaperone is HSP47,
whose physiological importance is manifested by a severe phenotype of transgenic
mice lacking HSP47 [153]. Several very different mechanisms of action were dis-
cussed for HSP47, but with the exception of a specific binding to collagen, molec-
ular explanations remain to be explored.
30.3.6
Mutations in Collagen Triple Helices Affect Proper Folding
References
Gly ! Ala substitution. Connect Tissue 91 Liu, X., Siegel, D. L., Fan, P.,
Res 35, 401–6. Brodsky, B. & Baum, J. (1996). Direct
82 Kramer, R. Z., Vitagliano, L., Bella, NMR measurement of folding kinetics
J. et al. (1998). X-ray crystallographic of a trimeric peptide. Biochemistry 35,
determination of a collagen-like 4306–13.
peptide with the repeating sequence 92 Buevich, A. V., Dai, Q. H., Liu, X.,
(Pro-Pro-Gly). J Mol Biol 280, 623–38. Brodsky, B. & Baum, J. (2000). Site-
83 Nagarajan, V., Kamitori, S. & specific NMR monitoring of cis-trans
Okuyama, K. (1998). Crystal structure isomerization in the folding of the
analysis of collagen model peptide proline-rich collagen triple helix.
(Pro-pro-Gly)10. J Biochem (Tokyo) Biochemistry 39, 4299–308.
124, 1117–23. 93 Melacini, G., Bonvin, A. M.,
84 Nagarajan, V., Kamitori, S. & Goodman, M., Boelens, R. &
Okuyama, K. (1999). Structure Kaptein, R. (2000). Hydration
analysis of a collagen-model peptide dynamics of the collagen triple helix
with a (Pro-Hyp-Gly) sequence repeat. by NMR. J Mol Biol 300, 1041–9.
J Biochem (Tokyo) 125, 310–18. 94 Bann, J. G., Peyton, D. H. &
85 Berisio, R., Vitagliano, L., Bächinger, H. P. (2000). Sweet is
Mazzarella, L. & Zagari, A. (2002). stable: glycosylation stabilizes
Crystal structure of the collagen triple collagen. FEBS Lett 473, 237–40.
helix model [(Pro-Pro-Gly)(10)](3). 95 Gaill, F., Mann, K., Wiedemann,
Protein Sci 11, 262–70. H., Engel, J. & Timpl, R. (1995).
86 Stetefeld, J., Frank, S., Jenny, M. Structural comparison of cuticle and
et al. (2003). Collagen stabilization at interstitial collagens from annelids
atomic level: crystal structure of living in shallow sea-water and at
designed (GlyProPro)10 –foldon. deep-sea hydrothermal vents. J Mol
Structure (Camb) 11, 339–46. Biol 246, 284–94.
87 Fan, P., Li, M. H., Brodsky, B. & 96 Xu, Y., Hyde, T., Wang, X., Bhate,
Baum, J. (1993). Backbone dynamics M., Brodsky, B. & Baum, J. (2003).
of (Pro-Hyp-Gly)10 and a designed NMR and CD spectroscopy show that
collagen-like triple-helical peptide by imino acid restriction of the unfolded
15N NMR relaxation and hydrogen- state leads to efficient folding.
exchange measurements. Biochemistry Biochemistry 42, 8696–703.
32, 13299–309. 97 Inouye, K., Sakakibara, S. &
88 Li, M. H., Fan, P., Brodsky, B. & Prockop, D. J. (1976). Effects of the
Baum, J. (1993). Two-dimensional stereo-configuration of the hydroxyl
NMR assignments and conformation group in 4- hydroxyproline on the
of (Pro-Hyp-Gly)10 and a designed triple-helical structures formed by
collagen triple-helical peptide. homogenous peptides resembling
Biochemistry 32, 7377–87. collagen. Biochim Biophys Acta 420,
89 Melacini, G., Feng, Y. & Goodman, 133–41.
M. (1996) Acetyl-terminated and 98 Bann, J. G. & Bächinger, H. P.
template-assembled collagen-based (2000). Glycosylation/hydroxylation-
polypeptides composed of Gly-Pro- induced stabilization of the collagen
Hyp sequences. 3. Conformational triple helix. 4-trans-hydroxyproline in
analysis by 1 H-NMR and molecular the Xaa position can stabilize the
modeling studies. J Am Chem Soc 118, triple helix. J Biol Chem 275, 24466–9.
10359–64. 99 Kresina, T. F. & Miller, E. J. (1979).
90 Bann, J. G., Bächinger, H. P. & Isolation and characterization of
Peyton, D. H. (2003). Role of basement membrane collagen from
carbohydrate in stabilizing the triple- human placental tissue. Evidence for
helix in a model for a deep-sea the presence of two genetically distinct
hydrothermal vent worm collagen. collagen chains. Biochemistry 18,
Biochemistry 42, 4042–8. 3089–97.
References 1107
100 Kefalides, N. A. (1973). Structure and 110 Bretscher, L. E., Jenkins, C. L.,
biosynthesis of basement membranes. Taylor, K. M., DeRider, M. L. &
Int Rev Connect Tissue Res 6, 63–104. Raines, R. T. (2001). Conformational
101 Burgeson, R. E., El Adli, F. A., stability of collagen relies on a
Kaitila, II & Hollister, D. W. (1976). stereoelectronic effect. J Am Chem Soc
Fetal membrane collagens: identifi- 123, 777–8.
cation of two new collagen alpha 111 Jenkins, C. L. & Raines, R. T. (2002).
chains. Proc Natl Acad Sci USA 73, Insights on the conformational
2579–83. stability of collagen. Nat Prod Rep 19,
102 Rhodes, R. K. & Miller, E. J. (1978). 49–59.
Physicochemical characterization and 112 Doi, M., Nishi, Y., Uchiyama, S.,
molecular organization of the collagen Nishiuchi, Y., Nakazawa, T.,
A and B chains. Biochemistry 17, Ohkubo, T. & Kobayashi, Y. (2003).
3442–8. Characterization of collagen model
103 Bos, K. J., Rucklidge, G. J., Dunbar, peptides containing 4-fluoroproline;
B. & Robins, S. P. (1999). Primary (4(S)-fluoroproline-pro-gly)10 forms a
structure of the helical domain of triple helix, but (4(R)-fluoroproline-
porcine collagen X. Matrix Biol 18, pro-gly)10 does not. J Am Chem Soc
149–53. 125, 9922–3.
104 Mizuno, K., Hayashi, T., Peyton, 113 Hodges, J. A. & Raines, R. T. (2003).
D. H. & Bächinger, H. P. (2004) Stereoelectronic effects on collagen
The peptides acetyl-(Gly-3(S)Hyp- stability: the dichotomy of 4-
4(R)Hyp)10 -NH2 and acetyl-(Gly-Pro- fluoroproline diastereomers. J Am
3(S)Hyp)10 -NH2 do not form a Chem Soc 125, 9262–3.
collagen triple helix. J Biol Chem 279, 114 Babu, I. R. & Ganesh, K. N. (2001).
282–7. Enhanced triple helix stability of
105 Bakerman, S., Martin, R. L., collagen peptides with 4R-aminoprolyl
Burgstahler, A. W. & Hayden, J. W. (Amp) residues: relative roles of
(1966). In vitro studies with electrostatic and hydrogen bonding
fluoroprolines. Nature 212, 849–50. effects. J Am Chem Soc 123, 2079–80.
106 Takeuchi, T. & Prockop, D. J. (1969). 115 Scherer, G., Kramer, M. L.,
Biosynthesis of abnormal collagens Schutkowski, M. U. R. & Fischer,
with amino acid analogues. I. G. (1998). Barriers to rotaion of
Incorporation of L-azetidine-2- secondary amide peptide bonds. J Am
carboxylic acid and cis-4-fluoro-L- Chem Soc 120, 5568–74.
proline into protocollagen and 116 Gratwohl, C. & Wüthrich, K.
collagen. Biochim Biophys Acta 175, (1976). The X-Pro peptide bond as an
142–55. NMR probe for conformational studies
107 Takeuchi, T., Rosenbloom, J. & of flexible linear peptides. Biopolymers
Prockop, D. J. (1969). Biosynthesis of 15, 2025–41.
abnormal collagens with amino acid 117 Reimer, U., Scherer, G., Drewello,
analogues. II. Inability of cartilage M., Kruber, S., Schutkowski, M. &
cells to extrude collagen polypeptides Fischer, G. (1998). Side-chain effects
containing L-azetidine-2-carboxylic on peptidyl-prolyl cis/trans
acid or cis-4-fluoro-L-proline. Biochim isomerisation. J Mol Biol 279, 449–60.
Biophys Acta 175, 156–64. 118 Raines, R. T., Bretscher, L. E.,
108 Engel, J. & Prockop, D. J. (1998). Holmgren, S. K. & Taylor, K. M.
Does bound water contribute to the (1999). The stereoelectronic basis of
stability of collagen? [letter]. Matrix collagen stability. In: Peptides for the
Biol 17, 679–80. New Millennium: Proceedings of the
109 Holmgren, S. K., Bretscher, L. E., 16th American Peptide Symposium
Taylor, K. M. & Raines, R. T. (1999). (Fields, G. B., Tam, J. P., & Barany,
A hyperstable collagen mimic. Chem G., eds). Kluwer Academic, Norwell,
Biol 6, 63–70. MA.
1108 30 The Thermodynamics and Kinetics of Collagen Folding
31
Unfolding Induced by Mechanical Force
Jane Clarke and Phil M. Williams
31.1
Introduction
Force spectroscopy has, as yet, been used in very few protein folding laboratories.
One has to ask why this is so. There are several possible explanations. First, until
relatively recently, only specialized instrumental laboratories had the technical ex-
pertise to build instruments. This has been overcome with the advent of commer-
cially available instruments that are relatively simple to use, menu driven, and in
one case, where the programming software (Igor) can be adapted for the investiga-
tors’ convenience – facilitating data collection, collation and analysis. Second, pre-
paring a protein sample is time consuming. With current molecular biology tech-
niques it is possible to clone and express many single domain mutant proteins in
the matter of a couple of weeks. But despite the development of versatile cloning
systems, the cloning of a polyprotein substrate can take months (and this has to
be repeated for each mutant protein you wish to analyze), and after this, a number
of simple-to-express, single domain proteins have turned out to be insoluble as pol-
yproteins. Third, the data are complex and time consuming to collect and analyze.
Ironically many repeats of the single molecule experiments are required to collect
enough data to analyze, and even the very best data can be noisy and are intrinsi-
cally unsatisfactory for those of us who are used to collecting many kinetic data
points with exquisite accuracy. Worse still, the investigator has to select which
data to analyze and which to discard – an anathema to careful experimentalists.
Fourth, the data cannot be analyzed by fitting to a simple model, rather a Monte-
Carlo or analytical approach has to be used to extract kinetic data.
So, why bother with these experiments? First, there is a biological imperative. It
is increasingly apparent that many proteins experience significant force in vivo. In
fact, response to mechanical stress has been implicated in a number of signaling
pathways. This means that proteins have evolved to resist unfolding when subject
to an external force. It has been shown in at least one case that the barrier to un-
folding under mechanical stress is not that investigated by traditional unfolding
experiments. Only dynamic force experiments can reveal these details of the pro-
tein folding landscape. Second, there is an interesting structural biology problem
of comparison. Why are certain protein folds stronger than others? How does se-
quence variation affect mechanical stability? Can the effect of mutation be pre-
dicted? How are domains assembled in some of the large multidomain natural
load-bearing proteins? Finally, forced unfolding experiments can easily be directly
compared with computer simulations. The reaction coordinate (NaC length) is
known and measurable.
In this chapter we first describe the experiments and summarize the theoretical
background for the analysis of these experiments. We describe how the analysis is
performed and what kinetic parameters can be obtained and the confidence limits
of these parameters. (We are concentrating on atomic force spectroscopy, as tech-
niques that exploit lower loading rates, such as optical tweezers, have yet to be
used on single protein domains and the instrumentation is more complex.) We
then show how complementary techniques can be used to help our understanding
of the data obtained. We end with a case study of a single protein, illustrating how
combination of a number of techniques has enabled the forced unfolding mecha-
nism to be examined in detail.
31.2
Experimental Basics
31.2.1
Instrumentation
A number of commercial instruments are available for use in protein folding labo-
ratories. Figure 31.1 shows the basic components of such instruments. The princi-
ple components are a stage, to which is attached the protein substrate, a microfab-
ricated cantilever, and a piezoelectric positioner which adjusts the relative position
of the cantilever and stage with subnanometer accuracy.
The protein sample is placed onto the stage. In most experimental studies pub-
lished to date the protein substrate is a long, multimodular molecule. This may
be either a natural multimodular protein, such as a portion of a long extracellular
matrix or muscle protein or an engineered construct of multiple repeats of single
domains (see Section 31.2.2). Attachment of the protein to the stage may be
achieved by specific attachment, for example a gold-sulfur linkage with cysteines
at the terminus of the protein, or by nonspecific adsorption onto a glass surface.
The protein and cantilever are surrounded by solvent, either as a droplet or in an
enclosed cell.
Likewise, attachment to the cantilever is generally through nonspecific adsorp-
tion to the surface, although cantilevers can be modified to allow specific attach-
ment. The commercial cantilevers used are usually made of silicon nitride, and
are @20–300 mm in length. The key component is an unsharpened tip (radius @ 50
nm) to which the protein adsorbs. (Sharper tips are available, but this larger,
blunter tip gives a better surface for adsorption of the protein.) Cantilevers used
in protein unfolding experiments typically have spring constants in the range 10–
31.2 Experimental Basics 1113
Piezo
Laser
Split
Photo diode
Cantilever
Polyprotein
AFM Stage
Fig. 31.1. Diagram of an atomic force cantilever is controlled by the piezo which is
microscope. The protein molecule is deposited capable of repeated cycles of extension and
onto the stage and adsorbed to the retraction. The photodiode measures the laser
microfabricated silicon nitride cantilever. The light reflected off the back of the cantilever.
deflection of the cantilever is used to The deflection of the cantilever is determined
determine the force exerted upon the protein, from the difference in the output voltages from
once the spring constant of the cantilever is the two halves.
known. The separation between stage and
100 pN nm1 . The actual spring constant has to be measured at the start of each
experiment, using methods that are integrated into the software of the commercial
instruments. The backs of the cantilevers are reflective and the position of the can-
tilever is determined by use of a laser reflected onto a split photodiode. Following
calibration of the instrument to determine the relationship between photodiode
output and bending of the cantilever, the force applied on the protein is deter-
mined directly from the deflection of the cantilever. A piezoelectric positioner ad-
justs the separation of the cantilever tip and the surface.
31.2.2
Sample Preparation
of holding the protein at a significant distance from the surface before unfolding
occurs, so that tip/surface or protein/surface interactions are minimized.
A number of approaches for constructing such repeats have been reported [4–6],
but the most versatile are those that use different restriction enzyme sites at the
start and end of each domain. One such system is available on request from our
laboratory [6]. The first protein to be multimerized in this way was titin I27. It
has a poly-His tag at the N-terminus to facilitate affinity purification, and two cys-
teines at the C-terminus to attach the protein to an AFM stage by a gold-sulfur
linkage. This protein can be produced in large amounts as a soluble polyprotein
in Escherichia coli. The protein can be used after a one-step purification, but we
tend to get better results following a second, size exclusion step. I27 is a very ver-
satile protein – it has proved to be a useful ‘‘handle’’ for attaching other proteins,
which may not polymerize so easily or where only a single domain of a protein of
interest is required [7, 8]. In these circumstances it acts as an internal control.
A few studies have used novel methods to produce multimodular proteins. Yang
et al. exploited crystal packing in T4 lysozyme [9]. Cysteines were engineered at
contact points and the polyprotein was formed in the crystal. In studies to investi-
gate the effect of attachment point on the effect of force on a protein Fernandez
and coworkers used the only known natural polyprotein ubiquitin, linked either
via the N- and C-terminus or via specific surface lysine–C-terminus linkages [10].
Brockwell et al. used an elegant, novel lipoic acid–lysine linkage to study the same
problem in the protein E2Lip3 [8].
When working with multidomain constructs it is important to know whether
the protein is folded in, and how far its properties are changed by, inclusion in a
polyprotein. Again, I27 is a ‘‘model’’ protein in this respect; it has the same ther-
modynamic stability and the same kinetic properties in the polyprotein as does an
isolated domain [4]. However, the same is not true of all proteins. Some are stabi-
lized and some destabilized by inclusion in the protein. Remarkably, although tra-
ditional equilibrium denaturation and stopped flow kinetic experiments can be per-
formed as easily on the polyprotein as on small individual domains these simple
controls are very rarely carried out. We have also been able to show that NMR ex-
periments can be easily undertaken on three-module constructs, so structural in-
tegrity can also be verified [7].
31.2.3
Collecting Data
Although there are recent reports of constant force experiments (see, for example,
Ref. [11]), most experiments reported to date have used a ramp of force induced by
retracting the piezo at constant speed. The cantilever is lowered to the surface re-
peatedly, picking a protein molecule up at random. Since this is a blind ‘‘fishing’’
step the experimentalist has little control over whether a protein is picked up and
no control over the position on the polyprotein where the protein is attached to the
cantilever tip. If the protein is adsorbed at too high a concentration onto the sub-
31.2 Experimental Basics 1115
strate then more than one protein molecule is likely to be attached to the tip. Such
traces have to be discarded, as they cannot be interpreted. If the protein is too di-
lute there will be too few pick-ups. It is estimated that where one approach in 10
picks up a protein we can be confident that most data will be from a single mole-
cule [12]. The concentration of protein that results in such a success rate has to be
determined empirically.
The tip is retracted at constant speed and when a protein molecule is picked up a
force trace is observed. A large number of such traces have to be collected at a
number of pulling speeds. The range of experimentally useful pulling speeds al-
lowed by the commercial instruments currently available is on the order of 10
nm s1 to 10 000 nm s1 but in practice, most useful data are collected in the pull-
ing speed range of @300 to @3000 nm s1 . Instrumental drift is a problem at low
pulling speeds and at very high pulling speeds, viscous drag and cantilever re-
sponse time introduce error.
31.2.4
Anatomy of a Force Trace
A ‘‘typical’’ force trace is shown and described in Figure 31.2. At the start of the
trace there is usually a peak of unpredictable height that reflects tip:surface inter-
actions, deadsorption of the protein from the surface and other nonspecific effects
(see arrow on Figure 31.2a). As the cantilever retracts force is applied to the pro-
tein. As the force increases unfolded parts of the chain are stretched. At some force
one of the protein domains will unfold and this results in a sharp drop in the force
trace. The trace does not drop to the baseline, however, as the entropic elasticity of
the unfolded protein maintains a force on the cantilever. The force increases
again until another domain unfolds, resulting in the signature saw-tooth pattern.
The base of subsequent peaks is higher that that of the preceding peak. The spac-
ing between the peaks is regular, reflecting the all-or-none nature of the unfolding
events, so that a number of traces can be overlaid. Finally the cantilever is retracted
so far that the protein becomes detached. A final ‘‘pull off ’’ peak is observed, typi-
cally much higher than all preceding peaks and the trace returns to the baseline.
31.2.5
Detecting Intermediates in a Force Trace
If the protein domains unfold without populating intermediates the trace should
be simple with a single unfolding event per domain, and the unfolded protein
should extend as a featureless smooth curve, fitting to a simple model of an elastic
polymer (see Section 31.3.4). However, intermediates have been detected in a few
forced unfolding experiments by careful examination of forced unfolding data. The
intermediates have been detected either by the presence of two consecutive unfold-
ing events per domain (e.g., Ref. [13]), or from the presence of a ‘‘hump’’ in the
unfolding traces [14] (described in more detail in Section 31.7).
1116 31 Unfolding Induced by Mechanical Force
(a)
5 (b)
300
4
200
Force (pN)
4
100 F 3
3
2
0
1
2
-100 1 Extension
(c)
1 2 3 4
Fig. 31.2. Anatomy of a force trace. Force to the cantilever tip. 3) Tip is retracted at a
trace (a) and diagram (b) showing the fixed pulling speed from the surface. An entropic
response of the cantilever during different restoring force of the protein is generated
phases of the AFM approach-retraction cycle. when it is extended. The cantilever deflects in
The cartoon in (c) describes the response of response to this force. 4) A domain unfolds
the protein and cantilever during the different and extends. The system relaxes. The arrow in
phases shown in (a). 1) Tip makes contact (a) shows non-specific adsorption between the
with surface and is deflected. 2) Protein adsorbs tip and/or the protein and the surface.
31.2.6
Analyzing the Force Trace
The unfolding forces can be determined by analysis of the force traces, however,
one of the most difficult features of forced protein unfolding experiments is that
one has to choose which traces to analyze. As described above, most approach–
retract cycles result in no protein being attached to the cantilever at all. Further-
more, a number of the traces where a protein is attached may have large peaks of
‘‘noise’’ at the start of the trace, possibly because the protein was attached to other
(possibly unfolded) protein molecules on the surface or there may be more than
one molecule attached to the cantilever. It is an essential assumption of all the
31.3 Analysis of Force Data 1117
analysis that the data are collected from a single molecule attached to the tip. It is
therefore essential that consistent and rational criteria are applied in choosing
which peaks to analyze. This has been discussed elsewhere and a set of criteria
have been proposed [15]:
1. Force peaks must be equally spaced and the distance between them must be
consistent with the expected contour length of the protein.
2. The trace must include three or more peaks, the last of which is assumed to be
the detachment of the protein from the AFM tip.
3. The base of each successive peak should be higher than base of the previous
peak.
4. The approach and retraction baselines should overlay, indicating that there was
no drift during the course of the pulling experiment.
5. The final baseline should be straight, indicating complete relaxation of the
cantilever.
We find that analyzing the trace from the right, at the pull-off peak, towards
the left, until the base of the peak touches the baseline, gives us most consistent
analysis between investigators and from day to day. Note that all peaks must be
counted. The temptation to discard peaks that meet all other criteria on the basis
that they are ‘‘too high’’ or ‘‘too low’’ must be resisted – these will not contribute to
analysis of the modal forces.
To determine the height of a peak, the full-length final baseline should be fitted
to a line and the height of the peak above this line determined. This allows for drift
in the instrument to be accounted for. Once these peak heights have been col-
lected, the modal unfolding force can be determined (see Section 31.3.7).
We have found that it is important to collect data with different cantilevers and
on different days to minimize error. We estimate that it is necessary to collect at
least three full data sets, where each data set contains at least 40–50 force peaks
at each pulling speed [15]. A full range of pulling speeds should be used to mini-
mize errors in analysis. However, for some proteins we have found that we need to
collect significantly more data to get reliable estimates of the modal unfolding
force.
31.3
Analysis of Force Data
31.3.1
Basic Theory behind Dynamic Force Spectroscopy
-f.x
G
-f.x
x
Fig. 31.3. Unfolding energy landscape and the the force and the displacement, x TS , of the
effect of force. a) The rate of unfolding from N transition state. Here, force can be considered
is dictated by the magnitude of the free energy as having a stabilizing effect on the transition
relative to the transition state DG TS and the and denatured states. c) Force can be
shape of the energy potential (encompassed in considered as having a destabilizing effect on
Eq. (1) as oN and o TS ). b) A force f acts on the native state and unfolding transition states.
the potential to drop the transition state The effect of force on the unfolding kinetics in
relative to the native state by the product of (b) and (c) is the same.
the time scale for motions in condensed liquids, which commences at the nano-
second for small ligands and slows with increased molecular size. As shown by
Kramers in the 1940s [17], the rate of thermally activated escape from an unfolding
potential energy well (Figure 31.3a) can be written as
! !
ðoN o TS Þ 1=2 DG TS
no ¼ exp ð1Þ
2pz kB T
the native state. The amount that the transition state energy has been lowered by is
the product of the force applied and the displacement of the state in the direction
of force (x TS ), and we can re-write Eq. (1) to show this effect:
!
fx TS ðoN o TS Þ 1=2 ðDG TS fx TS Þ
nu ð f Þ ¼ h exp ð2Þ
kB T 2pz kB T
where hð Þ describes the movement of the transition state under force [18]. Since
the dominant effect of force is the lowering of the transition state, hð Þ can usually
be ignored, which gives the rate equation
fx TS f
nu ð f Þ ¼ no exp ¼ no exp ð3Þ
kB T fb
where no is the unfolding rate over this TS barrier in the absence of force. Impor-
tantly this equation introduces fb ð¼ kB T=x TS Þ, the force scale, commonly used in
dynamic force spectroscopy and x TS (or x u as it is usually referred to in protein
unfolding studies) representing the displacement between N and TS. At room tem-
perature, thermal energy kB T is 4:11 1021 J and over the nanometer length
scale of proteins is equal to forces of only a few piconewtons (4.11 pN nm). The
catalytic effect of a 100 pN force over this nanometer distance is equivalent to over
14 kcal mol1 (1 kcal mol1 ¼ 6:9 pN nm).
31.3.2
The Ramp of Force Experiment
Virtually all experiments of protein unfolding measured by the atomic force micro-
scope are of the kind where force is increased in time and the maximum force that
the protein can withstand recorded. The tip is approached to the surface, protein
adheres to the tip, and the tip is then withdrawn from the surface at a constant ve-
locity. The simplest case to consider is where force increases steadily with time and
the loading rate on the protein, rf ¼ df =dt, is constant. As we show later, this is
usually not true for these experiments but this first approximation provides the in-
troduction to dynamic force spectroscopy. The rate at which force increases is the
product of the stiffness of the system (in this first approximation the this is the
stiffness of the cantilever alone) and the speed at which the cantilever is with-
drawn. Lever stiffness kc is of the order of 10–100 pN nm1 , and retract rates vr
can vary from around 10 nm s1 to approaching 10 000 nm s1 (thermal drift and
operator patience limiting the low speed and hydrodynamic drag restricting high
speeds). AFM loading rates can theoretically range, therefore, between 100 and
1 000 000 pN s1 , although practically achieving more than two to three orders of
loading rate is problematic.
As an illustration, we consider the effect of increasing force on a transition state
located x u ¼ 0:5 nm along the projection of force. Since at room temperature ther-
mal energy, kB T, is 4.11 pN nm at a loading rate of 1 000 000 pN s1 it takes nearly
1120 31 Unfolding Induced by Mechanical Force
10 ms (kB T=x u :rf ¼ 4:11=0:5=1 000 000) to drop the barrier by 1 kB T. On the time
scale of a thermal attempt frequency (see above), therefore, force and the landscape
are quasi-static. Experiments of strength adopting a ramp of force can thus nor-
mally be considered as a series of many tests under an equally large set of different
forces. However, were the rate at which energy is added to the transition state by
force (x u :rf =kB T) to be comparable to, or exceed, the kinetic prefactor then the dy-
namics would deviate from that described here.
As unfolding is a kinetic process driven by random fluctuations one must make
many measurements to sample the process. In this respect, the measurement of
the behavior of a single molecule many times is identical to the measurement of
an ensemble of molecules all at once. We can consider each measurement as a
sample of this population, the numbers in which follow a simple rate equation.
Commencing with all folded proteins, the fraction of the population that are folded
under force decays in time as
dS
¼ nu ðtÞSðtÞ þ nf ðtÞ½1 SðtÞ ð4Þ
dt
where nu ðtÞ is the rate at which unfolding occurs at time t, and nf ðtÞ is the rate of
refolding. Since time and force are related through the loading rate rf , Eq. (4) can
be written for an increasing force as
dS 1
¼ fnu ð f ÞSð f Þ þ nf ð f Þ½1 Sð f Þg ð5Þ
df rf
Until now, we have ignored the refolding term. In fact, since force causes an expo-
nential decrease in folding in the same manner in which it exponentiates unfold-
ing, at forces above a few 10 s of piconewtons refolding can safely be neglected.
The master rate equation is simply
dS 1
¼ fnu ð f ÞSð f Þg ð6Þ
df rf
Equation (7) gives the fraction of the ensemble (the probability) that remain folded
as the force increases from 0 to f . The probability that a protein will unfold at this
force is the product of this, the survivability to force f , and the unfolding rate at
this force. The probability of unfolding at f is therefore the product
ðf !
1
pð f Þ ¼ nu ð f Þ exp ðnu ðgÞ=rf Þ dg ð8Þ
rf 0
31.3 Analysis of Force Data 1121
The integral of an exponential is unchanged, so taking the limits of force the inte-
gral can be solved as
1 f 1 f
pð f Þ ¼ no exp exp fb no fb no exp ð10Þ
rf fb rf fb
A series of tests of strength using a ramp of force at a constant loading rate will
form a distribution given by Eqs (8), (9), and (10). There is not a single force at
which a protein will unfold. The most probable unfolding force is that where this
distribution is at a maximum, and therefore has a zero gradient. This value is
found by finding the force where the first derivative is zero. To solve this, we rely
on the property that we can scale the distribution and not affect the location of the
maximum, and a useful scaling is the natural logarithm. The logarithm of Eq. (10),
expanded for clarity, is
f 1 f
ln½ pð f Þ ¼ ln½no þ ln exp ln½rf þ ln exp fb no fb no exp
fb rf fb
ð11Þ
The maximum of this distribution at the most probable unfolding force is found by
differentiation with respect to force. The terms that do not depend of force disap-
pear, leaving the maximum as
1 f
q ln½ pð f Þ ¼ no exp rf ¼0 ð13Þ
fb fb f¼f
31.3.3
The Golden Equation of DFS
Using Eq. (13) we can find the most probable unfolding force f (see Ref. [18])
kB T rf x u
f ¼ fb ½lnðrf Þ lnð fb no Þ ¼ ln ð14Þ
xu kB Tv0
Equation (14), the ‘‘golden equation’’ of DFS, has been derived from numerous as-
sumptions (single transition state, stationary in location under force, linear force
1122 31 Unfolding Induced by Mechanical Force
loading, x u :rf =kB T f 1=tD , etc.) but shows elegantly the features of DFS. A plot of
f against the logarithm of the loading rate has slope fb , revealing the location (x u )
of the transition state along the force axis. Similar to the m-value of chemical un-
folding measurements, fb is the susceptibility of the unfolding kinetics to force.
But here this value has a direct physical relationship to geometric coordinates and
structure (x u ). Further, the plot cuts the loading rate axis, where f is zero, when
rf ¼ fb no . Since fb is known one can determine through this extrapolation the
force-free unfolding rate across the transition state (no ).
Equation 14 is derived for the most probable rupture force; the mode of the dis-
tribution. The equation does not hold true for the mean force. In fact, the mean of
the force distribution (Eq. (8)) is complex to find, involving exponential integrals
with no exact solution. Thus it is important to find the mode of the distribution of
forces measured and analyze these, and not their mean. Additionally, the mode of
the distribution is less affected by outlying forces from nonspecific tip/sample in-
teractions, multiple unfolding events and instrumental noise. As a rule-of-thumb,
however, at high forces the mean unfolding force can be crudely approximated by
[12]:
and with force scales fb for proteins approaching 20 pN, mean forces are 10 pN or
so lower than the modes.
31.3.4
Nonlinear Loading
The approximation used above, that the stiffness of the system is dominated by the
cantilever, is not valid for most protein unfolding measurements. Typically, tandem
repeats of the protein are stressed and multiple copies are unfolded in series re-
vealing the characteristic saw-toothed force extension curve. In this arrangement,
the mechanical stiffness of the protein polymer may well be less than the cantile-
ver. The stiffness of the system ks , with two springs stressed in series (the cantile-
ver with stiffness kc and molecular system km ) is
1 1 1
¼ þ ð16Þ
ks kc km
Clearly the molecule has to be stiffer than the lever (km > kc ) for the approxima-
tion that lever stiffness dominates loading. With soft levers of 10 pN nm1 or so
this can be true. However, the saw-tooth pattern measured in the AFM experi-
ments derives from the stretching of the unfolded polypeptide chain after each re-
peat has unfolded. This means that force loading is not constant during protein
unfolding experiments. This has two consequences. First data have to be repre-
sented as unfolding force vs. retract velocity (vr ) and second, we need to consider
the mechanical stiffness of such random-coil polypeptide chains.
31.3 Analysis of Force Data 1123
kB T x
f WLC ðxÞ A ð17Þ
b L
As the extension approaches the contour length (the asymptote as the molecule is
inextensible), force increases anharmonically as
kB T x 2
f WLC ðxÞ A 1 ð18Þ
b L
A convenient equation to show the behaviour of a WLC over all extensions, and
one that matches that seen in AFM experiments, is an interpolation between Eqs
(17) and (18), as
!
kB T 1 1 x
f WLC ðxÞ A 2 þ ð19Þ
b 4 1x 4 L
L
A polymer exerts a force of 150 pN (a typical titin I27 unfolding force) when the
chain is extended to around 90% of its contour length. The stiffness of the chain
kWLC ð f Þ at this extension is approximately 10 pN nm1 , and for all but the softest
of AFM cantilevers it is the polymer that dominates the loading rate (Eq. (16)). So
instead of a steady loading rate, rf depends on the force and increases nonlinearly
in time as rf ¼ vr kWLC ð f Þ. Substitution of this loading rate term into Eq. (9) and
subsequent solving for f gives a good approximation of the unfolding kinetics
under WLC polymer loading, as [20]:
vr f 3 1 f
f A fb ln þ ln þ ln
vb fb 2 2 fb
ð20Þ
Lno x u 1=2
vb ¼
4 b
1124 31 Unfolding Induced by Mechanical Force
kB T bf 3=2
kWLC ð f Þ A 4 ð21Þ
bL kB T
The rate of loading at the point of unfolding can be estimated, therefore, by calcu-
lating the stiffness of the unfolded protein at the unfolding force (Eq. (21)), deter-
mining the stiffness of the system (Eq. (16)) and multiplying this by the retract ve-
locity vr. The modal unfolding force can then be plotted against this loading rate
estimate and Eq. (14) used to find a reasonable estimate of the force-free unfolding
rate and transition state displacement.
31.3.5
Experiments under Constant Force
1 f
t uð f Þ ¼ exp ð22Þ
no fb
The constant force experiment has the advantage that it should be independent of
the system dynamics, such as the nonlinear loading rate introduced by the polymer
stiffness, but suffers from two shortcomings. First, analysis assumes that force is
kept constant and that the force-feedback system can respond faster than the un-
folding rate of the protein under the force applied. Fortunately, this is often the
case. However, a greater limitation of this class of experimentation is that control
of force relates to exponential changes in lifetime, as seen in Eq. (22). Therefore, as
force is lowered, the experiment lasts for exponentially increasing periods of time.
Conversely, at high forces, it becomes increasingly difficult to determine lifetime
with the necessary accuracy. The advantage of the ramp of force measurement is
31.3 Analysis of Force Data 1125
that the force is a direct measurement of lifetime since these are related through
the loading rate. However, since the ramp of force measurement is a continuous
collection of constant force measurements, these two regimes can be considered
identical. The experiment of constant loading rate can be considered as the ideal
and relatively simple technological developments are required to realize this meth-
odology. The potential of the constant force experiment is to measure the kinetics
of unfolding under high forces. To measure this, however, requires the ability to
apply the force instantaneously (through the use of magnetism or electrostatic
attraction for instance) and measure the diminishing lifetime with increasing
temporal resolution. Current AFM technology, depending on peizo displacement
and at best having kilohertz temporal resolution, is incapable of exploiting this
potential.
31.3.6
Effect of Tandem Repeats on Kinetics
In addition to the unfolding rate decreasing with the event number, the loading
rate also decreases as the length of unfolded protein increases with each event.
The drop in unfolding rate leads to an increase in the force, whilst the drop in
loading rate causes a competing decrease in force. Between the first and second
event the fractional change in rate is small (from Nno to (N 1Þno ) whereas the
loading rate changes considerably as the unfolded polymer has increased consider-
ably in length (Eq. (21)). The forces drop over the first events. Conversely, between
the second-to-last and last event the unfolding rate halves whereas the fractional
change in polymer length is small. The forces increase towards the last events.
On average, the net effect is a decrease and then increase in the average rupture
force for event number [22].
1126 31 Unfolding Induced by Mechanical Force
To analyze accurately the unfolding forces for tandem repeats requires knowl-
edge of the length of repeat being pulled, which can change from test to test, and
analysis of individual events based on the each total length, i.e., the 1st, 2nd, and
3rd events of pulls of three repeats, the 1st, 2nd, 3rd, and 4th of pulls of four re-
peats, etc. A useful observation is that the sixth event in a chain of eight repeats
behaves similarly to the mode of all measurements [23]. Fortunately, the effect of
changing unfolding rate and polymer length with event number shift the most
probable rupture force by only 20 pN or so, and so this effect is usually ignored
in all protein folding studies. Titin I27 has been studied in different laboratories
using polyproteins with 12, 8, and 5 repeats and the results are essentially the
same [4, 5, 24].
31.3.7
Determining the Modal Force
and a cantilever of typical stiffness (30 pN nm1 ) fluctuates around 0.3 nm, show-
ing that the standard deviation of the force noise for these soft levers is over 10 pN.
The force noise for an AFM is typically between 10 and 20 pN, and we can assume
that each force measured is drawn from a Gaussian 10 or 20 pN wide. The sum of
all these Gaussians, centered on each force recorded, is calculated and the mode
force, where the sum is greatest, found. It is often worthwhile seeing how this es-
timate of the mode depends on the level of noise chosen, which ideally should be
invariant.
The distribution of forces for unfolding over a single transition state has a well-
defined form given by Eq. (10). Since the full distribution of forces at a particular
rate of loading is given by the two parameters measured, the unfolding rate no and
31.3 Analysis of Force Data 1127
the transition state location x u , the distribution serves to validate the experimental
measurements. Having found the mode, the histogram of recorded forces (and not
the artificial Gaussian sum used above) should be plotted and the distribution pre-
dicted using Eq. (10) overlaid.
31.3.8
Comparing Behavior
1 DG TS1
n1 ¼ exp
tD kB T
1 DG TS2
n2 ¼ exp ð25Þ
tD kB T
n1 DG TS2 DG TS1
¼ exp
n2 kB T
31.3.9
Fitting the Data
The data need to be analyzed to extract the two parameters requires, x u and the
unfolding rate at 0 force no . Probably the most obvious method to predict the be-
1128 31 Unfolding Induced by Mechanical Force
(a) (b)
Fig. 31.4. Monte-Carlo simulations. Monte- unfolding intermediates fine detail, such as the
Carlo simulations (a) of the AFM experiment presence of the force plateau around 100 pN
reveal the full force vs. distance trace of the attributed to the population of the force-
experiment (b). By incorporating details of stabilized unfolding intermediate of I27, can be
refolding and unfolding rate constants and resolved.
31.4 Use of Complementary Techniques 1129
31.4
Use of Complementary Techniques
simulation. Both these techniques can also be exploited in the study of forced un-
folding mechanisms.
31.4.1
Protein Engineering
One of the best ways to determine the regions of the protein structure involved in
the resistance of a protein domain to force is to make site-specific mutants and to
investigate the response of these mutants to force. However, unless we combine
force spectroscopy with biophysical data, any information gained is purely qualita-
tive. If mutations in two different regions of the protein have different effects on
force it is not possible to say that one region is more structured in the transition
state than the other unless the effect of the protein on the stability of the native
state has been quantified.
Fortunately, the analysis of mutant proteins as a means of investigating protein
folding (and unfolding) mechanisms is well established [31]. Essentially, the effect
of the mutation on the stability of the protein and on the stability of the transition
state for unfolding is compared:
where DDGDaN is the change in free energy on mutation (determined from equilib-
rium denaturation experiments) and DDGDaTS and DDG TSaN are changes in free
energy of the transition state (TS), relative to the denatured (D) and native (N)
states, respectively, determined from ratios of folding and unfolding rate constants:
kfwt kuwt
DDGDaTS ¼ RT ln and DDGzaN ¼ RT ln ð27Þ
kfmut kumut
where k f and k u are the folding and unfolding rate constants, and the super-
scripts wt and mut refer to the rate constants for wild-type and mutant proteins,
respectively.
We have shown that this F-value analysis can be applied to force experiments,
however, they are somewhat more complex, due to the nature of the data collected
[30].
1. The mutations should not alter the structure of the native state significantly: To
this end, the mutations should be conservative, nondisruptive mutations. They
should not distort the structure or add new interactions. The most suitable mu-
tations are simple deletion mutants such as Ile to Val or Val to Ala.
31.4 Use of Complementary Techniques 1131
2. The mutations should not alter the stability of the denatured state significantly,
or at least the effect on D should be significantly less than on N. For this reason
mutations of or to Pro or Gly should be avoided where possible, as should mu-
tants that change the polar/nonpolar nature of the sidechain.
3. The change in stability (DDGDaN ) should be significant. Where DDGDaN is small
F cannot be determined with confidence. In AFM experiments this is particu-
larly important since there is much more error in the experimental data. We
do not choose to analyze mutants with a DDGDaN < 2 kcal mol1 . Where the ef-
fect of mutation is less than this it is unlikely that differences between wild type
and mutant can be quantified with any confidence.
1. Analyze the data for wild type and mutant as described above (Section 31.3.9)
but assuming a mean, fixed x u .
2. Fit the force spectra to a fixed mean slope and then compare the pulling speeds
(vr ) of wild type (wt) and mutant (mut) at a fixed force in the range of the exper-
imental data:
vr wt
DDG TSaN ¼ RT ln ð28Þ
vr mut
3. Fit the force spectra (F vs. ln vr ) to a fixed mean slope (m) and then compare the
unfolding force at a fixed pulling speed in the range of the experimental data:
We have shown that all three methods give the same F-value [30].
It is important to note that F-value analysis only applies where the unfolding
transition that is being measured is the same in wt and mutant proteins. If a mu-
tation changes the folding behaviour so that a different barrier is being explored
then this method cannot be used. As an indication of this, the x u should always
be the same for wt and mutant proteins.
mean that the protein unfolded by two pathways, one where the F-value was 1
and a second where F was 0. In these single molecule experiments, however,
these two possibilities can be distinguished. If there are two different pathways
then the forced unfolding should show a bi-modal distribution of forces. This
has not been seen to date, and so the simpler interpretation of partial loss of con-
tacts can be made.
Given the nature of the AFM experiment the error on F is larger than that from
traditional unfolding experiments. It is probably not possible to say more than that
a region is ‘‘fully, or almost fully structured’’ (F close to 1), ‘‘fully or almost fully
unstructured’’ (F close to 0) or ‘‘partly structured’’ (F less than 1 but more than
0). This is one reason why mutants being studied should be significantly less stable
than wild type, to allow F-values to be assigned with confidence. In traditional F-
value analysis F-values are considered to be too prone to error if DDGDaN < 0:75
kcal mol1 (it has recently suggested that this limit should be set significantly
higher). We suggest that for mechanical F-value analysis DDGDaN should be as
large as possible and at least >1.5 kcal mol1 .
31.4.2
Computer Simulation
Since even the most detailed protein engineering experiment can only provide
‘‘snapshots’’ of the unfolding process, simulations are important to understand
the effect of force at the atomistic level. Mechanical unfolding experiments are ap-
parently ideal for comparison with simulation. Unlike addition of temperature or
denaturant, the perturbation is directional, suggesting that the reaction coordinate
(NaC distance) is comparable in simulation and experiment. Furthermore, it is
easy to identify in a forced unfolding trajectory, both metastable unfolding inter-
mediates and the position of the transition state barrier. The problem of time scale
is, however, just as difficult in simulations of forced unfolding as it is for other un-
folding simulations. To increase the rate of unfolding the system has to be per-
turbed using conditions that are far from experiment. In simulations of thermal
unfolding this requires the use of very high temperatures in a simulation. In
forced unfolding simulations the rate of application of force (the loading rate) is
significantly higher in simulation than experiment. Since theory suggests that dif-
ferent barriers to unfolding will be exposed at different loading rates, it is not pos-
sible to assume that the same barrier will be investigated in theory and experiment.
Thus it is important that simulation be bench-marked by experiment.
A number of increasingly computationally expensive techniques have been ap-
plied to the study of mechanical unfolding and where these can be directly com-
pared to experiment, the simulations seem to perform well. Lattice and off-lattice
simulations have been used to describe the effect of force, and other perturbants
(denaturant, temperature) on energy landscapes (e.g., [33–36]). The phase dia-
grams suggest that interesting data may be obtained if force is applied at higher
temperatures and/or in the presence of denaturant. Steered molecular dynamics
1134 31 Unfolding Induced by Mechanical Force
simulations, with and without explicit solvent, have been successful in reproducing
a number of experimental features. They have also been successful in describing
the order of unfolding events in titin I27 (e.g., [37, 38]). The hierarchy of unfolding
of different protein domains has been predicted [39]. Simulations have observed
unfolding intermediates in experimental systems where they have later been de-
tected in experiment [13, 14, 37, 40, 41]. (But again, a note of caution is required.
In only one case has the intermediate been shown to have experimental properties
consistent with the intermediate observed in the simulations [14, 24]. In the other
cases, the correct experiments have not yet been performed.) Finally, simulations
have been successful in predicting the effect of mutation on the unfolding force
[42]. Where simulation and experiment agree, the simulations can be used to un-
derstand details of the transitions between experimentally observable states.
Recent advances have reduced the unfolding forces observed in simulations to
within an order of magnitude of those observed in experiment. Perhaps the most
exciting phase is before us. Unlike previous studies in the field of protein folding
where simulations have lagged behind experiment, the opposite is true of simula-
tions of mechanical unfolding experiment. There are far more protein unfolding
simulations in the literature than detailed experimental analyzes. The good simu-
lation papers suggest experiments to be done.
31.5
Titin I27: A Case Study
31.5.1
The Protein System
Only one protein has, at the time of writing, been investigated in depth. This study
shows how a combination of force spectroscopy, biophysical techniques, protein
engineering, and simulation can be used to describe the mechanical unfolding of
a single protein in detail. Titin I27 was the first protein made into a polyprotein
and thus, the first where domain-specific force characteristics were elucidated [4].
It is one of the most mechanically stable proteins yet studied and remains a para-
digm for forced unfolding experiments. The studies described here represent the
work of a number of different experimental and theoretical groups.
To understand these experiments the structure of the protein has to be consid-
ered [43]. Titin I27 has an I-type immunoglobulin (Ig) fold. It has a beta sandwich
structure, with the first sheet comprised of antiparallel strands A, B, E, and D and
the second of antiparallel strands C, F, and G with, importantly, the A0 strand run-
ning parallel to strand G. In the polyprotein, the N and C termini (A and G
strands) are connected by two-residue linkers (encoded by the restriction sites
used in the cloning process). The N- and C-termini are found at opposite ends of
the protein (Figure 31.5).
The first experiments revealed that the no was @104 s1 and the x u @ 3 Å [4].
Since the unfolding rate was close to that observed in standard stopped-flow experi-
31.5 Titin I27: A Case Study 1135
ments it was suggested that the transition state for unfolding under force was the
same as that observed in the absence of force. This idea, however, did not hold for
long [5, 24].
31.5.2
The Unfolding Intermediate
The first step observed in the simulations of forced unfolding of I27 was the de-
tachment of the A-strand, to form a metastable unfolding intermediate, which lived
longer than the native state in the simulation conditions [37, 40]. Careful examina-
tion of the experimental force traces revealed the presence of a ‘‘hump’’ in the
AFM traces, at around 100 pN, which was attributed to a lengthening due to the
detachment of the A-strand in all the domains of the polyprotein [14]. A mutant
that disrupted the hydrogen bonding between the A- and B-strands did not show
these ‘‘humps.’’
Thus, apparently, at forces around 100 pN, the native state becomes less stable
than this intermediate, I. If this explanation were correct, then the observed un-
folding force reflects the unfolding of this intermediate, and not the denatured
state. Studies of an I27 polyprotein with a conservative mutation V4 to A in the A-
strand confirmed this explanation [24]. This mutation destabilized the native state
1136 31 Unfolding Induced by Mechanical Force
of the protein by @2.5 kcal mol1 , yet had no effect of the unfolding force, consis-
tent with the intermediate hypothesis, since, if the A-strand is detached in I the
V4A mutant would simply lead to I being populated at lower forces as N is desta-
bilized, but I is not. Thus, the experiment and theory agree and the first process,
N ! I, in the unfolding pathway can be understood, and the unfolding rate con-
stant at 0 force, no , and the unfolding distance, x u , can be assigned to the unfold-
ing of this intermediate. In a traditional protein folding study of I27, no such
folding intermediate is observed [44]. A model of this intermediate (I27 with the
A-strand deleted) was purified, shown to be stable and, by NMR spectroscopy, to
be structurally very similar to the native state. In MD simulations this intermediate
had all the properties of the intermediate induced by force [24].
31.5.3
The Transition State
Possibly the most important state along the unfolding pathway is the transition
state, the barrier to forced unfolding. Proteins that can withstand significant force
have a free energy barrier to unfolding that remains significant under applied
force. In order to probe the nature of this barrier protein engineering F-value anal-
ysis was undertaken [30]. Mutations were made in the A0 , B, C 0 , E, F, and G
strands. Most of these mutations, although strongly destabilizing, had no effect
on the unfolding force, i.e., had a F-value of 1 (Figure 31.6) [42]. Only mutations
in the A0 and G strand lowered the unfolding force. The effect of the mutation in
the A0-strand (V13A) was to lower the unfolding force by @25 pN without changing
the x u giving a F-value of @0.5. The mutation in the G-strand (V86A) changed the
slope of the force spectrum (Figure 31.6) so that a F-value could not be determined
(see Section 31.3.8). Since mechanical F-values are much harder to determine than
in traditional protein engineering studies (the molecular biology is more laborious,
the data are harder to collect, and the DDGDaN of each mutant must be signifi-
cantly larger than traditionally) only eight mutants were made. This can give only
a general picture of the transition state. However, on the contrary, the transition
state in MD simulations of forced unfolding experiments is easier to identify than
in high temperature simulations of unfolding. The transition state is represented
by the structures just at the point where extension proceeds rapidly to a fully un-
folded structure. Direct comparison was made between the experimental F-values
and the F-values determined from the simulations. The agreement was very good.
This means that the structure of the transition state can be directly inferred from
the simulation [42].
The transition state structure is essentially the same as the native structure ex-
cept in the region of the A, A0 , and G strands (Figure 31.7). The A-strand is, of
course, completely detached, as it is in I. The hydrogen bonding and side-chain
interactions between the A0 and G strands are lost, as are interactions between the
A0-strand and the EaF loop and between the G-strand and the A0 aB loop. Other
interactions of the A0 and G strands with the rest of the protein are, however, main-
tained in this structure. The mechanical strength of the I27 domain lies in the
31.5 Titin I27: A Case Study 1137
250
wild type
L41A
V13A
V86A
200
Force (pN)
150
100
50
10 100 1000 104
Pulling Speed (nm s-1)
Fig. 31.6. Force spectra of titin I27. Most lowers the unfolding force and changes the
mutants (such as L41A) have no effect on the dependence of the unfolding force on the
unfolding force, nor on the dependence of the puling speed. The slope of the force spectrum
unfolding force on the retract speed. V13A is fb ks ¼ kB Tks =x u (i.e., the decrerase in slope
lowers the unfolding force but does not affect for the mutant V86A reflects an increase in x u ).
the pulling speed dependence. V86A both
strength of interactions between the A0 and G strands, and in the strength of inter-
actions of these strands with other regions of the protein. (A mutant I27 studied
elsewhere had shown that a substitution that made contacts in the BC loop, also
lowered the unfolding force, consistent with these results [5].) Importantly, these
experiments show that the interactions of the side chains contribute significantly
to mechanical strength, and it is not just the hydrogen bonding pattern that deter-
mines mechanical strength, as had been suggested in some theoretical studies
[38]. Whether it is simply packing interactions, or a more complex effect of shield-
ing of the hydrogen bonds from attack by solvent has yet to be determined. This
side chain dependence does, however, seem to be at the root of the relative me-
chanical stability of different I-band domains of titin [45, 46].
31.5.4
The Relationship Between the Native and Transition States
Thus far theory and experiment had elucidated the structure of the principle states
along the pathway and the relative height of the barrier between I and TS (corre-
sponding to no @ 104 s1 ) and the distance between I and TS (x u A 3 Å). However,
1138 31 Unfolding Induced by Mechanical Force
Transition
Native state Intermediate Unfolded
state
N I D
TS
N G
G G G C
C G
N C N
A A’
A’
TSF
TSF
TSF
ku≈10-4
ku ≈ 10-6
I
N
N N I I
6Å 3Å
0 pN ≈100 pN >100 pN
Increasing force
Fig. 31.7.The unfolding of TI I27 in the on the unfolding landscape and the height and
presence of force. Experiment and simulation relative position of the energy barriers and the
combined have allowed the structure of the intermediate have been determined. Structures
important states on the forced unfolding Courtesy of E. Paci.
pathway to be elucidated. The effect of force
the relative height of the barrier between N and TS and the distance between these
two states was unknown. It is not possible, using AFM, to collect data at pulling
speeds low enough to unfold the protein at forces below 100 pN, i.e., when I is
not yet populated. However, here again mutant studies were instructive. One of
the mutants (V86A) changed the slope of the force spectrum (x u A 6 Å) [23]. The
same behavior had been seen in a separate study for mutants with prolines in-
serted into the A0-strand [26]. The anomalous result was that these highly destabi-
lized mutants had, apparently a lower unfolding rate at 0 force (no ) than wild type.
The simplest explanation is that these mutations lowered the stability of I and/or
lowered the TS energy to such an extent that the protein unfolded directly from N
before I was populated. In this case, no is the unfolding rate directly from N and x u
is the unfolding distance from N to the transition state. By using this as a model
and modeling the data for wild type and mutants it was possible to estimate the
rate constant between N and TS at 0 force for wild-type I27 (no A 106 s1 ) [23].
References 1139
31.5.5
The Energy Landscape under Force
Using all these results a pathway for the forced unfolding of titin I27 could be con-
structed (Figure 31.7). It is was also possible to compare the transition state in the
absence of force (TS 0 , as determined by traditional F-value analysis) with that ob-
served in the force experiments (TSF ). The height of the energy barrier between N
and TS 0 (no ¼ 5 104 s1 ) is >3 kcal mol1 lower than that between N and TSF
(no ¼ 106 s1 ). There is also a difference in structure. TSF is more structured,
with higher F-values than TS 0 . However, the region of the protein that is responsi-
ble for mechanical stability (the region that breaks at the transition state) is appar-
ently the same as the region that unfolds early in the denaturant induced unfold-
ing studies, the A0 and G strands. When force is applied to the protein energy
landscape, the unfolding pathway changes, and the protein unfolds along a path-
way occupied by a high-energy barrier, TSF , that resists unfolding.
31.6
Conclusions – the Future
References
32
Molecular Dynamics Simulations to Study
Protein Folding and Unfolding
Amedeo Caflisch and Emanuele Paci
32.1
Introduction
32.2
Molecular Dynamics Simulations of Peptides and Proteins
32.2.1
Folding of Structured Peptides
16000
14000
12000
Number of clusters
10000
8000
6000
4
<E> [kcal/mol]
2
4000 0
-2
-4
-6
2000 -8
-10
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Q
0
0 2 4 6 8 10 12
Simulation time [microseconds]
Fig. 32.1. Number of clusters as a function of average number of conformers sampled during
time. The ‘‘leader’’ clustering procedure was the folding time which is defined as the
used with a total of 120 000 snapshots saved average time interval between successive
every 0.1 ns (thick line and square symbols). unfolding and refolding events. The insets
The clustering algorithm which uses the Ca show a backbone representation of the folded
RMSD values between all pairs of structures state of beta3s with main chain hydrogen
was used only for the first 8 ms (80 000 bonds in dashed lines, and the average
snapshots) because of the computational effective energy as a function of the fraction of
requirements (thin line and circles). The native contacts Q which are defined in [11].
diamond in the bottom left corner shows the Figure from Ref. [33].
taneous folding at 360 K) both peptides satisfy most of the NOE distance restraints
(3/26 and 4/44 upper distance violations for beta3s and D PG, respectively). At a
temperature value of 360 K which is above the melting temperature of the model
(330 K), a statistically significant sampling of the conformational space was ob-
tained by means of around 50 folding and unfolding events for each peptide [11,
12]. Average effective energy and free energy landscape are similar for both pepti-
des, despite the sequence dissimilarity. Since the average effective energy has a
downhill profile at the melting temperature and above it, the free energy barriers
1146 32 Molecular Dynamics Simulations to Study Protein Folding and Unfolding
a-Helices Richardson et al. [18] have analyzed the structure and stability of
the synthetic peptide Y(MEARA)6 by circular dichroism (CD) and differential
scanning calorimetry (DSC). This repetitive sequence was ‘‘extracted’’ from a 60-
amino-acid domain of the human CstF-64 polyadenylation factor which contains
12 nearly identical repeats of the consensus motif MEAR(A/G). The CD and DSC
data were insensitive to concentration indicating that Y(MEARA)6 is monomeric in
solution at concentrations up to 2 mM. The far-UV CD spectrum indicates that the
peptide has a helical content of about 65% at 1 C. The DSC profiles were used to
determine an enthalpy difference for helix formation of 0.8 kcal mol1 per amino
acid. The length of Y(MEARA)6 makes it difficult to study helix formation by MD
simulations with explicit water molecules. Therefore, multiple MD runs were per-
formed with the same implicit solvation model used for the b-sheet peptides [13].
32.2 Molecular Dynamics Simulations of Peptides and Proteins 1147
The simulation results indicate that the synthetic peptide Y(MEARA)6 assumes a
mainly a-helical structure with a nonnegligible content of p-helix [149]. This is
not inconsistent with the currently available experimental evidence [18]. A signifi-
cant p-helical content was found previously by explicit solvent molecular dynamics
simulations of the peptides (AAQAA)3 and (AAKAA)3 [19], which provides further
evidence that the p-helical content of Y(MEARA)6 is not an artifact of the approxi-
mations inherent to the solvation model.
An exponential decay of the unfolded population is common to both Y(MEARA)6
[149] and the 20-residue three-stranded antiparallel b-sheet [14] previously investi-
gated by MD at the same temperature (360 K) [11]. The free energy surfaces of
Y(MEARA)6 and the antiparallel b-sheet peptide differ mainly in the height and lo-
cation of the folding barrier, which in Y(MEARA)6 is much lower and closer to the
fully unfolded state. The main difference between the two types of secondary struc-
ture formation consists of the presence of multiple pathways in the a-helix and
only two predominant pathways in the three-stranded b-sheet. The helix can nucle-
ate everywhere, with a preference for the C-terminal third of the sequence in
Y(MEARA)6 . Furthermore, two concomitant nucleation sites far apart in the se-
quence are possible. Folding of the three-stranded antiparallel b-sheet peptide
beta3s started with the formation of most of the side chain contacts and hydrogen
bonds between strands 2 and 3, followed by the 1–2 interstrand contacts. The in-
verse sequence of events, i.e., first formation of 1–2 and then 2–3 contacts was also
observed, but less frequently [11].
The free energy barrier seems to have an important entropic component in both
helical peptides and antiparallel b-sheets. In an a-helix, it originates from con-
straining the backbone conformation of three consecutive amino acids before the
first helical hydrogen bond can form, while in the antiparallel b-sheet it is due to
the constraining of a b-hairpin onto which a third strand can coalesce [11]. There-
fore, the folding of the two most common types of secondary structure seems to
have similarities (a mainly entropic nucleation barrier and an exponential folding
rate) as well as important differences (location of the barrier and multiple vs. two
pathways). The similarities are in accord with a plethora of experimental and theo-
retical evidence [20] while the differences might be a consequence of the fact that
Y(MEARA)6 has about 7–9 helical turns whereas the three-stranded antiparallel b-
sheet consists of only two ‘‘minimal blocks’’, i.e., two b-hairpins.
shows that the length of each simulation is much larger than the relaxation time of
the slowest conformational change. Interestingly, in the average folding time of
about 85 ns beta3s visits less than 400 clusters (diamond in Figure 32.1). This is
only a small fraction of the total amount of conformers in the denatured state.
It is possible to reconcile the fast folding with the large conformational space by
analyzing the effective energy, which includes all of the contributions to the free
energy except for the configurational entropy of the protein [11, 34]. Fast folding
of beta3s is consistent with the monotonically decreasing profile of the effective en-
ergy (inset in Figure 32.1). Despite the large number of conformers in the dena-
tured state ensemble, the protein chain efficiently finds its way to the folded state
because native-like interactions are on average more stable than nonnative ones.
In conclusion, the unfolded state ensemble at the melting temperature is a large
collection of conformers differing among each other, in agreement with previous
high temperature molecular dynamics simulations [8, 35]. The energy ‘‘bias’’
which makes fast folding possible does not imply that the unfolded state ensemble
is made up of a small number of statistically relevant conformations. The simula-
tions provide further evidence that the number of denatured state conformations
is orders of magnitudes larger than the conformers sampled during a folding
event. This result also suggests that measurements which imply an average
over the unfolded state do not necessarily provide information on the folding
mechanism.
sampled by MD and the NMR conformers. Only in two of the four MD studies
NOE distance restraints were measured along the trajectories and about 20% were
found to be violated [40, 41]. Moreover, as explicitly stated by the authors, the na-
tive state sampled by distributed computing contains a p-helix (instead of the a-
helix) and the Trp is not packed correctly in the core [39]. These discrepancies
are significant because the Trp-cage has a very small core and a rather rigid C-
terminal segment.
32.2.2
Unfolding Simulations of Proteins
this is even more the case when the perturbation is strong and the reaction is in-
duced too quickly for the system to relax along the pathway.
A perturbation which is particularly ‘‘gentle’’ since it exploits the intrinsic ther-
mal fluctuations of the system and produces the acceleration by selecting the fluc-
tuations that correspond to the motion along the reaction coordinate has also been
used to unfold proteins [55, 56]. This perturbation has been employed, in particu-
lar, to expand a-lactalbumin by increasing its radius of gyration starting from the
native state, and generate a large number of low-energy conformers that differ in
terms of their root mean square deviation, for a given radius of gyration. The re-
sulting structures were relaxed by unbiased simulations and used as models of
the molten globule (see Chapter 23) and more unfolded denatured states of a-
lactalbumin based on measured radii of gyration obtained from nuclear magnetic
resonance experiments [57]. The ensemble of compact nonnative structures agree
in their overall properties with experimental data available for the a-lactalbumin
molten globule, showing that the native-like fold of the a-domain is preserved and
that a considerable proportion of the antiparallel b-sheet in the b-domain is pres-
ent. This indicated that the lack of hydrogen exchange protection found experi-
mentally for the b-domain [58] is due to rearrangement of the b-sheet involving
transient populations of nonnative b-structures in agreement with more recent in-
frared spectroscopy measurements [59]. The simulations also provide details con-
cerning the ensemble of structures that contribute as the molten globule unfolds
and shows, in accord with experimental data [60], that the unfolding is not cooper-
ative, i.e., the various structural elements do not unfold simultaneously.
when the protein was pulled beyond a certain length, i.e., a simple two-state behav-
ior. Simulations showed a more complex behavior [63] with possible intermediates
on the forced unfolding pathways for certain proteins.1
Simulation has been used [65] to compare forced unfolding of two protein
classes (all-b-sandwich proteins and all-a-helix bundle proteins). In particular, sim-
ulations suggested that different proteins should show a significantly different
forced unfolding behavior, both within a protein class and for the different classes
and dramatic differences between the unfolding induced by high temperature and
by external pulling forces. The result was shown to be correlated to the type of per-
turbation, the folding topologies, the nature of the secondary and tertiary interac-
tions and the relative stability of the various structural elements [65]. Improve-
ments in the AFM technique combined with protein engineering methods have
now confirmed (see Chapter 31) that chemical (or thermal) and forced unfolding
occur through different pathways and that forced unfolding is related to crossing
of a free energy barrier which might not be unique, but might change with force
magnitude or upon specific mutations [66].
It should be borne in mind, however, that the forced unfolding of proteins is a
nonequilibrium phenomenon strongly dependent on the pulling speed, and, since
time scales in simulations and experiments are very different, the respective path-
ways need not to be the same. Recently, through a combination of experimental
analysis and molecular dynamics simulations it has been shown that, in the case
of mechanical unfolding, pathways might effectively be the same in a large range
of pulling speeds or forces [67, 68], thus providing another demonstration of the
robustness of the energy landscape (i.e., the funnel-like shape of the free energy
surface sculpted by evolution is not affected by the application of even strong per-
turbations [69, 70]). In two recent papers [71, 72] it has been shown that proteins
resist differently when pulled in different directions. In both cases the experiments
have been complemented with simulations, with either explicit or implicit solvents.
In both cases the behavior observed experimentally is qualitatively reproduced.
This fact strongly suggests, although does not prove it, that in this particular case
the forced unfolding mechanisms explored in the simulations is the same as that
which determines the experimentally measured force.
Difference between solvation models is discussed in detail in Section 32.3.4 in
the context of forced unfolding simulations, the disadvantage of an explicit solva-
tion model [62, 72, 73] relative to an implicit [63, 65, 67, 68, 71] is not only that of
being much slower, but also to provide an environment which relaxes slowly rela-
tive to the fast unraveling of the protein under force. Moreover, properly hydrating
with explicit water a partially extended protein requires a large quantity of water,
thus requiring a very large amount of CPU time for a single simulation. Implicit
solvent models, on the other hand, allow unfolding to be performed at much lower
1) The presence of ‘‘late’’ intermediates on the extension profile was predicted to arise from
forced unfolding pathway was first observed the presence of a kinetically metastable
[63] in the 10th domain of fibronectin type III state. Most recent experimental results (J.
from fibronectin (FNfn3). A more complex Fernandez, personal communication) confirm
pattern than equally spaced peaks in the force the behavior predicted by the simulation.
32.2 Molecular Dynamics Simulations of Peptides and Proteins 1153
forces (or pulling speeds) and multiple simulations to be used to study the depen-
dence of the results on the initial conditions and/or on the applied force.
32.2.3
Determination of the Transition State Ensemble
The understanding of the folding mechanism has crucially advanced since the
development of a method which provide information on the transition state [74]
(see Chapter 13). The method allows the structure of the transition state at the
level of single residue to be probed by measuring the change in folding and unfold-
ing rates upon mutations. The method provides a so-called f-value for each of the
mutated residues which is a measure of the formation of native structure around
the residue: a f-value of 1 suggests that the residue is in a native environment at
the transition state while a f-value of 0 can be interpreted as a loss of the interac-
tions of the residue at the transition state. Fractional f-values are more difficult to
interpret, but have been shown to arise from weakened interactions [75] and not
from a mixture of species, some with fully formed and some with fully broken
interaction.
As we discussed in Section 32.2.2.1, the use of high temperature makes it possi-
ble to observe the unfolding of a protein by MD on a time scale which can be simu-
lated on current computers. Valerie Daggett and collaborators [76] first had the
idea of performing a very high-temperature simulation and looking for a sudden
change in the structure of the protein along the trajectory, indicating the escape
from the native minimum of the free energy surface. The collections of structures
around the ‘‘jump’’ were assumed to constitute a sample of the putative transition
state. Assuming that the experimental f-values correspond in microscopic terms to
fraction of native contacts, they found a good agreement between calculated and
experimental values. This approach was initially applied to the protein CI2, a small
two-state proteins which has been probably the most thoroughly studied by experi-
mental f-value analysis; it has been subsequently improved and extended to the
study of several proteins for which experimental f-values were available (see Ref.
[77] for review and other references).
Another related method has been used recently [78] to unfold a protein by high-
temperature simulation (srcSH3 in the specific case) and determine a putative
transition state by looking for conformations where the difference between calcu-
lated and experimental f-values was smallest.
Both methods presented above have the advantage of providing structures ex-
tracted from an unfolding trajectory and thus the fast refolding or complete un-
folding from these structures (a property of transition states) has been reported
[78, 79]. But both approaches only provide few transition state structures, because
a long simulation is required to generate each member of the transition state en-
semble, while the transition state can be a quite broad ensemble for some proteins
[80].
In a recent development, it has been shown that the amount of information that
can be obtained from experimental measurements can be expanded further by
1154 32 Molecular Dynamics Simulations to Study Protein Folding and Unfolding
Fig. 32.2. Comparison between the native structure elements are show in color (the two
state structure (left) and the most representa- a-helices are plotted in red and the b-sheet in
tive structures of the transition state ensemble green). The three key residues for folding are
of AcP, determined by all-atom molecular shown as gold spheres [81, 82]. Figure from
dynamics simulations [82]. Native secondary Ref. [69].
measured. The method can be also used to find key residues for folding (i.e., those
that are most important for the formation of the TSE).
The study of the transition state represents probably the most interesting exam-
ple of how experiment and molecular dynamics simulations complement each
other in understanding and visualizing the folding mechanisms in terms of rele-
vant structures involved. Simulations are performed with approximate force fields
and unfolding induced using artificial means (such as high temperature or other
perturbations). At this stage in the development of MD simulations, the experi-
ment provides evidence that what is observed in silico is consistent with what hap-
pens in vitro. On the other hand, and particularly in the case in which the full en-
semble of conformations compatible with the experimental results is generated
[82, 83], the simulation suggests further mutations to increase the resolution of
the picture of the transition state, and allows detailed hypothesis of the mecha-
nisms, such as the identity and structure of the residues involved in the folding
nucleus [150].
32.3
MD Techniques and Protocols
32.3.1
Techniques to Improve Sampling
enkephalin in vacuo [96]. The basic idea of REMD is to simulate different copies
(replicas) of the system at the same time but at different temperatures values.
We recently applied a REMD protocol to implicit solvent simulations of a 20-
residue three-stranded antiparallel b-sheet peptide (beta3s) [97]. Each replica
evolves independently by MD and every 1000 MD steps (2 ps), states i; j with
neighbor temperatures are swapped (by velocity rescaling) with a probability wij ¼
expðDÞ [96], where D 1 ðb i b j ÞðEj Ei Þ, b ¼ 1=kT and E is the potential energy.
During the 1000 MD steps the Berendsen thermostat [98] is used to keep the tem-
perature close to a given value. This rather tight coupling and the length of each
MD segment (2 ps) allow the kinetic and potential energy of the system to relax.
High temperature simulation segments facilitate the crossing of the energy bar-
riers while the low-temperature ones explore in detail the conformations present
in the minimum energy basins. The result of this swapping between different tem-
peratures is that high-temperature replicas help the low-temperature ones to jump
across the energy barriers of the system. In the beta3s study eight replicas were
used with temperatures between 275 and 465 K [97].
The higher the number of degrees of freedom in the system the more replicas
should be used. It is not clear how many replicas should be used if a peptide or
protein is simulated with explicit water. The transition probability between two
temperatures depends on the overlap of the energy histograms. The histograms’
pffiffiffiffi
width depends on 1= N (where N is the size of the system). Hence, the number
of replicas required to cover a given temperature range increases with the size.
Moreover, in order to have a random walk in temperature space (and then a ran-
dom walk in energy space which enhances the sampling), all the temperature
exchanges should occur with the same probability. This probability should be at
least of 20–30%. To optimize the efficiency of the method, one should find the
best compromise between the number of replicas to be used, the temperature
space to cover and the acceptance ratios for temperature exchanges. In the litera-
ture there is no clear indication about the selection of temperatures and empirical
methods are usually applied (weak point of the method). The choice of the bound-
ary temperatures depends on the system under study. The highest temperature has
to be chosen in order to overcome the highest energy barriers (probably higher in
explicit water) separating different basins; the lowest temperature to investigate the
details of the different basins.
Sanbonmatsu and Garcia have applied REMD to investigate the structure of Met-
enkephalin in explicit water [99] and the a-helical stabilization by the arginine side
chain which was found to originate from the shielding of main-chain hydrogen
bonds [100]. Furthermore, the energy landscape of the C-terminal b-hairpin of
protein G in explicit water has been investigated by REMD [101, 102]. Recently, a
multiplexed approach with multiple replicas for each temperature level has been
applied to large-scale distributed computing of the folding of a 23-residue minipro-
tein [103]. Starting from a completely extended chain, conformations close to the
NMR structures were reached in about 100 trajectories (out of a total of 4000) but
no evidence of reversible folding (i.e., several folding and unfolding events in the
same trajectory) was presented [103].
32.3 MD Techniques and Protocols 1157
32.3.2
MD with Restraints
1 X exp
r¼ ðf fi Þ 2 ð1Þ
Nf i A E i
exp
where E is the list of the Nf available experimental f-values, fi . The f i -value of
amino acid i in the conformation at time t is defined as
Ni ðtÞ
fi ðtÞ ¼ ð2Þ
Ninat
1158 32 Molecular Dynamics Simulations to Study Protein Folding and Unfolding
where Ni ðtÞ is the number of native contacts of i at time t and N inat the number of
native contacts of i in the native state.
Molecular dynamics simulations are then performed to sample all the possible
structures compatible with the restraints. The structures thus generated are not
necessarily at the transition state for folding for the potential used. They provide
instead a structural model of the experimental transition state, including all possi-
ble structures compatible with the restraints derived from the experiment. The ex-
perimental information provided by the f-values might not be enough to restrain
the sampling to meaningful structures (e.g., when only few mutations have been
performed). In such circumstance, other experimentally measured quantities,
such as the m-value, which is related to the solvent accessible surface, must be
used to restrain the sampling or to a posteriori select meaningful structures.
This type of computational approach relies on the assumption implicit in Eq. (2).
This consists in approximating a f-value, measured as a ratio of free energy varia-
tions upon mutation, as a ratio of side-chain contacts. A definition based on side-
chains is appropriate since experimental f-values are primarily a measure of the
loss of side-chain contacts at the transition state, relative to the native state. Al-
though simply counting contacts, rather than calculating their energies, is a crude
approximation [113], it has been shown that there is a good correlation between
loss of stability and loss of side-chain contacts within about 6 Å on mutation
[114]. Also, Shea et al. [115] have found in their model calculations that this ap-
proximation for estimating f-values from structures is a good one under certain
conditions. A more detailed relation between experimental f-values and atomic
contacts could in principle be established by using the energies of the all-atom con-
tacts made by the side chain of the mutated amino acid.
The same approach can be extended to generate the structures corresponding to
other unfolded or intermediate states as the site-specific information provided by
the experiment is steadily increasing (see Chapters 20 and 21).
32.3.3
Distributed Computing Approach
large systems with explicit water and a relatively small number of processors (be-
tween 2 and 100, depending on the program and the problem studied). One
approach has been proposed that allows the scalability of a MD simulation to be
pushed to the level of being able to use efficiently a network of heterogeneous and
loosely connected computer [117]. The approach (called distributed computing) ex-
ploits the stochastic nature of the folding process. In general protein folding in-
volves the crossing of free energy barriers. The approach is most easily understood
assuming that the proteins have a single barrier and a single exponential kinetic
(which is the case for a large number of small proteins [118]). The probability that
a protein is folded after a time t is PðtÞ ¼ 1 expðktÞ, where k is the folding rate.
Thus, for short times, and considering M proteins or independent simulations, the
probability of observing a folding event is Mkt. So, if M is large, there is a sizable
probability of observing a folding event on simulations much shorter than the time
constant of the folding process [119]. The folding rate could then in principle be
estimated by running M independent simulations (starting from the completely
extended conformation with different random velocities) for a time t and counting
the number N of simulations which end up in the folded state as k ¼ N=ðMtÞ.
Simulations have been reported where the folding rate estimated in this way
(assuming that partial refolding counts as folding) is in good agreement with the
experimental one (see, for example, Ref. [39]).
However, it has been argued [120] that even for simple two-state proteins, fold-
ing has a series of early conformational steps that lead to lag phases at the begin-
ning of the kinetics. Their presence can bias short simulations toward selecting
minor pathways that have fewer or faster lag steps and so miss the major folding
pathways. This fact has been clearly observed by comparing equilibrium and fast
folding trajectories simulations [121] for a 20-residue three-stranded antiparallel
b-sheet peptide (beta3s). It was found that the folding rate is estimated correctly
by the distributed computing approach when trajectories longer than a fraction of
the equilibrium folding time are considered; in the case of the 20-residue peptide
studied within the frictionless implicit solvation model used for the simulations,
this time is about 1% of the average folding time at equilibrium. However, careful
analysis of the folding trajectories showed that the fastest folding events occur
through high-energy pathways, which are unlikely under equilibrium conditions
(see Section 32.2.1.1). Along these very fast folding pathways the peptide does not
relax within the equilibrium denatured state which is stabilized by the transient
presence of both native and nonnative interactions. Instead, collapse and formation
of native interactions coincides and, unlike at equilibrium, the formation of the
two b-hairpins is nearly simultaneous.
These results demonstrate that the ability to predict the folding rate does not
imply that the folding mechanisms are correctly characterized: the fast folding
events occur through a pathway that is very unlikely at equilibrium. However, ex-
tending the time scale of the short simulations to 10% of the equilibrium folding
time, the folding mechanism of the fast folding events becomes almost indistin-
guishable from equilibrium folding events. It must be stressed that this result is
not general but concerns the specific peptide studied; the explicit presence of sol-
1160 32 Molecular Dynamics Simulations to Study Protein Folding and Unfolding
vent molecule (and the consequent friction), might decrease the differences be-
tween equilibrium and shortest folding events. Unfortunately, this kind of valida-
tion of the distributed computing approach is not possible for a generic protein in
a realistic solvent, as equilibrium simulations are not feasible.
An alternative method to use many processors simultaneously to access time
scales relevant in the folding process by MD simulations has recently been pro-
posed by Settanni et al. [122]. The method is based on parallel MD simulations
that are started from the denatured state; trajectories are periodically interrupted,
and are restarted only if they approach the transition (or some other target) state.
In other words, the method choses trajectories along which a cost function de-
creases. The effectiveness of such an approach was shown by determining the
transition state for folding an SH3 domain using as cost function the deviation be-
tween experimental and computed f-values (Eq. (1) in Section 32.3.2). The method
can efficiently use a large number of computers simultaneously because simula-
tions are loosely coupled (i.e., only the comparison between final conformations,
needed periodically to choose which trajectory to restart, involve communications
between CPUs). This method can also be extended to complex nondifferentiable
cost functions.
32.3.4
Implicit Solvent Models versus Explicit Water
Incorporating solvent effects in MD and Monte-Carlo simulations is of key impor-
tance in quantitatively understanding the chemical and physical properties of
biomolecular processes. Accurate electrostatic energies of proteins in an aqueous
environment are needed in order to discriminate between native and nonnative
conformations. An exact evaluation of electrostatic energies considers the interac-
tions among all possible solute–solute, solute–solvent, and solvent–solvent pairs
of charges. However, this is computationally expensive for macromolecules. Con-
tinuum dielectric approximations offer a more tractable approach [123–127]. The
essential concept in continuum models is to represent the solvent by a high dielec-
tric medium, which eliminates the solvent degrees of freedom, and to describe the
macromolecule as a region with a low dielectric constant and a spatial charge dis-
tribution. The Poisson equation provides an exact description of such a system.
The increase in computation speed for a finite difference solution of the Poisson
equation [128–131] with respect to an explicit treatment of the solvent is remark-
able but still not enough for effective utilization in computer simulations of macro-
molecules. The generalized Born (GB) model was introduced to facilitate an effi-
cient evaluation of continuum electrostatic energies [42]. It provides accurate
energetics and the most efficient implementations are between five and ten times
slower than in vacuo simulations [132–134]. The essential element of the GB
approach is the calculation of an effective Born radius for each atom in the system
which is a measure of how deeply the atom is buried inside the protein. This infor-
mation is combined in a heuristic way to obtain a correction to the Coulomb law
for each atom pair [42]. For the integration of energy density, necessary to obtain
the effective Born radii, both numerical [42, 132, 135] and analytical [134, 136, 137]
32.3 MD Techniques and Protocols 1161
implementations exist. The former are more accurate but slower than the latter
[135]. Moreover, analytical derivatives that are required for MD simulations are
not given by numerical implementations.
For efficiency reasons empirical dielectric screening functions are the most com-
mon choice in MD simulations with implicit solvent. One kind of solvation model
is based on the use of a dielectric function that depends linearly on the distance r
between two charges ðeðrÞ ¼ arÞ [138, 139] or has a sigmoidal shape [140, 141]. Al-
though very fast, these options suffer from their inability to discriminate between
buried and solvent exposed regions of a macromolecule and are therefore rather
inaccurate. A distance and exposure dependent dielectric function has been pro-
posed [142]. Recently, an approach based on the distribution of solute atomic vol-
umes around pairs of charges in a macromolecule has been proposed to calculate
the effective dielectric function of proteins in aqueous solution [143].
The simulation results presented in Section 32.2.1 were obtained using an im-
plicit solvent model based on a fast analytical approximation of the solvent accessi-
ble surface (SAS) [13] and the CHARMM force field [110]. The former drastically
reduces the computational cost with respect to an explicit solvent simulation. The
SAS model is based on the approximation proposed by Lazaridis and Karplus [112]
for dielectric shielding due to the solvent, and the surface area model for the hydro-
phobic effect introduced by Eisenberg and McLachlan [144]. Electrostatic screening
effects are approximated by a distance-dependent dielectric function and a set of
partial charges with neutralized ionic groups [112]. An approximate analytical ex-
pression [145] is employed to calculate the SAS because an exact analytical or nu-
merical computation of the SAS is too slow to compete with simulations in explicit
solvent. The SAS model is based on the assumptions that most of the solvation en-
ergy arises from the first water shell around the protein [144] and that two atomic
solvation parameters are sufficient to describe these effects at a qualitative level
of accuracy. Within these assumptions, the SAS energy term approximates the
solute–solvent interactions (i.e., it should account for the energy of cavity forma-
tion, solute–solvent dispersion interactions, and the direct (or Born) solvation of
polar groups). The two atomic solvation parameters were optimized by performing
1 ns MD simulations at 300 K on six small proteins [13]. It is important to under-
line that the structured peptides discussed in Section 32.2.1 were not used for the
calibration of the SAS atomic solvation parameters. The SAS model is a good ap-
proximation for investigating the folded and denatured state (large ensemble of
conformers) of structured peptides. Its limitations, in particular for highly charged
peptides and large proteins, have been discussed [13].
The most detailed and physically sound approaches (e.g., explicit solvent and
particle mesh Ewald treatment of the long-range electrostatic interactions [146])
are still approximations and might introduces artifacts (see, for example, Ref.
[147]). All solvation models, even those computationally most expensive, are ap-
proximations and their range of validity is difficult to explore. It is likely that most
proteins will unfold fast relative to the experimental time scale if one could afford
long (e.g., 100 ns) explicit water MD simulations even at room temperature. Some
evidence of this instability has been recently published [148].
1162 32 Molecular Dynamics Simulations to Study Protein Folding and Unfolding
32.4
Conclusion
It is a very exciting time for studying protein folding using multidisciplinary ap-
proaches rooted in physics, chemistry, and computer science. The time scale gap
between folding in vitro and in silico is being continuously reduced and this will
bring interesting surprises. We expect an increasing role of MD simulations in
the elucidation of protein folding thanks to further improvements in force fields
and solvation models.
References
33
Molecular Dynamics Simulations of Proteins
and Peptides: Problems, Achievements,
and Perspectives
Paul Tavan, Heiko Carstens, and Gerald Mathias
33.1
Introduction
When viewed from the standpoint of theoretical physics, proteins are extremely
complex materials. They are inhomogeneous, non-isotropic, and exhibit dynamic
processes covering many spatial and temporal scales (instead of only one or two),
that is, they do not offer any of the nice features that make other materials readily
accessible to simplified or coarse-grained descriptions. Therefore, theoretical ap-
proaches towards protein and peptide dynamics have to be based on microscopic
descriptions. Here, atomistic molecular dynamics (MD) simulations, which are
based on molecular mechanics (MM) force fields, currently represent the standard
[1]. It should be noted, however, that these MM methods represent a first level of
coarse graining, because they try to account for the electronic degrees of freedom
mediating the interactions between the atoms by simplified parametric descrip-
tions coded into standard MM force fields (for a recent review see Ponder and
Case [2]).
In this article we want to address the question, to what extent can MM-MD sim-
ulations currently or in the near future contribute to the understanding of protein
folding? Although simulations have been published (for a review see Ref. [3]) that
allegedly describe folding processes of peptides or small proteins, there are also
serious arguments (e.g., in Ref. [2]) that the available MM force fields may not be
accurate enough for qualitatively correct descriptions. This would imply that the
published simulations, instead of providing a ‘‘virtual reality’’ of folding processes,
actually represent ‘‘real artifacts.’’ Thus, it is worthwhile addressing the issue of
accuracy.
Because there are many excellent textbooks and review articles on MD simula-
tion methods for liquids [4] and proteins (Ref. [2] and references therein), there is
no point in duplicating this material. Instead, we have chosen to outline our own
perspective of the issue, which has been and still is guiding our longstanding ef-
forts to develop and improve the required computational methods. We start with a
sketch of the basic physics of protein structure and dynamics. This sketch will
serve to identify key challenges that have to be mastered by any attempt aiming at
a theoretical description of these complicated processes.
33.2
Basic Physics of Protein Structure and Dynamics
O H
H R H
Ca N C' Ca N
C' Ca N C' Ca
R
H H
O O
H O
H R H
N Ca C' N Ca
Ca C' N Ca C'
H H R
O O
Fig. 33.1. Organization of the dipole–dipole equatorial ("#). The structural stability of b-
interactions within a b-sheet. As explained by sheets is caused by a dipolar organization
the resonance structures shown in Figure 33.3, integrating both arrangements in an optimized
the peptide groups form rigid platelets fashion. Here, in contrast to a-helices, which
(indicated by the dashed lines) and exhibit exclusively exhibit the (!!)-arrangement, the
strong electric dipole moments (arrows). Such local dipoles do not add up to a macroscopic
dipoles attract each other, if their arrangement dipole.
is parallel and axial (!!) or antiparallel and
1172 33 Molecular Dynamics Simulations of Proteins and Peptides
33.2.1
Protein Electrostatics
When attempts are made at a theoretical description, a first difficulty becomes ap-
parent here. The dipole–dipole interactions (commonly called hydrogen bonds),
which provide the main contribution to the folding of the polypeptide chains into
specific three-dimensional structures, provide binding energies not much larger
than the thermal energy kB T and it is only the combination of many different
dipole–dipole interactions that generates the stability of a specific structure. There-
fore, the precise description of the many weak electrostatic interactions within a
protein is of key importance, since these interactions determine whether particular
solid structures are formed or not. This instance is only a specific manifestation of
the general fact that electrostatic interactions are dominant in shaping protein
structure and dynamics [5]. In contrast, the weaker van der Waals interactions are
less important as they are nonspecifically acting between all chemically nonbonded
atoms in a similar way.
To obtain a quantitative estimate, consider a hypothetical protein with N ¼ 100
residues, whose native conformation lies the typical energy value of DE ¼ 20
kJ mol1 below either a compact misfolded conformation or the huge variety of
unfolded ones. Then the acceptable error per residue allowing us still to identify
the correct conformation is approximately DE=N 1=2 or 2 kJ mol1 (see Ref. [2]
and references therein), where the errors are assumed to be randomly distributed.
Systematic errors in the description of the electrostatics will reduce the acceptable
error down to DE=N or 0.2 kJ mol1 . Thus, great care has to be taken in the de-
scription of intraprotein electrostatics if one wants to generate a ‘‘virtual reality.’’
Its quantitatively correct description constitutes the first key problem for theoreti-
cal approaches.
33.2.2
Relaxation Times and Spatial Scales
33.2.3
Solvent Environment
Up to this point we have looked at the problem of describing the native structure
and dynamics of proteins as if they could be understood as isolated objects. As-
suming this hypothesis we have identified two key problems with which any at-
tempt of a realistic description is confronted. The first is posed by the high accu-
racy required for the modeling of the electrostatics within a protein and the
second by the huge span of time scales that have to be covered. But the issue is
even more complicated, since proteins cannot be isolated from their surroundings.
Proteins acquire their native structure and functional dynamics solely within
their physiological environment, which in the case of soluble proteins is character-
ized by certain ranges of temperature, pressure, pH, and concentrations of ions
and of other molecular components within the aqueous solvent. Although these
ranges may vary from protein to protein, there are specific physiological ranges
for each of them. Therefore, any attempt to provide a quantitative description for
the dynamic processes of protein folding and function has to properly include the
environment at its physiological thermodynamic state.
1174 33 Molecular Dynamics Simulations of Proteins and Peptides
33.2.4
Water
The key component of the environment is liquid water, whose adequate represen-
tation in computer simulations represents a challenge on its own (see Ref. [7] for a
critical review and Ref. [6] for recent results). A water molecule is nearly spherical
and very small (approximately the size of an oxygen atom), it carries large electrical
dipole and quadrupole moments, and is strongly polarizable by electric fields
generated, for example, by other water molecules, ions, or polar groups in its
surroundings.
The large dipole moment causes the unusually large dielectric constant of 78, by
which liquid water scales down electrostatic interactions. For proteins and peptides
this dielectric mean-field property of water is important, because they are usually
polyvalent ions and therefore generate a reaction field by polarization of the aque-
ous environment, which modifies the electrostatic interactions within the protein.
The proper description of these very long-ranged effects thus represents the first
problem specifically posed by MD simulations of protein–solvent systems.
The sizable quadrupole moment of the water molecule shapes the detailed struc-
tures of the hydration shells of solute molecules and, in particular, also those of
proteins. If, as suggested by Brooks [8], water molecules actively support secondary
structure formation, for example, by bridging nascent b-sheets, then an accurate
modeling of the quadrupole moment is required for the understanding of the de-
tailed folding pathways.
The strong electronic polarizability of the water molecules poses yet another
challenge. As illustrated by the computational result in Figure 33.2, due to polar-
Relative frequency
= 2.77
= 0.18
0 1 2 3
(Debye)
Fig. 33.2. Distribution of the dipole moment very close to the experimental value of 1.85 D.
of a water molecule dissolved in pure water Thus, the induced dipole moment determined
as obtained from a hybrid MD simulation by the calculations is 0.85 D, that is about 40%
combining density functional theory (DFT) of the gas-phase value. In liquid water the
for the given water molecule with an MM fluctuations of the dipole moment generated
description of the aqueous environment. The by the electronic polarization are 6.5% as
DFT description of an isolated water molecule measured by sm =hmi. For technical details and
yields a dipole moment of 1.92 D, which is explanations see Ref. [9].
33.2 Basic Physics of Protein Structure and Dynamics 1175
ization the dipole moment increases from a value of 1.85 D for a water molecule
isolated in the gas-phase to a value of about 2.6–3.0 D in the liquid phase [7, 9,
10]. Correspondingly the dipole moment of a water molecule will sizably change
when it moves from a hydration shell of an ion through bulk water towards a non-
polar surface of a protein.
33.2.5
Polarizability of the Peptide Groups and of Other Protein Components
H H
+
Ca N Ca N
C' Ca C' Ca
–
O O
Fig. 33.3. p-electron resonance structures of these resonance structures contribute to the
a peptide group explaining its strong dipole electronic wave function. Correspondingly, it
moment and polarizability. An external electric will sizably change the dipole moment and the
field can modify the relative weights, by which force constants.
structure content (see Ref. [13] for further information). Peaks of the so-called
amide-I band, for instance, which belongs to the CbO stretching vibrations of the
backbone, are found in the spectral range between 1620 and 1700 cm1 , indicating
that the polarization can change the associated CbO force constants by 5%. Con-
versely, (i) the fact that the large and specific frequency shifts of the amide modes
can indicate the secondary structure environment of the corresponding peptide
groups, and (ii) the resonance argument in Figure 33.3 uniquely demonstrate that
the electronic polarization will entail sizably different strengths (b5%) of the dipo-
lar interactions (‘‘H-bonds’’) within the various secondary structure motifs. This
proves our above conjecture that the electronic polarizability provides a differential
contribution to the relative stabilities of secondary structure motifs.
As a result, theoretical descriptions of protein dynamics have to include the elec-
tronic polarizability. In addition to this general insight, the above considerations
have led us also to quantitative estimates as to how much the polarizability con-
tributes to the electrostatics. We found numbers ranging between 50% (charge on
top of an aromatic ring) and 5% (peptide groups in different secondary structure
motifs). These contributions systematically and differentially change the electro-
static interactions within protein–solvent systems and do not represent random
variations. This instance is obvious if one compares the interaction of a dipolar
group with a charged and a nonpolar group, respectively. Furthermore, it has
been demonstrated above for the dipolar interactions within secondary structure
motifs. Below we will refer to the thus established 5% lower bound of the polariz-
ability contribution to protein electrostatics for purposes of accuracy estimates.
Finally, we would like to add that enzymatic catalysis in proteins, which is a
prominent feature of these materials, is essentially an effect of polarizability.
Here, a substrate is noncovalently attached in a specific orientation to a binding
pocket and a highly structured electric field, which is generated by a particular ar-
rangement of charged, polar and nonpolar residues, exerts a polarizing strain on
its charge distribution such that a particular chemical bond either breaks or is
formed. This example underlines the decisive contribution of the electronic polar-
izability to protein structure, dynamics and function.
33.3 State of the Art 1177
33.3
State of the Art
33.3.1
Control of Thermodynamic Conditions
In the theory of liquids and long before the history of protein simulations began,
MD simulation systems representing a well-defined thermodynamic ensemble had
been set up by filling periodic boxes with MM models of the material of interest
(cf. the first simulation of liquid water by Rahman and Stillinger [17]). Whereas
the temperature (or internal energy) is trivially accessible in any MD approach, up
to now the pressure (or volume) can be controlled solely (cf. Mathias et al. [18] for
a discussion) through the use of periodic boundary conditions (PBCs). This control
can be executed safely and poses no problems.
33.3.2
Long-range Electrostatics
PBCs generate an unbounded simulation system of finite size with the topology
of a three-dimensional torus. Therefore, they create the problem of artificial self-
interactions whenever the interactions among the simulated particles are long
ranged as is the case for electrostatics. To avoid such self-interactions one has to
follow the minimum image convention (MIC), which dictates that all interactions
beyond the MIC-distance RMIC ¼ L=2 should be neglected, where L is the size of
the periodic simulation box [4]. On the other hand, a corresponding truncation
of the long-ranged electrostatic interactions in the so-called straight-cutoff (SC)
approach leads to serious structural and energetic artifacts, even if the simulated
system consists solely of pure water [18–20]. For solutions of polyvalent ions, like
proteins or nucleic acids, the SC-artifacts become even worse.
1178 33 Molecular Dynamics Simulations of Proteins and Peptides
Rc
d
Fig. 33.4. Concept of the TBC/RF approach to dielectric outside of that sphere generates a
the computation of long-range electrostatics in reaction field acting on the central charge,
MD simulations of proteins in solution [18]. whose interactions with the charges within
Each atom of the periodic simulation system the sphere are rapidly calculated by fast,
of size L containing a protein of diameter d in hierarchical, and structure-adapted multipole
solution is surrounded by a sphere of radius expansions. Here, the applied SAMM
R C ¼ RMIC ¼ L=2. The polarization of the algorithm [25, 26] scales linearly with N.
33.3 State of the Art 1179
To provide an example for such error bounds, we consider the PBC/LS, TBC/RF,
and SC simulations on large water systems (L ¼ 8–12 nm, N ¼ 34 000–120 000),
which have been executed with two different programs (GROMACS [27], EGO-
MMII [28]) under otherwise identical conditions (see Refs [6, 18]). The enthalpies
of vaporization DH per water molecule calculated by PBC/LS and TBC/RF showed
relative deviations of at most 3/1000, that is 0.1 kJ mol1 . This deviation is smaller
by a factor of 2 than the acceptable systematic error per residue estimated in Sec-
tion 33.1.1 and by a factor of 20 below the acceptable random error. In contrast, the
DH value calculated by SC using a cutoff radius R C A 4 nm (system size L ¼ 8
nm) deviated by 1.2 kJ mol1 from the PBC/LS and TBC/RF values, which is larger
by a factor of 6 than the acceptable systematic error introduced in Section 33.1.1.
This example quantitatively proves that SC simulations grossly miss the accuracy
required for protein structure prediction in purely dipolar systems, whereas here
the two other methods are acceptable. For solutions containing ions and particu-
larly proteins similar comparisons of PBC/LS and TBC/RF treatments will soon
provide clues as to which measures can be used to avoid computational artifacts
and to achieve the required accuracy in these more complicated cases. As a result,
the longstanding problem of how the long-range electrostatics of protein–solvent
systems can be treated in MD simulations with sufficient accuracy and manageable
computational effort will find a final answer in the near future. Furthermore, we
can safely state that the remaining accuracy problems connected with long-range
electrostatics computation are much smaller than those associated with the MM
force fields to which we will now turn our attention.
33.3.3
Polarizability
librated system as the starting point for a 100 ps simulation and evaluated the
average total electrostatic energy E e; tot of the protein atoms. Less than 1% of all
electrostatic interactions between the protein charges occur within a given residue.
Due to the so-called 1–3 exclusion commonly used in MM force fields [33] they act
at comparable distances like the interactions with charges residing at neighboring
residues or solvent molecules. Therefore, we estimate the intra-residue contri-
bution to E e; tot to be sizably smaller than at most 20% of E e; tot . As a result, the
expression
should represent a lower limit for the average electrostatic energy E e; res per resi-
due. Inserting the computed number, we find E e; res A 160 kJ Mol1 . With the 5%
lower bound for the systematic error, which is expected to be caused by the ne-
glect of polarizability, we thus find a lower limit to the ‘‘polarizability error’’ of 8
kJ mol1 per residue. This polarizability error is by a factor 4 larger than the ac-
ceptable random error and exceeds the acceptable systematic error, which should
be more relevant here, by a factor of 40.
Therefore, the neglect of the electronic polarizability is the key drawback of the
available MM force fields and, up to now, prevents quantitative MD descriptions of
protein structure. Furthermore, it is the reason why the parameterization of such
force fields is confronted with insurmountable difficulties. As outlined in Section
33.1.5, the electronic polarizability of the peptide groups provides a differential
contribution to the relative stabilities of secondary structure motifs. As a result,
the common nonpolarizable MM force fields can either describe the stability of a-
helices or that of b-sheets at an acceptable accuracy, but never both. Correspond-
ingly and as illustrated in Ponder and Case [2] by sample simulations, such force
fields either favor a-helical structures (e.g., AMBER ff94 [34]) or b-sheets (e.g.,
OPLS-AA [35]), or else fail concerning intermediate structures (e.g., CHARMM22
[33]).
33.3.4
Higher Multipole Moments of the Molecular Components
For computational reasons, MM force fields apply the partial charge approxima-
tion, according to which the charge distribution within the molecular components
of a simulation system is modeled by charges located at the positions of the nuclei.
Although this approximation should be sufficiently accurate in most instances (if
the problem of the missing polarizability is put aside), it will cause problems con-
cerning the description of local structures, whenever ‘‘electron lone pairs’’ occur at
oxygen atoms [29]. Important examples for this chemical motif are the peptide
CbO groups (cf. Figure 33.3) and the water molecule. At the small oxygen atoms
the lone pairs cause large higher multipole moments, whose directional signature
cannot be properly represented within the partial charge approximation. Although
the electric field generated by these multipole moments rapidly decays beyond the
33.3 State of the Art 1181
A B C
H
r
q q/2
s O
O F
l l
H
Fig. 33.5. Geometries of three different charges. The large circle indicates the van der
nonpolarizable MM models for water. Black Waals radius. A) TIP3P, B) TIP4P (both
circles indicate the positions of negative and Jorgensen et al. [33]), C) TIP5P [36].
open circles those of the positive partial
first shell of neighboring atoms, it steers the directions of the H-bonds within that
shell and, thus, shapes the corresponding local structures. The consequences of
this subtle effect are yet to be explored for proteins. For water, however, whose ex-
perimental and theoretical characterization has always been and still is at a much
higher stage than that of proteins, the importance of this effect is well known.
33.3.5
MM Models of Water
In an excellent review, Guillot [7] has recently given ‘‘a reappraisal of what we have
learnt during three decades of computer simulations on water.’’ Within the partial
charge approximation, MM models of water comprise three charges at the posi-
tions of its three atoms (‘‘three point models’’). By symmetry and neutrality of the
molecule, in this case three parameters (one charge, bond angle, and bond length)
suffice to specify the charge distribution (cf. Figure 33.5A). However, these simple
three-point models, which are commonly used in MD simulations of protein–
water systems, do not correctly account for the higher multipole moments and,
therefore, miss essential features of the correlation functions in pure water (see,
for example, Ref. [6]). Because of this defect and of the missing polarizability,
they cannot be expected to provide an adequate description of the hydration shell
structures surrounding proteins.
For an improved description of the higher multipole moments, four- and five-
point models have been suggested positioning partial charges away from the posi-
tions of the nuclei. Figure 33.5 compares the geometries of correspondingly ex-
tended models with that of a simple three-point model. In addition, a series of
polarizable models using ‘‘fluctuating charges’’ or inducible dipoles have been pro-
posed. Guillot gives an impressive table listing about 50 different MM models of
water and, subsequently, classifies the performance of these models concerning
the reproduction of experimentally established properties in simulations. Although
Guillot mainly discusses pure water, he also shortly comments on the topic of
transferability, which is the unsolved problem of how to construct an MM water
1182 33 Molecular Dynamics Simulations of Proteins and Peptides
model performing equally well for all kinds of solute molecules. Summarizing his
review he arrives at the conclusion that ‘‘one has a taste of incompletion, if one
considers that not a single water model available in the literature is able to repro-
duce with a great accuracy all the water properties. Despite many efforts to im-
prove this situation very few significant progress can be asserted.’’ In his view this
is due to ‘‘the extreme complexity of the water force field in the details, and their
influence on the macroscopic properties of the condensed phase.’’ Nevertheless,
being a committed scientist he does not suggest that we should quit attempts at
an MM description but instead adds several suggestions as to how one should ac-
count for the polarizability and for other details in a better way.
33.3.6
Complexity of Protein–Solvent Systems and Consequences for MM-MD
The state of the MM-MD descriptions of pure water sketched above is sober-
ing when one turns to protein–solvent systems, whose complexity exceeds that of
pure water by orders of magnitude. This enormous increase of complexity is
caused only in part by the much larger spatial and temporal scales characterizing
protein dynamics (cf. Section 33.1.2). More important is the combinatorial com-
plexity, which is posed by the fact that one has to parameterize the interactions be-
tween a large variety of chemically different components (water, peptide groups,
amino acid residues, ions, etc.) into a single effective energy function in a balanced
way. If already in the case of a one-component liquid like water subtle details of the
MM models determine the macroscopic properties, a difficulty which, up to now,
has prevented the construction of a model capable of explaining all known proper-
ties, then one may ask whether the MM approach will ever live up to its promises
concerning proteins at all. This critical question gains support by our above esti-
mate concerning the size of the errors introduced by the neglect of the electronic
polarizability characteristic for the common force fields. Our conservative estimate
suggests that nowadays not even a single and most basic property of soluble
proteins, namely their structure, can be reliably predicted by extended MM-MD
simulations.
33.3.7
What about Successes of MD Methods?
A B
1680 1680
v (cm–1)
v~ (cm–1)
1620 2a 1g 1620
~
1600 1600 C4=O
1580 b2u 1580
C2=C3
1560 1560
DFT exp DFT/MM exp
Fig. 33.6. Comparison of DFT (A) and DFT/ Rb. sphaeroides [49]. Both computational results
MM (B) computations with experimental data agree very well with the observations. In
for the vibrational spectra of the CbO (solid particular, the DFT/MM treatment correctly
lines) and CbC (dashed lines) modes of predicts the sizable 60 cm1 red shift of one of
quinone molecules in the gas-phase and in a the CbO modes, which is caused by the
protein, respectively. The DFT results pertain to protein environment through electrostatic
p-benzoquinone [48] and the DFT/MM results polarization of the quinone.
to the ubiquinone QA in the reaction center of
lar structure is much less sensitive to inaccuracies of the QM treatment than the
intramolecular force field.
In condensed phase, the computation of vibrational spectra [49, 50] has recently
been enabled by the development of a DFT/MM hybrid method [9]. The pertur-
bation argument given further above entails the expectation that DFT/MM can
describe condensed-phase vibrational spectra at an accuracy comparable to the
one previously achieved by DFT for the gas-phase spectra. This expectation is
confirmed by the results shown in Figure 33.6 and by the infrared spectra of
p-bezoquinone on water, which have been calculated from a QM/MM hybrid MD
trajectory [50]. Thus, although the common nonpolarizable MM force fields are
not accurate enough for unrestricted MD simulations of biomolecules, they are
valuable in more restricted settings.
33.3.8
Accessible Time Scales and Accuracy Issues
In this respect it is, in a sense, fortunate that computational limitations have pre-
viously restricted MM-MD simulations of protein–solvent systems to the time scale
of a few nanoseconds. Because simulations usually start from a structure deter-
mined by X-ray crystallography or by more-dimensional NMR, and because in pro-
teins small-scale conformational transitions only begin to occur at this time scale,
the actual metastability of a MM-MD protein model is likely to become apparent
only in much more extended simulations than in the ones that are currently feasi-
ble. At the accessible nanosecond time scale, the large-scale and slow relaxation
processes changing a metastable structure are simply ‘‘frozen out,’’ and protein
simulations are strongly restricted by the accessible time scales. Structural stabil-
ities observed in short-time simulations are due to this restriction and cannot be
attributed to the particular quality of the force field. Nevertheless, the apparent sta-
bility of protein structures even in grossly erroneous MM-MD simulations (like in
those applying the SC truncation to the long-range electrostatics) may have led sci-
entists to believe in the validity of MD results. For the field of MD simulations on
protein dynamics this was fortunate, because it has created a certain public accep-
tance (cf. e.g., Ref. [41]) and has attracted researchers. On the other hand it was
also quite unfortunate, because the urgent necessity to improve the MD methods
to meet the challenges outlined in Section 33.1 remained an issue of concern only
in a small community of methods developers.
However, the speed of computation has been increasing by a factor of 2 each
year during the past decades and, according to the plans of the computer industry,
will continue to exhibit this type of exponential growth within the coming decade.
For the available MM force fields this progress would imply that MD simulations
spanning about 0.5 ms become feasible, because the current technology already al-
lows series of MD simulations spanning 10 ns each for protein–solvent systems
comprising 10 4 –10 5 atoms. Therefore, the deficiencies of the current MM force
fields are becoming increasingly apparent in unrestricted long-time simulations
through changes of supposedly stable protein or peptide conformations. This expe-
rience will drive the efforts to construct better force fields, which must be more
accurate by at least one order of magnitude, although such an improvement neces-
sarily will imply a larger computational effort and, therefore, will impede the trac-
tability of large proteins at extended time scales. Here, the accurate and computa-
tionally efficient inclusion of the electronic polarizability will be of key importance.
33.3.9
Continuum Solvent Models
The above discussion on the accessible time scales was based on the assumption
that a microscopic model of the aqueous environment is employed to account for
the important solvent–protein interactions. This approach represents the state of
the art and, at the same time, an obstacle in accessing more extended time scales,
because in the corresponding simulation systems 90% of the atoms must belong to
the solvent (for an explanation see Section 33.2.2 and Mathias et al. [18]). Thus,
one simulates mainly a liquid solvent slightly polluted by protein atoms, and most
1186 33 Molecular Dynamics Simulations of Proteins and Peptides
of the computer time is spent on the calculation of the forces acting on the solvent
molecules.
It is tempting to get rid of this enormous computational effort by applying so-
called implicit solvent models. Here, one tries to replace the microscopic modeling
of the solvent by mean field representations of its interactions with the solute pro-
tein, which are mainly the electrostatic free energy of solvation and the hydropho-
bic effect caused by the solvent. The hydrophobic effect is usually included by an
energy term proportional to the surface of the solute. Much more complicated is
the computation of the electrostatic free energy of solvation. Here one has to find
solutions to the electrostatics problem of an irregularly shaped cavity filled with
point charges (the solute) and embedded in a dielectric and possibly ionic contin-
uum (the solvent). Standard numerical methods to solve the corresponding Pois-
son and Poisson-Boltzmann equations are much too expensive to be used in
MD simulations and have many other drawbacks (see Egwolf and Tavan [31] for a
discussion).
To circumvent the problem of actually having to solve a partial differential equa-
tion (PDE) at each MD time step, the so-called generalized Born methods (GB)
have been suggested (for a review see Bashford and Case [51]). GB methods intro-
duce screening functions that are supposed to describe the solvent-induced shield-
ing of the electrostatic interactions within the solute and are empirically parameter-
ized using sets of sample molecules. The required parameters then add to the
combinatorial complexity of the force field parameterization.
The GB approximation enables the computation of impressive MD trajectories
covering several hundred microseconds [52]. However, not surprisingly, the result-
ing free energy landscapes drastically differ from those obtained with explicit sol-
vent [53, 54]. Thus, the GB methods apparently oversimplify the complicated elec-
trostatics problem for the sake of computational efficiency and, therefore, fail in
structure prediction.
A physically correct method to compute the free energy of solvation and, in
particular, the forces on the solute atoms in implicit solvent MD simulations has
recently been developed by Egwolf and Tavan [31, 55]. In this analytical ‘‘one-
parameter’’ approach the reaction field arising from the polarization of the implicit
solvent continuum is represented by Gaussian dipole densities localized at the
atoms of the solute. These dipole densities are determined by a set of coupled
equations, which can be solved numerically by a self-consistent iterative procedure.
The required computational effort is comparable to that of introducing the elec-
tronic polarizability into MM force fields. This effort is by many orders of magni-
tude smaller than that of standard numerical methods for the solution of PDEs.
The resulting reaction field forces agree very well with those of explicit solvent
simulations, and the solvation free energies match closely those obtained by the
standard numerical methods [31]. However, this progress is only a first step to-
wards MD simulations with implicit solvent. For its use in MD, one has to correct
the violations of the reaction principle resulting from the reaction field forces. In
addition, the surface term covering the hydrophobic forces has to be parameterized
in order to match the free energy landscapes found in explicit solvent simulations.
33.4 Conformational Dynamics of a Light-switchable Model Peptide 1187
33.3.10
Are there Further Problems beyond Electrostatics and Structure Prediction?
Up to this point, our analysis has been focused on the complex electrostatics in
protein–solvent systems and the accuracy problems that are caused by the electro-
statics in the construction and numerical solution of MM-MD models. In this dis-
cussion we have taken the task of protein structure prediction as our yardstick to
judge the accuracy of the available force fields and computational techniques. If
MD is supposed to provide a contribution to protein folding or to the related prob-
lem of structure-based drug design, this yardstick is certainly relevant. However,
there are many more possible fields of application for MD simulation than just
these two. Each of them has its own accuracy requirements pertaining to different
parts of the force field, as has been illustrated by the various examples of restricted
MD-based methods given in Section 33.2.7.
If MD is taken seriously as a method for generating a virtual reality of the con-
formational dynamics that enables, for instance, energy-driven processes of protein
function, then additional aspects of the MM force fields become relevant. Because
this conformational dynamics involves torsions around chemical bonds and re-
quires evasive actions to circumvent steric hindrances in these strongly constrained
macromolecules, the potential functions mapping the mechanical elasticity and
flexibility must be adequately represented for correct descriptions of the kinetics.
Here, these so-called ‘‘bonded’’ potential functions must be balanced with the van
der Waals potentials of the various atoms and with the modeling of the electro-
statics. To illustrate the delicate dependence of simulation descriptions on these de-
tails of the force field parameterization, we will now return to one of the examples
mentioned in Section 33.2.7.
33.4
Conformational Dynamics of a Light-switchable Model Peptide
cis trans
N N
8,0 Å 11,3 Å
~80º ~180º O
hv N
O
HN HN N Ala
Phe Ala
Gly Cys Phe – Gly – Asp – Cys – Thr – Ala – Cys
Asp Ala
Cys Thr
Fig. 33.7. Chemical structures of a small chromophore and the peptide from about 80
cyclic peptide (cAPB) and scheme of the light- to 180 and elongates the chromophore by
induced photoisomerization, which changes 3.3 Å. The strain exerted by the chromophore
the isomeric state of the azobenzene dye after photoisomerization forces the peptide to
integrated into the backbone of the peptide relax from an ensemble of helical and loop-like
from cis to trans. The isomerization widens conformations into a stretched b-sheet (see
the angle of the stiff linkage between the Figure 33.9 for structural details).
33.4.1
Computational Methods
20
E (kcal mol–1) E1
15 E2 φ O
N
10
HN N
HN
5
O
0
150 160 170 180 190 200 210
φ (deg)
Fig. 33.8. Two different MM potentials vicinity of the trans-configuration (F ¼ 180 )
obtained from fits to DFT potential curves for and the softer potential E2 from a fit of a
the torsion of the chromophore around the Fourier expansion aiming at an improved
central NbN double bond by a dihedral angle description of the DFT potential curve in a
F [57]. The stiff potential E1 results from a fit wider range of torsion angles F A ½70 ; 300 .
of a cosine to the DFT potential curve in the
model the photoexcitation of the dye, its relaxation on the excited-state surface, and
its ballistic crossing back to the potential energy surface of the electronic ground
state through a conical intersection. To describe the subsequent dissipation of the
absorbed energy into the solvent and the relaxation of the peptide from the initial
cis-ensemble of conformations to the target trans-conformation, MM parameters
for the chromophore, the peptide and the surrounding solvent were required.
Apart from a slight modification [57], the peptide parameters were adopted from
the CHARMM22 force field [33]. Like in the experimental set-up, dimethyl-sulfoxid
(DMSO) was used as the solvent. For this purpose an all-atom, fully flexible DMSO
model was designed by DFT/MM techniques [9], was characterized by TBC/RF-
MD simulations, and was compared with other MM models known in the litera-
ture [57]. Likewise, the MM parameters of the chromophore were determined by
series of DFT calculations. Figure 33.8 illustrates the subtle differences that may
result from such a procedure of MM parameter calculation, taking the stiffness of
the central NbN bond with respect to torsions as a typical example. The stiffer po-
tential E1 has been employed in Spörlein et al. [39] and the consequences of choos-
ing the slightly softer potential E2 will be discussed further below.
A larger number of MD simulations, each spanning 1–10 ns (some even up to
40 ns), have been carried out to characterize the equilibrium conformational en-
sembles in the cis and trans states as well as the kinetics and pathways of their
photo-induced conversion [57]. The simulation system was periodic, shaped as a
rhombic dodecahedron (inner diameter of 54 Å), and was initially filled with 960
DMSO molecules. The best 10 NMR solutions [58] for the structure of the cAPB
peptide were taken as starting structures for the simulations of the photoisomeri-
zation. They were placed into the simulation system, overlapping DMSO mole-
cules were removed, and the solvent was allowed to adjust in a 100 ps MD simula-
tion at 300 K to the presence of the rigid peptide. Then the geometry restraints
initially imposed on the peptide were slowly removed during 50 ps, the system
was equilibrated for another 100 ps at 300 K and 1 bar, it was heated to 500 K for
1190 33 Molecular Dynamics Simulations of Proteins and Peptides
50 ps, and cooled within 200 ps to 300 K. In this way 10 different starting struc-
tures were obtained for the 1 ns simulations of the cis–trans isomerization, which
was initiated by suddenly turning on the model potential driving the inversion re-
action. To obtain estimates for the equilibrium conformational ensembles in the cis
and trans states, 10 ns MD simulations were carried out at 500 K starting from the
best NMR structures modified by the equilibration procedure outlined above. At
this elevated temperature a large number of conformational transitions occur in
the peptide during the simulated 10 ns. In all simulations the TBC/RF approach
(cf. Section 33.2.2) was used with a dielectric constant of 45.8 modeling the
DMSO continuum at large distances (for further details see Ref. [57]).
Due to the lack of space, the methods applied for the statistical analysis of the
simulation data cannot be presented in detail here. Instead a few remarks must
suffice: The 10 ns trajectories at 500 K have been analyzed by a new clustering
tool, which is based on the computation of an analytical model of the configuration
space density sampled by the simulation. The model is a Gaussian mixture [59],
enables the construction of the free energy landscape and allows the identification
of a hierarchy of conformational substates. The associated minima of the free
energy are then characterized by prototypical conformations [57]. Concerning the
characterization of the photo-induced relaxation dynamics, we will restrict our
analysis to the temporal evolution of the total energy left in the cAPB molecule
after photoisomerization and will omit structural aspects.
33.4.2
Results and Discussion
Figure 33.9 shows the prototypical conformations of cis- and trans-cAPB in DMSO
at 500 K and 1 bar, which have been obtained from the 10 ns simulations by the
procedures sketched in the preceding section. Whereas cis-cAPB exhibits a series
hv
cis-cAPB trans-cAPB
Fig. 33.9. Conformational ensembles in the The gray-scale indicates the depth of the
cis and trans states of the cAPB peptide as respective minimum. Whereas there is only
obtained from 10 ns simulations at 500 K. one such minimum in trans, there are many in
Each backbone structure represents a different cis (see Ref. [57] for details).
local minimum of the free energy landscape.
33.4 Conformational Dynamics of a Light-switchable Model Peptide 1191
75.5
60
MD total energy (kcal mol )
–1
40 76.0
20
76.5
77.0
-20
-1 0 1 10 100 1000
t (ps)
Fig. 33.10. Comparison of the energy and as calculated by MD (black pixels and fit
relaxation kinetics following cis–trans to the MD trajectories, monitoring the total
isomerization as observed by femtosecond energy of the cAPB molecule) on a linear-
spectroscopy (black dots, monitoring the logarithmic time scale [39].
spectral position of the main absorption band)
1192 33 Molecular Dynamics Simulations of Proteins and Peptides
From a global fit of all spectroscopic data, covering a wide spectral range, four
time constants have been obtained for the relaxation processes, and the analysis
of the spectra has allowed to assign elementary processes to these kinetics [39].
Thus, according to spectroscopy, the ballistic isomerization proceeds within 230
fs, a second relaxation channel from the excited state surface to the electronic
ground state (not included in the MM model by construction) takes 3 ps, the dis-
sipation of heat from the chromophore into the solvent and the peptide chain
proceeds within 15 ps, and a 50 ps kinetics is assigned to conformational transi-
tions in the peptide. As shown in Ref. [60] further conformational transitions
in the peptide occur above the time scale of 1 ns until the relaxation process is
complete.
The limited statistics of the MD data shown in Figure 33.10 allows us to deter-
mine two time constants within the first nanosecond. These time constants are
280 fs for the isomerization of the chromophore and 45 ps for the conformational
relaxation of the peptide form the cis-ensemble towards the trans-conformation.
The corresponding two-exponential decay is depicted by the solid fit curve in Fig-
ure 33.10. Due to the proximity of the cooling kinetics (15 ps) to that of the confor-
mational dynamics (50 ps) these processes cannot be distinguished in the MD data
on cAPB. According to the simulations, after 1 ns the trans-conformation has not
yet been reached in most trajectories. Thus, within the first nanosecond the relax-
ation kinetics of the simulated ensemble quantitatively agrees with the observa-
tions to the extent, to which the limited MD statistics allows the identification of
kinetic constants.
Further above we have stated that this result has been somewhat frustrating for
us. Representing the theory group that had been involved in the design of the
model system and of its experimental characterization from the very beginning
about a decade ago, we had hoped that our integrated approach [39] would provide
hints as to how one has to improve the MD methods and MM force fields. This
hope had been a key driving force in the original set-up of this interdisciplinary
approach. Instead, the results have merely validated our MD methods, raising the
question whether (i) the integrated approach is sensitive enough to the details of
the MM force field or (ii) our skepticism with respect to the quality of MD descrip-
tions is justified.
To check these questions we have conducted a series of simulations in which cer-
tain details of the applied MM force fields were changed one by one [57]. From the
huge pile of results collected in this way we have selected a typical example which
provides enough evidence to answer both questions. In this example we have
changed a single model potential of the MM force field, that is, we have replaced
the stiff torsion potential E1 by the slightly weaker potential E2 (see Figure 33.8 for
plots of these function and Section 33.3.1 for further explanation). One can expect
that this potential is important for transmitting the mechanical strain from the
chromophore to the peptide after cis–trans isomerization. If it is stiff (E1 ), then
the strain is strong and the relaxation should be fast, and if it is weaker (E2 ) the
relaxation should become slowed down.
33.4 Conformational Dynamics of a Light-switchable Model Peptide 1193
E2
40
20
-20
-1 0 1 10 100 1000
t (ps)
Fig. 33.11. Comparison of the energy to the 10 MD trajectories (pixels) calculated
relaxation kinetics calculated by MD for two with E2 exhibits three kinetic constants.
slightly different force fields: The original Besides showing a 190 fs ballistic isomeriza-
model potential E1 is replaced by E2 (cf. Figure tion (exp: 230 fs), it separates an 11 ps heat
33.8). The fit curve belonging to E1 has been dissipation (exp: 15 ps) from a 190 ps con-
taken from Figure 33.10. The dashed fit curve formational relaxation (exp: 50 ps).
The above expectation is confirmed by the MD results for the softer potential E2
shown in Figure 33.11 and by the kinetic data listed in the associated caption. In
the case of E2 , the prolonged time scale (190 ps) of the conformational relaxation,
which is caused by the slightly softer spring, has enabled us to additionally identify
the kinetics of heat dissipation (11 ps) in the data. The results demonstrate that a
small change of a single model potential can change the kinetic time constant cal-
culated for the conformational relaxation from 45 ps to 190 ps, that is, by a factor
of 4.
This example answers the two questions voiced above, because it provides evi-
dence that (i) our integrated approach is sensitive enough to detect small changes
in a force field and (ii) that our skepticism with respect to the quality of MM force
fields is well-justified. The latter is apparent from the fact that both fit procedures
for obtaining a torsion potential from DFT calculations, the one that has led to E1 ,
and the one yielding E2 , represent a priori valid choices. It was actually a matter of
luck that we happened to choose the right one for the first few simulations pub-
lished in Ref. [39]. As we now definitely know [57], only a series of further lucky
choices in the force field used for our original simulations enabled the surprisingly
perfect match. On the other hand, the perfect match obtained in our first attempt
to describes the light-induced relaxation in the cAPB peptide also demonstrates
that, in principle, MD simulations can give quantitative descriptions. In this sense
it gives credit to the method and justifies further efforts aiming at the improve-
ment of the force fields and the extension of time scales.
1194 33 Molecular Dynamics Simulations of Proteins and Peptides
Summary
As of today, MM force fields are not yet accurate enough to enable a reliable predic-
tion of protein and peptide structures by MD simulation. Here the key deficiency is
the neglect of the electronic polarizability in the MM force fields. However, MM-
MD is valid in restricted settings, for instance, when complemented by experimen-
tal constraints, when applied to systems of reduced complexity, or in the restricted
context of QM/MM hybrid calculations. In particular, the development of the DFT/
MM methods has generated key tools that are required for the improvement of the
MM force fields, because they allow highly accurate computations of intramolecu-
lar force fields of molecules polarized by a condensed phase environment. The lat-
ter has been demonstrated by computations of IR spectra in condensed phase,
whose accurate description critically depends on the quality of the computed force
field. For the validation of future MM force fields more specific experimental char-
acterizations of the materials are necessary. Thus, the improvement of the compu-
tational methods has to rely on close cooperation between scientists working in
experimental and theoretical biophysics, chemistry, biochemistry, and beyond.
Acknowledgments
Financial support by the Volkswagen-Stiftung (Project I/73 224) and by the Deut-
sche Forschungsgemeinschaft (SFB533/C1) is gratefully acknowledged. The au-
thors would like to thank W. Zinth, F. Siebert, L. Moroder, C. Renner, and J. Wacht-
veitl for fruitful discussions.
References
Berendsen, J. Mol. Biol. 1984, 176, 37 X. Daura et al., J. Am. Chem. Soc.
559–564. 2001, 123, 2393–2404.
16 M. Levitt, R. Sharon, Proc. Natl 38 H. Grubmüller, B. Heymann, P.
Acad. Sci. USA 1988, 85, 7557– Tavan, Science 1996, 271, 997–999.
7561. 39 S. Spörlein et al., Proc. Natl Acad. Sci.
17 A. Rahman, F. H. Stillinger, USA 2002, 99, 7998–8002.
J. Chem. Phys. 1971, 55, 3336–3359. 40 W. F. van Gunsteren, A. E. Mark,
18 G. Mathias, B. Egwolf, M. Nonella, J. Chem. Phys. 1998, 108, 6109–6116.
P. Tavan, J. Chem. Phys. 2003, 118, 41 H. J. C. Berendsen, Science 1996, 271,
10847–10860. 954–955.
19 I. G. Tironi, R. Sperb, P. E. Smith, 42 A. T. Brünger et al., Science 1987,
W. F. van Gunsteren, J. Chem. Phys. 235, 1049–1053.
1995, 102, 5451–5459. 43 A. Warshel and M. Levitt, J. Mol.
20 P. H. Hünenberger, W. F. van Biol 1976, 103, 227–249.
Gunsteren, J. Chem. Phys. 1998, 108, 44 P. Hohenberg, W. Kohn, Phys. Rev
6117–6134. B 1964, 136, 864–870.
21 T. A. Darden, D. York, L. Pedersen, 45 W. Kohn, L. J. Sham, Phys. Rev. A
J. Chem. Phys. 1993, 98, 10089–10092. 1965, 140, 1133–1138.
22 U. Essmann et al., J. Chem. Phys. 46 M. J. Frisch et al., Gaussian98,
1995, 103, 8577–8593. Gaussian, Inc., Pittsburgh PA, 1998.
23 B. A. Luty, I. G. Tironi, W. F. van 47 J. Neugebauer, B. A. Hess, J. Chem.
Gunsteren, J. Chem. Phys. 1995, 103, Phys. 2003, 118, 7215–7225.
3014–3021. 48 M. Nonella, P. Tavan, Chem. Phys.
24 P. H. Hünenberger, J. A. 1995, 199, 19–32.
McCammon, Biophys. Chem. 1999, 49 M. Nonella, G. Mathias, M.
78, 69–88. Eichinger, P. Tavan, J. Phys. Chem.
25 C. Niedermeier, P. Tavan, J. Chem. B 2003, 107, 316–322.
Phys. 1994, 101, 734–748. 50 M. Nonella, G. Mathias, P. Tavan,
26 C. Niedermeier, P. Tavan, Mol. J. Phys. Chem. A 2003, 107, 8638–8647.
Simul. 1996, 17, 57–66. 51 D. Bashford, D. A. Case, Annu. Rev.
27 E. Lindahl, B. Hess, D. van der Phys. Chem. 2000, 51, 129–152.
Spoel, J. Mol. Mod. 2001, 7, 306–317. 52 C. D. Snow, N. Nguyen, V. S. Pande,
28 G. Mathias et al., EGO-MMII Users M. Gruebele, Nature 2002, 420, 102–
Guide. Lehrstuhl für BioMolekulare 106.
Optik, LMU München, Oettingen- 53 R. H. Zhou, B. J. Berne, Proc. Natl
strasse 67, D-80538 München, Acad. Sci. USA 2002, 99, 12777–
unpublished. 12782.
29 G. A. Kaminski et al., J. Comput. 54 H. Nymeyer, A. E. Garcia, Proc. Natl
Chem. 2002, 23, 1515–1531. Acad. Sci. USA 2003, 100, 13934–
30 M. Marquart et al., Acta Crystallogr. 13939.
Sec. 1983, 39, 480–490. 55 B. Egwolf, P. Tavan, J. Chem. Phys.
31 B. Egwolf, P. Tavan, J. Chem. Phys. 2004, 120, 2056–2068.
2003, 118, 2039–2056. 56 S. Woutersen, P. Hamm, J. Phys.
32 L. Jorgensen et al., J. Chem. Phys. Chem. B 2000, 104, 11316–11320.
1983, 79, 926–935. 57 H. Carstens, Dissertation, Ludwig-
33 A. D. MacKerell et al., J. Phys. Chem. Maximilians-Universität München,
B 1998, 102, 3586–3616. 2004.
34 W. D. Cornell et al., J. Am. Chem. 58 C. Renner et al., Biopolymers 2000, 54,
Soc. 1995, 115, 9620–9631. 489–500.
35 G. A. Kaminski, R. A. Friesner, J. 59 M. Kloppenburg, P. Tavan, Phys. Rev.
Tirado-Rives, W. L. Jorgensen, J. E 1997, 55, 2089–2092.
Phys. Chem. B 2001, 105, 6474–6487. 60 S. Spörlein, Dissertation, Ludwig-
36 M. W. Mahoney, W. L. Jorgensen, Maximilians-Universität München,
J. Chem. Phys. 2000, 112, 8910–8922. 2001.
Part II
3
1
Paradigm Changes from ‘‘Unboiling an Egg’’
to ‘‘Synthesizing a Rabbit’’
Rainer Jaenicke
1.1
Protein Structure, Stability, and Self-organization
Max Perutz, in reviewing Anson and Mirsky’s discovery [1–3] that hemoglobin
fully recovers its functional properties after reversible acid denaturation, felt that
‘‘making a protein alive again is as impossible as inventing the perpetuum mobile’’;
this was in 1940 [4]. As we know today from 75 years practice since his first experi-
ments, Anson (Figure 1.1) was right, despite Kailin’s protest that ‘‘boiling an egg
does not denature its protein, it kills it!’’ [5]: proteins do have the capacity to spon-
taneously and autonomously organize their three-dimensional structure. At An-
son’s time this was a model, because the antivitalist creed that the composition of
proteins is revealed by analytical chemistry and their behavior, by physical chemis-
try (which in the last analysis depends on their structure [6]) was no more than a
guiding principle for chemists interested in biological material [7]. Svedberg had
just published the determination of the molecular weight of the first protein,
equine hemoglobin, making use of his newly developed analytical ultracentrifuge
[8]. With this data, it became clear that proteins are monodisperse entities with
well-defined molecular masses. To determine the detailed 3D structure of hemo-
globin at atomic resolution still took more than a generation [9]. Thus, the first
folding experiments were based on the assumption that an isolated homogeneous
component of mammalian blood, obtained by fractional crystallization, not only
had the intrinsic property of self-organization but also allowed one to quantify its
physiological activity in vivo in terms of cooperative ligand binding, allostery, Bohr
effect etc., following the reductionist’s creed and methodology. With this paradigm
in mind, the alchemy of protein folding addressed a wide range of fundamental
biological problems, tacitly assuming that the results of in vitro studies are biolog-
ically relevant. As sometimes happens, a few results of this bold hypothesis turned
out to be practicable in various ways, so that, in the long run, the scope switched
from pure physical to biological chemistry. After a generation or two, research and
development in biotechnology shifted the dimensions of in vitro folding experi-
ments from microliter pipettes and Eppendorf vials to pipelines and 5–10-m 3
tanks, nowadays used in the production of pharmaceutically or technologically im-
portant proteins. Frequently the final products are modified in one way or the
other to alter their long-term stability, their specificity, or other properties.
In this context, Anfinsen’s (Figure 1.1) classical experiments influenced the phi-
losophy of his followers right from the beginning by the fact that he started from
reduced ribonuclease A (RNaseA) rather than the natural enzyme with its cystine
cross-bridges intact. In spite of the chemical modification, the reshuffling process
led to the correctly cross-linked native state, thus proving that all the information
required for folding and stabilizing the native three-dimensional conformation of
proteins is encoded in their amino acid sequence [10]. Extending this conclusion
to the entire structural hierarchy of proteins, it was postulated that the primary
structure directs not only the formation of multiple noncovalent (‘‘weak’’) and
covalent (‘‘strong’’) bonds within a single polypeptide chain but also the interac-
tions between the subunits of oligomeric proteins, driving the sequential folding-
association reaction and stabilizing the final tertiary and quaternary structure.
In distinguishing between weak and strong interactions, it is important to note
that globular proteins in their natural aqueous environment show only mar-
ginal Gibbs free energies of stabilization. With numerical data on the order of 50
kJ mol1 , they are the equivalent of a small number of hydrogen bonds or hydro-
phobic interactions, or perhaps just one or two ion pairs. This holds despite the
fact that thousands of atoms may be involved in large numbers of attractive and
repulsive interactions, in total adding up to molecular energies a million times
larger than the above average value [11]. The reason for this apparent discrepancy
is that proteins are multifunctional: they need to fold, they serve specific modes of
action, and they provide amino acid pools in both catabolic and anabolic processes.
1.1 Protein Structure, Stability, and Self-organization 5
With this in mind, the point is that, given one single device, it is impossible to
simultaneously maximize the efficiency of all these ‘‘functions’’; nature has to com-
promise, and since stability, biological activity, and turnover all require a certain de-
gree of flexibility, stability (equivalent to rigidity) cannot win.
Recent advances in our understanding of the different types of weak interactions
involved in the self-organization and stabilization of proteins were derived from
theoretical studies on the one hand ([12–15], cf. R. L. Baldwin, Vol. I/1) and from
the analysis of complete genomes of phylogenetically related mesophiles and
(hyper-)thermophiles, as well as systematic mutant studies, on the other [16–18].
Apart from the latter ‘‘quasi-Linnaean approach,’’ protein engineering provides us
with a wealth of alternatives to crosscheck the genomix results [19–23]; in combin-
ing and applying these approaches to technologically relevant systems, increases in
thermal stability of up to 50 C have been accomplished [24]. In contrast to such
success stories, Kauzmann’s catalog of weak intermolecular interactions [25] has
not changed since 1959, and a 2004 ‘‘balance sheet of DGstab contributions to fold-
ing’’ would be as tentative or postdictive as J. L. Finney and N. C. Pace’s classical
attempts to predict DGstab for RNase and lysozyme in 1982 and 1996 [26, 27]. What
has become clear in recent years is that increases in overall packing density, helic-
ity, and proline content (i.e., destabilization of the unfolded state) and improved
formation of networks of H-bonds and ion-pairs are important increments of pro-
tein stabilization. In the long-standing controversy regarding the prevalence of the
various contributions, evidently all components are significant. In the case of
hydrophobic interactions, entropic (water-release) and enthalpic (London-van der
Waals) effects seem to be equally important [28, 29]. That hydrogen bonding and
ion pairs are essential is now generally accepted [27, 30, 31]; surprisingly, recent
evidence seems to indicate that polar-group burial also makes a positive contribu-
tion to the free energy of stabilization [32].
As has been mentioned, the balance of the attractive and repulsive forces yields
only marginal free energies of stabilization. They may be affected by mutations or
changes in the solvent, the latter often highly significant due to their effect on the
unfolding kinetics. The reason is that by adding specific ions or other ligands, the
free energy of activation ðDGB stab Þ of the N ! U transition, i.e., the free-energy
difference between the native and the transition states, may be drastically in-
creased. The resulting ‘‘kinetic stabilization’’ has been shown to be essential for
the extrinsic stabilization or longevity of the native state as well as for the folding
competence of conjugated and ultrastable proteins [16, 33, 34].
From the thermodynamic point of view, protein folding is a hierarchical process,
driven by the accumulation of increments of free energy from local interactions
among neighboring residues, secondary structural elements, domains, and sub-
units (Scheme 1.1). Domains represent independent folding units. Correspond-
ingly, the folding kinetics divide into the collapse of subdomains and domains
and their merging to form the compact tertiary fold. Establishing the particular
steps allows the folding pathway for a given protein to be elucidated. In proceeding
to oligomeric proteins, docking of structured monomers is the final step. In agree-
ment with this mechanism, in vitro experiments have shown that the overall fold-
6 1 Paradigm Changes from ‘‘Unboiling an Egg’’ to ‘‘Synthesizing a Rabbit’’
1.2
Autonomous and Assisted Folding and Association
The contributions to the present volume deal with the above side reactions in the
self-organization of proteins as well as the helper proteins involved in either avoid-
ance of misfolding or the catalysis of rate-determining steps along the folding path-
way. Reading the seminal papers, and having in mind the procedures that were ap-
plied in early protein denaturation-renaturation experiments, it becomes clear that
the pioneers in the field were aware of the possible role of accessory components
long before they were discovered: templates assisting the nascent polypeptide chain
1.2 Autonomous and Assisted Folding and Association 7
Fig. 1.2. Kinetic competition between folding corresponding native enzyme (data from Ref.
and aggregation. (A) Effect of protein concen- [38]). (B) With increasing protein concentration,
tration on the renaturation of a-glucosidase the yield of reactivation of lactate dehydro-
after urea denaturation, illustrating the parti- genase after acid denaturation in 0.1 M H3 PO4 ,
tioning between ‘‘renativation’’ and aggregation. pH 2, and subsequent renaturation in
Denaturation in 8 M urea, 10 mM K-phosphate phosphate buffer pH 7.0 decreases (b) and
pH 7.7, 1 h incubation at 20 C; reactivation in aggregation increases (s) in a complementary
10 mM Na-phosphate pH 7.6, 10 C, 30 mM way [37]. As indicated by the dotted line, to a
NaCl, and 8% ethylene glycol, measured first approximation, the kinetic competition
after b24 h renaturation. (j) refers to wild- can be modeled by Eq. (5), i.e., by sequential
type a-glucosidase and (b) to the fusion pro- unimolecular folding (k 1 ¼ 0:01 s1 ) and
tein containing a C-terminal hexa-arginine tail, (diffusion-controlled) bimolecular aggregation
which allows the stabilization of the enzyme on (k2 ¼ 10 5 M1 s1 ), with molar concentrations
heparin-Sepharose. The yield of reactivation is based on a subunit molecular mass of 35 kDa
expressed as a percentage of the activity of the [39].
on its pathway toward the native conformation, or shuffling enzymes and foldases as
catalysts were postulated or even isolated early in the game [49, 50]. However, they
were ignored for the simple reason that there seemed to be no real need for help-
ers, because the spontaneous reaction was autonomous and fast enough to proceed
without accessory components. It took more than two decades to change the para-
digm from Cohn and Edsall’s purely physicochemical approach [7] to considera-
tions comparing optimal in vitro conditions with the situation in the cell [51, 52].
That catalysts or chaperones might be biologically significant was suggested by
three observations (Figure 1.3). First, experiments with mouse microsomes de-
pleted of Anfinsen’s ‘‘shuffling enzyme’’ proved protein disulfide isomerase (PDI) to
be essential for protein folding in the endoplasmic reticulum (ER) [55]. Second,
the unexpected bimodal N $ I $ U kinetics of the two-state N $ U equilibrium
transition of ribonuclease (RNaseA) and its physicochemical characteristics sug-
gested proline isomerization to be involved in protein folding [56, 57]; in fact, the
enzyme peptidyl-prolyl cis-trans isomerase (PPI) was found to catalyze rate-limiting
steps on the folding pathway of proteins [58–60]. Third, increasing numbers of
8 1 Paradigm Changes from ‘‘Unboiling an Egg’’ to ‘‘Synthesizing a Rabbit’’
Fig. 1.3. Folding catalysts and chaperones. signal is caused by aggregation of the reduced
(A) Acceleration of the oxidative refolding of and unfolded protein. Curves calculated for
ribonuclease T1 by PPI and PDI, monitored by single first-order reactions with time constants
tryptophan fluorescence at 320 nm: 2.5 mM t ¼ 4300 s (f), 2270 s (b), 1500 s (C), and
RNaseT1 in 0.1 M Tris-HCl pH 7.8, 0.2 M 650 s (s) [53]. (B) Reactivation of citrate
guanidinium chloride, 4 mM reduced/0.4 mM synthase (CS) after preceding denaturation in
oxidized glutathione, 25 C. Re-oxidation in the 6 M guanidinium chloride in the presence of
absence of PPI and PDI (f), in the presence of GroEL without GroES and ATP (n), with GroES
1.4 mM PPI (b), in the presence of 1.6 mM PDI and ATP (f), and with GroES and ATP added
(C), in the presence of 1.4 mM PPI plus 1.6 after 55 min (C). Parallel light-scattering
mM PDI (s), and in the presence of 10 mM measurements show that in the absence of the
dithioerythritol (to block disulfide bond components of the GroE system, CS forms
formation) (j); the slight decrease in the latter inactive aggregates [54].
were developed, and mathematical models for understanding the system’s behavior
were formulated. Crowning the reductionist approach, the data were finally put to-
gether, ending up with the sequences and cycles of reactions that constitute the
awesome networks of metabolic pathways and information transfer, including
their regulation and energetics.
Faced with these Herculean efforts on the one hand, and the pressure in present-
day paper production on the other, we reach the latest paradigm change: from the
reductionist detail to the simplistic concept that by replacing the atomic complexity
of molecules with the information pool encoded in genomes, proteomes and their
corollaries at the metabolic, regulatory, and developmental levels might help to
solve the riddle of life just by simplifying proteins and their interactions to spheres,
squares, triangles, and arrows (as symbols for interactions).1
In the minds of some of the above ‘‘heroes,’’ this change started long before the
-omix age: in 1948, at the end of his lectures on The Nature of Life, the alternative to
mere speculation or getting your hands dirty with experiments led Albert Szent-
Györgyi to ironically promise his audience that next time he would pull a synthetic
rabbit out of his pocket [64].
In the present context of protein folding vs. protein misfolding, it seems appro-
priate to start with a critical discussion of the physicochemical principles of self-
organization, with special emphasis on the kinetic competition of folding and
aggregation that led nature to evolve folding catalysts and chaperones. Combining
the structural hierarchy of proteins with the processes involved in folding and as-
sociation (cf. Scheme 1.1), it has been shown that commonly small (single-domain)
proteins can fold without helpers, in many cases undergoing fully reversible equi-
librium transitions without significant amounts of intermediates. In contrast, large
(multi-domain) proteins and subunit assemblies, owing to the above-mentioned
kinetic competition of intra- and intermolecular interactions in the processes of
folding and association, require careful optimization to reach high yields; at high
protein levels and in the absence of accessory proteins, off-pathway reactions take
over (cf. Figure 1.2B) [46–48, 61].
To the uninitiated, the above hierarchical scheme might suggest that the self-
organization of proteins necessarily occurs along a compulsory pathway with well-
separated consecutive steps, in contrast to the alternative random-search hypothe-
sis, which favors the idea that a jigsaw puzzle might be a more appropriate way to
model protein folding [65]. When inspecting available kinetic data, it becomes
clear that there is a wide variance of mechanisms that suggest ‘‘energy landscapes’’
rather than simple two-dimensional energy profiles as adequate pictographs to il-
lustrate folding paths [13, 66–68] (Figure 1.4). Relevant experimental facts include
1 For a critical discussion of the dominating role of the ‘‘pictorial molecular paradigm’’ in the
biosciences and its philosophical implications, including the question as to whether chemical
formulae and graphical representations of biopolymers such as proteins are objective
representations of reality, see Refs. [62, 63].
10 1 Paradigm Changes from ‘‘Unboiling an Egg’’ to ‘‘Synthesizing a Rabbit’’
Fig. 1.4. Free-energy profiles and energy (I and II), and folded (N) forms of the protein
landscapes for protein folding and unfolding. and for the transition states of the SH ! SS
(A) Relationships between the free energies of reshuffling reaction. The reaction coordinate is
activation for folding ðDG 0B f Þ, and unfolding represented by the three-dimensional downhill
ðDG 0B u Þ and of the equilibrium free energy sequence of arrows. The species I and II
ðDG 0 Þ, using conventional transition-state include the one- and two-disulfide interme-
theory. (B) ‘‘Energy-landscape’’ presentation diates; Nð14SH;38SHÞ and Nð30SH-51SHÞ are native-
describing the oxidative in vitro folding like intermediates [34]. (C) Hypothetical confor-
pathway of basic pancreatic trypsin inhibitor mational energy plot illustrating local energy
(BPTI) according to Creighton [68, 69]. The minima and barriers along a possible protein-
apparent free energies (in kcal mol1 ) are folding pathway. The lower ‘‘rugged potential’’
given for the fully reduced (R), intermediate depicts the vicinity of the native state [13].
the following. (1) Folding as a cooperative process rarely allows true intermediates
to be accumulated to a level suitable for a detailed characterization; this finding
does not exclude ordered pathways but just requires a proper choice of conditions
to optimize the population of intermediates. (2) For methodological reasons, the
analysis of next-neighbor interactions in the folding process is prone to ‘‘in vitro
artifacts,’’ because anomalous intermediates may be populated (see the classical
studies devoted to the folding pathway of basic pancreatic trypsin inhibitor [69]
and to the early events in the flash-induced folding of cytochrome c [70, 71]). (3)
For lysozyme, it has been shown that folding in fact follows multiple pathways
[72]. (4) Finally, in the case of bacterial luciferase, it appears that the kinetic com-
petition of folding and association of the ab-heterodimer constitute a trap on the
folding path that guides the individual subunits to the energy minimum of the
active dimer [73]. Taken together, these and other examples seem to indicate that
individual proteins show individual folding characteristics, an observation that also
holds for the stability and stabilization of proteins. Evidently, any generalization in
the world of proteins has its limitations, so that firm statements regarding ‘‘the
in vitro/in vivo issue’’ or the wealth of expressions for ‘‘different types of molten
globules’’ or the distinction between ‘‘short-range and long-range intermolecular
forces’’ should either be ignored or at least taken with a grain of salt.
1.3 Native, Intermediate, and Denatured States 11
1.3
Native, Intermediate, and Denatured States
Before entering today’s jungle of protein folding, with its roots deep in the solid
ground of topology, thermodynamics, molecular dynamics, and kinetics from the
microsecond to the minutes, hours, or days time range [74] (cf. Vol. I) and its
youngest branches in present-day mainstream biotechnology and molecular medi-
cine, it seems appropriate to briefly remind the reader what the starting points look
like and what kind of problems motivated the aborigines to settle in the woods
before the age of genomics, protein design, and folding diseases. The challenging
questions were as follows. Is there a correlation between the 1D information at
the level of the gene and the 3D structure of the correspondding protein? What
does denaturation of proteins mean? What do the native and denatured states
imply with respect to the folding mechanism of proteins? What are the best
denaturation-renaturation conditions with minimal side reactions, and which crite-
ria are best suited to characterize the authentic (native) state? What are the limits
of in vitro protein folding and reconstitution, with respect to size, compartmenta-
tion, and structural complexity? Here are just a few short answers that might help
in setting the frame for the crucial questions concerning the ‘‘folding pathology’’
in vitro and in vivo.
Owing to the large size of protein molecules and the unresolved problem of how
to summarize the weak attractive and repulsive interatomic forces in a unique po-
tential function, theorists have been unsuccessful in the a priori solution of the
protein-folding code that would allow one to translate amino acid sequences into
their corresponding three-dimensional structures. Considering the present-day out-
put of solved X-ray and NMR structures per anno and the promise of developing
high-throughput techniques, ‘‘cracking the protein folding code’’ has evidently
lost its former challenging priority. The reason for this is simply that in order to
understand the physical and functional properties of a protein at the atomic level
or to acquire insight into the mechanistic details of enzyme catalysis, a resolution
of <2 Å is required. This level of accuracy cannot be accomplished by any predic-
tive approach [75].
The question of whether the physical state of a protein obtained either from the
crystalline state or at high protein concentration in dilute buffer is related to or
indistinguishable from its functional state in vivo has been extensively discussed
since Kendrew, Perutz, and Phillips came up with their first 3D structures. Hardly
any scientific result has ever been challenged to this extent; the final answer has
been fundamentally positive [76], with one important limitation. Owing to their
marginal stability in solution, any protein undergoes steady N $ U transitions be-
tween the native and unfolded states. Under physiological conditions, the less-
structured state is unstable and the protein reassumes its native configuration.
Thus, proteins in their native functional state are dynamic systems exhibiting a
high degree of local flexibility; ligand binding, allosteric transitions, and catalytic
mechanisms depend on the interconversion of multiple states. The detailed quan-
titative understanding must necessarily take into account the structures of these
12 1 Paradigm Changes from ‘‘Unboiling an Egg’’ to ‘‘Synthesizing a Rabbit’’
alternative states and the energetics of their interconversion. The structural fluctu-
ations are on the order of 2 Å on a picosecond timescale. In the case of myoglobin,
they allow the funneling of O2 to the heme group in terms of ‘‘functionally im-
portant motions’’ [77, 78]. Whether structural heterogeneity is determined by ther-
mal fluctuations or represents well-populated conformational substates (defined by
high activation barriers beyond the thermal energy) is still under debate (Figure
1.4C). Attempts to characterize the physical and chemical properties of single mol-
ecules seem to support the second alternative [79, 80].
The term protein denaturation is essentially vague, because proteins do not un-
fold to a simple reference state [47, 81]. Instead, various modes of denaturation
lead to different denatured states, each of them consisting of an astronomically
large ensemble of different conformations but closely similar energy. Their com-
mon denominator is that they are solvated to a higher extent than the native state.
Considering the amount of hydrophobic residues present in most soluble proteins,
water is a poor solvent. Structure formation in the cytosol or in aqueous buffers is
driven by this very fact. Making use of heteropolymers with approximately equal
amounts of polar and nonpolar residues, nature allows solvation to balance by ex-
posing hydrophilic groups to the aqueous solvent, while at the same time minimiz-
ing the hydrophobic surface. Solubilization of the inner core by denaturants leads
to unfolding; however, it is obvious from the above balance of hydrophobic and
hydrophilic amino acids that complete solvation cannot be accomplished. For this
reason it is doubtful whether a polypeptide chain will ever be ‘‘fully randomized.’’
Evidence from Tanford’s observation that the intrinsic viscosity of denatured pro-
teins obeyed Einstein’s equation for spherical particles was weak from the begin-
ning [82]. Even for the nascent protein in the process of translation, the conforma-
tional space is expected to be limited, because of the space-filling properties of the
nascent polypeptide and its side chains, which do not allow all j=c angles in the
Ramachandran plot to be occupied [83]. From the practical point of view, it is im-
portant to keep in mind that the extent of unfolding of globular proteins depends
not only on the denaturant but also on the physical conditions of the solvent. For
the chaotrope, concentration and environmental parameters such as pH and tem-
perature are trivial; however, specific stabilizing ligands or kinetic barriers may be
essential as well [34, 84–86]. When analyzing the mechanisms of evolutionary
adaptation to the wide variety of extreme physical conditions in nature, it becomes
clear that extremophiles took advantage of the specific differences in protein stabil-
ity [16, 33].
The fact that U is conformationally heterogeneous gives rise to multiple folding
reactions in going from the unfolded to the native state. Partially folded intermedi-
ates (I) are to be expected as transients in the individual folding reactions. To find
out whether intermediates are populated and what their physical properties are, it
is mandatory to use different probes to follow folding and to vary the conditions
for comparison. The above-mentioned bimodal N $ I $ U folding kinetics of
RNaseA [57–60] exemplified a well-established three-state system, with populated
cis-trans proline isomers as structural intermediates; Figure 1.5 shows stopped-flow
NMR spectra of dihydrofolate reductase from E. coli to illustrate the structural de-
1.4 Folding and Merging of Domains – Association of Subunits 13
Fig. 1.5. Protein-folding kinetics studied by suggest the formation of an intermediate that
NMR spectroscopy. Stopped-flow 19 F NMR of is unable to bind NADþ strongly, having a
the refolding of 6- 19 F-tryptophan-labeled native-like side chain environment in the
dihydrofolate reductase following dilution from regions around Trp 30, 47, and 133 and little if
5.5 M to 2.75 M urea at 5 C in the presence any native side chain environment around Trp
of 4 mM NADPþ . Each spectrum represents 22 and 74. The resonance labeled 47i is that of
the sum of 41 separate rapid dilution Trp47 in the intermediate [87].
experiments. The kinetics and chemical shifts
tail and the time resolution for a specific system [87]. For further details, see Refs.
[88, 89].
1.4
Folding and Merging of Domains – Association of Subunits
Domains as compact substructures within protein molecules show the typical char-
acteristics of small globular proteins. There have been many definitions using
either the visual inspection of crystal structures or surface area calculations, topo-
logical distance considerations, sequence homology at the protein level, or the
intron-exon organization at the level of the gene. Among all these definitions [90],
the one that is most relevant in connection with the folding and assembly of large
proteins refers to the observation that domain proteins ‘‘fold by parts,’’ i.e., their
constituent parts fold and unfold independently as individual entities in a stepwise
fashion [91] (Figure 1.6). This holds not only in vitro but also in the cell, where the
14 1 Paradigm Changes from ‘‘Unboiling an Egg’’ to ‘‘Synthesizing a Rabbit’’
Evidently, for some large substrates, domains are the only way to tackle the prob-
lem of binding, activating, and releasing the substrate. The classical example to
illustrate this combinatorial playground of evolution is the NAD-dependent dehy-
drogenases, with the Rossmann fold as coenzyme-binding domain and the wide
variety of different substrate-binding domains determining the redox specificity of
the various enzymes [93]. The key to our understanding of multi-domain proteins
is multifunctionality as the common denominator of regulated networks such as
the basal membrane, the blood-clotting system, and the non-ribosomal synthesis
of peptide and polyketide antibiotics [90]. It is obvious that evolution has gained
an enormous combinatorial potential by domain shuffling.
Regarding their folding and association mechanism, multi-domain and multi-
subunit systems have been modeled as the sum of their constituent parts plus con-
tributions attributable to their mutual interactions. This approach sounds trivial
and easily accomplishable, but it is not, for the simple reason that both folding
and assembly may involve multiple-intermediate species differing in either their
local conformation or their state of association. Thus, in most cases the two-state
model does not hold. Only under ‘‘strongly native conditions,’’ i.e., far away from
the denaturation transition, can the denaturation-renaturation reaction be approxi-
mated by the two-state equation,
N,U ð1Þ
leaving first-order folding as the only significant process. For modular proteins,
this may be attributed to the last folding step connected with the docking of the
domains; for multimeric proteins, it represents the formation of the ‘‘structured
monomers’’ with the correct complementary interfaces required for the specific
recognition of subunits. In proceeding from structured monomers to the native
quaternary structure, subunit association represents a bimolecular reaction that
adds concentration-dependent steps to the first-order folding processes. If biologi-
cal function (as the criterion of the native state) requires the cooperation of do-
mains, domain pairing is expected to be essential in acquiring catalytic activity.
Taking NAD-dependent monomeric octopine dehydrogenase as an example, it has
been shown that at high solvent viscosity, domain pairing becomes rate-limiting; as
one would predict from the Stokes-Einstein model of rotational and translational
diffusion of rigid bodies, at high viscosity (in the presence of glycerol, sucrose, or
polyethylene glycol) reactivation of the enzyme after preceding denaturation is
slowed down [34, 89, 90, 94]. Thus, the overall folding kinetics for a two-domain
protein may be determined by the folding and pairing of the domains according to,
kD1 kD2 k pairing
U ! I ! N 0 ! N ð2Þ
with kD1 ; kD2 , and k pairing as rate constants for the folding of the individual domains
and their pairing. In cases in which proline isomerization or other slow reactions
participate in the overall mechanism, these steps will lead to kinetic schemes of
even higher complexity [94].
16 1 Paradigm Changes from ‘‘Unboiling an Egg’’ to ‘‘Synthesizing a Rabbit’’
Fig. 1.7. Hierarchical folding of proteins as a lytic activity requires the dimeric quaternary
multi-step reaction involving the collapse of structure; thus, reactivation is concentration-
the unstructured polypeptide and subsequent dependent: c, b, and a refer to decreasing
shuffling (and eventual assembly) of a ‘‘molten subunit concentrations (68, 17, and 8.5 nM)
globule’’ intermediate. (A) Folding-association during reactivation. The co-translational
pathway of invertase from Saccharomyces glycosylation of the enzyme does not affect
cerevisiae: changes in circular dichroism (s), the folding/association mechanism [48, 95].
fluorescence (f), and enzymatic activity (open (B) Two-step model for sequential protein
circles) allow us to monitor the sequential folding. The hypothetical structures of a
formation of (1) a-helices, (2) the correct protein in the unfolded (U), molten globule
environment of aromatic side chains, and (3) (MG), and native states (N) are shown with
the correct, ‘‘native’’ tertiary (and quaternary) the free-energy landscapes at different stages
structure. In the case of yeast invertase, cata- of structure formation [96].
some, unavoidable annoyance that you just had to live with. When I started my
own projects in physical biochemistry 40 years ago, my primary interests were the
intermolecular forces involved in heat aggregation and the question of how protein
denaturation and aggregation are correlated [97]. In asking these basic questions, I
was in good company [98, 99]; however, in contrast to protein aggregates, the inter-
pretation of protein denaturation and the question which factors are responsible for pro-
tein stability have turned out to be insoluble to this day. The formal kinetic treat-
ment is obvious: in the simplest case of a dimer, the overall sequential reaction
obeys either uni-bi-molecular or uni-bi-uni-molecular kinetics according to:
k1 k2 k1 k2 k1 0
2 Mu ! 2 Mi ! N or 2 Mu ! 2 Mi ! ðMi Þ2 ! N ð3Þ
ðMu Þm ðMi Þn
In Eqs. (3) and (4), k 1 and k2 ; k>2 are first- and second or higher-order rate
constants; Mu ; Mi , and M 0 are the monomers in the unfolded, intermediate, and
‘‘structured’’ states, respectively, and N is the native dimer. Higher oligomers have
been shown to obey the same kinetic scheme because commonly there is only one
rate-limiting assembly step in going from the structured monomer to the native
oligomer (Figure 1.8).
An enormous repertoire of methods has been devised to analyze the single
steps along the paths of folding, association, and aggregation. By applying intrin-
sic and extrinsic spectral characteristics, flash photolysis, H-D exchange, chemical
cross-linking, light-scattering and other kinetic techniques, sequential folding steps
have been resolved on the timescale down to microseconds and below. In all these
approaches, possible artifacts due to the interference of intra- and intermolecular
interactions have to be carefully excluded. Considering these precautions, different
methods have been shown to be in excellent agreement, corroborating the above
sequential folding-association mechanisms. In the case of combined folding-
association reactions, obviously, the alternative of which of the consecutive or com-
peting reactions is rate-limiting depends not only on protein concentration but also
on the specific properties of the system. For example, in the case of tetrameric lac-
tate dehydrogenase, the reactivation kinetics after acid denaturation and maximum
randomization in 6 M guanidinium chloride differ drastically: acid denaturation
starts from a partially active ‘‘structured monomer,’’ leaving the association reac-
tion as the rate-determining step, whereas reactivation from the inactive unfolded
polypeptide chain follows the sequential uni-bimolecular folding-association mech-
anism [34, 101]. As expected from the above ‘‘kinetic partitioning’’ between proper
18 1 Paradigm Changes from ‘‘Unboiling an Egg’’ to ‘‘Synthesizing a Rabbit’’
Fig. 1.8. Alternative mechanisms of the (B) Reconstitution kinetics of m-MDH after
sequential folding and association of a dimeric denaturation in 1 M glycine/H3 PO4 , pH 2.3;
protein: a case study on cytoplasmic (s-MDH) renaturation as in (A), at 0.07 (f), 0.14 (j),
and mitochondrial malate dehydrogenase 0.35 (C), 1.2 (b), 3.1 (n), and 5.0 (s)
(m-MDH). (A) Reconstitution kinetics of mg mL1 . Curves calculated according to
s-MDH in 0.2 M phosphate buffer pH 7.6 after Eq. (3), with k 1 ¼ 6:5 104 s1 and k2 ¼
denaturation in 6 M guanidinium chloride in 3 10 4 M1 s1 (20 C) [97]. Varying the
the concentration range between 1 and 13 mode of denaturation (low pH, high
mg mL1 . The process is governed by rate- guanidinium chloride, urea) or adding the
determining folding, yielding structured coenzyme (NADþ ) has no effect on the
monomers that (in a subsequent diffusion- renaturation mechanism [100].
controlled step) form the active dimer [34].
teins giving rise to inclusion bodies), they may form gel-like phases, which in the
early days of protein downstream processing were appropriately named ‘‘refractile
bodies.’’ They resemble the native protein in certain spectral properties, as far as
turbidity allows such conclusions [37, 103].
1.5
Limits of Reconstitution
The assumption that ‘‘unboiling the egg,’’ after preceding deactivation, randomiza-
tion, and coagulation (or whatever Max Perutz might have associated with ‘‘boil-
ing’’), leads to the unrestrained recovery of proteins to their ‘‘initial native state’’
has been accepted as a working hypothesis since Anson and Mirsky’s days [1]. In
1944, when the first review on protein denaturation was published, the authors
came to the conclusion that ‘‘. . . certain proteins are capable of reverting from the
denatured insoluble state to a soluble form which resembles the parent native pro-
tein in one or more properties [while] other proteins appear to be incapable of this
conversion’’ [104]. Thirty years later, when the author used this argument, Harold
Scheraga’s reply was ‘‘you didn’t try long enough.’’3 In the meanwhile, the situa-
tion had shifted from the playground of two lonely researchers at the Princeton
Institute for Animal and Plant Pathology to a topologically and functionally impor-
tant hot issue, called the protein-folding problem [105]. To give an example, the
question at which state of association on the folding-association pathway an oligo-
meric enzyme gains enzymatic or regulatory properties required optimal reversibil-
ity of both dissociation-reassociation and deactivation-reactivation. Optimization
was pure alchemy, blended with some insight into standard denaturation mecha-
nisms and the physicochemical basis of protein stability. There were at least five
variables, and the initial and final products had to be carefully compared using all
available physical and biochemical characteristics [106]. Proving the native state
the proper way, i.e., by X-ray analysis, was the only thing we were unable to do;
later we found out that the commercial ‘‘crystallized enzyme’’ we had started out
with had been purified using a routine denaturation-renaturation cycle.
As mentioned earlier (cf. section 1.3), different denaturants yield different dena-
tured states. Commonly, maximum yields of renaturation are obtained after ‘‘com-
plete randomization’’ (e.g., at high guanidinium chloride concentration), under
essentially irreversible conditions. Exceedingly low protein levels are necessary to
minimize aggregation. To escape this highly uneconomical requirement, i.e., to in-
crease the steady-state concentration of the refolding protein at low levels of aggre-
gation-competent intermediates, recycling, pulse dilution, and immobilization
techniques were devised [84]. False intermediates, trapped under strongly native
1.6
In Vitro Denaturation-Renaturation vs. Folding in Vivo
In the last chapter I tried to summarize certain limits where the physicochem-
ist’s optimism that biological morphogenesis is just a variant of abiotic self-
organization might be wrong. At this point it is amusing to meet respectable biol-
ogists who, faced with the ‘‘in vitro/in vivo issue,’’ happily welcome Aristotle again,
assuming that in the crowded cell the vitalistic doctrine is better than Descartes’
radicalism. Having in mind that in vivo folding is part of the cellular process of
protein biosynthesis and subsequent compartmentation, undoubtedly, there are
significant differences between the co- and post-translational structure formation
on the one hand, and optimally controlled reconstitution processes involving
mature proteins in vitro on the other. There are two questions: are these differ-
ences relevant, and can we combine available fragmentary evidence from in vitro
translation data with in vivo data on other cell-biological events related to self-
organization, growth, and differentiation? In his excellent review on protein fold-
ing in the cell, Freedman [112] related the paradigm change from Anfinsen’s
dogma ‘‘one sequence ! one 3D structure’’ to the ‘‘Anfinsen-cage’’ concept in
chaperon research to two experimental observations: (1) the formation of mis-
folded recombinant proteins in microbial host cells and (2) the fate of translation
products and their ultimate functional intra- or extracellular location. The wish to
understand protein folding in the context of the wide variety of co- and posttransla-
tional processes clearly emphasized that in vitro refolding studies on isolated un-
folded proteins provide an incomplete model for describing protein folding in the
22 1 Paradigm Changes from ‘‘Unboiling an Egg’’ to ‘‘Synthesizing a Rabbit’’
cell. In the context of the above observations, two examples may be taken to prove
that the model studies do provide essential background against which the cellular
events can be understood. One deals with the attempt to gain direct insight into
the detailed kinetic mechanism of the folding and assembly of the tailspike endo-
rhamnosidase from Salmonella bacteriophage P22 in vitro and in the cell. Mak-
ing use of temperature-sensitive mutants, spectral and pulse-labeling experiments
showed that in both sets of experiments the rate-limiting folding intermediates are
formed at identical rates [47] (Figure 1.9). In the other example, it was shown that
misfolding and subsequent inclusion-body formation at high levels of recombinant
Fig. 1.9. In vitro and in vivo folding and (B) Folding yields of wild-type and mutant
assembly of the trimeric tailspike endorhamno- Tsp upon biosynthesis in vivo on the one hand
sidase (Tsp) of Salmonella bacteriophage P22, (left), and dilution from denaturant in vitro
making use of the temperature-sensitive (right) on the other. The similarity of the data
folding and aggregation of folding intermedi- shows that the in vitro reconstitution applies
ates. (A) Schematic representation of the to the situation in the living cell [113, 114].
folding/assembly/aggregation pathways.
1.6 In Vitro Denaturation-Renaturation vs. Folding in Vivo 23
4 An additional spin-off of this approach was that immobilization allowed the kinetic partitioning
to be shifted from A to N (cf. Eq. (5)): the tagged proteins showed not only significantly increased
yields of reactivation but also enhanced intrinsic stabilities.
24 1 Paradigm Changes from ‘‘Unboiling an Egg’’ to ‘‘Synthesizing a Rabbit’’
rious solvent effects. Thus, if the native structure can be reached either after re-
combinant expression or after refolding in vitro, intermediates and subunits must
be stable under the respective conditions. Considering temperature, it turns out
that alterations in the folding conditions often have surprisingly little effect. A
drastic example is the expression of active enzymes from hyperthermophilic bacte-
ria in Escherichia coli, in which the temperature difference DTopt between the host
and the guest may amount to >60 C. Here, it has been shown that denaturation-
renaturation experiments at up to 100 C yield the fully active protein, indistin-
guishable from its initial native state, proving that all folding-association inter-
mediates must be stable over the whole temperature range [16]. When comparing
homologous proteins from mesophiles and thermophiles, it is interesting to note
that thermophilic proteins are often kinetically stabilized, showing drastically
decreased unfolding rates but unchanged folding kinetics [18], no doubt highly
advantageous considering short-term exposure to lethal temperatures in hydrother-
mal vents on the one hand, and kinetic competition of protein folding and proteo-
lytic degradation on the other.
Little is known about the effects of the other extreme physical parameters on
protein folding. Because the cytosolic pH is close to neutrality, even in the case of
extreme acidophiles and alkalophiles, standard in vitro conditions are close to those
in the cell. Halophilic proteins seem to require salt, not only as stabilizing agent
but also for folding and assembly [117]. For non-halophilic proteins, the ionic
strength is not as critical, apart from general Hofmeister effects that may be used
to optimize the in vitro folding conditions. Regarding possible effects of high
hydrostatic pressure on cellular processes, it is obvious that the adaptive effort
to cope with the biologically relevant extremes of pressure (<120 MPa) can be
ignored in comparison to the adaptation to low temperature prevailing in the
deep sea [118]. Finally, viscosity should be considered because of macromolecular
crowding in the cytosol. As has been mentioned, glycerol, sucrose, and other carbo-
hydrates have been used to simulate frictional effects on domain pairing. In con-
nection with protein folding, information is scarce, especially because polyols not
only increase the viscosity of the solvent but also enhance the stability of the solute
owing to excluded-volume effects and preferential solvation; there is no way to alter
viscosity alone.
1.7
Perspectives
As we have witnessed, interest in the protein-folding problem started from the sen-
sational resurrection of denatured proteins in Tim Anson’s hands, long before the
first 3D structure of a globular protein was elucidated. After self-organization had
become an accepted fact, experimental studies were centered mainly on the inves-
tigation of the mechanisms of folding and association of model proteins in terms
of the complete description of the unfolded (nascent) and final (native) states, in-
cluding all accessible intermediates along the U ! N transition. The main obstacle
1.7 Perspectives 25
in reaching this goal was the elusive nature of the folding protein, especially the
question of whether folding is in fact a sequential reaction with well-populated in-
termediates or a multiple-pathway process, as in the case of a jigsaw puzzle [119].
What was important at this point was the observation that model peptides as well
as separate protein fragments and domains were able to form ordered structures,
supporting the idea that local structures and modules may serve as ‘‘seeds’’ in the
folding process. That such nuclei did not necessarily adopt the same conformation
in unrelated protein structures was no argument against their significance as inter-
mediates. Specific ligands such as metal ions were found to be essential as guides
along the folding path; interestingly, in cases like these, estimated cellular levels
were found to be close to the optimum concentration in in vitro refolding experi-
ments [34].
Regarding folding in the cell, in the early days it was tacitly assumed that the in
vitro mechanism could be directly transferred to describe the folding and associa-
tion of the nascent polypeptide as the final step of translation. Obvious discrepan-
cies were clear from the beginning; for example, ‘‘unscrambling’’ of ribonuclease
took unreasonably long, in vitro reactivation at standard cellular protein levels
ended up with aggregates rather than active material, certain proteins were found
to be inaccessible to ‘‘renativation,’’ etc. The discovery of a whole spectrum of fold-
ing catalysts and molecular chaperones resolved most of these challenges because
they were clearly correlated with the three established rate-limiting steps in the
overall folding reaction: PDI with disulfide shuffling, PPI with proline isomeriza-
tion, and molecular chaperones with the kinetic partitioning between folding and
aggregation. Based on mutant studies and on the fact that under normal physiolog-
ical conditions several percent of the cellular proteins are molecular chaperones,
there could hardly be any doubt that all three classes of proteins were physiologi-
cally essential. As a matter of fact, complementing the in vitro experiments with
the accessory components allowed the in vitro results to mimic the reality in the
cell.
No new biophysical principles emerged from all these studies; on the contrary,
the better the mechanisms of folding catalysts and molecular chaperones were un-
derstood, the more it became clear that the basic principles of catalysis, allosteric
regulation, and compartmentation hold also in the cellular environment. In spite
of that, attempts at quantifying protein folding in vivo in physicochemical terms
failed because the cytosol represents a complex macromolecular multicomponent sys-
tem under crowding conditions, the full description of which still goes beyond the
presently available repertoire of experimental methods. As in other ‘‘real systems’’
such as the real gas or concentrated simple electrolytes, the quantitative treatment
of the non-ideality of highly charged polyelectrolytes in aqueous multicomponent
systems is still beyond the reach of a sound theoretical treatment.
Twenty-five years ago, at the end of the first International Conference on Protein
Folding in Regensburg, Cyrus Chothia quoted Johannes Kepler’s treatise On the
Six-cornered Snowflake as the first ‘‘theory of biological structures.’’ In his essay
Kepler argued that a material’s structure arises from its intrinsic properties and
that in biological structures it is important to consider the structure-function rela-
26 1 Paradigm Changes from ‘‘Unboiling an Egg’’ to ‘‘Synthesizing a Rabbit’’
tionship [120]. In this context, focusing on protein folding, at the 1979 conference
attempts were made to combine the intrinsic physical and chemical properties of
proteins and the laws of kinetics and thermodynamics in order to predict how
they determine the structure and activity. At that time, ‘‘simple protein molecules’’
such as basic pancreatic trypsin inhibitor, ribonuclease, and lactate dehydrogenase
were discussed as adequate models, the argument being that the central problem
at all levels of biological structure is to understand how the intrinsic entropy of its
various constituents is overcome to form particular stable structures in a finite
time and how the physical and chemical properties of the materials determine the
way in which they function. Cyrus, the theoretician, ended by vaguely announcing
that ‘‘the next time we meet in Regensburg we shall talk about more biological sys-
tems such as the red cell membrane or coated vesicles’’ [121]. I do not have to ex-
plain why this second conference never materialized. Now, after a quarter of a cen-
tury, two former Regensburg graduate students are keeping Cyrus’ promise,
figuring as editors of the present volume. In fact, this volume covers the state of
the art, from the principles, dynamics, and mechanisms of protein stability, protein
design, and structure prediction to the function and regulation of chaperones and
research and development in the fields of engineering protein folding and stability
and refolding technology, an amazing horizon having in mind the first steps in
protein folding in the dark age of biocolloidology, when the chemical establish-
ment still considered proteins to be loose aggregates of smaller entities, unflinch-
ingly opposing the term macromolecule.
Acknowledgements
References
2
Folding and Association of Multi-domain
and Oligomeric Proteins
2.1
Introduction
2.2
Folding of Multi-domain Proteins
2.2.1
Domain Architecture
Domains are compact substructures within protein molecules. There are different
definitions of ‘‘domain’’ based on analysis of the three-dimensional structure of the
respective protein, the autonomous folding units of a protein, and functions such
as cofactor binding or resistance to proteolysis. Interestingly, in a number of cases
the boundaries of protein domains correlate with intron-exon boundaries on the
genetic level [5, 6]. This suggests that exon shuffling (and thus domain shuffling)
is an important mechanism in generating functional diversity during evolution of
multi-domain proteins [1].
A hierarchical classification of domain structures (CATH) has been proposed, us-
ing as parameters the secondary structure composition (C for class), the gross arrange-
ment of secondary structure (A for architecture), the sequential connectivities (T for to-
pology), and the structural and functional similarity (H for homology) [7–9]. By this
procedure 27 000 known domain structures can be grouped into 1800 sequence
families, 700 fold groups, 245 superfamilies, and about 30 different architectures
(Figure 2.1, see p. 36/37) of a total of three major classes (all-a, a=b, all-b).
Based on the three-dimensional structure and structure acquisition of proteins,
domains are considered as cooperative units in the folding process. In this con-
cept, the individual domains of a multi-domain protein can be considered sim-
ply as small, one-domain proteins that fold to a native-like structure indepen-
dently of the remaining part of the protein. This holds for both in vitro folding
and co-translational folding. The latter has been shown for the folding and assem-
bly of nascent antibody chains [10]. Based on this kind of vectorial mechanism
with well-defined assembly modules, large polypeptide chains speed up folding by
many orders of magnitude. At the same time, possible wrong long-range interac-
tions that would lead to an incorrectly folded protein are minimized.
Methods used to investigate the structure formation process of multi-domain
proteins are the same as for simple model proteins and will not be discussed
here. The most important techniques are listed in Table 2.1. A detailed description
of some of these methods can be found in several chapters of Volume I of this
handbook.
Detailed analyses on the structure, thermodynamic stability, and folding of
multi-domain proteins have been performed, e.g., for gB-crystallin, lysozyme, a-
lactalbumin, the light chain of antibodies, and many other proteins [11, 12]. From
these studies it can be concluded that in most cases the domains fold indepen-
dently kinetically and that they are thermodynamically stable entities. The com-
plete refolding, however, often requires an additional kinetic phase involving intra-
molecular domain association. If the interaction sites of the domains are very
34 2 Folding and Association of Multi-domain and Oligomeric Proteins
2.2.2
g-Crystallin as a Model for a Two-domain Protein
A B
100
CD and sedimentation
1e-2
80
relative signal of
folding / unfolding
relaxation time of
60 1e-3
40
20
1e-4
0
0 2 4 6 8 0 2 4 6 8
urea (M) urea (M)
Fig. 2.2. Stability and folding of g-crystallin sedimentation (s20,w: n). (B) Kinetics of
(adapted from Ref. [18]). (A) Urea-induced folding/unfolding of g-crystallin (s,n)
equilibrium transition of g-crystallin. The measured by fluorescence and HPLC gel
protein at a concentration of 0.2 mg mL1 in filtration. The relaxation times of the kinetics
10 mM HCl (pH 2), 0.1 M NaCl was incubated are plotted against the urea concentration used
at different urea concentrations. Subsequently, for folding/unfolding.
the samples were analyzed using CD (s) and
main interactions ðDG int Þ to the overall free energy of stabilization ðDGg Þ, in addi-
tion to the sum of the intrinsic stabilities of the isolated domains ðDG N ; DGC Þ:
Under the given conditions (pH 2) the stability DGC amounts to only 4.7
kJ mol1 , whereas the DG int is 16 kJ mol1 . Thus, at low pH the C-terminal do-
1.0
fraction of native species
0.8
0.6
0.4
0.2
0.0
4 0 62 8
Urea (M)
Fig. 2.3. Stability of a single domain of g-crystallin. Urea-
induced equilibrium transition of g-crystallin and its single
domains measured in 10 mM HCl pH 2, 0.1 M NaCl. Complete
g-crystallin (u), the isolated N-terminal domain (o), and the
isolated C-terminal domain (C).
2.2 Folding of Multi-domain Proteins 39
2.2.3
The Giant Protein Titin
Titin is a protein located in the sarcomere of vertebrate striated muscle cells. A sin-
gle molecule of titin spans half the sarcomere from the Z-disc to the M-line, ex-
tending about 1 mm in length (Figure 2.4). The protein consists mostly of several
hundred copies of b-sheet domains of the immunoglobulin type and of fibronectin
type III (Fn3); the molecular mass ranges from 3 MDa to 4 MDa, depending on the
source of titin. The function of titin varies along the sarcomere. In the M-line it
forms an integral part of the protein meshwork, while in the A-band it probably
regulates the assembly of the thick filaments and might be involved in the inter-
filament interaction of myosin and the actin/nebulin thin filament [21, 22]. In
the I-band, titin serves as an elastic connection between the Z-disc and the
thick filaments. It is hypothesized that these elastic properties depend on partial
unfolding/refolding of titin during muscle tension.
In order to obtain insight into the molecular properties of this giant protein, sin-
gle immunoglobulin and Fn3 domains as well as tandem repeats of these two do-
mains have been cloned and biophysically analyzed. The individual domains vary
considerably in sequence. Using fluorescence and circular dichroism as probes in
equilibrium unfolding experiments, the thermodynamic stability of a variety of
these domains was determined to be in the range of 8.6–42 kJ mol1 [23]. Simi-
larly, the rates of chemical unfolding of individual domains vary by almost six
orders of magnitude [23]. It has been speculated that this dramatic variation in
folding behavior is important in ensuring independent folding of domains in the
multi-domain protein titin.
Site-directed mutagenesis of each position in a protein together with a detailed
analysis of the folding and unfolding kinetics of these mutants permit the struc-
tural mapping of the transition state of folding (y-value analysis, cf. Chapter 2 in
Part I). Such an analysis for the titin domain I27 reveals that the transition state
of folding is highly structured and remarkably native-like regarding its solvent
accessibility. Only a small region close to the N-terminal region of the domain re-
mains completely unfolded.
How do the molecular properties of the single domains of titin correlate with the
function of the whole molecule? Titin is involved in muscle tension and elasticity
and thus is subjected to mechanical forces. The elasticity of a protein can be mea-
sured by atomic force microscopy (AFM) [22]. To this end, the protein is fixed at
A I I-band I A-band I I-band I
40
B C D
2 Folding and Association of Multi-domain and Oligomeric Proteins
Fig. 2.4. (A) Schematic representation of a that reveals the mechanical characteristics of
sarcomere of striated muscle. (B–D) Single- the protein. (1) An anchored multi-domain
molecule AFM measurements of an engineered protein composed of four I27 domains. The
polyprotein of the titin domain I27. (B) protein is relaxed. (2) Stretching this protein
Structure of the I27 Ig-like domain. (C, D) requires a force that is measured as a deflection
Schematic representation of the sequence of of the cantilever. (3) The applied force triggers
events during the stretching of I27 multi-domain unfolding of a domain, increasing the contour
protein. Stretching the ends of the polyprotein length of the protein and relaxing the cantilever
with an atomic force microscope sequentially back to its resting position. (4) Further
unfolds the protein modules, generating a saw- stretching removes the slack and brings the
tooth pattern in the force-extension relationship protein to its new contour length Lc 1.
2.3 Folding and Association of Oligomeric Proteins 41
one end on a surface and at the other end on the tip of a so-called cantilever (Fig-
ure 2.4). Moving the cantilever slowly and on an Angstrom scale from the surface
and thus continuously increasing the mechanical force leads to a stretching of the
fixed protein. The resistance of the protein to this stretching results in a bending of
the cantilever that is optically detected. At some point a protein domain unfolds,
the resistance is partially released, and the cantilever returns to its starting posi-
tion. This procedure is repeated as often as domains are located between the static
surface and the cantilever. The experimental result is a sawtooth-shaped curve, in
which the width of a single sawtooth reflects the difference in length of the folded
and unfolded domains (28 nm for the titin domain I27) and the height of the peak
corresponds to the mechanical force necessary for domain unfolding (Figure 2.4).
The immunoglobulin domain I27 of titin unfolds at 210 pN, and the Fn3 domain
I28 unfolds at ca. 260 pN. However, the hetero-oligomer (I27–I28)4 is slightly sta-
bilized, hence the mechanical force to denature the domain I28 in this construct
increases to 306 pN [24]. Such a stabilization of the Fn3 domain I28 in the context
of the neighboring I27 domain was also observed using chemical denaturation
methods [25]. Comparison of the chemically induced unfolding and the mechani-
cally induced unfolding reveals similar mechanisms of denaturation. The forced
unfolding starts with the detachment of the N-terminal b-strand from the other-
wise stable b-sheet. This N-terminal b-strand is not structured in the transition
state of chemical denaturation. Site-directed mutagenesis studies show that the re-
gion important for kinetic stability is very similar in both cases, although there are
also significant differences in the forced and chemically induced unfolding transi-
tion states [26, 27].
Presently, it seems that chemical and mechanical denaturation processes are re-
lated, at least for proteins evolved to respond to mechanical stress. However, in the
case of barnase, a globular two-domain protein, mechanically and chemically in-
duced unfolding seem to proceed quite differently on the structural level [28].
2.3
Folding and Association of Oligomeric Proteins
2.3.1
Why Oligomers?
2.3.2
Inter-subunit Interfaces
hydrophobic core of proteins and, conversely, very different from that of the
solvent-accessible surface. In contrast to the interfaces responsible for stable sub-
unit association, the contact sites of proteins that interact only transiently with
other proteins are less hydrophobic [41].
A qualitative analysis of 136 subunit interfaces of homodimeric proteins [45]
showed that about 30% are characterized by a well-defined hydrophobic core, sur-
rounded by polar interactions [45, 46]. However, the majority of interfaces consist
of multiple small, hydrophobic patches interspersed by hydrogen bonds and polar
interactions [45].
Such large differences in interface structure might correspond to differences in
interface stability and in folding and assembly mechanisms, as proposed by Nussi-
nov and coworkers [40, 47]. This is illustrated by the tetrameric yeast pyruvate de-
carboxylase (PDC). The topology of PDC is reminiscent of a dimer of dimers. The
interface between the dimers that form the tetramer is largely polar, and the tet-
ramer spontaneously dissociates to dimers upon dilution. The dissociation con-
stant of this dimer-tetramer equilibrium is 8.1 mM [48]. The interface between the
monomers, however, is strongly hydrophobic; the dimer does not dissociate into
monomers just by dilution of the protein. Only in the presence of 2–3 M urea can
folded monomers be populated [48]. Unfortunately, there are only very few cases
where thermodynamic, kinetic, and crystallographic data of a certain protein can
be combined in order to assess these questions. Recent studies suggest that, in a
given interface, a few amino acids may play a dominant role in determining the
free energy of stabilization of the interaction. These residues are called hot spots
of the binding surface [46].
Whereas in most cases the association of proteins is mediated by globular,
native-like structured domains, a growing number of protein-protein interactions
have been discovered in which one of the partners is unstructured in its isolated
state. These natively unstructured proteins can be grouped into two classes, one
with random coil structures and the other with a pre-molten globule structure
[13]. Only upon association with their targets do these proteins develop an ordered
and biologically active structure. Examples are several ribosomal proteins that
transform to a rigid well-folded conformation only in the presence of the ribosomal
RNA [49, 50]. In the case of cyclin-dependent kinase inhibitor p21, the N-terminal
fragment lacks stable secondary and tertiary structure in the free solution state.
However, when bound to its target, Cdk 2, it adopts a well-ordered conformation
[51].
Besides the usual subunit interaction via globular domains, there is also another
major type of protein-protein interaction mediated by special helical structures: the
so-called coiled coils [52]. A detailed description of this interaction that may lead to
dimers, trimers, or tetramers, depending on the sequence of the coiled coils, can
be found in Chapter 18 in part I.
Furthermore, completely artificially designed peptides have been described, con-
ferring heterodimerization of proteins fused to them. These peptides are based on
only a stretch of eight charged amino acids and an additional cysteine. Two of
these peptides containing complementary charged side chains associate with each
44 2 Folding and Association of Multi-domain and Oligomeric Proteins
other due to their polyionic interaction; subsequently, the cysteines can form a di-
sulfide bond, thus stabilizing the heterodimerization [53–55].
2.3.3
Domain Swapping
As pointed out by Bennett et al. [56–58], the gradual accumulation of random mu-
tations on the surface of a monomeric protein that are required to create a suitable
subunit interface and stabilize a dimer cannot be accomplished within a biologi-
cally feasible time. At least in some cases the evolution of domains offers a key
to understanding the mechanism to proceed from monomers to dimers. Start-
ing from a two-domain monomer, domain swapping permits the transition from
monomers to dimers (or any other association state) simply by switching from an
inter-domain to an inter-subunit interaction. In this scheme, the interface that is
almost impossible to create by random mutagenesis in evolution is available a pri-
ori. Based on the already-existing interaction site, additional mutations might then
stabilize the domain-swapped dimer compared to the monomer (Figure 2.5).
The concept of domain swapping may be illustrated by bg-crystallins. bg-
crystallins are members of the superfamily of b-sheeted proteins containing the
Greek key motif. g-crystallin is monomeric even at extremely high protein con-
centrations, whereas b-crystallin forms dimers or higher oligomers. Both show
highly homologous two-domain structures. Comparison of the structures of g- and
b-crystallins suggests that the b dimers are the product of domain swapping and
association of a g ancestor (Figure 2.5). This hypothesis is supported by the fact
that 60% of the interface residues of the N- and C-terminal domains of g-crystallin
are identical or highly conserved compared to those of the N- and C-terminal do-
mains of the two subunits of b-crystallin [59].
Domain swapping does not correlate necessarily with structural domains. In-
stead, the swapped portion can vary from a large globular domain, as described
for bg-crystallins, to a structural element as short as a single helix or b-strand.
In fact, of the ca. 40 structurally characterized cases of domain-swapped proteins,
most swapped domains are either at the N- or C-terminus and are diverse in their
primary and secondary structure [60]. Hsp33, for example, forms a dimer in which
a C-terminal arm of one monomer consisting of two a-helices wraps around a sec-
ond monomer [61, 62]. This C-terminal arm is connected to the remaining part of
the protein by a putatively flexible hinge region. The chaperone-active dimer is
in equilibrium with a less active or even inactive monomer (KD ¼ 0:6 mM, [63]). A
detailed structural and functional analysis of Hsp33 is presented in Chapter 21.
Unfortunately, the structure of the Hsp33 monomer is not known. Thus, at pres-
ent it is only speculation that in the monomeric structure the C-terminal arm folds
back on the monomer.
Higher association states of domain-swapped proteins are found in several virus
capsids, in which capsomers swap b-strands, or pathologic aggregates in the case
of a1-antitrypsin [64].
Although domain swapping is an intriguing concept in understanding evolution
2.3 Folding and Association of Oligomeric Proteins 45
Fig. 2.5. Domain swapping. (A) The monomer stable dimers (III) to be formed [56]. (B) b-
(I) with its primary inter-domain interface may crystallin dimer (thin line) as potential product
participate in interchain interactions, of swapped domains of g-crystallin (thick line)
generating domain-swapped dimmers (II). (adapted from Ref. [11]).
Stabilizing mutations in the linker region allow
2.3.4
Stability of Oligomeric Proteins
100
denatured species
80
fraction of
60
40
20
intersect the y-axis at the same value that corresponds to the stability of the dimeric
native state in the absence of denaturant [72].
Oligomeric states often are not only thermodynamically stable but also kineti-
cally stabilized, i.e., the activation energy of dissociation under native conditions
is very high. Dissociation rates in denaturant-free buffer have been estimated by
extrapolation to be as low as 1014 s1 [73].
2.3.5
Methods Probing Folding/Association
ferent times, aliquots from both solutions were mixed, reconstitution of native
trimers was allowed to reach completion, and the differently charged hybrids and
homo-oligomers were separated by electrophoresis [75]. In samples mixed early
during renaturation, wild-type homotrimers, hybrids, and mutant homotrimers
formed at the statistically expected ratio of 1:2:2:1, whereas in samples mixed late,
only homo-oligomers were detected. The half-time of assembly can be estimated
from the amount of hybrid present in samples mixed at intermediate renaturation
times.
It should be noted that hybrid formation measures only stably associated proto-
mers; those in rapid exchange between loosely associated oligomers will be equiv-
alent to free subunits.
2.3.6
Kinetics of Folding and Association
2U ! 2M ! M2 ! N2
3M $ M2 þ M ! M3
In this reaction scheme, the first association resulting in a dimer, in which only
one of two interfaces is buried, is considered a fast equilibrium. This first associa-
tion step is unfavorable and the dimeric intermediate is not populated in equilib-
rium. The second association step will pull this equilibrium towards the native
trimeric state. In this concept, the overall process follows apparent third-order ki-
netics [84].
50 2 Folding and Association of Multi-domain and Oligomeric Proteins
A tetramer may be envisioned as a dimer of dimers. Thus, the scheme for this
reaction is just an extension of that shown for a dimeric protein [1, 88]:
4U ! 4M ! 2M 0 2 ! 2M2 ! M4 ! N4
In the case of multimers such as fibrillary proteins or virus capsids, the simplest
model comprises a stepwise addition of monomers to the growing chain:
2A $ A2 ; A2 þ A $ A3 ; A3 þ A $ A 4 ; . . .
In this scheme the equilibrium constants of association may be all identical. How-
ever, the first steps of multimerization are often thermodynamically disfavored,
i.e., the equilibrium is on the side of the non-associated species. Only after reach-
ing a certain size of the oligomer structure does further addition of monomers to
this nucleus become thermodynamically favorable. This would lead to an overall
assembly process characterized by a nucleation/polymerization reaction.
Fig. 2.7. Folding and association pathway of obtained from the fluorescence change can be
yeast invertase. (a) Changes in circular used to fit a sequential uni-bimolecular mode
dichroism (open triangles), fluorescence to the reactivation data obtained at different
emission (broken line), and enzymatic activity protein concentrations (full lines). Subunit
(open circles) occur with very different kinetics concentrations were 8.5 nM (circles), 17 nM
during the reconstitution of homodimeric yeast (triangles), and 68 nM (diamonds) [89].
invertase. (b) The first-order folding rate
ing or partial enzyme activity. Examples are mammalian aldolases [90], lactate de-
hydrogenase [91], the P22 tail spike protein [92], and yeast hexokinases [69, 70].
Monomeric folding intermediates of oligomeric proteins often possess large
interfaces for subunit association. This subunit interaction is quite similar to the
domain-domain interactions in multi-domain proteins. As shown before for g-
crystallin, the association can contribute substantially to the overall stability of the
subunits. Thus, it is not surprising that folded subunits usually are not detectable
as equilibrium unfolding intermediates if the native protein is subjected to high
temperature or chemical denaturants [1, 88]. Structured monomers of lactate dehy-
drogenase have been obtained at low pH in the presence of the stabilizing salt
sodium sulfate [93–95]. Another possibility to obtain stably folded monomeric sub-
units is based on protein engineering, as has been demonstrated for triose phos-
52 2 Folding and Association of Multi-domain and Oligomeric Proteins
phate isomerase [96, 97], tetrameric human aldolase [98], and P22 tail spike
protein [92]. In all these cases the engineered monomeric form retained some en-
zyme activity, but, as expected, the variants were strongly destabilized compared to
the oligomeric wild-type form.
Triosephosphate isomerase 27 2M ! D 3 10 5 1
Malate dehydrogenase 33 2M ! D 3 10 4 1
(mitochondrial)
Invertase 76 2M ! D 1 10 4 89
b-Galactosidase 116 2M ! D 4 10 3 74
Alcohol dehydrogenase (liver) 40 2M ! D 2 10 3 1
Phosphoglycerate mutase 27 2M ! D 6 10 3 103
2D ! T 3 04 103
Lactate dehydrogenase 36 2D ! T 2 10 4 1
Arc repressor 62 2M ! D 1 10 7 102
Trp aporepressor 12 2M ! D 3 10 8 101
2.3 Folding and Association of Oligomeric Proteins 53
2.4
Renaturation versus Aggregation
Folding and association of multi-domain and oligomeric proteins are complex pro-
cesses involving partially structured folding intermediates. These processes can be
accompanied by nonspecific side reactions leading to aggregation. The molecular
basis for aggregation is that intermediates in refolding/reassembly may engage in
wrong intermolecular rather than in correct intramolecular or assembly interac-
tions. Thus, aggregation is at least a second-order or even higher-order reaction,
depending strongly on the concentration of folding intermediates [112, 113].
Therefore, the kinetic partitioning between folding and aggregation favors aggrega-
tion at high protein concentrations. In contrast, at low protein concentrations, fold-
ing dominates over aggregation (Figure 2.8).
Using reduced lysozyme as a model system, it could be shown that commitment
to aggregation is a fast reaction preceding the commitment to renaturation; early
folding intermediates were especially prone to aggregation [114]. After a certain
intermediate state is reached, slow conformational changes lead exclusively to the
native state (Scheme 2.1 [114]).
As shown by circular dichroism and Fourier transform infrared spectroscopy
(FTIR), aggregated proteins possess a high content of secondary structure and
probably of native-like structured protein domains [115, 116].
Aggregation is not an artifact of in vitro refolding; the same phenomenon occurs
in vivo. It is especially obvious in the case of inclusion body formation upon high-
level expression of recombinant proteins in microorganisms and the accumulation
of fibrillary aggregates in human diseases [115–118]. Cells have developed a set
of chaperones and folding catalysts to deal with aggregation. These two protein
classes are discussed in detail in several chapters of this book.
2.5
Case Studies on Protein Folding and Association
2.5.1
Antibody Fragments
Antibodies are hetero-oligomeric proteins consisting of two light chains and two
heavy chains. These four polypeptides are covalently linked by several disulfide
bonds. The light chain consists of two domains and the heavy chain of four do-
mains, with each domain adopting a typical immunoglobulin-like topology of a
two-sheeted b structure. Folding of these domains has been shown to be a highly
2.5 Case Studies on Protein Folding and Association 55
14000
A B
0 0
0 1 2 3 4 5 0 5 10 15 20 100
time (min) time (min)
Fig. 2.8. Folding and aggregation of reduced ther diluted 10-fold and incubated overnight.
lysozyme. (A) Yield of enzymatic activity as a (D) Kinetics of commitment of renaturation.
function of turkey lysozyme concentration Renaturation was performed with denatured
during oxidative refolding. (B) Kinetics of and reduced turkey lysozyme. At different time
reactivation of turkey lysozyme. (C) Kinetics of points, a high concentration of denatured and
commitment of aggregation of denatured and reduced hen lysozyme was added. After com-
reduced lysozyme. Starting protein concen- pletion of renaturation, the activity of turkey
tration for renaturation was 0.185 mg mL1 lysozyme was determined (adapted from Ref.
(f) and 0.74 mg mL1 (b). At different time [114]).
points of renaturation, the samples were fur-
Scheme 2.1
56 2 Folding and Association of Multi-domain and Oligomeric Proteins
state (depending on the denaturation time) equilibration between the cis and trans
configuration occurs. Since the two configurations are separated by a high-energy
barrier, re-isomerization is a slow process. This is reflected by an additional slow
folding phase observed during refolding [119] that can be accelerated by prolyl iso-
merases [121, 122]. As soon as immunoglobulin domains are covalently linked, as
in the case of the immunoglobulin light chain, which is composed of two domains,
an additional slow folding phase is detected that may be due to interactions of the
individual domains [120]. An additional level of complexity is observed in the case
of the antibody Fab fragment, which consists of the light chain and the two N-
terminal domains of the heavy chains, the Fd fragment (Figure 2.9). The two poly-
peptide chains associate noncovalently via contact sites between both the variable
domains and the constant domains of each chain. Precise assembly of the two
Fig. 2.9. Structure of the Fab fragment of the highlighted in gray, those in cis configuration
monoclonal antibody MAK33 (k/IgG1) (pdb: in red. The intermolecular disulfide bond
1FHE5, [123]). The light chain and the Fd chain connecting the C-terminus of the two chains is
are colored in green and yellow, respectively. not resolved in the structure.
Proline residues in trans configuration are
2.5 Case Studies on Protein Folding and Association 57
Fig. 2.10. Folding of Fab, not containing the molar excess of native light chain or starting
C-terminal disulfide bond between the two from short-term denatured protein in the
chains. Refolding was achieved by diluting the absence (f) or presence (D) of a fivefold
denatured protein into 0.1 M Tris, pH 8, either molar excess of native light chain. The data
starting from a long-term denatured protein in were fitted to a single first-order or a serial
the absence (b) or presence (t) of a fivefold first-order reaction [76].
58 2 Folding and Association of Multi-domain and Oligomeric Proteins
ment, even though folding of the light chain itself is limited by prolyl isomeriza-
tion [121]. If the prolyl isomerization is catalyzed (or excluded from the folding/
assembly reaction by short-term denaturation), the time course of reconstitution
of the Fab fragment switches from a sigmoidal to a monophasic first-order kinetic;
prolyl isomerization is not rate-limiting anymore. The yield of reconstitution does
not change. The yield is determined by aggregation of the Fd fragment. Because
peptidyl-prolyl-cis/trans isomerases catalyze the rate-limiting step occurring after
subunit association of the Fab fragment, they do not effect the kinetic competition
of off-pathway aggregation and productive reconstitution.
The situation changes if the Fab reconstitution is performed in the presence of
an excess of native light chain. Under these conditions, the rate-limiting step be-
fore subunit association vanishes, and the regain of Fab activity becomes identical
to the prolyl isomerization-determined folding of the Fab fragment, in which the
two chains are disulfide-linked. This shows that (1) the rate-limiting step of recon-
stitution on the monomeric level is the folding of the light chain, presumably cor-
responding to a specific intramolecular pairing of the two folded domains [120],
and (2) the rate-limiting prolyl isomerization on the level of the already associated
subunits occurs in the Fd part of the Fab fragment (Scheme 2.2).
The folding/reconstitution of the disulfide-bonded and the non-disulfide-bonded
Fab fragment comprise domain folding and domain interactions that are identical
on the molecular level; however, in one case the domain pairing occurs intramolec-
ularly, in the other case intermolecularly. The rate-limiting folding of the light
chain can be observed only for the non-disulfide-bonded Fab fragment, clearly in-
2.5 Case Studies on Protein Folding and Association 59
dicating that in the disulfide-bonded form the folding of the light chain is intramo-
lecularly chaperoned by the nearby Fd part of the Fab fragment. On the other
hand, the reconstitution of the non-disulfide-bonded Fab shows that the isomeriza-
tion of at least one prolyl residue of the Fd part obligatorily occurs only after sub-
unit association. Thus, only the association with the folded light chain provides the
structural information for the isomerization state of the respective proline in the
Fd part. This is an intriguing example of how folding and association of subunits
are interrelated.
Whereas in the case of the Fab fragment the association of the two polypeptides
precedes prolyl isomerization, it is different for the antibody domain CH 3. The iso-
lated CH 3 domain forms a dimer. Under conditions where prolyl isomerization
does not contribute to the folding kinetics, formation of the b-sandwich structure
is a slow process, even compared to other antibody domains, while the subsequent
association of the folded monomers is fast. After long-term denaturation, the ma-
jority of the unfolded CH 3 molecules reach the native state in two serial reactions
involving the re-isomerization of one proline bond to the cis-configuration. The
folded species with the wrong isomer accumulates as a monomeric intermediate;
the proline isomerization is a prerequisite for dimerization of the CH 3 domain
[42].
2.5.2
Trimeric Tail spike Protein of Bacteriophage P22
The tail spike protein (TSP) of bacteriophage P22 from Salmonella typhimurium
has been a paradigm in studying protein folding and association in vivo since
Jonathan King’s early attempts to solve the problem of phage morphogenesis by
genetic methods [126–130]. The beauty of both the system and the approach is
that it has provided insight not only into the principles of protein self-organization
in the cytoplasm but also into the mechanisms of misfolding, aggregation, and in-
clusion body formation [85, 115, 131–134]. It has been shown that in vitro recon-
stitution is clearly related to the situation in the living cell [115, 132].
The tail spike is a multifunctional trimeric protein composed of identical 72-kDa
polypeptide chains, attached to the viral capsid by their 108-residue N-terminal do-
main. Six of these trimers assemble onto the virus head to form the tail, complet-
ing the infectious phase of the phage. In the absence of heads, tail spikes accumu-
late as soluble protein in the bacterial cytosol. Their assembly and the competing
formation of inclusion bodies at high temperature have been studied in detail us-
ing temperature-sensitive (ts) mutants. In this connection, the assembly of viable
phage was used as an in vitro assay for proper folding. Other means to follow
the folding and assembly pathway of TSP, in addition to spectroscopic and hy-
drodynamic techniques, are its endorhamnosidase activity (directed toward the O-
antigen of the host), its reactivity with monoclonal antibodies, and the thermal
stability and detergent-resistance of the mature protein.
Crystal structures of wild-type TSP and mutant variants of the protein have been
determined at high resolution [135, 136]. The major part of the trimer represents a
60 2 Folding and Association of Multi-domain and Oligomeric Proteins
A C
D
Fig. 2.11. Structure of the tail spike protein domain where the subunits form mixed b-
(TSP) from Salmonella bacteriophage P22. (A) sheets. (D) Single TSP subunit with sites of tsf
TSP trimer assembled from the structures of mutations in the main body and in the ‘‘dorsal
the N-terminal (above) and C-terminal (below) fin’’ (red), sites of global suppressors of the tsf
fragments [136]. (B) Section through the phenotype in the central part of the b-helix
parallel b-helix in the central domain of the (yellow), and sites of lethal mutations (green)
trimer. (C) Section through the C-terminal [134, 135].
Scheme 2.3
2.6
Experimental Protocols
1. Dialyze the native protein against a suitable buffer, which should not contain re-
active amino groups (e.g., Tris is not suitable).
2. Incubate the protein at concentrations between 5 and 100 mg mL1 with 100
mM glutaraldehyde at room temperature for 2 min. Depending on the amount
of cross-linking, the concentration of glutaraldehyde, the temperature, and the
time can be varied.
3. Further cross-linking is blocked by addition of 200 mM NaBH 4 in 0.1 M NaOH.
4. The cross-linking product is analyzed by SDS-PAGE. The samples at low
protein concentration have to be precipitated before. Add 0.01 volume 10%
Na-desoxycholate (dissolved in slightly alkaline buffer). Subsequently, add 0.01
volume of 85% phosphoric acid. Incubate for 15 min at room temperature. Har-
vest the precipitate by centrifugation and resolubilize the pellet in Laemmli
buffer.
5. For reconstitution experiments, choose the protein concentration at which the
cross-linking is quantitative without formation of any higher molecular mass
products (unspecific cross-linking).
6. Initiate reconstitution by diluting the denatured protein into the refolding
buffer at the respective protein concentration.
7. Take aliquots at different time points of reconstitution and cross-link the protein
as described before.
8. The bands of monomeric and oligomeric species can be quantified by densitom-
etry of Coomassie-stained gels. (Band intensities of silver-stained gels are pro-
portional to the amount of loaded protein only over a very narrow range and
are only rarely suited for quantitative analysis.)
1. Dialyze the protein extensively against an appropriate buffer. If the analytical ul-
tracentrifuge is equipped with an absorbance unit, the buffer should not absorb
strongly at the detection wavelengths (230 nm and 280 nm). The buffer should
contain at least 50 mM salt.
2. Dilute the protein with dialysis buffer to concentrations between 10 and 1000
mg mL1 . Using an eight-hole rotor and six-channel center pieces, 21 samples
(150 mL each) can be measured simultaneously.
3. Choose an appropriate rotor speed. Sedimentation equilibrium is reached when
the radial distribution of the protein no longer changes over time (identical
scans over a time span of 5 h). Depending on the sample volume and the size
of the protein, this may take 24–72 h. Ideally, at equilibrium the protein should
be completely depleted from the meniscus of the solution. The sedimentation
equilibrium can be run at temperatures between 4 C and 40 C.
4. Measure the radial distribution of the protein at different wavelength (usually
280 nm and 230 nm).
64 2 Folding and Association of Multi-domain and Oligomeric Proteins
5. Calculate the apparent molecular mass Mr from the equilibrium profile. This
can be done using the standard software provided with the ultracentrifuge. A
manual method would be the analysis of the slope of a plot of ln A versus r 2
according to Eq. (1)
with A being the absorbance, r the distance to the center of the rotor, v the par-
tial specific volume of the protein, r the density of the solvent, and o the angu-
lar velocity. The partial specific volume of the protein can be calculated from its
amino acid composition [146].
6. If a monomer-dimer equilibrium exits in the respective concentration range, the
apparent molecular mass Mr will vary with the protein concentration. A semi-
logarithmic plot of log cp (in molar units) against Mr results in a sigmoidal
curve. The dissociation constant KD can be calculated by fitting the data to the
following equations:
where cM and cD are the concentrations of the monomeric and dimeric forms,
respectively. M and D represent the molecular mass of the monomer and dimer.
cD can be expressed as a function of cM and the dissociation constant, KD (Eq.
(3)):
cD ¼ cM 2 =KD ð3Þ
1. Prepare stock solutions of the native protein, the denatured protein (e.g., in 6 M
GdmCl or 10 M urea), and the denaturation buffer (either GdmCl or urea; for
preparation of denaturation buffer, compare Chapter 3 in Part I).
2. Dilute the protein into different concentrations of denaturant and allow the
solutions to equilibrate. Depending on the protein, this may take only a few
hours or several days. Usually a transition curve should consist of 20–30 differ-
ent denaturant concentrations. To assay the reversibility of the process, the tran-
References 65
sition has to be set up starting from both the native and the denatured protein,
respectively.
3. Measure the transition at different protein concentrations (e.g., 10–100
mg mL1 ) and using different methods (e.g., enzyme activity, fluorescence,
CD). Only if the transition is reversible and the different methods yield the
same transition curve (at a given protein concentration) can the data be ana-
lyzed according to a two-state model. (A reversible transition with different tran-
sition curves measured by different methods can be analyzed by a three-state
model [15].) The higher the protein concentration, the more the transition mid-
point is shifted to higher denaturant concentrations.
4. Calculate the equilibrium constant KU for each data point within the transi-
tions. KU is defined as:
DG ¼ RT ln KU ð6Þ
References
8 Thornton, J. M., Orengo, C. A., calf eye lens. Proc. Natl Acad. Sci.
Todd, A. E. & Pearl, F. M. (1999). USA, 88, 2854–2858.
Protein folds, functions and evolution. 19 Mayr, E. M., Jaenicke, R. &
J. Mol. Biol., 293, 333–342. Glockshuber, R. (1997). The
9 Orengo, C. A., Bray, J. E., Buchan, domains in gamma B-crystallin:
D. W., Harrison, A., Lee, D., Pearl, identical fold-different stabilities. J.
F. M., Sillitoe, I., Todd, A. E. & Mol. Biol., 269, 260–269.
Thornton, J. M. (2002). The CATH 20 Palme, S., Slingsby, C. & Jaenicke,
protein family database: a resource for R. (1997). Mutational analysis of
structural and functional annotation of hydrophobic domain interactions in
genomes. Proteomics, 2, 11–21. gamma B-crystallin from bovine eye
10 Bergman, L. W. & Kuehl, M. W. lens. Protein Sci., 6, 1529–1536.
(1979). Formation of intermolecular 21 Guiterrez-Cruz, G., Van Heerden,
disulfide bonds on nascent immuno- A. H. & Wang, K. (2001). Modular
globulin polypeptides. J. Biol. Chem., motif, structural folds and affinity
254, 5690–5694. profiles of the PEVK segment of the
11 Jaenicke, R. (1999) Stability and human fetal skeletal muscle titin. J.
folding of domain proteins. Prog. Biol. Chem., 276, 7442–7449.
Biophys. Mol. Biol., 71, 155–241. 22 Smith, D. A. & Redford, S. (2000).
12 Kuwajima, Y. (1989). The molten Protein folding: Pulling back the
globule state as a clue for under- frontiers. Current Biol., 10, 662–664.
standing the folding and cooperativity 23 Head, J. G., Houmeida, A., Knight,
of globular-protein structure. Proteins, P. J., Clarke, A. R., Trinick, J. &
6, 87–103. Brady, R. L. (2001). Stability and
13 Uversky, V. N. (2002). Natively folding rates of domains spanning the
unfolded proteins: A point where large A-band super-repeat of titin.
biology waits for physics. Prot. Science, Biophys. J., 81, 1570–1579.
11, 739–756. 24 Li, H., Oberhauser, A. F., Fowler,
14 Namba, K. (2001). Roles of partly S. B., Clarke, J. & Fernandez, J. M.
unfolded conformations in macro- (2000). Atomic force microscopy
molecular self-assembly. Genes to Cells, reveals the mechanical design of a
6, 1–12. modular protein. Proc. Natl Acad. Sci.
15 Ikeguchi, M., Kuwajima, K. & Sugai, USA, 97, 6527–6531.
S. (1986). Ca2þ-induced alteration in 25 Politou, A. S., Gautel, M., Improta,
the unfolding behavior of alpha- M., Vangelista, L. & Pastore, A.
lactalbumin. J. Biochem., 99, 1191– (1996). The elastic I band region of
1201. titin is assembled in a ‘‘modular’’
16 Segel, D. J., Bachmann, A., fashion by weakly interacting Ig-like
Hofrichter, J., Hodgson, K. O., domains. J. Mol. Biol., 255, 604–616.
Doniach, S. & Kiefhaber, T. (1999). 26 Fowler, S. B. & Clarke, J. (2001).
Characterization of transient Mapping the folding pathway of an
intermediates in lysozyme folding immunoglobulin domain: structural
with time-resolved small-angle X-ray detail from Phi value analysis and
scattering. J. Mol. Biol., 288, 489–499. movement of the transition state.
17 Wildegger, G. & Kiefhaber, T. Structure, 9, 355–366.
(1997). Three-state model for lysozyme 27 Best, R. B., Fowler, S. B., Herrera,
folding: triangular folding mechanism J. L., Steward, A., Paci, E. & Clarke,
with an energetically trapped inter- J. (2003). Mechanical unfolding of a
mediate. J. Mol. Biol., 270, 294–304. titin Ig domain: structure of the
18 Rudolph, R., Siebendritt, R., transition state revealed by combining
Nesslauer, G., Sharma, A. K. & atomic force microscopy, protein
Jaenicke, R. (1990). Folding of an all- engineering and molecular dynamics
beta protein: independent domain simulations. J. Mol. Biol., 330, 867–
folding in gamma II-crystallin from 877.
References 67
28 Best, B., Li, B., Steward, A., 39 Tsai, C. J., Lin, S. L., Wolfson, H. J.
Daggett, V. & Clarke, J. (2001). Can & Nussinov, R. (1997). Studies of
non-mechanical proteins withstand protein-protein interfaces: a statistical
force? Stretching barnase by atomic analysis of the hydrophobic effect.
force microscopy and molrcular Protein Sci., 6, 53–64.
dynamics simlation. Biophys. J., 81, 40 Tsai, C. J., Xu, D. & Nussinov, R.
2344–2356. (1997). Structural motifs at protein-
29 Powers, E. T. & Powers, D. L. (2004). protein interfaces: protein cores versus
A perspective of mechanisms of two-state and three-state model
protein tetramer formation. Biophys. complexes. Prot. Sci., 6, 1793–1805.
J., 85, 3587–3599. 41 Le Conte, L., Chothia, C. & Ianin, J.
30 Seckler, R. (2000). Assembly of multi- (1999). The atomic structure of
subunit strucutres. In: Mechanisms of protein-protein recognition sites. J.
Protein Folding, Ed.: R. H. Pain, 2nd Mol. Biol., 285, 2177–2198.
ed., Oxford Press, pp. 279–308. 42 Thies, M., Mayer, S., Augustine,
31 Branden, C.-I. and Tooze, J. (1999). J. G., Frederick, C. A., Lilie, H. &
Introduction to protein structure, (2nd Buchner, J. (1999). Folding and
edn). Garland, New York, p. 410. association of the antibody domain
32 Smith, S. (1994). The animal fatty add CH 3: Prolyl isomerization preceeds
synthase: one gene, one polypeptide, dimerization. J. Mol. Biol., 293, 67–79.
seven enzymes. FASEB J., 8, 1248– 43 Chothia, C. and Janin, J. (1975).
1259. Principles of protein-protein recogni-
33 Weber, G., Schorgendorfer, K., tion. Nature, 256, 705–708.
Schneider-Scherzer, E. & Leitner, 44 Dill, K. A. (1990). Dominant forces
E. (1994). The peptide synthetase in protein folding. Biochemistry, 29,
catalyzing cyclosporin production in 7133–7155.
Tolypocladium niveum is encoded 45 Larsen, T. A., Olson, A. J. &
by a giant 45.8-kilobase open reading Goodsell, D. S. (1998). Morphology
frame. Curr. Genet., 26, 120–125. of protein-protein interfaces. Structure,
34 Dautry-Varsat, A. & Garel, J. R. 6, 421–427.
(1978). Refolding of a bifunctional 46 DeLano, W. L. (2002). Unravelling
enzyme and its monofunctional hot spots in the binding interface:
fragment. Proc. Natl Acad. Sci. USA, progress and challenges. Curr. Opin.
75, 5979–5982. Struct. Biol., 12, 14–20.
35 Creighton, T. E. (1992). Proteins: 47 Xu, D., Tsai, C. J. & Nussinov, R.
structures and molecular properties, (1998). Mechanism and evolution of
(2nd edn). W. H. Freeman, New York, protein dimerization. Protein Sci., 7,
p. 309–325. 533–544.
36 Miller, S., Schuler, B. & Seckler, 48 Killenberg-Jabs, M., Jabs, A., Lilie,
R. (1998). Phage P22 tailspike protein: H., Golbik, R. & Hubner, G. (2001).
removal of head binding domain Active oligomeric states of pyruvate
unmasks effects of folding mutations decarboxylase and their functional
on native-state thermal stability. Prot. characterization. Eur. J. Biochem., 268,
Sci., 7, 2223–2232. 1698–1704.
37 Danner, M., Fuchs, A., Miller, S. 49 Venyaminov, S. Yu., Gudkov, A. T.,
and Seckler, R. (1993). Folding and Gogia, Z. V. & Tumanova, L .G.
assembly of phage P22 tailspike endo- (1981). Absorption and cicular
rhamnosidase lacking the N-terminal, dichroism spectra of individual
head-binding domain. Eur. J. Biochem., proteins from Escherichia coli
215, 653–661. ribosomes. Pushkino, Russia.
38 Jones, S. & Thornton, J. M. (1997). 50 Yusupov, M. M., Yusupova, G. Z.,
Analysis of protein-protein interaction Baucom, A., Lieberman, K., Earnest,
sites using surface patches. J. Mol. T. N., Cate, J. H. & Noller, H. F.
Biol., 272, 121–132. (2001). Science, 292, 883–896.
68 2 Folding and Association of Multi-domain and Oligomeric Proteins
pathways: the P22 tailspike endo- Huber, R., Yu, M. H. & Seckler, R.
rhamnosidase. Methods Enzymol., (1995). Mutations that stabilize folding
131, 250–266. intermediates of phage P22 tailspike
130 Betts, S. & King, J. (1999). There’s a protein: folding in vivo and in vitro,
right way and a wrong way: in vivo stability, and structural context. J. Mol.
and in vitro folding, misfolding and Biol., 249, 185–194.
subunit assembly of the P22 tailspike. 139 Goldenberg, D. & King, J. (1982).
Structure Fold. Des., 7, 131–139. Trimeric intermediate in the in vivo
131 Mitraki, A., Fane, B., Haase folding and subunit assembly of the
Pettingell, C., Sturtevant, J. & tail spike endorhamnosidase of
King, J. (1991). Global suppression of bacteriophage P22. Proc. Natl Acad.
protein folding defects and inclusion Sci. USA, 79, 3403–3407.
body formation. Science, 253, 54–58. 140 Brunschier, R., Danner, M. &
132 Mitraki, A., Danner, M., King, J. & Seckler, R. (1993). Interactions of
Seckler, R. (1993). Temperature- phage P22 tailspike protein with GroE
sensitive mutations and second-site molecular chaperones during
suppressor substitutions affect folding refolding in vitro. J. Biol. Chem., 268,
of the P22 tailspike protein in vitro. J. 2767–2772.
Biol. Chem., 268, 20071–20075. 141 Gordon, C. L., Sather, S. K.,
133 Schuler, B. & Seckler, R. (1998). Casjens, S. & King, J. (1994).
P22 tailspike folding mutants Selective in vivo rescue by GroEL/ES
revisited: effects on the thermody- of thermolabile folding intermediates
namic stability of the isolated beta- to phage P22 structural proteins. J.
helix domain. J. Mol. Biol., 281, 227– Biol. Chem., 269, 27941–27951.
234. 142 Sather, S. K. & King, J. (1994).
134 Seckler, R. (1998). Folding and Intracellular trapping of a cytoplasmic
function of repetitive structure in the folding intermediate of the phage P22
homotrimeric phage P22 tailspike tailspike using iodoacetamide. J. Biol.
protein. J. Struct. Biol., 122, 216–222. Chem., 269, 25268–25276.
135 Steinbacher, S., Seckler, R., 143 Speed, M. A., Wang, D. I. & King, J.
Miller, S., Steipe, B., Huber, R. & (1995). Multimeric intermediates in
Reinemer, P. (1994). Crystal structure the pathway to the aggregated
of P22 tailspike protein: interdigitated inclusion body state for P22 tailspike
subunits in a thermostable trimer. polypeptide chains. Protein Sci., 4,
Science, 265, 383–386. 900–908.
136 Steinbacher, S., Miller, S., Baxa, 144 Speed, M. A., Morshead, T., Wang,
U., Budisa, N., Weintraub, A., D. I. & King, J. (1997). Conformation
Seckler, R. & Huber, R. (1997). of P22 tailspike folding and aggrega-
Phage P22 tailspike protein: crystal tion intermediates probed by mono-
structure of the head-binding domain clonal antibodies. Protein Sci., 6, 99–
at 2.3 A, fully refined structure of the 108.
endorhamnosidase at 1.56 A 145 Speed, M. A., Wang, D. I. & King, J.
resolution, and the molecular basis of (1996). Specific aggregation of partially
O-antigen recognition and cleavage. J. folded polypeptide chains: the
Mol. Biol., 267, 865–880. molecular basis of inclusion body
137 Baxa, U., Steinbacher, S., composition. Nat. Biotechnol., 14,
Weintraub, A., Huber, R. & 1283–1287.
Seckler, R. (1999). Mutations 146 Durchschlag, H. (1986). Specific
improving the folding of phage P22 volumes of biological macromolecules
tailspike protein affect its receptor and some other molecules of biologi-
binding activity. J. Mol. Biol., 293, cal interest. In: Thermodynamic data
693–701. for biochemistry and biotechnology,
138 Beissinger, M., Lee, S. C., ed.: H.-J. Hinz, Springer Verlag,
Steinbacher, S., Reinemer, P., Heidelberg, pp. 45–127.
73
3
Studying Protein Folding in Vivo
3.1
Introduction
3.2
General Features in Folding Proteins Amenable to in Vivo Study
In contrast to in vitro protein folding, protein folding in vivo can be studied only
via indirect methods. Instead of direct measurements on the protein itself, the fold-
Chaperone assistance
Lineair epitopes
ER lumen
Cytosol
Signal peptide cleavage
ing status of a protein can be followed through changes in shape, associations with
other proteins, co- and posttranslational modifications, and its compartmental loca-
tion (Figure 3.1). During in vivo folding, proteins are not purified but are part of
the cell ‘‘soup.’’ So far the best way to follow folding of a small population of
newly synthesized proteins in a cell is by labeling them with 35 S-labeled cysteine
and/or -methionine. In this way, changes in a small, synchronized protein popula-
tion can be followed with time.
A pulse-chase experiment (Figure 3.2) in intact cells is an assay in which kinetics
and characteristics of sequential steps are determined in vivo. Newly synthesized
proteins are labeled with radioactive amino acids for a very short pulse time, be-
cause folding starts during synthesis and one would ideally follow folding from
the very first folding intermediates. Labeled proteins are ‘‘chased’’ by incubation
with unlabeled amino acids. After different times of chase, cells and supernatants
can be collected and cooled on ice, and free cysteines are blocked with an alkylating
3.2 General Features in Folding Proteins Amenable to in Vivo Study 75
time
(min)
1 pulse 1'
2 pulse 2'
3 chase 3'
4 chase 4'
5 pulse 5'
6
7 stop 7'
8
9 stop 9'
33 stop 33'
Lysis
Immunoprecipitation
NR
SDS-PAGE
R
agent. Cells are lysed in a detergent-containing buffer and the protein under study
is isolated by immunoprecipitation. Samples can be analyzed by various types of
gel electrophoresis, including native gels suitable for detection of complexes, and
reducing and non-reducing SDS-PAGE to detect folding intermediates with a vari-
able set of disulfide bonds (see the Appendix and [11]). Other changing features of
folding proteins, such as their compactness or resistance to proteolytic digestion,
can be determined with this pulse-chase assay as well.
Other labeling assays that can be used to study protein folding include the pulse-
chase assay in suspension cells, microsomes, or semi-permeabilized cell systems
[12] in which mRNA is translated in vitro in the presence of a source of ER mem-
branes (see Chapter 18) and the recently developed in vitro chase assay in which a
protein is translated in vivo and chased in a detergent lysate (Maggioni et al.,
manuscript submitted). These assays will be discussed in the Appendix.
3.2.1
Increasing Compactness
One of the general features of a protein during folding is its increasing compact-
ness. During the folding process of a soluble protein, all hydrophobic side chains
will strive to be buried inside the molecule and all hydrophilic ones to be exposed
on the outside; hydrogen bonds are formed within the protein and between ex-
posed residues and water. Electrostatic, van der Waals, and hydrophobic interac-
tions play a major role during the folding process, and disulfide bridges will form
within the protein when conformation and environmental conditions allow. The
end product of the folding process will be the native conformation of the protein,
which is assumed to be the most compact, energy-favorable form. To study
the increase in compactness of proteins, non-denaturing gel electrophoresis can
be used. The formation of oligomers, protein complexes, and transient protein-
protein interactions can be detected on a native gel as well as by sucrose velocity
gradient centrifugation. To monitor disulfide bond formation within or between
polypeptide chains, non-reducing denaturing gel electrophoresis (SDS-PAGE) [13]
is an effective technique. In the assays described in this chapter, the folding process
of a (disulfide bond containing glyco)protein can be followed kinetically.
3.2.2
Decreasing Accessibility to Different Reagents
Due to the increasing compactness of a protein on its folding pathway, it will be-
come less accessible to various reagents including proteases. Changes in protein
conformation usually result in decreased sensitivity to protease digestion, since
potential cleavage sites are buried inside the protein during the folding process.
To limit the appearance of protease-specific bands, a range of proteases should be
used optimally, some of which cleave frequently in a protein. Examples of the latter
are proteinase K, TPCK-trypsin, and TLCK-chymotrypsin. Limited proteolysis was
first used to identify domains in immunoglobulin [14] and is a very powerful tool.
3.2 General Features in Folding Proteins Amenable to in Vivo Study 77
The target protein is exposed to a protease, and in a time course the generation of
stable products, i.e. protected inaccessible domains, can be monitored. To identify
these domains, the digestion pattern of full-length protein can be compared to that
of expressed isolated fragments of the protein, or domain-specific antibodies can
be used to identify protease-resistant fragments.
Other reagents that can be used to obtain more information about the folding
status of a protein include diazenedicarboxylic acid bis(N,N-dimethylamide) (di-
amide) and 4,4 0 -dithiodipyridine (4-DPS) [15, 16]. Diamide is an oxidant that,
when added to the chase medium, will over-oxidize all compartments in the cell,
resulting in oxidation of all cellular thiols, including those in proteins and gluta-
thione. Since only proteins with free sulfhydryl groups are sensitive to diamide
treatment, in the secretory pathway, only folding intermediates will oxidize, as
most mature folded proteins have disulfide bonds. If treated and non-treated
proteins are analyzed by non-reducing SDS-PAGE, folding intermediates and
the native protein can be discriminated [17–19] (Braakman and Helenius, in prep-
aration).
Like diamide, reducing agents such as dithiothreitol (DTT) or 2-mercaptoethanol
(2ME) can be used in a pulse-chase assay to determine the folding status of a pro-
tein. The influenza virus hemagglutinin (HA) [20, 21] and other proteins with
disulfide bonds [22, 23] undergo a conversion from DTT sensitivity to DTT re-
sistance during their conformational maturation in the ER. In combination with
conformation-sensitive and epitope-specific antibodies, DTT sensitivity is very use-
ful in determining the folding hierarchy in a protein.
3.2.3
Changes in Conformation
During folding, domains are formed within a protein, coinciding with changes in
conformation. Antigenic epitopes become exposed, are masked, or form because
distant parts of the polypeptide chain start to interact. Conformational changes
can be followed by probing the folding protein with conformation-sensitive anti-
bodies. Contrary to general belief, completely conformation-sensitive antibodies
are rare. Polyclonal as well as monoclonal antibodies generally recognize a limited
conformation range of a protein, depending on how the antigen used to produce
the antibody was obtained. Protein bands cut out of SDS-PA gels, when injected
into rabbits, yield antibodies that work well in Western blotting (against SDS-
denatured antigen) but that often fail to work in immunofluorescence, EM, or im-
munoprecipitation, which are conditions that allow proteins to remain more or
less native. Antibodies generated against native, properly folded and assembled
antigens often fail to interact with folding intermediates or SDS-denatured protein.
Antibodies against peptides often fail to bind protein, because the peptide is not
representative of the protein conformation or the peptide is not exposed in the
complete protein. Therefore, an antibody needs to be characterized before it be-
comes a powerful tool in protein-folding studies.
Another assay that can be used concerns biological function of the folding pro-
78 3 Studying Protein Folding in Vivo
tein: this includes binding to ligands and other proteins or an increase in enzy-
matic activity. To test, for instance, whether the protein can bind its receptor, the
receptor may be coupled to Sepharose beads, which can be used in a precipitation
assay to isolate the desired protein, or, alternatively, interaction can be tested in an
ELISA-like assay.
3.2.4
Assistance During Folding
All information required for a protein to attain its final three-dimensional structure
resides in its primary sequence [1]. However, complex proteins need the assistance
of molecular chaperones and folding enzymes to reach their native structure effi-
ciently without formation of (large) aggregates. Molecular chaperones interact
with nonnative states of proteins; they are important during folding of newly syn-
thesized polypeptides where they facilitate rate-limiting steps, stabilize unfolded
proteins, and prevent unwanted intra- and interchain interactions, which could
lead to aggregation [24–26]. Classical chaperones in the ER, such as BiP [27], rec-
ognize exposed hydrophobic patches [28, 29], while calnexin and calreticulin recog-
nize monoglucosylated glycan chains on proteins in the ER [30]. BiP is a member
of the Hsp70 family; ATP binding and hydrolysis are crucial for the cycle of
binding and release of substrate [31, 32]. The preference for binding of either
BiP or calnexin/calreticulin is thought to depend on the position of the first N-
linked glycan in the polypeptide chain [33]. When N-glycans are present in the
first @50 amino acids, a protein in general will associate with calnexin and cal-
reticulin first, whereas proteins with N-glycans downstream in the sequence may
bind BiP or other chaperones first. Other observations, however, suggest that cal-
nexin/calreticulin may interact with non-glycosylated proteins as well [34, 35]. As
a rule, a protein’s association with molecular chaperones will decrease its folding
rate but increase folding efficiency and thereby yield.
The ER in addition possesses a wealth of enzymes that assist proteins during
folding. Protein disulfide isomerase (PDI) (56 kDa) [36] is a member of the thiore-
doxin superfamily that acts catalytically in both the formation and reduction of di-
sulfide bonds, and hence in disulfide bond rearrangements. Another PDI family
member is ERp57, which facilitates disulfide bond formation in newly synthesized
proteins [37] and works in a complex with either calnexin or calreticulin [38]. A
different class of enzymes comprises the peptidyl-prolyl isomerases (PPIases) [39],
which catalyze cis/trans isomerization of peptide bonds N-terminally to proline res-
idues [40] and thereby increase folding rates [41]. The ribosome attaches amino
acids in the trans conformation, which requires an isomerization step for all cis
prolines in the final folded structure. PPIases belong to three structurally diverse
families: the cyclophilins (inhibited by cyclosporin A [CsA]), the FK506-binding
proteins (inhibited by FK506 and rapamycin), and the parvulins. They reside in
the cytoplasm as well as in various eukaryotic organelles including the ER. In
1991 Steinmann et al. [42] and Lodish and Kong [43] showed that the folding of pro-
collagen I and transferrin is slowed down by CsA due to inhibition of cyclophilin.
Folding proteins can be assisted by chaperones and folding enzymes from differ-
3.3 Location-specific Features in Protein Folding 79
ent compartments. If a protein is soluble, only the chaperones and folding en-
zymes that are located in the same compartment as the protein will assist folding.
Membrane-bound or membrane-spanning proteins, however, will have domains in
different compartments. For example, both P-glycoprotein and the cystic fibrosis
transmembrane conductance regulator (CFTR) are multiple-membrane-spanning
proteins that are members of the ABC transporter family. For both P-glycoprotein
and CFTR, approximately 10–15% of the protein is located in the ER lumen and
ER membrane, whereas the bulk is located in the cytosol. The only ER chaperone
that interacts with these membrane proteins is calnexin [44, 45]. On the cytosolic
side, the most abundant chaperone Hsp70 assists in folding these proteins [46, 47].
Interestingly, when folding attempts fail, Hsp70 also promotes proteasomal degra-
dation of misfolded CFTR by targeting the E3 ubiquitin ligase CHIP to this protein
[48]. Cytosolic Hsp90 also has been demonstrated to be involved in degrading
CFTR [49]. Since both ER and cytosolic chaperones impose a quality control on
this type of membrane protein, it will be of great interest to investigate how events
at the two sides of a membrane are communicated and coordinated.
To study a possible interaction of a substrate protein with folding enzymes and
chaperones, co-immunoprecipitations (see protocols) can be done using antibodies
against folding factors; if interactions are too weak or too transient, chemical cross-
linkers may stabilize the complex during analysis. To examine the role of a partic-
ular chaperone on the folding of a substrate protein, one may examine the effect of
its increased or decreased expression in the cell or its presence or absence in a fold-
ing assay. To test whether an ATP-dependent chaperone is involved, ATP can be
depleted from cells. Section 3.4 will address these and other conditions that may
affect the folding process.
3.3
Location-specific Features in Protein Folding
3.3.1
Translocation and Signal Peptide Cleavage
3.3.2
Glycosylation
Glycan moieties on proteins are essential for folding, sorting, and targeting of
glycoproteins through the secretory pathway to various cellular compartments. N-
linked glycans affect local conformation of the polypeptide chain they are attached
to; increase local solubility of proteins, thereby preventing aggregation; are impor-
3.3 Location-specific Features in Protein Folding 81
tant for the interaction with the two lectin chaperones in the ER, calnexin and cal-
reticulin; and are important for targeting misfolded proteins for degradation. Hence,
for most glycoproteins, their N-linked glycans are indispensable for proper folding.
The glycosylation process starts at the cytosolic side of the ER, where
monosaccharides are added to the lipid intermediate dolichol phosphate up to
Man5 GlcNac2 -PP-Dol. This precursor is translocated to the lumenal side, where
it is elongated to Glc3 Man9 GlcNAc2 -PP-Dol. The oligosaccharyl transferase then
transfers the 14-mer to an asparagine residue in the consensus sequence N-X-S/T
of a nascent polypeptide chain, wherein X is any amino acid except proline
[58]. Glucosidase I (GI, a membrane-bound enzyme) removes the first glucose
and glucosidase II (GII, a soluble enzyme), the second and third. Monoglucosy-
lated oligosaccharides associate with the ER-resident lectins calnexin (membrane-
bound) and calreticulin (soluble), which are in a complex with ERp57 [38]. When
GII cleaves the last glucose, the protein is released from calnexin/calreticulin
and, if correctly folded, can leave the ER. UDP-Glc:glycoprotein glucosyltransferase
(GT), a soluble ER protein of 160 kDa, is a sensor of glycoprotein conformation
and reglucosylates glycans close to un- or misfolded amino acid stretches [59–61].
By reglucosylating the protein, it becomes a ligand again for calnexin/calreticulin
and therefore a substrate for this quality-control cycle [9]. Two ER mannosidases
can remove one or two mannoses in the ER before the protein is transported to
the Golgi, where further oligosaccharide processing proceeds and proteins can be
O-linked glycosylated. Alternatively, a permanently misfolded protein may be tar-
geted for degradation from the ER through glycan recognition (see Section 3.3.4).
The enzyme endoglycosidase H (Endo H) often is used to monitor the move-
ment of newly synthesized glycoproteins from the ER to the Golgi complex. Gly-
cans on proteins remain sensitive to digestion by Endo H as long as they are in
the ER and in early regions of the Golgi, but they become resistant beyond. Endo
H digestion after the immunoprecipitation of a glycoprotein allows conclusions to
be made on the protein’s location. If a protein is highly heterogeneously glycosy-
lated, e.g., HIV envelope, which has @30 N-linked glycans, Endo H can also be
used after the immunoprecipitation to increase mobility of folding intermediates
in the ER and to bring about a collapse of smeary bands into one sharp band in a
gel, which enables identification and quantitation. The enzyme PNGase F in prin-
ciple removes all glycans, irrespective of composition, and theoretically is the better
deglycosylation enzyme. Its activity changes the glycosylated asparagine into an as-
partic acid, however, which in some proteins decreases mobility, increases fuzzi-
ness of bands, and may confuse the pattern. The enzyme of choice needs to be
determined for every protein one studies. To gather more information on how
glycosylation affects protein folding, glycosylation site mutants and enzymes in-
volved in glycosylation can be used (see Section 3.4).
3.3.3
Disulfide Bond Formation in the ER
3.3.4
Degradation
nated, and then degraded by the 26S proteasome [73]. Post-ER quality-control sys-
tems exist as well: proteins can be transported to the Golgi or the plasma mem-
brane before they are degraded, most often by vacuolar/lysosomal proteases.
Proteins can be degraded after synthesis as well as during their translation,
which is undetectable in a pulse-chase assay. The net result is a lower amount
of labeled protein, which can be visualized only through the use of a proteasome
inhibitor during the pulse labeling. If the total signal of labeled protein at the
end of the pulse is higher in the presence than in the absence of the inhibitor, co-
translational degradation apparently does happen in the control situation. Degrada-
tion after translation can be followed on gel, because the total protein signal will
decrease during the chase. Proteasome inhibitors and lysosomal enzyme inhibitors
can be added to prove degradation and identify the site of breakdown. A useful sys-
tem to study protein folding without degradation is an in vitro translation system
in the absence or presence of a membrane source, the latter ranging from enriched
organelles to digitonin-permeabilized cells. The hemin present in the reticulocyte
lysate inhibits proteasomal activity.
3.3.5
Transport from ER to Golgi and Plasma Membrane
3.4
How to Manipulate Protein Folding
3.4.1
Pharmacological Intervention (Low-molecular-weight Reagents)
To study the role of folding factors in vivo, their activity needs to be manipulated.
Drugs act fast and do not allow time for genetic adaptation of cells, but they
may not inhibit to 100% and may exhibit pleiotropic effects. Genetic changes
of cells circumvent this lack of specificity. One particular folding factor can be
changed, but compensatory regulation processes may completely change the pro-
tein composition of the cell. With these limitations in mind, both high- and
low-molecular-weight manipulations can increase our understanding of protein
folding in cells.
3.4.1.4 Cross-linking
Chemical cross-linkers are useful for multiple purposes in protein science, such as
the stabilization of protein-chaperone complexes. There are two prominent groups
of cross-linkers: homobifunctional cross-linking reagents, which have two identical
reactive groups, and heterobifunctional cross-linking reagents, in which the reac-
tive groups are chemically distinct. Further variation is present in spacer arm
length, cleavability, or membrane permeability, and cross-linkers may react chemi-
cally or photochemically upon UV illumination. Whereas membrane-permeable
cross-linkers are suitable for cross-linking in the cell, membrane-impermeable
reagents are useful for cross-linking at the cell surface and in cell lysates. Cross-
linkers with shorter spacer arms (4–8 Å) often are used for intramolecular
cross-linking, and reagents with longer spacers are favored for intermolecular
cross-linking. The most frequently used cross-linkers at the moment are DSP
(dithio-bis-succinimidylpropionate), a water-insoluble, homobifunctional N-hydrox-
ysuccimide ester with a spacer arm length of 12 Å, which is thiol-cleavable and pri-
mary amine reactive and is used in many applications [91], and BMH (bis-maleimi-
dohexane), a water insoluble, homobifunctional, non-cleavable cross-linker with a
spacer arm length of 16 Å, which is reactive towards sulfhydryl groups and is also
used in many applications [92, 93]. Since there is a wide variety in characteristics
and potential applications of cross-linkers, it is desirable to compare different
cross-linkers before protein-folding studies are performed.
tein. A wide variety in glycosylation inhibitors exists (see also chapter 17), each
blocking a specific step in glycosylation. Tunicamycin, for instance, was isolated
from Streptomyces lysosuperificus in the early 1970s [94]; it inhibits transfer of
N-acetylglucosamine to dolichol phosphate, thereby completely blocking N-linked
glycosylation, which most often causes misfolding of glycoproteins. The misfold-
ing of many proteins in the ER can lead to upregulation of classical chaperones
and folding-facilitating proteins through a so-called unfolded protein response
(UPR) [95].
Other than complete inhibition by tunicamycin, which can trigger the UPR,
more subtle inhibitors can be used, e.g., castanospermine. This is a plant alkaloid
that was isolated from the seeds of the Australian tree Castanospermum australe,
which inhibits glucosidase I and II. An example of a glucosidase I-specific inhibi-
tor is australine, whereas deoxynojirimycin inhibits glucosidase II better than glu-
cosidase I. Glucosidase I and II inhibitors can be useful in preventing association
of the folding protein with the molecular chaperones calnexin and calreticulin,
which bind only to monoglucosylated proteins. If a prolonged association with cal-
nexin or calreticulin is desired, a glucosidase II inhibitor can be added during the
chase period in a pulse-chase assay. This prevents cleavage of the last glucose, such
that the glycoprotein will remain bound to the lectins calnexin and calreticulin. If a
mannosidase I inhibitor such as kifunensine is added during the pulse chase, deg-
radation of misfolded glycoproteins can be prevented.
Glycosylation inhibitors are useful reagents in determining the role of oligosac-
charides during protein folding, but because they often do not inhibit 100% and
because some inhibitors can have an effect on protein synthesis, it is very impor-
tant to always include the proper controls when they are used in experiments.
3.4.2
Genetic Modifications (High-molecular-weight Manipulations)
line CEM.NKR [103], which is resistant to natural killer cells (NK cells) and has no
calnexin.
The most dramatic effect can be expected when a folding factor is completely ab-
sent. In various organisms such as yeast, genes can be deleted. In mammalian cell
lines, this is still a procedure with little success. Alternatives are cell lines derived
from knockout animals, which are becoming increasingly available. In other organ-
isms or cell lines, depletion of a gene is the maximum reachable. Gene expression
can be suppressed by siRNA, which was first discovered in the nematode worm C.
elegans; this animal is capable of sequence-specific gene silencing in response to
double-stranded RNA (dsRNA) [104]. The technology has been adapted for use in
mammalian cells, and an increasing number of siRNA constructs and libraries are
becoming available, which makes siRNA a powerful tool to study how chaperones
and folding enzymes influence protein folding.
3.5
Experimental Protocols
3.5.1
Protein-labeling Protocols
1. Prepare the pulse-chase setup, which includes a water bath at the desired tem-
perature, with tube racks and 1–2 mm water above them, which can be main-
tained at this temperature. Make sure the water level is just above the racks, such
3.5 Experimental Protocols 89
that the bottom of a tissue culture dish is completely in contact with the water but
dishes are not floating when they are without cover.
2. Wash the cells with 2 mL wash buffer and add 2 mL starvation medium. Incu-
bate for 15–40 min. at 37 C in the presence of 5% CO2 .
3. Place the dishes on the rack in the water bath. Make sure there are no air bubbles
below the dishes.
4. After 15 min pre-incubation with starvation medium, start the first pulse label-
ing. The staggering schedule (Figure 3.2) should allow labeling of the last dish
within @40 min of the start of starvation. Pulse label the cells one dish at a
time: aspirate depletion medium, add 400 mL labeling medium at time 0 s. to
the center of the dish, swirl gently, and incubate for the desired labeling time
on the water bath. For pulse times longer than 15 min, a larger volume and
incubation (on a rocker) in a 37 C incubator in the presence of 5% CO2 is rec-
ommended. The pulse time optimally is equal to or shorter than the synthesis time
of the protein under study (an average of 4–5 amino acid residues per second). La-
beling time should be long enough to detect the protein.
5. Add 2 mL chase medium at precisely the end of the pulse (i.e., at time 2 0 0 00 if a
2-min pulse is aimed for). Swirl gently to mix; the labeling stops immediately
upon addition of chase medium. Aspirate this mix of labeling and chase
medium and add another 2 mL of chase medium. For short chases: incu-
bate on the water bath. For chases longer than 30 min, incubate dishes in the
incubator. For 0 0 chase: add 2 mL chase medium at precisely the end of the
pulse. Swirl gently to mix. Aspirate chase medium, place the dish immediately
on ice, and add 2 mL of ice-cold stop buffer. This is best achieved by putting an
aluminum plate on top of an ice pan full of ice, with a wet paper towel on top to allow
optimal contact between the plastic of the cell culture dish and the ice-cold surface.
6. Stop the chase: aspirate chase medium at precisely the endpoint of the chase
time, place the dish on the aluminum plate on ice, and immediately add 2 mL
of ice-cold stop buffer. If the studied protein will be secreted, collect the chase me-
dium and use it for immunoprecipitation.
7. Just before lysis: wash the cells again with 2 mL ice-cold stop buffer.
8. Aspirate the dish as dry as possible without letting cells warm up and add 600
mL ice-cold lysis buffer.
9. Scrape the cell lysate and nuclei off the dish with a cell scraper and transfer the
lysate to a microcentrifuge tube.
10. Centrifuge lysates at 16 000 g at 4 C for 10 min to spin down the nuclei. Use
the supernatant directly for immunoprecipitations or transfer the supernatant
to a new microcentrifuge tube, snap freeze, and store at 80 C.
DTT Resistance When proteins fold and become more compact, this is usually
reflected by resistance to reduction by DTT in vivo. Influenza hemagglutinin, for
example, acquires DTT resistance when it is a completely oxidized monomer that
is trimerization-competent but not yet a trimer [21].
1. Label the protein (see pulse-chase protocol) and chase for different times.
2. After the chase: perform in a parallel dish with the same chase time an addi-
tional chase of 5 min with 5 mM DTT. Compare this sample with the sample
in which the protein is chased without DTT.
Diamide Resistance
1. Label the protein (see pulse-chase protocol) and chase for different times.
2. After the chase: perform in a parallel dish with the same chase time an addi-
tional chase of 5 min with 5 mM diamide. Compare this sample with the sam-
ple in which the protein is chased without diamide.
the minimum volume for incubating the cells during the pulse (x mL), (This is
cell line dependent.)
the number of chase points ( y), and
the desired sample volume (z). We used, for example, x ¼ 300 mL, y ¼ 3, and
z ¼ 500 mL.
1. Transfer suspension cells to a sterile 50-mL tube with cap. We used, for exam-
ple, @10 6 cells per time point.
2. Pellet cells at 500 g for 4 min at RT. Resuspend cells in 2 y mL depletion me-
dium and pellet cells again. Resuspend cells in 2 y mL depletion medium. Incu-
bate cells for 15 min at 37 C in a CO2 incubator.
3. Pellet cells at 500 g for 4 min at RT. Resuspend cells in x mL depletion medium,
change to appropriate tube if necessary, and place in a water bath at 37 C or the
desired temperature for the experiment.
4. Add 50–100 mCi 35 S-labeled methionine and cysteine per time point and mix
gently to start the pulse. Incubate for the pulse period.
5. Add byz mL chase medium. The total volume should be slightly more than
y z mL to allow for fluid loss due to evaporation during the experiment. Mix
by gently pipetting up and down.
6. Immediately take the first sample (z mL). Transfer to microcentrifuge tube with
pre-prepared z mL 2 concentrated lysis buffer on ice, mix well, and keep on ice.
92 3 Studying Protein Folding in Vivo
7. After every chase interval, collect a sample and add to 2 concentrated lysis
buffer on ice, mix well, and keep on ice.
8. Spin the lysates at 16 000 g for 10 min at 4 C to pellet the nuclei. The post-
nuclear lysate can be directly used for immunoprecipitations or transferred to a
new microcentrifuge tube, snap frozen, and stored at 80 C.
Semi-permeabilized Cell System (see Chapter 18) In vitro translation in the pres-
ence of semi-permeabilized cells [12] is a method with a complexity between the
pulse-chase assay and the in vitro chase assay, since there is only one membrane
between the protein and the experimenter, whereas there are two in the pulse-
chase assay and none during the second part of the in vitro chase assay.
In the semi-permeabilized cell system, cells are treated with the detergent digi-
tonin, which selectively permeabilizes the plasma membrane, leaving cellular
organelles such as ER and Golgi complex intact. These semi-permeabilized cells
(SP cells) are added to an in vitro translation system, where they act as a source of
ER membranes. Only the mRNA of the protein of interest is present, which pre-
cludes the need for immunoprecipitations. This assay is especially useful when
there are no antibodies available against the protein under study, for protease resis-
tance studies, or when only the ER form of a protein is under study and ER-to-
Golgi transport is undesirable.
In Vitro Chase Assay In the in vitro chase assay (Maggioni et al., manuscript sub-
mitted), a protein is pulse labeled/translated in vivo and chased in a detergent lysate.
Since there are no membranes during the chase period, the direct environment of
the folding protein can be manipulated. For example, ATP levels in the detergent
lysate can be depleted, and chaperones and organellar cell fractions can be added.
After point 4 of the basic pulse-chase protocol (now using 10-cm dishes):
For the in vitro assay 10-cm dishes are used instead of 6-cm dishes. Therefore, all buffer
volumes used in steps 1–4 of the basic pulse-chase protocol need to be multiplied by 2–
2.5.
1. Wash cells twice with 4 mL ice-cold HBSS wash buffer, aspirate the buffer, and
immediately lyse with 1.8 mL ice-cold lysis buffer without alkylating agent.
2. Scrape the cell lysate and nuclei off the dish with a cell scraper, transfer lysates
to a microcentrifuge tube, and immediately take out 1/6 of the lysate and add
NEM to this sample to a concentration of 20 mM (0 min sample in the in vitro
chase assay).
3. Spin cell lysates for 10 min at 16 000 g at 4 C to pellet nuclei. Transfer post-
nuclear cell lysate to a new tube. The supernatant of the lysate without NEM is
used in the rest of the in vitro chase assay.
After transfer to a new tube, add 5 mM GSSG and transfer the tube immediately to
a water bath at 30 C.
4. After different times of in vitro chase, take 300 mL of the lysate and transfer it
3.5 Experimental Protocols 93
into a new tube that contains NEM to a final concentration of 20 mM and im-
mediately incubate on ice.
5. Use the supernatant directly for immunoprecipitations or snap freeze and store
at 80 C.
3.5.2
(Co)-immunoprecipitation and Accessory Protocols
3.5.2.1 Immunoprecipitation
1. Mix 50 mL of washed protein A-Sepharose beads and the optimal amount of an-
tibody and shake in a shaker for 60 min at 4 C. Optimize the incubation time for
each antibody. The optimal amount of antibody is the amount that precipitates all, or
at least the maximum amount, of antigen from the solution. This needs to be tested
for each antibody-antigen combination by re-incubation of the supernatant from the
beads-antibody-antigen incubation (see step 3 below) with antibody and beads. If the
antibody has a low affinity for protein A, protein G-Sepharose beads or a bridging an-
tibody between the primary antibody and the protein A beads, such as a goat or rabbit
anti-mouse Ig, can be used. Instead of Sepharose beads, heat-killed and fixed S. aur-
eus cells can be used to immobilize the antibody. Because these cells are more difficult
to resuspend than Sepharose beads, spin conditions should be 5 min at 1500 g rather
than 1 min at max speed. Resuspension time will be longer and should be added to
the wash time.
2. Add 100 mL to 600 mL post-nuclear cell lysate and couple in the head-over-head
rotator for at least 30 min (depending on the antibody) at 4 C. Test for each
antibody the minimum incubation time needed (between 30 min and overnight).
With one lysate, different immunoprecipitations (such as one with an antibody
that recognizes all forms of the protein, a conformation-sensitive antibody, or a co-
immunoprecipitation) can be done to obtain as much information as possible on the
folding protein.
3. Pellet beads by spinning 1 min at RT max, aspirate the supernatant, and use it
for re-precipitation to test whether the antibody precipitated all antigen mole-
cules in the solution (see step 1 above). Add 1 mL washing buffer and shake
for at least 5 min in a shaker at RT. Repeat this washing procedure once. Test
different wash buffers with different concentrations of SDS, other detergents, combina-
tions thereof, and salt. Also test different wash times and wash temperatures. It is not
necessary to wash more than twice. Rather than increase the number of washes, con-
ditions of the two washes should be changed. If this does not give the desired result,
pre-clearing of the detergent cell lysate is the method of choice (see calnexin protocol).
4. Pellet beads and aspirate supernatant. Add 20 mL TE to the beads, vortex to re-
suspend the beads, and add 20 mL 2 concentrated sample buffer and vortex
again. If an Endo H digestion is performed, add here the Endo H buffer instead of
the TE. It is important to resuspend in TE before adding SDS, because addition of
SDS to a pellet may induce or increase aggregation.
94 3 Studying Protein Folding in Vivo
5. Heat samples for 5 min at 95 C, vortex when still hot, and pellet the beads. The
supernatant is the non-reduced sample.
6. To make a reduced sample, transfer 18 mL supernatant to a new tube to which a
defined drop of DTT solution has been added (final DTT concentration should
be 20 mM) and vortex. Heat samples for 5 min at 95 C. Centrifuge shortly to
spin down the condensation fluid. The samples are now ready to be loaded
onto an SDS-PA gel. Addition of DTT to the bottom of the tube allows one to check
whether DTT actually has been added to the sample. Re-oxidation during electropho-
resis, which happens if proteins have many cysteines, is prevented by adding an excess
alkylating agent, such as 100 mM NEM, to all samples before loading.
1. Aspirate the dish as dry as possible and add 600 mL ice-cold lysis buffer (2%
CHAPS in 50 mM Na-HEPES pH 7.6 and 200 mM NaCl. Do not add EDTA to
the lysis buffer, since calcium is needed for the interaction.
2. Scrape the cell lysate and nuclei from the dish and transfer into a microcentri-
fuge tube.
3. Use around 200 mL of the lysate for the co-immunoprecipitation and use a par-
allel amount for an immunoprecipitation using an antibody that recognizes all
antigen that is present.
4. Add 0.1 mL of a 10% suspension of heat-killed and fixed S. aureus cells
to the lysate to reduce background and rotate or shake for 30–60 min at
4 C.
3.5 Experimental Protocols 95
5. Pellet the fixed S. aureus and nuclei at max speed to remove the smallest par-
ticles and use the supernatant for the immunoprecipitation.
6. Couple the antibody to the protein A-Sepharose beads for 30 min at 4 C in a
shaker.
7. Add the pre-cleared lysate.
8. Couple at 4 C for 1–16 h, depending on the antibody.
9. Pellet the beads by centrifugating for 2 min at 8000 g.
10. Wash twice with 0.5% CHAPS in 50 mM Na-HEPES pH 7.6, containing 200
mM NaCl.
11. After the last wash: continue the immunoprecipitation protocol at step 4.
2. Add 10 mL MNT þ 0.5% Triton X-100 to the beads and resuspend gently.
3. Add 1 mL protease from a 10 stock. Resuspend by vortexing and incubate for
exactly 15 min on ice. Use a protease titration range (for example, 0.25, 1, 5, 25,
100, or 500 mg mL1 final concentration) to investigate protease susceptibility of
each protein. TPCK-trypsin, TLCK-chymotrypsin, proteinase K, and endoprotei-
nase Glu-C V8 are suitable proteases (Sigma) for these studies.
4. After incubation, inhibit the protease by first adding 1 mL PMSF (phenylmethyl-
sulfonyl fluoride) from a 10 stock before adding 10 mL 2 sample buffer.
TPCK-trypsin is specifically inhibited by adding a fivefold excess of soybean
trypsin inhibitor.
5. Heating the sample for 5 min at 95 C will completely inactivate the protease.
6. Analyze the radio-labeled proteolytic pattern on a 12–15% SDS-PA gel and ex-
pose to film or phosphor screen.
1. Pellet beads and aspirate all wash buffer from the protein A-Sepharose beads.
2. Add 15 mL 0.2% SDS in 100 mM sodium acetate, pH 5.5, and heat for 5 min at
95 C to denature the protein.
3. Cool and add 15 mL 100-mM sodium acetate, pH 5.5, containing 2% Triton X-
100 and protease inhibitors: 10 mg mL1 each of chymostatin, leupeptin, anti-
pain, and pepstatin and 1 mM PMSF. Add 0.0025 U Endo H (Roche).
4. Mix and incubate for 1.5–2 h at 37 C. Optimize incubation time for each protein.
5. Spin down the fluid and add 7.5 mL 5 sample buffer and mix.
6. Heat for 5 min to 95 C.
7. Spin down. At this moment a reducing sample can be prepared (see immuno-
precipitation protocol).
3.5.3
SDS-PAGE [13]
1. Prepare 0.75-mm thick polyacrylamide separating and stacking gels. The acryl-
amide percentage depends on the molecular weight of the studied protein and should
position the protein just below the middle of the gel to allow maximal separation of
different forms of the protein.
2. Load 8 mL of each sample (and 3 mL of the total lysates on a separate gel to check
labeling efficiency). Not using the outer lanes of the gel, load 1 sample buffer
in the outside as well as empty lanes to prevent ‘‘smiling’’ of the bands. When
reduced and non-reduced samples are loaded in adjacent lanes in SDS-PAGE, DTT
can diffuse into the non-reduced lanes. Either one or two lanes should be left open
between the two samples, or an excess quencher should be added to all samples (in-
cluding the non-reduced ones) just before loading. We frequently use 100 mM N-
ethylmaleimide (final concentration in the sample) for this purpose.
3. Run each gel at max speed at constant current (25 mA per gel for Hoefer mini-
gels) until the dye front is at the bottom of the gel. Maximum speed without over-
heating limits lateral diffusion and guarantees sharper bands.
4. Stain the gel with Coomassie brilliant blue for 5 min and destain to visualize
antibody bands.
5. Neutralize gels for 1–3 5 min. If incubated too long in the absence of acid, bands
will become diffuse. A minimum time of neutralization is needed to prevent precipita-
tion of salicylic acid, if gels are too acidic.
6. Treat gels with enhancer solution for 15 min. Do not incubate longer than 15 min
to prevent non-specific signal.
7. Dry gels at 80 C on 0.4-mm Schleicher & Schuell filter paper.
8. Expose to film at 80 C or to a phosphor-imaging screen.
Various types of native gels can be used to study the conformation of proteins, oli-
gomerization, aggregation, and the interactions between different proteins. The
mobility depends on both the protein’s charge and its hydrodynamic size. Blue
native gels [106] are often are used for membrane proteins.
Two-dimensional SDS-PAGE
1. Load samples on a tube gel and run, or run on a slab gel with pre-stained
markers on either side of the lane. The first dimension can be a native gel or a
non-reducing SDS-PA gel, an IEF gel, or virtually any other kind.
2. Extract tube gel from capillary, or cut the lane of interest out of the first dimen-
sion slab gel.
3. Boil the tube or strip for 10 min in reducing sample buffer.
4. Place the tube or strip on an SDS-polyacrylamide slab gel.
5. A reduced sample can be run in a separate lane close to the edge of the slab gel.
6. Run in the second dimension as a regular slab gel.
Acknowledgements
We thank Claudia Maggioni for sharing the in vitro chase assay protocol and for
assistance with the figures, and we thank other members of the Braakman lab, es-
pecially Ari Ora, Aafke Land, and Nicole Hafkemeijer, for critical reading of the
manuscript.
References
69 Sevier, C. S., Cuozzo, J. W., Vala, A., 79 Leonard, C. K., Spellman, M. W.,
Aslund, F. & Kaiser, C. A. (2001). A Riddle, L., Harris, R. J., Thomas,
flavoprotein oxidase defines a new J. N. & Gregory, T. J. (1990).
endoplasmic reticulum pathway for Assignment of intrachain disulfide
biosynthetic disulphide bond bonds and characterization of
formation. Nat Cell Biol 3, 874–882. potential glycosylation sites of the
70 Tu, B. P. & Weissman, J. S. (2004). type 1 recombinant human
Oxidative protein folding in eukaryotes: immunodeficiency virus envelope
mechanisms and consequences. J Cell glycoprotein (gp120) expressed in
Biol 164, 341–346. Chinese hamster ovary cells. J Biol
71 Creighton, T. E. (1978). Chem 265, 10373–10382.
Experimental studies of protein 80 Pelletier, L., Jokitalo, E. and
folding and unfolding. Prog Biophys Warren, G. (2000). The effect of
Mol Biol 33, 231–297. Golgi depletion on exocytic transport.
72 Creighton, T. E. (1990). Disulphide Nature Cell Biology 2, 840–846.
bonds between cysteine residues. In 81 Stevens, T., Esmon, B. & Schekman,
Protein structure. A practical approach. R. (1982). Early stages in the yeast
(Creighton, T. E., ed.), pp. 155–168. secretory pathway are required for
IRL Press, Oxford. transport of carboxypeptidase Y to the
73 Sommer, T. & Wolf, D. H. (1997). vacuole. Cell 30, 439–448.
Endoplasmic reticulum degradation: 82 Campbell, K. P., MacLennan, D. H.,
reverse protein flow of no return. Jorgensen, A. O. & Mintzer, M. C.
FASEB J 11, 1227–1233. (1983). Purification and character-
74 Hosokawa, N., Wada, I., Hasegawa, ization of calsequestrin from canine
K., Yorihuzi, T., Tremblay, L. O., cardiac sarcoplasmic reticulum and
Herscovics, A. & Nagata, K. (2001). identification of the 53,000 dalton
A novel ER alpha-mannosidase-like glycoprotein. J Biol Chem 258, 1197–
protein accelerates ER-associated 1204.
degradation. EMBO Rep 2, 415–422. 83 Lodish, H. F. & Kong, N. (1990).
75 Jakob, C. A., Bodmer, D., Spirig, U., Perturbation of cellular calcium blocks
Battig, P., Marcil, A., Dignard, D., exit of secretory proteins from the
Bergeron, J. J., Thomas, D. Y. & rough endoplasmic reticulum. J Biol
Aebi, M. (2001). Htm1p, a Chem 265, 10893–10899.
mannosidase-like protein, is involved 84 Klenk, H. D., Garten, W. & Rott, R.
in glycoprotein degradation in yeast. (1984). Inhibition of proteolytic
EMBO Rep 2, 423–430. cleavage of the hemagglutinin of
76 Oda, Y., Hosokawa, N., Wada, I. & influenza virus by the calcium-specific
Nagata, K. (2003). EDEM as an ionophore A23187. EMBO J 3, 2911–
acceptor of terminally misfolded 2915.
glycoproteins released from calnexin. 85 Thastrup, O., Cullen, P. J., Drobak,
Science 299, 1394–1397. B. K., Hanley, M. R. & Dawson, A. P.
77 Molinari, M., Calanca, V., Galli, C., (1990). Thapsigargin, a tumor
Lucca, P. & Paganetti, P. (2003). promoter, discharges intracellular
Role of EDEM in the release of Ca2þ stores by specific inhibition of
misfolded glycoproteins from the the endoplasmic reticulum Ca2(þ)-
calnexin cycle. Science 299, 1397–1400. ATPase. Proc Natl Acad Sci USA 87,
78 Cummings, R. D., Kornfeld, S., 2466–2470.
Schneider, W. J., Hobgood, K. K., 86 Lytton, J., Westlin, M. & Hanley,
Tolleshaug, H., Brown, M. S. & M. R. (1991). Thapsigargin inhibits
Goldstein, J. L. (1983). Biosynthesis the sarcoplasmic or endoplasmic
of N- and O-linked oligosaccharides of reticulum Ca-ATPase family of
the low density lipoprotein receptor. J calcium pumps. J Biol Chem 266,
Biol Chem 258, 15261–15273. 17067–17071.
References 103
4
Characterization of ATPase Cycles
of Molecular Chaperones
by Fluorescence and Transient Kinetic Methods
4.1
Introduction
4.1.1
Characterization of ATPase Cycles of Energy-transducing Systems
ATP has long been known to be involved in many important cellular processes
such as energy transduction in molecular motors, ion transport and signal trans-
mission, topological processing of nucleic acids and the fidelity of protein syn-
thesis. In spite of the diverse functions of ATPases, their mechanism includes
some universal steps: the specific recognition and binding of the nucleotide, a con-
formational change induced by nucleotide binding, phosphoryl transfer (chemical
reaction), adjustment of the conformation to the product state, and the release of
products, which is accompanied by the regeneration of the initial state of the en-
zyme.
Although ATPases in principle employ the same reaction steps, they have to
cope with entirely different tasks. Therefore, the consequence of a single reac-
tion step on a molecular level depends on the respective system. The ATP state of
DnaK, for example, represents a form that rapidly traps and releases potential sub-
strate proteins [1, 2]. In the context of molecular motor proteins, ATP binding
causes the forward movement of conventional kinesin (power stroke), but ATP
binding to myosin, on the other hand, leads to dissociation of myosin from actin
and is accompanied by the recovery stroke of its lever arm [3].
With some exceptions, an exact definition of the coupling between ATP hydroly-
sis and enzymatic action is still missing for many of the systems. History has re-
vealed repeatedly that mechanisms remain a matter of speculation until equilib-
rium constants and reaction rates of all relevant intermediates (functional states)
have been identified. In addition, the stability of enzyme-bound nucleotide as well
as the velocity of product release represent molecular timers that are carefully
regulated by accessory proteins in many ATPases. Determination of kinetics and
measurements of force and motion of single molecules are therefore essential for
a comprehensive understanding of the respective enzymatic mechanism.
4.1.2
The Use of Fluorescent Nucleotide Analogues
-O3S N
Base modified N
O O O
NH P O P O P O R N N NH2
O- O- O- γ AmNS-ATP NH2 R
2-Amino purine TP
N 6
H
7 N NH2
5 1N
CH3
8
4 2 H N
9 3 N N
Phosphate modified O O O N N N
H
O N N
HO P O P O P O
O R
O- O- O- H H
C8-maba-ATP
H 3’ 2’ H
HO OH
H H H H H H
O OH O OH O OH
NO2
107
Tab. 4.1. Commercially available AXP matrices for affinity chromatography with ATPases.
4.2
Characterization of ATPase Cycles of Molecular Chaperones
4.2.1
Biased View
The following description of the ATPase cycles of DnaK, Hsp90, and ClpB systems
certainly does not claim to give a comprehensive overview of the many roles and
aspects of these chaperones. For this purpose, other excellent reviews in this book
and outside, as mentioned in the individual chapters, should be referred to. We are
also aware that the selection of results and methods here is highly biased towards
fluorescence, nucleotide analogues, and related methods, but we feel that the
reader may benefit more from a description of the various aspects of these tech-
niques than from a (bound-to-fail) attempt to cover all conceivable approaches.
4.2.2
The ATPase Cycle of DnaK
The ATPase cycles of DnaK and its eukaryotic ortholog Hsc70 are well character-
ized from a variety of methods at this point. Comprehensive overviews may be
found in this volume and in other reviews (e.g., Refs. [24–27]).
Where applicable, references to particular aspects concerning ATPase activity or
nucleotide binding are covered in Section 4.3.
4.2.3
The ATPase Cycle of the Chaperone Hsp90
Hsp90 has a central role in communicating numerous folding as well as other pro-
cesses in the cell [28–32]. Despite great interest in its molecular mechanism, the
involvement of ATP in its function remained a matter of debate for an extended
time period. It was reported by Csermely and co-workers that Hsp90 has an ATP-
110 4 Characterization of ATPase Cycles of Molecular Chaperones
cleotide, with K d around 500 mM [14]. This fortunate fact allowed us to determine
many individual steps of the ATPase cycle of Hsp90 with stopped-flow and other
methods. Once the basic properties of P(g)-MABA-ATP binding were determined,
they served as a diagnostic tool to help clarify the role of individual residues or
domains [43, 44] and, similarly to studies concerning DnaK and its nucleotide-
exchange factor GrpE, the effect of regulating co-chaperones such as Sti1 on the
catalytic cycle [45, 46].
4.2.4
The ATPase Cycle of the Chaperone ClpB
4.2.4.1 ClpB, an Oligomeric ATPase With Two AAA Modules Per Protomer
The molecular chaperone ClpB belongs to the AAA superfamily of proteins that
couple ATP binding and hydrolysis to the disruption of protein-protein interaction
and the remodeling of protein complexes in an extraordinary variety of biological
circumstances (for reviews, see Refs. [47, 48]). AAA proteins contain at least one
ATPase module consisting of two physical domains with conserved sequence mo-
tifs (Walker A, Walker B, sensor-1) [49] and form ring-shaped oligomers. The con-
formation of the oligomer and in some cases oligomerization itself are controlled
by the bound nucleotide [50, 51].
In contrast to the related proteins ClpA and ClpX, which unfold soluble protein
substrates for proteolysis, ClpB and its S. cerevisiae homologue Hsp104 act directly
on protein aggregates to return them to the soluble state [52–54]. Recovery of
active proteins from the aggregated state by ClpB requires the assistance of the
DnaK-DnaJ chaperone system [52, 55–57].
The molecular details of the mechanism linking ATP binding and hydrolysis to
the disruption of interactions in protein aggregates remain largely undefined. In
the case of ClpB, which comprises two nucleotide-binding domains (NBD), clarifi-
cation of this mechanism is a particularly difficult problem. On the one hand, one
has to obtain site-specific information regarding the contribution of each NBD
to the activity (catalysis and binding), but, on the other hand, one has to consider
that these properties are regulated by the activity of the other site. In addition, the
accumulation of ATP-binding sites within the ClpB oligomer is suggestive of com-
plex allosteric behavior, in which inter-ring communications might contribute to
complexity.
A
4
1
0 1 2 3 4 5 10 20 30
[ClpB] ( µM)
B
4
Fluorescence (a. u.)
1
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
[ADP] (mM)
Fig. 4.2. Equilibrium titration experiments for ClpB Tth (wt) and ClpB Tth (K204Q) and a
with ClpB Tth and MANT-ADP. (A) 0.5 mM K d of 15.9 mM (G1.0) was observed for
MANT-ADP was titrated with ClpB Tth (wt) (c), ClpB Tth (K601Q). (B) Displacement titration. A
ClpB Tth (K204Q) (s), ClpB Tth (K601Q) (C), preformed ClpB Tth *MANT-ADP complex was
and the double mutant ClpB Tth (K204Q/K601Q) titrated with ADP (T ¼ 25 C, l ex ¼ 360 nm,
(k) at 25 C. The excitation wavelength was l em ¼ 440 nm). Fitting the curves to a cubic
360 nm and emission was detected at 440 nm. equation yields K d values of 1.5 mM (G0.08)
Data were fitted according to the quadratic for ClpB Tth (wt) (c), of 2.8 mM (G0.09) for
equation assuming a 1:1 binding stoichiome- ClpB Tth (K204Q) (s), and of 2.3 mM (G0.8) for
try. A K d of 0.2 mM (G0.02) was observed ClpB Tth (K601Q) (C).
changes in response to binding at both NBDs, and for both ClpB(wt) and the NBD
mutants, a saturable increase in fluorescence was observed (Figure 4.2A). Fitting
the data to the quadratic binding equation (see Section 4.3) yielded a Kd of 0.2 mM
for ClpB(K204Q) and of 16 mM for ClpB(K601Q), indicating that the NBD2 rep-
resents a high-affinity and the NBD1 a low-affinity binding site. As the binding af-
finities differ by two orders of magnitude, MANT-ADP binding to the high-affinity
4.2 Characterization of ATPase Cycles of Molecular Chaperones 113
site of ClpB(wt) masks binding to the low-affinity site. Active site titrations with
MANT-ADP and ClpB(K204Q) showed that data are best described by a stoichiom-
etry of 1:2 (one molecule of MANT-ADP binds to a dimer of ClpB), whereas bind-
ing to the low-affinity binding site NBD1 follows a 1:1 stoichiometry.
Determination of the affinities for ADP and ATP was based on displacement ti-
trations with a preformed MANT-ADP–ClpB complex and the corresponding non-
fluorescent nucleotides (Figure 4.2B). According to the Kd values obtained by fit-
ting this data to the cubic equation (see Section 4.3), ATP binds with a ca. 10-fold
lower affinity compared to ADP. (ATP hydrolysis might tamper with the apparent
Kd calculated from displacement titrations with ATP. As ATPase activity especially
of the NBD mutants is very low, this effect could be neglected.) Surprisingly, the
displacement titrations did not confirm the assignment of the NBDs as low- and
high-affinity binding sites. With regard to the displacement titrations, both NBDs
bind nucleotides with approximately the same affinity (K d (ADP) A 2 mM; K d
(ATP) A 30 mM).
In contrast to ClpB from E.coli, which is capable of hexamerization only in the
presence of ATP [51, 58], ClpB Tth forms a hexamer in the absence of nucleotides
and in the presence of ATP. ADP binding to ClpB Tth triggers dissociation of high-
order oligomers and formation of dimers. Gel-filtration studies with ClpB Tth (wt)
and the NBD mutants indicated that ADP binding to the NBD1 is responsible for
ClpB dissociation [21]. As the bound nucleotide determines the oligomeric state of
ClpB, according to the rules of thermodynamics, the oligomeric state should influ-
ence nucleotide affinity. To investigate this, the ClpB concentration was kept con-
stant during the fluorescence equilibrium titration with increasing amounts of
MANT-ADP (Figure 4.3A). This experiment was repeated with ClpB concentrations
ranging from 0.5 mM to 50 mM, thereby covering different distributions of oligo-
mers. Analysis of the obtained data showed that the affinity of the NBD2 to
MANT-ADP indeed depends on the oligomeric state of ClpB Tth (Figure 4.3B). The
ClpB dimer represents the high-affinity state, while high-order oligomers bind with
decreased affinity.
In the case of the chaperone ClpB, application of the fluorescent MANT nucleo-
tide in combination with mutants partially deficient in nucleotide binding has
some drawbacks. In the first place MANT-ATP does not meet the requirements of
an ATP analogue. The rate of hydrolysis is 100-fold slower compared to ATP, and
binding of MANT-ATP triggers oligomer dissociation in contrast to ATP, which
stabilizes high-order oligomers. In addition, affinities for MANT-ADP and the un-
modified nucleotide differ by a factor of 10, indicating altered kinetic parameters of
the modified nucleotide. Notably, these effects are distinct from another technical
problem: that of mutations in one active site, such as the K204Q mutation in the
Walker A motif of ClpB Tth NBD1, which affects properties of the second site by
altering the oligomeric state of the protein.
Because of such problems, we suggest that the site-specific modification of ClpB
to introduce fluorescent reporter groups into the NBDs is more suitable to obtain
information regarding the properties of each module. A single tryptophan substi-
tution (Y819W) in the C-terminal domain of NBD2 of Hsp104, which has no other
114 4 Characterization of ATPase Cycles of Molecular Chaperones
A
6
0
0 0.2 0.4 0.6 0.8 1 1.2
[Mant-ADP] ( µM)
B
8
6
KD (µM)
0
0 2 4 6 8 10 12 14 16 18 20 22
[CIp B] (µM)
Fig. 4.3. Reverse equilibrium titration of free fluorophore and fitted to a quadratic
ClpB Tth . (A) 0.5 mM ClpB Tth (wt) (b) or buffer equation. (B) The plot shows the observed K d
(f) was titrated with MANT-ADP at 25 C dependent on the ClpB concentration applied
(l ex ¼ 360 nm, l em ¼ 440 nm). The titration in the titra-tion experiment.
curve was corrected for the contribution of the
0.05
0.04
0.03
k (s-1)
0.02
0.01
0
0 0.2 0.4 0.6 0.8 1
[ATP] (mM)
Fig. 4.4. Steady-state kinetic analysis of ATP of Mg-ATP at 25 C. A fit of the sigmoidal
hydrolysis by ClpB Tth . The turnover rate k was curve with the Hill equation (see ‘‘Coupled
determined in a coupled colorimetric assay by Enzymatic Assay ’’) yields n ¼ 3:0, k cat ¼ 0:044
incubating 5 mM ClpB Tth with various amounts s1 , and K m ¼ 0:34 mM.
different subunits) and heterotypic (between NBD1 and NBD2) levels. Quantitative
understanding of the catalytic properties of both ATPase domains is complicated
by the fact that effects of mutations on the ATPase activity studied to date are not
consistent for different ClpB/Hsp104 proteins [58, 60, 61]. Indeed, measurements
of ATP hydrolysis, determined in a coupled colorimetric assay with ClpB Tth [21] or
by quantifying released Pi in the case of Hsp104/Hsp78 [61, 62], identified positive
cooperativity (Figure 4.4). Mutations in both Walker A motifs of ClpB Tth caused a
complete loss of cooperativity and a 100-fold reduction of the ATPase activity. How-
ever single-turnover experiments show that both NBDs retain a basal ATPase activ-
ity [21]. Remarkably, formation of mixed oligomers, consisting of subunits from
ClpB(wt) and the Walker A mutants, leads to a decrease in the ATPase activity, in-
dicating that cooperative effects are based on inter-ring interactions.
A very detailed characterization of the properties of the two AAA modules of
Hsp104 could be attained by using [g- 32 P]ATP in combination with quantification
of a molybdate–inorganic phosphate complex [59]. Analysis of the kinetics of ATP
hydrolysis of NBD1 and NBD2 shows that one site (NBD1) is a low-affinity site
with a relatively high turnover (K m 1 ¼ 170 mM, kcat 1 ¼ 76 min1 ) and that the sec-
ond site is characterized by a much higher affinity and a 300-fold slower turnover
(K m 2 ¼ 4:7 mM, kcat 2 ¼ 0:27 min1 ). Both sites show positive cooperativity, with
Hill coefficients of 2.3 and 1.6 for the low- and high-affinity sites, respectively. Us-
ing mutations in the AAA sensor-1 motif of NBD1 and NBD2 that reduce the rate
of ATP hydrolysis without affecting nucleotide binding, Lindquist and coworkers
probed the communication between both sites [63]. Impairing ATP hydrolysis of
the NBD2 significantly alters the steady-state kinetic behavior of NBD1. Thus,
Hsp104 exhibits allosteric communication between the two sites in addition to
homotypic cooperativity at both NBD1 and NBD2.
116 4 Characterization of ATPase Cycles of Molecular Chaperones
ATP ATP
ADP ADP
1 1 1 1
k cat 1 ≈ 20 min
-1 1 k cat 1 » 20 min-1
1
2 2 2 2
2 2
ATP ATP k cat 2 ≈ 0.3 min
-1
ADP ADP
ATP ADP
Fig. 4.5. Model for regulation of ATP hydroly- NBD1 is relatively slow. Upon ATP hydrolysis
sis at Hsp104 NBD1 by the hydrolysis cycle at at NBD2, a presumed conformational change
NBD2 (adapted from Hattendorf et al. [63]). occurs, resulting in a state in which the rate at
When ATP is bound to NBD2, hydrolysis at NBD1 is elevated.
4.3
Experimental Protocols
4.3.1
Synthesis of Fluorescent Nucleotide Analogues
1
H-NMR (DMSO-d 6 , 500 MHz): d ¼ 1:57 (m; 4H, (CH2 )2 ), d ¼ 2:68 (m; 4H,
(NCH2 )2 ), d ¼ 2:88 (d; 3H, aryl-NCH3 ), d ¼ 3:33 (t; 2H, alkyl-NH2 ), d ¼ 6:73 (d;
1H, aryl-3-H), d ¼ 6:66 (m; 1H, aryl-4-H), d ¼ 7:39 (m; 1H, aryl-5-H), d ¼ 7:65 (d;
1H, aryl-6-H), d ¼ 7:78 (t; 1H, OCNH), d ¼ 8:45 (q; 1H, aryl-NH).
1
H-NMR (D2 O, 500 MHz; TSP): d ¼ 1:29 (t; 18H; CH3 ; (CH3 CH2 )3 NHþ -
counter-ion); 1.93 (m; 4H; (CH2 )2 ); 2.78 (s; 3H; aryl-NCH3 ); 2.91 (m; 4H; N(CH)2 );
3.21 (q; 12H; CH2 ; (CH3 CH2 )3 NHþ -counter-ion); 3.74 (m; 1H; 4 0 -H); 4.23 (m; 2H;
5 0 -H); 4.37 (m; 1H; 3 0 -H); 4.49 (m; 1H; 2 0 -H); 6.08 (d; 1H; 1 0 -H); 6.72 (m; 1H;
aryl-4-H); 6.79 (d; 1H; aryl-6-H); 7.30 (d; 1H; aryl-3-H); 7.37 (m; 1H; aryl-5-H);
8.16 (s; 1H; H-2); 8.47 (s; 1H; H-8).
31
P-NMR (D2 O; 500 MHz, PO43 ): d ¼ 1:24 (d; a-phosphate); 11.29 (d; b-
phosphate).
ESI-MS (negative ion mode): 629.1 gmol1 , 629.4 gmol1 were expected.
Synthesis of (Pg)MABA-ATP EDC was used to condense the tetralinker with the g-
phosphate of ATP to give adeninetriphospho-g-(N 0 -methylanthraniloylaminobutyl)-
phosphoramidate (referred to as (Pg )MABA-ATP). The synthesis was carried out as
described for (Pb )MABA-ADP with 100 mg ATP instead of ADP.
118
NH2 NH2
1.) EDC
N N
N N
O
N N O- O-
N N
(CH2)4
O (PO3)23- O P P N N
H H
O O O O
HN
2.) O NH CH3
NH CH3
(H2C)4
HO OH HO OH
NH2
Fig. 4.6. Synthesis of P(b)MABA-ADP. Condensation of ADP
4 Characterization of ATPase Cycles of Molecular Chaperones
1
H-NMR (D2 O, 500 MHz; TSP): d ¼ 1:29 (t; 27H; CH3 ; (CH3 CH2 )3 NHþ -
gegenion); 1.92 (m; 4H; (CH2 )2 ); 2.78 (s; 3H; aryl-NCH3 ); 2.91 (m; 4H; N(CH)2 );
3.21 (q; 18H; CH2 ; (CH3 CH2 )3 NHþ -gegenion); 3.78 (m; 1H; 4 0 -H); 4.26 (m; 2H;
5 0 -H); 4.38 (m; 1H; 3 0 -H); 4.53 (m; 1H; 2 0 -H); 6.08 (d; 1H; 1 0 -H); 6.71 (m; 1H;
aryl-4-H); 6.78 (d; 1H; aryl-6-H); 7.31 (d; 1H; aryl-3-H); 7.35 (m; 1H; aryl-5-H);
8.16 (s; 1H; H-2); 8.45 (s; 1H; H-8).
31
P-NMR (D2 O; 500 MHz, PO43 ): d ¼ 2:83 (d; a-phosphate); 13.35 (d; g-
phosphate); 24.73 (t; b-phosphate).
ESI-MS (negative ion mode): 709.1 gmol1 , 708.4 gmol1 were expected.
d ¼ 7:98 (s; 1H; H-2); 7.27 (t; 1H; CH3 00 ) 3 J (7.63 Hz); 7.15 (d; 1H; CH5 00 ) 3 J (7.83
Hz); 6.68 (d; 1H; CH2 00 ) 3 J (8.41 Hz); 6.50 (t; 1H; CH4 00 ) 3 J (7.46 Hz); 5.96 (d; 1H;
1 0 -H) 3 J1 0 2 0 (7.8 Hz); 4.65 (t; 1H; 2 0 -H) 3 J2 0 3 0 (6.74 Hz); 4.44 (m; 1H; 3 0 -H); 4.35
(s; 1H; 4 0 -H); 4.18 (m; 2H; 5 0 5 00 -H); 3.55 (m; 2H, (CH2 )4 ); 3.38 (m; 2H; (CH2 )1 );
2.72 (s; 3H; Ph-N-CH3 ); 1.79 (m, 2H, (CH2 )2 ); 1.72 (m; 2H; (CH2 )3 ); plus counter-
ion (triethylammonium) peaks.
Synthesis of MANT-ADP and MANT-ATP [18] The nucleotide (0.5 mmol ADP,
ATP) was dissolved in 7 mL water at 38 C. The pH was adjusted to 9.6 with
NaOH. To this solution a crystalline preparation of methylisatoic anhydride (2.5
mmol) was added in small portions with continuous stirring. The pH was main-
tained at 9.6 by titration with 2 N NaOH for 2 h. The reaction mixture was cooled
and washed three times with 10 mL chloroform; the derivative remains in the
4.3 Experimental Protocols 121
aqueous phase. The reaction products can be monitored by TLC on silica gel.
On a silica gel plate developed in 1-propanol/NH 4 OH/water (6:3:1, v/v, containing
0.5 g L1 of EDTA), the R f values behave as follows: unmodified nucleotides <
fluorescent nucleotides (R f (MANT-ADP): 0.44, R f (MANT-ATP): 0.21) < N-methyl-
anthranilic acid. The fluorescent analogue can be identified under an ultraviolet
lamp (366 nm) by its brilliant blue fluorescence, while N-methylanthranilic acid
shows violet fluorescence.
The reaction mixture was then diluted to a volume of 100 mL and the pH ad-
justed to 7.5 with 1 M acetic acid. To remove byproducts, the solution was applied
to the anion-exchange resin Q-Sepharose FF (3:0 25 cm, Pharmacia). After a
wash with 50 mM triethyl ammonium bicarbonate (TEAB), pH 7.5 at a flow rate
of 5 mL min1 , the column was eluted with a linear gradient to 0.7 M TEAB. The
effluent was monitored by UV absorbance at 260 nm and 366 nm; the fluorescent
nucleotide analogues elute after the unreacted nucleotide and the fluorescent
byproduct between 0.6 and 0.7 M TEAB in a single peak. MANT nucleotide-
containing fractions were identified by TLC and the absorption profiles. Fractions
were pooled, the solvent was distilled under vacuum, the residual was dissolved in
methanol, and the pure product was lyophilized.
Purity of the product was analyzed by reverse-phase HPLC with a C18 column
(ODS-Hypersil, 5 mM 250 4:6 mm, Bischoff ). The column was developed with a
linear gradient to 50% (v/v) acetonitrile in 50 mM potassium phosphate, pH 6.5,
over 20 min at 1.5 mL min1 and elution products were monitored by absorption
at 280 nm and fluorescence (excitation, 360 nm; emission, 440 nm). MANT-ADP
elutes as a single peak after 12.5 min, while MANT-ATP elutes after 11.7 min.
The final concentration of MANT nucleotides was determined by UV absorbance
with e255 ¼ 23 300 M1 cm1 at pH 8.0. The yield of MANT nucleotide amounts
to @50%. MANT nucleotides exist in solution as an equilibrium mixture of both
the 2 0 and 3 0 isomers (60% 3 0 isomer), which are in slow base-catalyzed exchange
(t 1/2 @ 10 min, pH 7.5, 25 C).
4.3.2
Preparation of Nucleotides and Proteins
Start 2 70 30
7 2 70 30
7.1 2 0 100
12 2 0 100
12.1 2 70 30
20 2 70 30
with a C18 column (ODS-Hypersil, 5 mM, 250 4:6 mm, Bischoff ) and detection at
254 nm. The sample (usually 20 mL of a 10-mM solution) was eluted with 50 mM
potassium phosphate (pH 6.8) over 10 min at 1.5 mL min1 . Nucleotides elute as
single peaks with retention times of 3.5 min (ATP), 4.1 min (ADP), and 5.0 min
(AMP) at room temperature.
0 1.5 100 0
2.0 1.5 100 0
22.0 1.5 0 100
23.0 1.5 100 0
30.0 1.5 100 0
50% (w/v) trichloroacetic acid and incubated for 10 min on ice. The precipitated
protein was removed by centrifugation. 10 mL of the supernatant was neutralized
with 20 mL of a 2-M solution of potassium acetate. The sample was diluted 1:10 in
50 mM potassium phosphate (pH 6.8) and 20 mL of this solution was analyzed by
HPLC (see ‘‘Assessment of Quality of Nucleotide Stock Solution’’ for HPLC condi-
tions). By comparing the integrated peaks with calibration peaks corresponding to
nucleotide stock solutions of known concentrations, the nucleotide content of the
probe can be calculated.
KB and Mg2B Depletion As two Kþ and one Mg 2þ coordinate the bound nucleo-
tide in the ATPase site of DnaK [67], removal of these ions should decrease nucleo-
tide affinity. Therefore, the nucleotide depletion procedure included extensive dial-
ysis over 2–3 days against a buffer containing 1 mM EDTA in the absence of Kþ
and Mg 2þ ions followed by size-exclusion chromatography [68]. A shortcoming of
this method is that at high protein concentrations the dissociation of a very small
percentage of the bound nucleotide provides enough free nucleotide to prevent
further dissociation. When reducing the enzyme concentration to facilitate the di-
alysis of the bound nucleotide, protein denaturation occurs easily [69].
4.3.3
Steady-state ATPase Assays
Observing the ATPase reaction under steady-state conditions implies that the
rate of reaction is constant for a relatively long period of time and that multiple
turnovers take place. This is the case when the ATP concentration exceeds the
chaperone concentration substantially and, ideally, the ATP concentration does
not change during the experiment. This could be achieved by using an ATP-
regenerating system. Steady-state ATPase assays allow determination of the enzy-
matic parameters Km and Vmax ; however, it is not possible to define individual
steps and rate constants of the ATPase mechanism. Procedures for determining
the steady-state ATPase activity of chaperones are listed below.
chaperone
ATP ADP + Pi
PK
ADP + PEP ATP + pyruvate
+
LDH
Pyruvate + NADH,H lactate + NAD+
Fig. 4.7. The coupled colorimetric assay. ADP is reduced to lactate by the LDH (lactate
produced by the hydrolysis of ATP reacts with dehydrogenase), while NADH is oxidized. The
PEP (phosphoenolpyruvate) to form pyruvate decrease of the NADH absorption at 340 nm is
in a reaction catalyzed by the PK (pyruvate a measure for hydrolytic activity.
kinase), whereby ATP is regenerated. Pyruvate
4.3 Experimental Protocols 125
The kcat and K m of the ATPase reaction can be determined by plotting the ATP
turnover rate k against the ATP concentration and by fitting the obtained curve to
the Michaelis-Menten equation (Eq. (2)). If allosteric effects play a role in ATP hy-
drolysis, the Michaelis-Menten equation should be expanded by introducing the
Hill coefficient n (Eq. (3)) (Figure 4.4).
4.3.4
Single-turnover ATPase Assays
the mixture was incubated for 5 min on ice, and 40 mL of 2-M potassium acetate
was added for neutralization. After centrifugation, 20 mL of the supernatant
was analyzed with reverse-phase HPLC on a C18 column (ODS-Hypersil, 5 mM
250 4:6 mm, Bischoff ) and detection at 254 nm (for conditions, see 4.3.2.1 ‘‘Un-
modified Nucleotides’’). The relative amounts of ATP and ADP present in each
sample were calculated by integrating peaks corresponding to ADP and ATP from
the elution profile.
The single-turnover ATPase activity of DnaK has been characterized with radio-
active [a- 32 P]ATP [70, 78] and [g- 32 P]ATP [9] using procedures similar to those de-
scribed for steady-state measurements.
4.3.5
Nucleotide-binding Measurements
Time (min)
0 50 100 150 200
0.0
µcal/sec
-0.2
-0.4
0
kcal/mole of ADP
N 0.522 0.018
Ka 2.77E5 4.6E4
Kd 3.6E-6 6.0E-7
H -5447 257
-4
binding isotherm for the interaction is obtained (Figure 4.8). ITC measurements
yield an entire set of experimental parameters for the binding process, including
binding affinity (Ka or Kd ), binding stoichiometry (n), heat (DH), and heat capacity
(DCp). ITC measurements were used to identify the ATP-binding site of Hsp90
[81]. The protocol cited below was used for equilibrium studies of the DnaK-
GrpE-ADP system from T. thermophilus [82].
A solution of 10 mM DnaKTth in the presence or absence of 42 mM GrpETth in 25
mM HEPES-NaOH (pH 7.5), 100 mM KCl, 5 mM MgCl2 (cell volume 1.3 mL), was
titrated with 142 mM MABA-ADP in 6-mL steps until saturation was achieved. In
the same way, a solution of 11 mM DnaKTth or DnaKTth pre-incubated with 20 mM
4.3 Experimental Protocols 129
MABA-ADP was titrated with a solution of 162 mM GrpETth in 7-mL steps until
saturation. Solutions were incubated for 5 h before measurements to allow for
completion of the binary DnaKTth -GrpETth or DnaKTth –MABA-ADP complex for-
mation. Data were analyzed by Eq. (4) using the program Microcal Origin for ITC
to give the enthalpy, stoichiometry of binding, and association constant.
Q: heat in J
[E]: total concentration of the protein in M
[L]: total concentration of the ligand in M
V: volume of cell in liters
DHB : molar enthalpy of the binding reaction in J mol1
n: number of active binding sites on the protein
KA : association equilibrium constant in M1
½protein
y¼ E ð5Þ
½protein þ KD
Fluorescence
6
0
0 2 4 6 8 10 12 14 16 18
[MANT-ADP] ( µM)
Fig. 4.9. Active site titration of ClpB Tth (trunc) an intersection at [MANT-ADP] ¼ 4.8 mM. The
with MANT-ADP. Concentration of volume-corrected concentration of ClpB(trunc)
ClpB(trunc), a shortened ClpB variant lacking at this point is 9.8 mM. This indicates that one
the NBD1, at start was 10 mM (measured ClpB-dimer is able to bind one molecule of
according to Ref. [106]), and initial volume was MANT-ADP.
600 mL. Interpolation of the two phases gives
mine protein concentration can deviate for certain proteins. Once these two prereq-
uisites are met, the actual experiment/analysis is quite simple. Protein is added to
the ligand in small steps, usually corresponding to concentration increases that
amount to 1/20th of the initial ligand concentration until an appreciable saturation
of the fluorescence signal is obtained. Now the protein may be added in larger
steps until (after volume correction) no further change of signal is observed. The
two resulting phases of this titration, initial nearly linear increase and saturation,
are extrapolated by hand with linear slopes. The point of intersection then deter-
mines at which concentration of added protein the ligand is saturated and as such
the stoichiometry of the binding isotherm.
An example for such an experiment is given in Figure 4.9, where MANT-ADP
was titrated to ClpB Tth (trunc), a shortened ClpB variant lacking a functional NBD1.
Interpolation of the two phases gives an intersection at [MANT-ADP] ¼ 4.8 mM,
while the volume-corrected concentration of ClpB(trunc) at this point is 9.8 mM.
This indicates that one molecule of MANT-ADP is bound per ClpB dimer. Adding
ClpB Tth (trunc) to MANT-ADP under similar conditions also results in a biphasic
curve, where the slope of the second phase is close to zero. The intersection also
points to a 2:1 stoichiometry (ClpB(trunc):MANT-ADP).
with the known Kd for the fluorescent nucleotide to obtain the affinity (see Figure
4.2B). The general mathematical description of this system is the solution of a
cubic equation (see ‘‘Cubic Equation’’ below) [88]. This complete analysis is specif-
ically required if the affinities of the two ligands are very high such that the corre-
sponding Kd values are much lower than the initial concentration of the substance
to be titrated.
4.3.6
Analytical Solutions of Equilibrium Systems
A þ B S AB
KdAB
Scheme 4.1
½A½B
KdAB ¼ ð6Þ
½AB
½A½B
KdAB ¼
½AB
A quadratic equation usually gives two solutions, but here only one is biochemi-
cally meaningful, and the final solution to calculate the concentration of the com-
plex AB as a function of the total concentrations of A and B and a given dissocia-
tion constant KdAB therefore is:
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
½Ao þ ½Bo þ KdAB ½Ao þ ½Bo þ KdAB 2
½AB ¼ ½Ao ½Bo ð10Þ
2 2
½AB
DF ¼ DFmax ð11Þ
½B0
The observed total fluorescence F in every titration step is composed out of the flu-
orescence of the free ligand F0 and the change in fluorescence DF. Substituting
[AB] with Eq. (10) yields an equation for the observed total fluorescence:
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
½A 0 þ ½B0 þ KdAB ½A 0 þ ½B0 þ KdAB 2
½A 0 ½B0
2 2
F ¼ F0 þ DFmax
B0
ð12Þ
A þ B S AB A þ C S AC
KdAB KdAC
Scheme 4.2
½A½B ½A½C
KdAB ¼ ; KdAC ¼ ð13Þ; ð14Þ
½AB ½AC
If the total concentrations of B and C are much higher than that of A, then the
observed dissociation constant of the AC complex, KdAC (obs), is defined by the
simple relationship:
½B Bo
KdAC ðobsÞ ¼ KdAC 1 þ A KdAC 1 þ
KdAB KdAB
If the total concentrations of B and C are close to that of A, however, the assump-
tion that ½C A ½Co and ½B A ½Bo no longer holds. The deviation from a simple hy-
perbolic binding is most evident if ½A o g KdAB and KdAC .
The general solution of the system in Scheme 4.2 is the solution of a cubic equa-
tion. This solution is not based on any assumptions with respect to the relationship
of ½A 0 ; ½B0 , and [C 0 ] to KdAB and KdAC .
Insert Eqs. (16) and (17) into Eq. (15):
Insert Eq. (13) into Eq. (16) and solve for [B]
½Bo
½B ¼ ð19Þ
½A
1þ
KdAB
Insert Eq. (14) into Eq. (17) and solve for [C].
½Co
½C ¼ ð20Þ
½A
1þ
KdAC
0 1 0 1
½Bo ½Co
½Ao ¼ ½A þ B½Bo C þ B½C C ð21Þ
@ ½A A @ o ½A A
1þ 1þ
KdAB KdAC
[A] has to be replaced by [AB] since we want to calculate the concentration of the
complex [AB]. Insert Eq. (19) into Eq. (18) and solve for [A]:
136 4 Characterization of ATPase Cycles of Molecular Chaperones
½Bo
½A
½A
1þ
KdAB
½AB ¼
KdAB
rearrange,
KdAB
½A ¼ ½AB 0 1
½Bo
B C
@ ½A A
1þ
KdAB
and finally
½ABKdAB
½A ¼ ð22Þ
½AB ½Bo
½A o ½Bo 2 KdAC
a3 ¼ ð28Þ
ao
4.3 Experimental Protocols 137
The general solution of the cubic equation Eq. (24) may be obtained as described
[89].
However, only one out of the three solutions given above is meaningful and has to
be selected.
The automatic selection is a bit tricky, but once it is implemented it works pretty
well. Imagine there is no substance C present; then a quadratic solution would be
sufficient to calculate the concentration of the complex [AB]. We can still apply the
cubic solution since it is valid for any arbitrary situation. If we now compare the
solutions from the quadratic equation with the three solutions of the cubic equa-
tion, we realize that only one of those three solutions is correct. Now we know
which of the three solutions to use. It turns out that this will still hold even if
[C 0 ] is not zero.
A more comprehensive treatment of this problem revealed that it is always the
first solution [AB]1 that represents the physically meaningful solution [90].
If R 2 Q 3 > 0, then there is only one solution:
2 3
qpffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi
6 3 Q 7 a1
½AB ¼ signðRÞ4 R 2 Q 3 þ jRj þ qp ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi5
3
R 2 Q 3 þ jRj 3
This case rarely occurs with the usual values of concentrations and dissociation
constants all being positive.
This approach to analyzing competition experiments is complicated and often
unnecessary, but experimental constraints do not always allow choosing conditions
such that a simple analysis is possible. The choice of appropriate experimental con-
ditions, in conjunction with the complete analysis as described above, also allows
gathering information about binding stoichiometries in a competition experiment.
138 4 Characterization of ATPase Cycles of Molecular Chaperones
Scheme 4.3
In the examples given above, we searched for analytical solutions that allowed us
to define the desired complex concentrations as a function of total concentrations
and K d values. The order of the corresponding equation was directly related to the
number of different ligands present. It was a quadratic equation (order 2) for two
compounds and a cubic equation (order 3) for three compounds. In principle we
could also obtain an analytical solution for a four-component system since the re-
sulting quadric equation would still be solvable. We will avoid this hassle and in-
stead use an iterative method that searches for the roots of a coupled equilibrium
system. This method is known as the Newton-Raphson algorithm to find roots in
systems of nonlinear equations and is also used by most commercial software
packages (e.g., Scientist). It should be noted that these computer programs solve
implicit equilibrium systems with this numerical method even with simple two-
component systems.
An early practical example of a four-component system was the competition of
Mg 2þ and Ca 2þ for the metal chelators ATP and EDTA or EGTA in muscle re-
search [91]. Solutions for multi-species equilibria are also regaining attention in
chaperone research due to the multi-enzyme–multi-ligand complexes involved.
The effects of the DnaK nucleotide-exchange factor GrpE as measured with fluores-
cent nucleotide in competition with its unlabeled form, e.g., constitute a quater-
nary system [10, 82, 92], with
½A½C ½A½D
KdAC ¼ ; KdAD ¼ ð31Þ; ð32Þ
½AC ½AD
½B½C ½B½D
KdBC ¼ ; KdBD ¼ ð33Þ; ð34Þ
½BC ½BD
Conservation equations:
½A½C ½A½D
½A þ þ ½A o ¼ 0
KdAC KdAD
½B½C ½B½D
½B þ þ ½Bo ¼ 0
KdBC KdBD
ð39Það42Þ
½B½C ½A½C
½C þ þ ½Co ¼ 0
KdBC KdAC
½A½D ½B½D
½D þ þ ½Do ¼ 0
KdAD KdBD
Eqs. (39)–(42) describe the system of equations that has to be solved. One step that
is necessary to get the Newton-Raphson (NR) algorithm to work is to generate a
matrix of partial derivatives where the species that are iterated (x) are freely select-
able. For example:
If the total concentrations are given, calculate free concentrations and concentra-
tions of complexes.
If free concentrations are fixed, calculate total concentrations and concentration
of complexes.
If complex concentrations are given, calculate free and total concentrations.
General case:
2 3
qy1 qy1
6 qx1
6 qx4 7
7
6 7
6 7
6 7
6 7
4 qy qy4 5
4
qx1 qx4
Let us assume that we define the total concentrations and want to calculate the free
concentrations; then the matrix of partial derivatives would read:
2 3
½A½C ½A½D ½A½C ½A½D
6 q ½A þ þ ½Ao q ½A þ þ ½Ao 7
6 KdAC KdAD KdAC KdAD 7
6 7
6 q½A q½D 7
6 7
6 7
6 ½B½C ½B½D ½B½C ½B½D 7
6 q ½B þ þ ½Bo q ½B þ þ ½Bo 7
6 KdBC KdBD KdBC KdBD 7
6 7
6 q½A q½D 7
6 7
6 7
6 ½B½C ½A½C ½B½C ½A½C 7
6 q ½C þ þ ½Co q ½C þ þ ½Co 7
6 KdBC KdAC KdBC KdAC 7
6 7
6 q½A q½D 7
6 7
6 7
6 7
6 q ½D þ ½A½D þ ½B½D ½Do q ½D þ
½A½D ½B½D
þ ½Do 7
4 KdAD KdBD KdAD KdBD 5
q½A q½D
ð43Þ
140 4 Characterization of ATPase Cycles of Molecular Chaperones
Why do we need this matrix of partial derivatives (see Ref. [89] p. 306)?
If x* is the vector of all x i , then each function f i may be extended in a Taylor se-
ries around x* .
X
N
qfi
fi ðx* þ dx* Þ ¼ fi ðx* Þ þ dxj þ Oðdx* 2 Þ ð45Þ
j¼1
qxj
A system of linear equations for the correction factors dx* is obtained if dx* 2 and all
higher-order terms are neglected. All equations could thus be optimized simultane-
ously:
X
N
qfi
aij dxj ¼ b i where aij 1 and b i ¼ fi ð46Þ
j¼1
qxj
a thus represents the matrix of partial derivatives. Since we reduced our problem to
a system of linear equations, it may be solved by any standard numerical technique
like partial pivoting or lower/upper triangular (LU) decomposition. This proceeds
iteratively and usually needs some 2 to 10 iterations to converge; the computer
simulates the approach to equilibrium!
Trial runs indicated that the numerical calculations became very unstable when
the matrix of partial derivatives was generated by the approximation of finite differ-
ences, e.g.,
qfi fi ðx þ Dxj Þ
G ð47Þ
qxj fi ðxÞ
2 3
1 0 0 0
6 0 1 0 0 7
6 7
6 7
4 0 0 1 0 5
0 0 0 1
All the examples presented above concern calculations of the same type: free con-
centrations, total concentrations, or complex concentrations. The question is now
whether there is a way to calculate mixed species. It is obvious that the choice of
species that may be defined or calculated is limited to a certain extent. For exam-
ple, if the total amount of A o and the complex concentration is specified, then the
concentration of A f cannot be specified since it has to be varied to give the AC de-
sired. In this case, the system would be overdetermined. There could also be cases
where the system is underdetermined; then the solution will not be unique or the
system will not converge.
4.3.7
Time-resolved Binding Measurements
4.3.7.1 Introduction
Mechanistic questions and kinetic models are closely related, since a kinetic model
is usually derived from hypotheses of the enzyme’s mechanism. The model is then
scrutinized by other appropriate kinetic measurements that should ideally be de-
signed to falsify – not ‘‘prove’’ – the model. The link between the model and exper-
imental data is provided by the equations that are obtained by solving the differen-
tial system determined by the model. Although it is becoming more popular to use
computer programs that avoid the hassle of analytical solutions (through numeri-
cal integration), it is often still more instructive to use these solutions, and some
examples of the most often needed types are shown below.
142 4 Characterization of ATPase Cycles of Molecular Chaperones
k
A!B
Under certain boundary conditions, this scheme can also be applied to enzymatic
reactions, e.g., single-step hydrolysis of ATP where either ATP binding is not rate-
limiting and saturating or the reaction is initiated with pre-bound ATP (e.g., by ad-
dition of catalytic Mg 2þ ). Another example is the displacement of fluorescently
labeled, pre-bound nucleotide by an excess of unlabeled ligand (usually also nu-
cleotide) to measure directly the rate constant for dissociation [10, 11, 97].
If the initial concentration of B at t ¼ 0 ðBo Þ is 0, then the differential equations
for that system read:
d½A
¼ k½A
dt
ðD1Þ; ðD2Þ
d½B
¼ k½A
dt
Converted to Laplace space (please refer to literature above for information on how
to set up equations)
sa ¼ ka þ a 0
ðL1Þ; ðL2Þ
sb ¼ ka
a0
a¼
sþk
ka ka 0
b¼ ¼ ðS1Þ; ðS2Þ
s sðs þ kÞ
Looking in the table for the back transformation and setting k equal to a gives
1 const const
L ¼ ð1 eat Þ
sðs þ aÞ a
and
const
L1 ¼ eat :
sþa
d½A
¼ k 1 ½A þ k1 ½B
dt
ðD1Þ; ðD2Þ
d½B
¼ k 1 ½A k1 ½B
dt
sa a 0 ¼ k 1 a þ k1 b
ðL1Þ; ðL2Þ
sb ¼ k 1 a þ k1 b
a 0 þ k1 b
)a¼ ðS1Þ
s þ k1
k a k 1 ao
b¼ 1 o ¼
k 1 k1 sðs þ k 1 þ k1 Þ
ðs þ k 1 Þ s þ k1
s þ k1
with k 1 þ k1 1 a.
144 4 Characterization of ATPase Cycles of Molecular Chaperones
const const
L1 ¼ ð1 eat Þ ðS2Þ
sðs þ aÞ a
k 1 ½A 0
and hence BðtÞ ¼ ð1 eðk 1 þk1 Þt Þ.
k 1 þ k1
d½A
¼ kþ1 ½A½B þ k1 ½C
dt
d½B
¼ kþ1 ½A½B þ k1 ½C ðD1ÞaðD3Þ
dt
d½C
¼ kþ1 ½A½B k1 ½C
dt
If the boundary conditions are such that Bo g A o (that is, the species B at t ¼ 0 is
in excess) and Co ¼ 0, then the reaction becomes pseudo-first order since the con-
centration of B does not change significantly over time, BðtÞ A Bo . The set of (ap-
proximated) differential equations then is:
d½A
G kþ1 ½A½Bo þ k1 ½C
dt
d½B
G0 ðD4ÞaðD6Þ
dt
d½C
G kþ1 ½A½Bo k1 ½C
dt
Solve for a
ao þ k1 c
aG ðS1Þ
s þ kþ1 b0
ao bo kþ1
cG ðS2Þ
sðs þ bo kþ1 þ k1 Þ
Ao Bo kþ1
½CðtÞ G ð1 eðBo kþ1 þk1 Þt Þ
Bo kþ1 þ k1
Note that the amplitude is also an approximation, but in practice a three- to four-
fold excess of Bo should be sufficient. To answer how close this approximation is to
the real solution, we perform an exemplary calculation.
1 10 1 10
½Cðt ¼ infinityÞ ¼ ¼ ¼ 0:909 mM
10 1 þ 1 11
The exact solution as calculated with the quadratic binding equation is:
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi
1 þ 1 þ 10 1 þ 1 þ 10 2
10 1 ¼ 0:90098 mM
2 2
We can conclude that the approximation of the amplitude is sufficiently good un-
der these conditions.
If B0 ¼ 2 mM, i.e., if the excess of ligand is just twofold over A 0, then the kinetic
121 2
‘‘solution’’ gives an amplitude of ¼ ¼ 0:666 in comparison to the exact
21þ1 3
pffiffiffiffiffiffiffiffiffiffiffi
quadratic equation, which gives 2 4 2 ¼ 0:5857 as a value for the final
complex concentration after equilibration. This shows that, even being far from
pseudo-first-order conditions, the amplitude derived under these conditions is still
146 4 Characterization of ATPase Cycles of Molecular Chaperones
a fair approximation. We should be aware, however, that the shape of the time trace
will be substantially different and thus large errors will be introduced for the re-
sulting rate constant after a fitting procedure, including large visible systematic de-
viations that are clearly visible with a residual plot [95].
One should also note here that in general, without the assumption that one li-
gand is in excess, binding systems are rarely solvable – as are many other nonlin-
ear systems.
kþ2
C
Scheme 4.4
d½A
¼ ðk 1 þ k2 Þ½A
dt
d½B
¼ k 1 ½A ðD1ÞaðD3Þ
dt
d½C
¼ k2 ½A
dt
Again, in Laplace space, we replace d/dt with the symbol s. Any species N that is
not initially zero (at time 0, start of the reaction) gets the additional integration
term no .
sa ao ¼ ðk 1 þ k2 Þa
sb ¼ k 1 a ðL1ÞaðL3Þ
sc ¼ k2 a
ao k 1 ao
sb ¼ k 1 )b¼ ðS2Þ
s þ k 1 þ k2 sðs þ k 1 þ k2 Þ
The table of Laplace back transformations tells us that the back transformation to
real space (L1 ) of our system is:
1 1
L1 ¼ ð1 eat Þ
sðs þ aÞ a
so that:
k1
½BðtÞ ¼ ½Ao ð1 eðk 1 þk2 Þt Þ ðS3Þ
k 1 þ k2
and accordingly:
k1
½AðtÞ ¼ ½Ao 1 ð1 eðk 1 þk2 Þt Þ
k 1 þ k2
ðS3Þ; ðS4Þ
k2
½CðtÞ ¼ ½Ao ð1 eðk 1 þk2 Þt Þ
k 1 þ k2
kobs ¼ k 1 þ k2
k1
Bobs ¼ Ao
k 1 þ k2
A obs
k1 ¼ kobs
Ao
k2 ¼ kobs k 1
148 4 Characterization of ATPase Cycles of Molecular Chaperones
d½A
¼ k 1 ½A
dt
d½B
¼ k 1 ½A k2 ½B ðD1ÞaðD3Þ
dt
d½C
¼ k2 ½B
dt
4.3 Experimental Protocols 149
This system is transformed forward into Laplace space to give the following linear
equation system: boundary conditions: Aðt ¼ 0; A 0 Þ and B0 ; C 0 ¼ 0
sa ao ¼ k 1 a
sb ¼ k 1 a k2 b ðL1ÞaðL3Þ
sc ¼ k2 b
ao
a¼ ðS1Þ
s þ k1
k 1 ao 1
b¼ ðS2Þ
s þ k 1 s þ k2
k2 k 1 ao 1
c¼ ðS3Þ
s s þ k 1 s þ k2
k1
½BðtÞ ¼ ½Ao ðek 1 t ek2 t Þ
k2 k 1
1 k 1 t k2 t
½CðtÞ ¼ ½Ao 1 þ ðk2 e k1e Þ
k 1 k2
measurements) are what the maximum of [B] is that can be obtained and at which
time this is reached.
k1
d ½Ao ðek 1 t ek2 t Þ
d½B k2 k 1
¼0¼
dt dt
k1
¼ ½Ao ðk 1 ek 1 tB max þ k2 ek2 tB max Þ
k2 k 1
k2
ln
k1
tB max ¼
k2 k 1
k1
½BðtB max Þ ¼ ½Ao ðek 1 tB max ek2 tB max Þ
k2 k 1
A plot of the maximal relative yield of the intermediate B as a function of the indi-
vidual rate constants is simulated and shown in Figure 4.11. Additionally, a similar
plot for the time when this maximum is reached is shown in Figure 4.12.
0.8
maximal yield for the
intermediate B
0.6
0.4
0.2
0
2 4 6 8 10 20 40 60 80 100
k2
Fig. 4.11. Simulation of maximal yield of B obtained for
different ratios of k 1 (10) and k2 of a consecutive reaction
k1 k2
(A ! B ! C).
4.3 Experimental Protocols 151
4
t (Bmax)
0
2 4 6 8 10 20 40 60 80 100
k2
Fig. 4.12. Plot of time point where maximal yield of B is
reached in consecutive reaction (k 1 ¼ 10, k2 as indicated on
x-axis).
For an even advanced treatment, which is necessary for two-step binding mech-
anisms including competition, the reader is referred to Refs. [98, 99] and to some
related ‘‘classics’’ that are concerned mostly with muscle research [100–104].
These mechanisms occur when, e.g., a fluorescent nucleotide serves as an estab-
lished monitor for a two-step binding process, but of interest are the properties of
the unmodified analogue that may also bind in two steps. An approach that
uses numerical integration was also described for binding/competition of N 8 -(4-
N 0 -methylanthraniloylaminobutyl)-8-aminoadenosine 5 0 -diphosphate and ADP to
DnaK from Thermus thermophilus [9].
Many binding interactions occur in two or more steps. How do we recognize this
and what are the consequences? Two types of mechanism have often been ob-
served. In principle, two different mechanisms are possible: initial ‘‘weak’’ binding
followed by an isomerization reaction
E þ L Ð EL Ð E L ðCase 1Þ
E Ð E þ L Ð E L ðCase 2Þ
To derive the according kinetic equations in the most general fashion, we use the
following model:
k1 kþ2
ASBSC
k1 k2
dc
¼ kþ2 b k2 c ¼ kþ2 ða 0 a cÞ k2 c
dt
da
¼ kþ1 a þ k1 b ¼ kþ1 ða 0 b cÞ þ k1 b
dt
dc 1 k2 c
b¼ þ
dt kþ2 kþ2
da dc 1 k2 c dc k1 k1 k2 c
¼ a0 c kþ1 þ þ
dt dt kþ2 kþ2 dt kþ2 kþ2
d 2c da dc dc
¼ kþ2 kþ2 k2
dt 2 dt dt dt
dc kþ1 k2 dc k1 k1 k2 c
¼ kþ2 ao kþ1 þ þ kþ1 c ckþ1 þ þ
dt kþ2 kþ2 dt kþ2 kþ2
dc dc
kþ2 k2
dt dt
d 2 c dc
þ ½kþ1 þ k1 þ kþ2 þ k2 þ c½kþ1 k1 þ kþ2 kþ1 þ k1 k2 ¼ kþ2 kþ1 a 0
dt dt
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
½kþ1 þ k1 þ kþ2 þ k2 G ½kþ1 þ k1 þ kþ2 þ k2 2 4½kþ1 k2 þ kþ2 kþ1 þ k1 k2
l1 ; l2 ¼
2
For the case 1ðE þ L Ð EL Ð E LÞ and assuming that l 0 g e 0 , the expression kþ1
is replaced by kþ1 lo, or briefly kþ1 0, to indicate that the product of kþ1 and lo is vir-
tually constant under these conditions:
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
½kþ1 0 þ k1 þ kþ2 þ k2 G ½kþ1 0 þ k1 þ kþ2 2 4½kþ1 0 k2 þ kþ2 kþ1 0 þ k1 k2
l1 l2 ¼
2
In the general case, this means complex kinetic behavior, with both solutions (l1
and l2 ) of the differential equation contributing to the transient observed. How-
ever, under certain conditions, simpler behavior is seen.
Examples are:
1. If k1 is much larger than kþ2 and k2 , the second term under the square root
bracket will always be much smaller than the first, and we can use the first two
terms of the binomial expansion as a good approximation.
4.3 Experimental Protocols 153
We can also simplify the first term by removing kþ2 and k2 .
na n1 b
[Binomial expansion of aða bÞ n ¼ a n . . .]
1
Thus,
1 ½kþ1 0 k2 þ kþ2 kþ1 0 þ k1 k2
½kþ1 0 þ k1 G ½kþ1 0 þ k1 4
2 ½kþ1 0 þ k1
l1 ; l2 ¼
2
or
Note: The solutions of the equation are actually all negative, but the sign will be
dropped at this point (terms in the solution are of the form Aelt ).
k2 kþ1 0
¼ k2 þ
kþ1 0 þ k1
k2 k2
¼ k2 þ ¼ k2 þ
k1 Kd 0
1þ 1 þ
kþ1 0 lo
where Kd0 is the dissociation constant for the first step. This behavior is shown in
Figure 4.13 and was, e.g., observed for the binding of ATP to DnaK [11].
2. If k1 is much smaller than kþ2 and k2 , we have the following situation when
kþ1 0 is also small (i.e., at low l ¼ lo ):
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
½kþ2 þ k2 G ½kþ2 þ k2 2 4½kþ1 lk2 þ kþ2 kþ1 l þ k1 kþ2
l1; 2 ¼ ð49Þ
2
Again, under the conditions chosen, the second term under the square root sign
is small compared to the first and to a first approximation
Fig. 4.13. Dependence of observed rate constant for large k1 (fast first step).
k1 kþ2
kþ1 l þ ð51Þ
kþ2 þ k2
k1 k1
kþ1 l þ ¼ kþ1 l þ ¼ kobs ð52Þ
k2 1
1þ 1þ
kþ2 K2
(i.e., the rate constant for production of E L can never be greater than the sum of
kþ2 and k2 . This essentially simplifies the system to be analogous to the situation
discussed previously for the rate of equilibration of A T B).
Fig. 4.15. Pre-isomerization dependence of k obs on ligand concentration for large k1 .
E Ð E þ L Ð E L
If E is mainly in the form without a star in the absence of the ligand, then k1
must be much larger than kþ1 .
If we also assume that k1 g k2 , the equation simplifies to:
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
½k1 þ kþ2 l G ½kþ1 þ kþ2 l 2 4½kþ2 lkþ1 þ k1 k2
l1; 2 ¼ ð55Þ
2
Again, the second term under the square root sign is much smaller than the first,
so that:
or
The first term tends to 0 as l tends to 0 and to kþ1 as l tends to y. The second term
tends to k2 as l tends to 0 and to 0 as l tends to y. The corresponding plot is
shown in Fig. 4.15.
References
37 Jakob, U., Scheibel, T., Bose, S., Proteins. J. Biol. Chem. 278, 25970–
Reinstein, J., & Buchner, J. (1996). 25976.
Assessment of the ATP Binding 47 Vale, R. D. (2000). AAA proteins.
Properties of Hsp90. J. Biol. Chem. Lords of the ring. J. Cell Biol. 150,
271, 10035–10041. F13–F19.
38 Scheibel, T., Neuhofen, S., Weikl, 48 Ogura, T. & Wilkinson, A. J. (2001).
T., Mayr, C., Reinstein, J., Vogel, AAAþ superfamily ATPases: common
P. D., & Buchner, J. (1997). ATP- structure–diverse function. Genes Cells
binding Properties of Human Hsp90. 6, 575–597.
J. Biol. Chem. 272, 18608–18613. 49 Neuwald, A. F., Aravind, L., Spouge,
39 Bauer, M., Baumann, J., & Trommer, J. L., & Koonin, E. V. (1999). AAAþ:
W. E. (1992). ATP binding to bovine A class of chaperone-like ATPases
serum albumin. FEBS Lett. 313, 288– associated with the assembly, opera-
290. tion, and disassembly of protein com-
40 Prodromou, C., Roe, S. M., O’Brien, plexes. Genome Res. 9, 27–43.
R., Ladbury, J. E., Piper, P. W., & 50 Parsell, D. A., Kowal, A. S., &
Pearl, L. H. (1997). Identification Lindquist, S. (1994). Saccharomyces
and Structural Characterization of cerevisiae Hsp104 Protein: Purification
the ATP/ADP-Binding Site in the and characterization of ATP-induced
HSP90 Molecular Chaperone. Cell 90, structural changes. J. Biol. Chem. 269,
65–75. 4480–4487.
41 Stebbins, C. E., Russo, A. A., 51 Zolkiewski, M., Kessel, M.,
Schneider, C., Rosen, N., Hartl, Ginsburg, A., & Maurizi, M. R.
F. U., & Pavletich, N. P. (1997). Crys- (1999). Nucleotide-dependent oligo-
tal structure of an Hsp90-geldanamy- merization of ClpB from Escherichia
cin complex: targeting of a protein coli. Protein Sci. 8, 1899–1903.
chaperone by an antitumor agent. 52 Glover, J. R. & Lindquist, S. (1998).
Cell 89, 239–250. Hsp104, Hsp70, and Hsp40: A Novel
42 Buchner, J. (1996). Supervising the Chaperone System that Rescues
fold: functional principles of molecu- Previously Aggregated Proteins.
lar chaperones. FASEB Journal 10, Cell 94, 73–82.
10–19. 53 Motohashi, K., Watanabe, Y.,
43 Wegele, H., Muschler, P., Bunck, Yohda, M., & Yoshida, M. (1999).
M., Reinstein, J., & Buchner, J. Heat-inactivated proteins are rescued
(2003). Dissection of the contribution by the DnaKJ-GrpE set and ClpB
of individual domains to the ATPase chaperones. Proc. Natl. Acad. Sci.
mechanism of Hsp90. J. Biol. Chem. U.S.A. 96, 7184–7189.
78, 39303–39310. 54 Parsell, D. A., Kowal, A. S., Singer,
44 Richter, K., Reinstein, J., & M. A., & Lindquist, S. (1994). Protein
Buchner, J. (2002). N-terminal disaggregation mediated by heat-
residues regulate the catalytic effi- shock protein Hsp104. Nature 372,
ciency of the Hsp90 ATPase cycle. 475–478.
J. Biol. Chem. 277, 44905–44910. 55 Goloubinoff, P., Mogk, A., Zvi, A.
45 Richter, K., Muschler, P., Hainzl, P., Tomoyasu, T., & Bukau, B. (1999).
O., Reinstein, J., & Buchner, J. Sequential mechanism of solubiliza-
(2003). Sti1 Is a Non-competitive tion and refolding of stable protein
Inhibitor of the Hsp90 ATPase. aggregates by a bichaperone network.
Binding prevents the N-terminal Proc. Natl. Acad. Sci. U.S.A. 96, 13732–
dimerization reaction during the 13737.
ATPase cycle. J. Biol. Chem. 278, 56 Mogk, A., Tomoyasu, T.,
10328–10333. Goloubinoff, P., Rüdiger, S.,
46 Wegele, H., Haslbeck, M., Röder, D., Langen, H., & Bukau, B.
Reinstein, J., & Buchner, J. (2003). (1999). Identification of thermolabile
Sti1 Is a Novel Activator of the Ssa Escherichia coli proteins: prevention
References 159
5
Analysis of Chaperone Function in Vitro
5.1
Introduction
Protein synthesis
Transient unfolding
Cell stress
ATP
ADP
+ Pi
Fig. 5.1. Model for assisted protein folding by (bottom, blue). They bind to nonnative poly-
molecular chaperones. Several processes peptides and thus prevent their aggregation.
generate unfolded polypeptide chains (red), Further, they may alter the conformation of the
which can fold via partially structured inter- bound polypeptide so that once it dissociates,
mediates (orange, green) to the native, bio- it is less prone to aggregation. Often, release
logically active state (right). Depending on the is triggered by a change in the nucleotide state
conditions and the nature of the polypeptide, of the chaperone or by ATP hydrolysis. Recently,
a significant fraction of the molecules may it was shown that some chaperones of the
form aggregates (top). The yield of cellular Hsp100 class such as ClpB and Hsp104 are
protein folding is significantly improved by able to reverse aggregation (top).
the action of general molecular chaperones
constitutively expressed [12], and some of them are essential for viability [13, 14].
This finding suggests that these helper proteins are of general importance for the
folding and assembly of newly synthesized polypeptide chains. Furthermore, the
observation of strong aggregation of overexpressed proteins under otherwise phys-
iological conditions is consistent with the notion of a balance between protein
synthesis/folding and chaperone expression, which has been developed through
evolution and can be modulated only within a certain range.
At first glance, the concept outlined above is in conflict with the paradigm of au-
tonomous, spontaneous protein folding established by Anfinsen and many others
[15, 16]. The apparent contradiction is that if the acquisition of the unique three-
dimensional structure of a protein is governed by its amino acid sequence and the
resulting interactions between amino acid side chains, there should be no need for
molecular chaperones (Figure 5.1). However, additional factors have to be taken
into account. The most important is the possibility of unspecific intermolecular in-
teractions during the folding process of proteins [17, 18]. This reaction depends on
the concentration of unfolded or partially folded polypeptide chains and their fold-
ing kinetics. Thus, either conditions under which proteins fold slowly or high con-
centrations of folding proteins will favor aggregation [19]. In consequence, the
fraction of molecules reaching the native state will decrease. The basic function of
molecular chaperones is to counteract this nonproductive side reaction and thereby
Hsp70 GroEL Hsp90 sHsps Hsp100
Fig. 5.2. Classes of general molecular Hsp90 chaperone consists of three domains.
chaperones. From left to right: Hsp70 is a While the C-terminal domains are responsible
monomeric chaperone consisting of an ATPase for dimerization, the N-terminal domains bind
domain and a peptide-binding domain. The and hydrolyze ATP. The small Hsps form large
bacterial chaperone GroEL forms two rings oligomeric structures that contain 24–32
comprised of seven subunits each. In the identical subunits. They are the only general
cross-section shown here, two subunits of each chaperones without ATPase function. Chaper-
ring are depicted. Interactions between the ones of the Hsp100 class form ring-shaped
subunits are mediated by the equatorial homohexamers. Each subunit contains two
domains in the center of the molecule, which similar, but not identical, ATPase sites.
also contain the ATPase site. The dimeric
increase the efficiency of folding (Figure 5.1). Molecular chaperones do not provide
steric information for the folding process and thus do not violate the concept of
autonomous folding. (The information for folding is encoded solely in the amino
acid sequence.) Rather, molecular chaperones assist the spontaneous folding of
proteins. Interestingly, the principal possibility of assisted folding had already
been hypothesized by Anfinsen in 1963 [20].
It is difficult to assess what fraction of the cellular proteome depends on molec-
ular chaperones to reach its active conformation. But clearly not all newly synthe-
sized proteins need their assistance. While for the chaperone role model GroE (see
Chapter 20) it was shown that 40% of the E. coli proteins can be influenced in their
folding [21], in the cell only about 10–15% may need this assistance under physio-
logical conditions [22, 23]. Similar values were reported for other chaperones [24,
25]. Under conditions where proteins are destabilized (such as heat shock), the re-
quirement for chaperone assistance is clearly increased.
There are only a few classes of general molecular chaperones (Figure 5.2) that
have to take care of a large number of different proteins, implying that chaperones
have to be promiscuous in their interaction with nonnative proteins, i.e., they must
be able to recognize and handle a large variety of different proteins. With the ex-
ception of small Hsps, all of them possess an intrinsic ATPase activity (Table 5.1).
ATP-dependent structural changes in molecular chaperones are essential for the
conformational processing of client proteins. While some basic functional features
seem to be conserved between different classes of chaperones, the structure and
mode of action are completely different among individual chaperones.
5.2
Basic Functional Principles of Molecular Chaperones
Representatives GroEL (eubacteria) DnaK (E. coli) HtpG (E. coli) Hsp104 (S. cerevisiae) Hsp16.5 (M. jannashii)
CCT/TRiC Ssa1 (S. cerevisiae) Hsp82 (S. cerevisiae) ClpB (bacteria) Hsp26 (S. cerevisiae)
(eukaryotes) Hsp72 (higher Hsp90 (human) ClpA (bacteria) a-crystallin
eukaryotes)
Structure: PDB 1AON (GroEL) [40] 1DKZ (DnaK 1AM1 (Hsp82 1QVR (ClpB T. 1SHS (Hsp16.5) [129]
accession number 1A6D (thermosome) peptide-binding ATPase domain) thermophilus) [128] 1GME (Hsp16.9) [151]
[125] domain) [33] [126]
1S3X(Hsp72 ATPase 1HK7 (Hsp82
domain) [150] middle domain)
[127]
Molecular mass of 57 (GroEL) 69 (DnaK) 81 (Hsp82) 102 (Hsp104) 16.5 (Hsp16.5)
the monomer 57–60 (human CCT-a) 72 (human Hsp72) 85 (human Hsp90a) 96 (ClpB) 24 (Hsp26)
(kDa) 84 (ClpA) 20 (bovine a-crystallin)
Co-chaperones GroES (E. coli) Hsp40 Hop/Sti1 Hsp70/Hsp40 Not known
Prefoldin (human) GrpE (E. coli) p23/Sba1 (Hsp104)
Hop/Sti1 Cdc37 DnaK/DnaJ/GrpE
(ClpB)
Quaternary structure Homo 14mer (GroEL) Monomer1 Dimer Dynamic hexamer Dynamic 24mer
Hetero 16mer (Hsp26)
(archaea/ 32mer (a-crystallin)
eukaryotes)
ATPases per subunit 1 1 1 22 –
ATPase rate per 3 (GroEL) [67] 0.02–0.04 (DnaK) 1.0 (Hsp82) [54] 70 (Hsp104) [93] –
subunit (min1 ) 1–7 (bovine CCT) [131] 0.05 (Hsp90) [60] 10–30 (ClpB) [133]
[130] 0.04 (Ssa1) [132]
TM ( C) 67 (GroEL) [134] 50 (Ssa1)3 62 (Hsp82)3 48 (Hsp104)3 75 (Hsp26) [135]
5.2 Basic Functional Principles of Molecular Chaperones
chaperones, applying this concept is rather difficult for three reasons. First, though
we currently lack a detailed understanding of the mechanisms of protein folding,
especially in the context of a living cell, we know it is a rather complex process
involving multiple pathways, intermediates, and potential side reactions. As a con-
sequence, molecular chaperones can interfere with and assist protein folding in
many different ways, and in most cases our knowledge of how chaperones influ-
ence protein folding on a molecular level is very limited. A second problem arises
from the heterogeneity of the substrates. As already mentioned, the in vivo sub-
strates of most chaperones are not known in detail. However, it is clear that
they include a substantial fraction of the proteome. Finally, it is very difficult to as-
sess which conformational state(s) of these client proteins interact with a certain
chaperone. In classic enzymology, reaction intermediates can often be trapped by
changing the solvent conditions or temperature. Due to their low intrinsic stability,
this is in general not possible for nonnative proteins. Here, a change in solvent
conditions will almost always result in a change in the conformational state of the
client protein. Under ‘‘folding’’ conditions, the nonnative protein will continue to
‘‘react’’ until a final state (e.g., native structure or aggregate) is reached. Therefore,
great care must be taken when the structure of these intermediates is studied.
Otherwise, the result will most likely not reflect the conformation of the protein
at the time the sample was taken [26–29].
In the following sections, we present key features that many chaperones have in
common and that we consider as the functionally most relevant.
5.2.1
Recognition of Nonnative Proteins
The most fascinating trait of a molecular chaperone is its ability to discriminate be-
tween the folded and nonnative states of a protein. It would be deleterious for the
cell if chaperones would bind to native proteins or fail to recognize the accumula-
tion of the nonnative protein species. Thus, the promiscuous general chaperones
have to use a property for recognition and binding that is common to all nonnative
proteins. This is the exposure of hydrophobic amino acid side chains, the hallmark
of unfolded or partially folded protein structures. In the unfolded state, extended
stretches of polypeptide chains containing hydrophobic residues can be recognized
by chaperones. This is well documented for proteins of the Hsp70 family [30, 31].
In partially folded proteins, these residues are often clustered in areas that are part
of a domain/subunit interface in the folded protein, and they will become buried
in subsequent folding/association events. These regions can thus be utilized by
molecular chaperones to identify potential client polypeptides, i.e., to distinguish
a nonnative conformation from a native protein. The respective binding site of the
chaperone usually contains a number of hydrophobic residues that interact with
the substrate [32, 33]. The low specificity of hydrophobic interactions is the basis
for the substrate promiscuity of molecular chaperones. Since the extended expo-
sure of hydrophobic surfaces in an unfolded or partially folded state is an impor-
tant determinant for its probability to aggregate, the shielding of these surfaces
upon binding to a chaperone is a very efficient means to prevent aggregation.
5.2 Basic Functional Principles of Molecular Chaperones 167
5.2.2
Induction of Conformational Changes in the Substrate
‘‘buffering capacity’’ appears not to be the primary task. This is especially true for
the ATP-consuming chaperones. An increasing amount of evidence suggests that
the energy provided by ATP hydrolysis is used to induce conformational changes
in the bound polypeptide. For the ClpA chaperone, Horwich and coworkers dem-
onstrated that it can actively unfold ssrA-tagged, native GFP [47]. In this case, ATP
hydrolysis is required to overcome the free energy of folding. In the case of the
Hsp90 system, an immature conformation of a bound steroid hormone receptor
(a client protein) is converted into a conformation that is capable of binding the
hormone ligand [48, 49]. ATP binding and hydrolysis are essential for this process,
as they seem to control the association of the chaperone with its various cofactors
[50] (see also Chapter 23). During the whole sequence of reactions, the receptor is
assumed to remain bound to the Hsp90 chaperone.
For the general chaperones, we know that conformational changes occur in the
chaperone upon both ATP binding and hydrolysis [51–54]. However, much less is
known about how and to what extent these changes result in conformational rear-
rangements in substrate proteins associated with the chaperone. As mentioned
above, this is mainly because the species involved are often only transiently popu-
lated and very labile and thus are not easily susceptible to methods of structure
determination. Also, the bound target protein may represent an ensemble of con-
formations [39].
Two mechanisms can be envisaged as to how a chaperone causes a structural
change in a client protein. In both cases, a conformational change in the chaper-
one is induced by the binding of a ligand (ATP or a cofactor) or by ATP hydrolysis.
This in turn results in a change in the nature/geometry of the substrate-binding
site that weakens the interaction with the bound polypeptide. As the system
ligand/chaperone tends to assume a state of minimum free energy, the substrate
has two choices: (1) it may undergo structural rearrangements to adopt a confor-
mation that has a higher affinity towards the binding site or (2) it may start to
fold. Both processes will lower the free energy of the system, but in the first case
the energy drop primarily results from interactions between chaperone and sub-
strate, whereas in the second case it results solely from interactions within the
client polypeptide. Although these rearrangements are driven by thermodynamics,
kinetics may be important as well in defining whether binding or folding domi-
nates for a given polypeptide conformation [55–57].
An alternative to this ‘‘passive’’ model is that the conformational change in the
chaperone generates a ‘‘power stroke’’ that is used to perform work on the sub-
strate, e.g., by moving apart two sections of the bound protein. If the substrate is
a single polypeptide, this could result in a forced unfolding reaction [58]. If the
substrate is an oligomeric species, e.g., a small aggregate, the power stroke may
cause its dissociation into individual molecules. This ‘‘active’’ mechanism requires
that the substrate remains associated with the chaperone during the conforma-
tional switch, which is not required in the first model. After the power stroke, the
substrate will probably undergo a series of conformational relaxation reactions.
For both models, the concept of general chaperones requires that the conforma-
tional change in the substrate be encoded by its amino acid sequence rather than
5.2 Basic Functional Principles of Molecular Chaperones 169
being forced onto the substrate by the chaperone. The chaperone merely enables
the polypeptide to use its inherent folding information.
5.2.3
Energy Consumption and Regulation of Chaperone Function
Most of the general chaperones require ATP to carry out their task (Table 5.1). Ex-
amples of ATP-hydrolyzing chaperones are GroEL; the Hsp104/ClpB proteins,
which participate in the dissociation of protein aggregates; Hsp90; and Hp70,
which is involved in a large variety of reactions ranging from the general folding
of nascent polypeptide chains to specific functions. As stated above, the cycle of
ATP binding and hydrolysis drives an iterative sequence of conformational changes
in the chaperone, which may induce structural rearrangements in a bound poly-
peptide or trigger binding/release of client proteins and co-chaperones. In contrast
to the conformational changes that occur within both the chaperone and a bound
substrate protein, the ATPase activity of molecular chaperones is experimentally
very easily accessible. Because of its crucial functional importance, ATP turnover
is usually modulated by both the substrate and helper chaperones.
Stimulation of ATP hydrolysis in the presence of unfolded proteins or peptides
has been observed, e.g., for Hsp70, Hsp90, and Hsp104 [30, 59–61]. Apparently,
the binding of the substrate induces a conformational change in these chaperones
that results in a state with a higher intrinsic ATPase activity. This ‘‘activation on
demand’’ may serve as a means to reduce the energy consumption by the empty
chaperones. In contrast, ATP hydrolysis by GroEL is only marginally affected by
the presence of protein substrates.
A well-characterized example of the regulation of ATP hydrolysis by co-chaperones
is the DnaK system of E. coli (see Chapter 15). The ATPase cycle, and thus the cycle
of substrate binding and release, is governed by two potential rate-limiting steps:
the hydrolysis reaction and the subsequent exchange of ADP by ATP [62]. Both
steps can be accelerated by specific cofactors, namely DnaJ and GrpE [59, 63, 64].
The rates of nucleotide exchange and hydrolysis determine how fast a substrate is
processed and how long it is on average associated with the ADP and ATP states of
DnaK, respectively.
While Hsp70 is subject mainly to positive regulation, other chaperone systems
appear to include negative regulators as well. Hsp90, which plays an essential role
in the activation of many regulatory proteins such as hormone receptors and
kinases, is controlled by a large number of cofactors (see Chapter 23). Two of these
co-chaperones, p23 and Hop/Sti1, were found to decrease the ATPase activity by
stabilizing specific conformational states of Hsp90 [50, 65]. Intriguingly, Hop/Sti1
from S. cerevisiae seems to have a dual regulatory function: besides its inhibitory
effect on Hsp90, Sti1 was also shown to be a very potent activator of Hsp70 [66].
Another example of a negative regulator is GroES, which reduces the ATP hydroly-
sis of its partner protein GroEL to 10–50% depending on the conditions used [67,
68]. The role of GroES in the functional cycle of the GroE chaperone, however,
goes far beyond its modulation of ATP turnover, as it is essential for the encapsu-
170 5 Analysis of Chaperone Function in Vitro
lation of a polypeptide in the folding compartment of the GroE particle (see Chap-
ter 20). The example of GroES is instructive, as it demonstrates that regulation of
ATP turnover may not be the most important function of a cofactor. Nevertheless,
the combination of experimental simplicity and functional relevance makes the
ATPase of a general chaperone a preferred choice for mechanistic studies.
Some chaperones are regulated by temperature. Hsp26, a small heat shock pro-
tein from S. cerevisiae, has no chaperone activity at 30 C, the physiologic tempera-
ture for yeast. At this temperature, Hsp26 forms hollow spherical particles consist-
ing of 24 subunits [69]. Under heat stress, e.g., at 42 C, these large complexes
dissociate into dimers that exhibit chaperone activity and bind to nonnative poly-
peptides. Another interesting example of temperature regulation is the hexameric
chaperone/protease degP of the E. coli periplasm [70]. At low temperature the pro-
tein acts as a chaperone, whereas at elevated temperature it possesses proteolytic
activity (see Chapter 28). The regulatory switch is thought to involve the reorienta-
tion of inhibitory domains, which block access to the active site of the protease at
low temperature [71]. Interestingly, switching can also be achieved by the addition
of certain peptides.
A further variation of the scheme of activating chaperones by stress conditions is
the regulation of Hsp33. Here, oxidative stress leads to the formation of disulfide
bonds within the chaperone and its subsequent dimerization, which is a prerequi-
site for efficient substrate binding [72] (see also Chapter 19).
5.3
Limits and Extensions of the Chaperone Concept
The term chaperone and the underlying concept are attractive. Therefore, it is not
surprising that the term itself has been proved to be rather promiscuous, as it is
increasingly used for different proteins and even non-proteinaceous molecules. In
its most basic interpretation, it has been applied to all proteins that have an effect
on the aggregation of another protein or that influence the yield of correct folding.
Most of these proteins have hydrophobic patches on their surface and thus, irre-
spective of their biological function, will associate with nonnative proteins. How-
ever, to qualify for a molecular chaperone, additional requirements should be met.
This includes the defined stoichiometry of the binding reaction and the regulated
release of the bound polypeptide (e.g., triggered by binding of ATP).
An extension of the term chaperone are the so-called intramolecular chaperones.
These are regions of proteins that are required for their productive folding without
being part of the mature protein because they are cleaved off after folding is com-
plete. Well-investigated examples include the pro-regions of subtilisin E and a-lytic
protease [73, 74]. The variation of the chaperone concept is that an intramolecular
chaperone is specific for one protein and is covalently attached to its target. Fur-
thermore, these chaperones act only once: after completing their job, they are
cleaved off and degraded. These properties do not agree with the basic definition
of a chaperone.
Chemical chaperone is a term that has become fashionable in recent years. It re-
fers to small molecules (not proteins or peptides) that are added to refolding reac-
5.3 Limits and Extensions of the Chaperone Concept 171
tions to increase the efficiency of protein folding in vitro. For instance, the folding
yields for many enzymes can be improved by adding a substrate to the refolding
buffer [75–77]. These compounds are thought to stabilize critical folding inter-
mediates by binding to the active site, thereby enhancing folding. Besides these
rather specific ligands, a number of substances, the most prominent of them being
Tris and arginine, were found to promote the yield of folding in many cases when
present in large excess over the folding proteins (see Chapter 38). In contrast to
molecular chaperones, both chemical chaperones and intramolecular chaperones
cannot be regulated in their activity.
5.3.1
Co-chaperones
5.3.2
Specific Chaperones
In addition to the general chaperones, a number of proteins exist for which the ba-
sic definition of chaperone function applies. However, these chaperones are dedi-
cated to one or a few target proteins. In these cases, recognition is highly specific
and does not necessarily involve hydrophobic interactions. An example for this
class of chaperones is Hsp47, a resident protein of the endoplasmic reticulum
that is a dedicated folding factor for collagen [83]. It interacts with immature colla-
gen, is required for its correct folding and association, and is not part of the final
collagen trimer. Thus, all the criteria for a molecular chaperone are fulfilled. Colla-
172 5 Analysis of Chaperone Function in Vitro
gen release is not triggered by ATP hydrolysis but by changes in the pH upon tran-
sit from the ER to the Golgi [84, 85].
Another well-studied example is the pili assembly proteins of gram-negative bac-
teria (see Chapter 29). Here, unassembled pili subunits are bound by specific chap-
erones that prevent their premature association in the periplasmic space. These
chaperones bind very specifically and can be considered as surrogate subunits.
5.4
Working with Molecular Chaperones
5.4.1
Natural versus Artificial Substrate Proteins
5.4.2
Stability of Chaperones
with a large variety of substrate proteins. For the reasons given, it is therefore plau-
sible that chaperones cannot be extremely stable and rigid proteins (Table 5.1).
This imposes certain restrictions on the design of in vitro chaperone assays (see
experimental Section).
5.5
Assays to Assess and Characterize Chaperone Function
The first in vitro assays for a molecular chaperone using purified components were
presented by Lorimer and coworkers in 1989 [86]. They analyzed and dissected the
function of the two components of the GroE system (GroEL and GroES) using un-
folded bacterial Rubisco as a substrate protein. The methods employed included
activity assays to monitor the refolding to the native state, CD spectroscopy to ana-
lyze the conformation of the nonnative substrate protein at the beginning of the
reaction, and native PAGE to demonstrate a direct physical interaction between
the chaperone and Rubisco. This landmark paper became a role model for numer-
ous studies to follow. Among other things, it showed that artificially denatured
proteins can serve as substrates for molecular chaperones and that they can be re-
folded in a strictly ATP-dependent manner. In many cases, the increase in the yield
of refolded protein is due to the suppression of aggregation during refolding. This
was first shown for GroE using light scattering [87].
5.5.1
Generating Nonnative Conformations of Proteins
5.5.2
Aggregation Assays
Since the prevention of aggregation is one of the key biological functions of gen-
eral chaperones, this test can be considered an ‘‘activity assay.’’ In the simplest
case, a temperature-labile protein, such as citrate synthase, is incubated at a tem-
5.5 Assays to Assess and Characterize Chaperone Function 175
5.5.3
Detection of Complexes Between Chaperone and Substrate
5.5.4
Refolding of Denatured Substrates
This type of assay is more complex than the ones described above because (1) it
requires some knowledge about the mechanism of substrate release and (2) suc-
cessful refolding may depend on the presence of (co-)chaperones or other factors.
The most convenient refolding assays use test substrates whose enzymatic activity
can be measured without interference from the chaperone. The use of classical
spectroscopic probes for folding such as fluorescence or far-UV CD may be difficult
176 5 Analysis of Chaperone Function in Vitro
because of the high background signal from the chaperone and (if present) ATP.
However, some chaperones such as GroEL, GroES, and Hsp104 lack intrinsic tryp-
tophans, and in these cases one may use fluorescence to directly monitor changes
in the structure of Trp-containing test proteins.
The general procedure for a refolding assay is as follows. First, a complex be-
tween the chaperone and the test substrate is formed (see Section 5.5.3). Usually,
it is not necessary to remove molecules that have not bound to the chaperone. Af-
terwards, substrate release and refolding are triggered by adding the appropriate
components, e.g., GroES and ATP in the case of GroEL. At various time points,
aliquots are withdrawn from the sample and the enzyme activity is determined.
5.5.5
ATPase Activity and Effect of Substrate and Cofactors
Many general chaperones have an intrinsic ATPase activity, though their rates of
hydrolysis can differ significantly (Table 5.1). ATP hydrolysis by a chaperone can
be measured using several assay systems (see Section 5.6). Once the basal ATPase
activity has been determined, the effect of substrates and cofactors in various com-
binations can be tested. Concerning the substrate, a polypeptide is preferred that
adopts a stable, nonnative conformation. Otherwise, its conformation and thus its
effect on ATP hydrolysis may change during the experiment. Alternatively, a small
peptide may be used.
A more extensive characterization of the ATPase may include the determination
of enzymatic parameters such as KM or kcat, in both the absence and the presence
of effectors (see Chapter 4). However, one should be aware that in the case of oligo-
meric chaperones, a detailed analysis can be very challenging due to homotropic
and heterotropic allosteric effects [91].
5.6
Experimental Protocols
5.6.1
General Considerations
Rhodanese (Bovine Liver) Rhodanese from liver mitochondria is a small (33 kDa)
monomeric enzyme that catalyzes sulfur transfer reactions [95]. It is commercially
178 5 Analysis of Chaperone Function in Vitro
available from several sources, but batches may differ in the degree of purity. The
structure of rhodanese has been determined [96]. The reactivation of rhodanese
is dependent on chaperones [97], although spontaneous refolding has been ob-
served in the presence of lauryl maltoside, b-mercaptoethanol, and thiosulfate
[75]. Rhodanese has been extensively used over the years to study assisted protein
folding. One disadvantage of rhodanese is that the activity assay uses the toxic
compound KCN.
Alpha-lactalbumin is a small protein (14.2 kDa) from bovine milk and can be
purchased from a number of suppliers. Its conformational stability is highly de-
pendent on the presence of the four disulfide bridges. To obtain a permanently un-
folded version of lactalbumin, the disulfides first have to be disrupted by DTT in
the presence of GdmCl (Protocol 1). This unfolded conformation can be main-
tained by irreversible alkylation of the eight cysteine residues with iodoacetic acid.
Compared with unmodified lactalbumin, this derivative carries eight additional
negative charges that increase its solubility but may interfere with the binding to
certain chaperones. An alternative form can be obtained by using iodoacetamide
in the alkylation step. The resulting derivative carries no additional charges but is
prone to aggregation at high concentration.
5.6.2
Suppression of Aggregation
The simplest type of activity test for a chaperone determines its capability to pre-
vent the aggregation of a model substrate. An unfolded conformation of the test
substrate is first prepared. Then the aggregation of this protein is measured, in
both the presence and the absence of the potential chaperone. A number of con-
trols have to be carried out to verify that the chaperone does not aggregate itself
and that the buffer in which the chaperone is stored does not change aggregation.
The process of aggregation is usually monitored by the increase in light scattering.
For detection, a thermostated photometer or fluorescence spectrometer is required.
Aggregates can be monitored at wavelengths between 320 nm and 500 nm (no in-
terference from aromatic residues) depending on the size of the aggregates. The
scattering intensity increases with decreasing wavelengths. Since there is no linear
relationship between the extent of aggregation and the light-scattering signal, we
advise against using a percent scale when plotting aggregation data.
Protocol 2, in which the aggregation of CS at 43 C is measured, can be per-
formed only for chaperones that are stable at this temperature. If this is not the
case, chemical denaturants have to be used to destabilize the protein whose aggre-
gation will be monitored. Two different approaches can be applied. In the cases of
insulin (Protocol 3) and alpha-lactalbumin (Protocol 4), aggregation can be induced
180 5 Analysis of Chaperone Function in Vitro
by the addition of a reducing agent such as DTT. The B chain of reduced insulin
will start to aggregate within minutes at room temperature [113]. For lactalbumin,
efficient aggregation requires reduction at higher temperatures. Additionally, ag-
gregation can be induced by diluting a concentrated solution of an unfolded test
protein, e.g., in GdmCl or urea, into buffer without denaturant. In the absence of
chaperones, the polypeptide will aggregate. This process can be enhanced by in-
creasing the temperature.
As a rule of thumb, complete suppression of aggregation requires that the
amount of chaperone – or, more precisely, the concentration of protein-binding
sites – be at least as high as that of the aggregating polypeptide. Thus, chaperones
act stoichiometrically and not catalytically. As some of the assays, especially the
ones using insulin and alpha-lactalbumin, require a high concentration of the test
protein, the consumption of chaperone can be considerable. Secondly, the interac-
tion between the chaperone and the polypeptide has to be tight (see Figure 5.3, left
panel). Otherwise, the irreversible process of aggregation will eventually pull all
polypeptide molecules from the chaperone. In this case, a deceleration of the
aggregation process may be observed in the presence of the chaperone, while the
final amplitude remains unchanged (Figure 5.3, middle panel).
It may be worthwhile to check whether the test protein is indeed associated
with the chaperone at the end of the aggregation assay by employing one of the
methods described in the following section. If a refolding assay for the test protein
is available, one can also examine whether the protein can be recovered from the
chaperone complexes by transferring the sample to refolding conditions, e.g., by
lowering the temperature (see Protocol 7). Finally, one should note that a positive
result from an aggregation assay alone is not sufficient to classify the protein under
investigation as a molecular chaperone.
500
400
300
200
0
0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30
time (min) time (min) time (min)
Fig. 5.3. Aggregation assays for three the rate but not the extent of aggregation,
chaperones using citrate synthase as a test indicating a transient interaction with CS.
substrate. The aggregation of 0.15 mM citrate (Right) In contrast, 0.2 mM Hsp104 (e) is not
synthase (a) at 43 C was followed by light able to suppress the aggregation of citrate
scattering at 360 nm in a fluorescence synthase. In comparison to the reaction in the
spectrometer. (Left) In the presence of 0.2 mM absence of the chaperone, the extent of
GroEL (e), complete suppression of aggrega- aggregation is even increased since Hsp104 is
tion is observed. (Middle) Unlike GroEL, not stable at this temperature.
Hsp90 (0.1 mM m and 0.2 mM C) decreases
5.6 Experimental Protocols
181
182 5 Analysis of Chaperone Function in Vitro
7. Start a new experiment in which some of the buffer in the sample cuvette is
replaced by a stock solution of your chaperone before the addition of CS; use
the following chaperone concentrations: 0.1, 0.2, and 0.5 mM (the final concen-
tration of CS in the cuvette is 0.15 mM).
8. To test the influence of the buffer in which the chaperone is stored, carry out a
set of control experiments in which the chaperone is replaced by equal volumes
of this buffer.
9. If an absorbance photometer is used, concentrations may have to be increased.
Set the wavelength to 360 nm.
Protocol 3: Aggregation of Insulin B Chain In the presence of DTT, the three disul-
fide bridges within the insulin molecule are broken, and the two chains dissociate.
While the A chain remains soluble, the B chain will readily start to aggregate [113,
114]. This process can be suppressed by molecular chaperones. A major disadvan-
tage of this assay is that it consumes substantial amounts of chaperone, of which
a highly concentrated stock solution is needed. The assay can be performed at dif-
ferent temperatures [69] and salt concentrations, but note that the extent of aggre-
gation is very sensitive to changes in pH.
The kinetics of the aggregation reaction depends on the DTT concentration and
is considerably faster at higher temperature. At 25 C, complete aggregation may
take several hours.
5.6.3
Complex Formation between Chaperones and Polypeptide Substrates
5.6.4
Identification of Chaperone-binding Sites
2.5
1.5
1.0
free RCMLA
0.5
0.0
0 10 20 30 40 50 60
time (min)
Fig. 5.4. Size-exclusion chromatography of a plexes were detected by following the fluores-
complex between GroEL and RCMLA (dashed cence of the labeled substrate. In a control
line). Complexes were formed by incubating experiment, GroEL alone was injected (solid
3 mM RCMLA (labeled with tetramethyl- line). Note that the position of the GroEL
rhodamin) with 2 mM GroEL14 for 5 min at peak at 18 min is not significantly shifted
30 C in 50 mM HEPES/KOH pH 7.5, 150 mM upon binding of RCMLA, as the apparent
KCl, 10 mM MgCl2 . 100 mL of this mixture molecular mass increases only marginally from
was applied to a Superdex-200 HR 10/30 @800 kDa to @820 kDa.
column equilibrated in the same buffer. Com-
of overlapping synthetic peptides that represent the entire primary structure of the
respective proteins. The peptides are immobilized on filters. This allows a corre-
lation between sequence and signals. Binding is usually detected by antibodies
against the respective chaperone in a Western blot–like manner. This technique
has been successfully applied for several chaperones and target proteins [120, 121].
A caveat is that these approaches will provide information on all potential bind-
ing sites in a protein. Which of these will actually be used in vivo may depend on
the kinetics of folding or on the degree of unfolding under stress conditions, simi-
lar to N-glycosylation sites in proteins. Moreover, this technique is unable to detect
binding sites that are formed by residues distant in sequence but close in space,
e.g., in a partially folded intermediate. Thus, its value may be restricted to identify-
ing binding sites for chaperones such as Hsp70, which are known to bind consec-
utive stretches of amino acids.
Identifying the binding site(s) for nonnative proteins in chaperones is even more
challenging. Here, chemical cross-linking of chaperone-substrate complexes and
subsequent analysis of the site of interaction by protease digestion and mass spec-
trometry is the method of choice. Mutational analyses using deletion fragments to
identify protein-binding sites should be considered with great caution because they
have repeatedly yielded false or contradictory results. However, after a potential
binding site has been identified biochemically, site-specific mutagenesis of resi-
dues in the respective region may be employed to further corroborate the result.
186 5 Analysis of Chaperone Function in Vitro
40
30
20
10
0 20 40 60
time (min)
Fig. 5.5. GroE-assisted reactivation of absence of chaperones (i). At the indicated
rhodanese. Urea-denatured rhodanese (0.5 time points, aliquots were withdrawn from the
mM) was refolded in the presence of 1 mM refolding reaction, and the rhodanese activity
GroEL/2 mM GroES/2 mM ATP (m) or in the was determined as described in Protocol 7.
5.6.5
Chaperone-mediated Refolding of Test Proteins
Caution: KCN is toxic. Work under a fume hood and wear gloves at all times.
1. Premix: 9% (v/v) formaldehyde.
2. Staining solution: 16% (w/v) Fe(NO3 )3 , 1 M HNO3 .
3. Prepare the number of activity tests tubes you will require (one for every time
point of the refolding kinetics): 24 mL KH2 PO4 (1 M), 12 mL EDTA (0.5 M, pH
9), 30 mL Na2 S2 O3 (1 M), 30 mL KCN (1 M), 480 mL H2 O, total volume 576 mL
each.
4. For each activity test tube, prepare six disposable cuvettes with 575 mL of pre-
mix solution.
5. Incubate test tubes at 25 C.
6. Start the activity test by adding 24 mL of rhodanese-containing solution from
the refolding reaction to the test tube. The final conditions of the activity tests
are: 25 C, 40 mM KH2 PO4 , 10 mM EDTA, 30 mM S2 O3 2 , 30 mM CN ,
20 nM rhodanese (from refolding reaction), total volume ¼ 600 mL.
7. Every 2 min, withdraw 90 mL from the activity test tube and add it to the pre-
mix solution in the cuvette.
8. Quench sample with 325 mL staining solution and mix rapidly.
9. Measure the absorbance at 470 nm (the orange color is stable for at least
2 h).
10. Blank: cuvette with premix solution.
11. Draw a straight line by plotting absorbance at 470 nm vs. time.
12. Use slope dA 470 nm /dt to calculate enzymatic activity.
5.6.6
ATPase Activity
Several assay systems have been developed to monitor ATP hydrolysis. The most
sensitive technique uses either a-[ 32 P]-ATP or g-[ 32 P]-ATP. Upon hydrolysis, either
radioactive phosphate or ADP is released and can be detected and quantified [122].
Another method uses the dye malachite green to determine the amount of phos-
phate produced by hydrolysis of ATP [123]. A more elegant approach employs a
coupled enzyme system [124]. In the first step, pyruvate kinase regenerates ATP
from ADP and phosphoenol pyruvate. This reaction also produces pyruvate, which
is reduced in a second step catalyzed by lactate dehydrogenase under the consump-
tion of NADH, a reaction that can be followed at 340 nm in a spectrometer. De-
tailed information on the experimental design and evaluation of ATPase experi-
ments is provided in Chapter 4 of this volume.
Acknowledgments
References
bacterial chaperonin GroEL at 2.8 Å. atpase cycle. J. Biol. Chem. 278,
Nature 371, 578–586. 10328–10333.
41 Buckle, A. M., Zahn, R., & Fersht, 51 Liberek, K., Skowyra, D., Zylicz, M.,
A. R. (1997). A structural model for Johnson, C., & Georgopoulos, C.
GroEL-polypeptide recognition. Proc. (1991). The Escherichia coli DnaK
Natl. Acad. Sci. U.S.A 94, 3571–3575. chaperone, the 70-kDa heat shock
42 Chen, L. & Sigler, P. B. (1999). The protein eukaryotic equivalent, changes
crystal structure of a GroEL/peptide conformation upon ATP hydrolysis,
complex: plasticity as a basis for thus triggering its dissociation from a
substrate diversity. Cell 99, 757–768. bound target protein. J. Biol. Chem.
43 Chin, D. T., Kuehl, L., & 266, 14491–14496.
Rechsteiner, M. (1982). Conjugation 52 Xu, Z., Horwich, A. L., & Sigler,
of ubiquitin to denatured hemoglobin P. B. (1997). The crystal structure of
is proportional to the rate of hemo- the asymmetric GroEL-GroES-(ADP)7
globin degradation in HeLa cells. Proc. chaperonin complex. Nature 388, 741–
Natl. Acad. Sci. U.S.A 79, 5857–5861. 750.
44 Parag, H. A., Raboy, B., & Kulka, 53 Ranson, N. A., Farr, G. W.,
R. G. (1987). Effect of heat shock on Roseman, A. M., Gowen, B., Fenton,
protein degradation in mammalian W. A., Horwich, A. L., & Saibil,
cells: involvement of the ubiquitin H. R. (2001). ATP-bound states of
system. EMBO J. 6, 55–61. GroEL captured by cryo-electron
45 Karzai, A. W., Roche, E. D., & microscopy. Cell 107, 869–879.
Sauer, R. T. (2000). The SsrA-SmpB 54 Richter, K., Muschler, P., Hainzl,
system for protein tagging, directed O., & Buchner, J. (2001). Coordinated
degradation and ribosome rescue. Nat. ATP hydrolysis by the Hsp90 dimer. J.
Struct. Biol. 7, 449–455. Biol. Chem. 276, 33689–33696.
46 van Nocker, S., Deveraux, Q., 55 Hardy, S. J. & Randall, L. L. (1991).
Rechsteiner, M., & Vierstra, R. D. A kinetic partitioning model of
(1996). Arabidopsis MBP1 gene selective binding of nonnative proteins
encodes a conserved ubiquitin by the bacterial chaperone SecB.
recognition component of the 26S Science 251, 439–443.
proteasome. Proc. Natl. Acad. Sci. 56 Weissman, J. S., Kashi, Y., Fenton,
U.S.A 93, 856–860. W. A., & Horwich, A. L. (1994).
47 Weber-Ban, E. U., Reid, B. G., GroEL-mediated protein folding
Miranker, A. D., & Horwich, A. L. proceeds by multiple rounds of
(1999). Global unfolding of a substrate binding and release of nonnative
protein by the Hsp100 chaperone forms. Cell 78, 693–702.
ClpA. Nature 401, 90–93. 57 Frieden, C. & Clark, A. C. (1997).
48 Pratt, W. B. & Toft, D. O. (1997). Protein folding: how the mechanism
Steroid receptor interactions with heat of GroEL action is defined by kinetics.
shock protein and immunophilin Proc. Natl. Acad. Sci. U.S.A 94, 5535–
chaperones. Endocr. Rev. 18, 306– 5538.
360. 58 Shtilerman, M., Lorimer, G. H., &
49 Pratt, W. B. & Toft, D. O. (2003). Englander, S. W. (1999). Chaperonin
Regulation of signalling protein function: folding by forced unfolding.
function and trafficking by the hsp90/ Science 284, 822–825.
hsp70-based chaperone machinery. 59 Jordan, R. & McMacken, R. (1995).
Exp. Biol. Med. 228, 111–133. Modulation of the ATPase activity of
50 Richter, K., Muschler, P., Hainzl, the molecular chaperone DnaK by
O., Reinstein, J., & Buchner, J. peptides and the DnaJ and GrpE heat
(2003). Sti1 is a non-competitive shock proteins. J. Biol. Chem. 270,
inhibitor of the Hsp90 ATPase. 4563–4569.
Binding prevents the N-terminal 60 McLaughlin, S. H., Smith, H. W., &
dimerization reaction during the Jackson, S. E. (2002). Stimulation of
192 5 Analysis of Chaperone Function in Vitro
the weak ATPase activity of human Chen, S., Saibil, H. R., & Buchner,
hsp90 by a client protein. J. Mol. Biol. J. (1999). Hsp26: a temperature-
315, 787–798. regulated chaperone. EMBO J. 18,
61 Cashikar, A. G., Schirmer, E. C., 6744–6751.
Hattendorf, D. A., Glover, J. R., 70 Spiess, C., Beil, A., & Ehrmann, M.
Ramakrishnan, M. S., Ware, D. M., (1999). A temperature-dependent
& Lindquist, S. L. (2002). Defining a switch from chaperone to protease in
pathway of communication from the a widely conserved heat shock protein.
C-terminal peptide binding domain to Cell 97, 339–347.
the N-terminal ATPase domain in a 71 Krojer, T., Garrido-Franco, M.,
AAA protein. Mol. Cell 9, 751–760. Huber, R., Ehrmann, M., &
62 Gamer, J., Multhaup, G., Tomoyasu, Clausen, T. (2002). Crystal structure
T., McCarty, J. S., Rudiger, S., of DegP (HtrA). reveals a new
Schonfeld, H. J., Schirra, C., protease-chaperone machine. Nature
Bujard, H., & Bukau, B. (1996). A 416, 455–459.
cycle of binding and release of the 72 Jakob, U., Muse, W., Eser, M., &
DnaK, DnaJ and GrpE chaperones Bardwell, J. C. (1999). Chaperone
regulates activity of the Escherichia coli activity with a redox switch. Cell 96,
heat shock transcription factor 341–352.
sigma32. EMBO J. 15, 607–617. 73 Kobayashi, T. & Inouye, M. (1992).
63 Liberek, K., Marszalek, J., Ang, D., Functional analysis of the intramolec-
Georgopoulos, C., & Zylicz, M. ular chaperone. Mutational hot spots
(1991). Escherichia coli DnaJ and GrpE in the subtilisin pro-peptide and a
heat shock proteins jointly stimulate second-site suppressor mutation within
ATPase activity of DnaK. Proc. Natl. the subtilisin molecule. J. Mol. Biol.
Acad. Sci. U.S.A 88, 2874–2878. 226, 931–933.
64 Packschies, L., Theyssen, H., 74 Jaswal, S. S., Sohl, J. L., Davis, J. H.,
Buchberger, A., Bukau, B., Goody, & Agard, D. A. (2002). Energetic
R. S., & Reinstein, J. (1997). GrpE landscape of alpha-lytic protease
accelerates nucleotide exchange of the optimizes longevity through kinetic
molecular chaperone DnaK with an stability. Nature 415, 343–346.
associative displacement mechanism. 75 Mendoza, J. A., Rogers, E., Lorimer,
Biochemistry 36, 3417–3422. G. H., & Horowitz, P. M. (1991).
65 Sullivan, W. P., Owen, B. A., & Unassisted refolding of urea unfolded
Toft, D. O. (2002). The influence of rhodanese. J. Biol. Chem. 266, 13587–
ATP and p23 on the conformation of 13591.
hsp90. J. Biol. Chem. 277, 45942– 76 Zhi, W., Landry, S. J., Gierasch,
45948. L. M., & Srere, P. A. (1992).
66 Wegele, H., Haslbeck, M., Renaturation of citrate synthase:
Reinstein, J., & Buchner, J. (2003). influence of denaturant and folding
Sti1 is a novel activator of the Ssa assistants. Protein Sci. 1, 522–529.
proteins. J. Biol. Chem. 278, 25970– 77 Aghajanian, S., Walsh, T. P., &
25976. Engel, P. C. (1999). Specificity of
67 Gray, T. E. & Fersht, A. R. (1991). coenzyme analogues and fragments in
Cooperativity in ATP hydrolysis by promoting or impeding the refolding
GroEL is increased by GroES. FEBS of clostridial glutamate dehydro-
Lett. 292, 254–258. genase. Protein Sci. 8, 866–872.
68 Todd, M. J., Viitanen, P. V., & 78 Szabo, A., Korszun, R., Hartl, F. U.,
Lorimer, G. H. (1994). Dynamics of & Flanagan, J. (1996). A zinc finger-
the chaperonin ATPase cycle: like domain of the molecular
implications for facilitated protein chaperone DnaJ is involved in binding
folding. Science 265, 659–666. to denatured protein substrates.
69 Haslbeck, M., Walke, S., Stromer, EMBO J. 15, 408–417.
T., Ehrnsperger, M., White, H. E., 79 Bose, S., Weikl, T., Bugl, H., &
References 193
98 Jaenicke, R., Rudolph, R., & Heider, homogenates. Acta Chem. Scand. 17,
I. (1979). Quaternary structure, sub- 129–134.
unit activity, & in vitro association of 107 Buchner, J., Grallert, H., & Jakob,
porcine mitochondrial malic dehydro- U. (1998). Analysis of chaperone
genase. Biochemistry 18, 1217–1223. function using citrate synthase as
99 Staniforth, R. A., Cortes, A., nonnative substrate protein. Methods
Burston, S. G., Atkinson, T., Enzymol. 290, 323–338.
Holbrook, J. J., & Clarke, A. R. 108 Wiech, H., Buchner, J.,
(1994). The stability and hydropho- Zimmermann, R., & Jakob, U. (1992).
bicity of cytosolic and mitochondrial Hsp90 chaperones protein folding in
malate dehydrogenases and their vitro. Nature 358, 169–170.
relation to chaperonin-assisted folding. 109 Shao, F., Bader, M. W., Jakob, U., &
FEBS Lett. 344, 129–135. Bardwell, J. C. (2000). DsbG, a
100 Roderick, S. L. & Banaszak, L. J. protein disulfide isomerase with
(1986). The three-dimensional struc- chaperone activity. J. Biol. Chem. 275,
ture of porcine heart mitochondrial 13349–13352.
malate dehydrogenase at 3.0-Å 110 Cyr, D. M., Lu, X., & Douglas, M. G.
resolution. J. Biol. Chem. 261, 9461– (1992). Regulation of Hsp70 function
9464. by a eukaryotic DnaJ homolog. J. Biol.
101 Ranson, N. A., Dunster, N. J., Chem. 267, 20927–20931.
Burston, S. G., & Clarke, A. R. 111 Panse, V. G., Swaminathan, C. P.,
(1995). Chaperonins can catalyse the Surolia, A., & Varadarajan, R.
reversal of early aggregation steps (2000). Thermodynamics of substrate
when a protein misfolds. J. Mol. Biol. binding to the chaperone SecB.
250, 581–586. Biochemistry 39, 2420–2427.
102 Lee, G. J., Roseman, A. M., Saibil, 112 Scholz, C., Stoller, G., Zarnt, T.,
H. R., & Vierling, E. (1997). A small Fischer, G., & Schmid, F. X. (1997).
heat shock protein stably binds heat- Cooperation of enzymatic and
denatured model substrates and can chaperone functions of trigger factor
maintain a substrate in a folding- in the catalysis of protein folding.
competent state. EMBO J. 16, 659– EMBO J. 16, 54–58.
671. 113 Sanger, F. (1949). Fractionation of
103 Goloubinoff, P., Mogk, A., Zvi, oxidized insulin. Biochem. J. 44, 126–
A. P., Tomoyasu, T., & Bukau, B. 128.
(1999). Sequential mechanism of 114 Horwitz, J., Huang, Q. L., Ding, L.,
solubilization and refolding of stable & Bova, M. P. (1998). Lens alpha-
protein aggregates by a bichaperone crystallin: chaperone-like properties.
network. Proc. Natl. Acad. Sci. U.S.A Methods Enzymol. 290, 365–383.
96, 13732–13737. 115 Stromer, T., Ehrnsperger, M.,
104 Remington, S., Wiegand, G., & Gaestel, M., & Buchner, J. (2003).
Huber, R. (1982). Crystallographic Analysis of the interaction of small
refinement and atomic models of two heat shock proteins with unfolding
different forms of citrate synthase at proteins. J. Biol. Chem. 278, 18015–
2.7 and 1.7 Å resolution. J. Mol. Biol. 18021.
158, 111–152. 116 Schonfeld, H. J. & Behlke, J. (1998).
105 Haslbeck, M., Schuster, I., & Molecular chaperones and their
Grallert, H. (2003). GroE-dependent interactions investigated by analytical
expression and purification of pig ultracentrifugation and other
heart mitochondrial citrate synthase in methodologies. Methods Enzymol. 290,
Escherichia coli. J. Chromatogr. B 269–296.
Analyt. Technol. Biomed. Life Sci. 786, 117 Blond-Elguindi, S., Cwirla, S. E.,
127–136. Dower, W. J., Lipshutz, R. J.,
106 Srere, P. A., Brazil, H., & Gonen, L. Sprang, S. R., Sambrook, J. F., &
(1963). Citrate synthase assay in tissue Gething, M. J. (1993). Affinity
References 195
136 Holl-Neugebauer, B., Rudolph, R., 144 Farahbakhsh, Z. T., Huang, Q. L.,
Schmidt, M., & Buchner, J. (1991). Ding, L. L., Altenbach, C.,
Reconstitution of a heat shock effect Steinhoff, H. J., Horwitz, J., &
in vitro: influence of GroE on the Hubbell, W. L. (1995). Interaction of
thermal aggregation of alpha- alpha-crystallin with spin-labeled
glucosidase from yeast. Biochemistry peptides. Biochemistry 34, 509–516.
30, 11609–11614. 145 Miller, A. D., Maghlaoui, K.,
137 Hayer-Hartl, M. K., Ewbank, J. J., Albanese, G., Kleinjan, D. A., &
Creighton, T. E., & Hartl, F. U. Smith, C. (1993). Escherichia coli
(1994). Conformational specificity of chaperonins cpn60 (groEL) and cpn10
the chaperonin GroEL for the compact (groES) do not catalyse the refolding
folding intermediates of alpha- of mitochondrial malate dehydro-
lactalbumin. EMBO J. 13, 3192–3202. genase. Biochem. J. 291, 139–144.
138 Lindner, R. A., Kapur, A., & Carver, 146 Sparrer, H., Lilie, H., & Buchner, J.
J. A. (1997). The interaction of the (1996). Dynamics of the GroEL-protein
molecular chaperone, alpha-crystallin, complex: effects of nucleotides and
with molten globule states of bovine folding mutants. J. Mol. Biol. 258, 74–
alpha-lactalbumin. J. Biol. Chem. 272, 87.
27722–27729. 147 Hayer-Hartl, M. K., Weber, F., &
139 Schmidt, M., Buchner, J., Todd, Hartl, F. U. (1996). Mechanism of
M. J., Lorimer, G. H., & Viitanen, chaperonin action: GroES binding and
P. V. (1994). On the role of groES in release can drive GroEL-mediated
the chaperonin-assisted folding reac- protein folding in the absence of ATP
tion. Three case studies. J. Biol. Chem. hydrolysis. EMBO J. 15, 6111–6121.
269, 10304–10311. 148 Viitanen, P. V., Lubben, T. H., Reed,
140 Schroder, H., Langer, T., Hartl, J., Goloubinoff, P., O’Keefe, D. P.,
F. U., & Bukau, B. (1993). DnaK, & Lorimer, G. H. (1990). Chaperonin-
DnaJ and GrpE form a cellular chaper- facilitated refolding of ribulosebis-
one machinery capable of repairing phosphate carboxylase and ATP
heat-induced protein damage. EMBO hydrolysis by chaperonin 60 (groEL)
J. 12, 4137–4144. are Kþ dependent. Biochemistry 29,
141 Buchberger, A., Schroder, H., 5665–5671.
Hesterkamp, T., Schonfeld, H. J., & 149 Rye, H. S., Burston, S. G., Fenton,
Bukau, B. (1996). Substrate shuttling W. A., Beechem, J. M., Xu, Z.,
between the DnaK and GroEL systems Sigler, P. B., & Horwich, A. L.
indicates a chaperone network (1997). Distinct actions of cis and trans
promoting protein folding. J. Mol. ATP within the double ring of the
Biol. 261, 328–333. chaperonin GroEL. Nature 388, 792–
142 Zolkiewski, M. (1999). ClpB 798.
cooperates with DnaK, DnaJ, & GrpE 150 Sriram, M., Osipiuk, J., Freeman, B.,
in suppressing protein aggregation. A Morimoto, R., & Joachimiak, A.
novel multi-chaperone system from (1997). Human Hsp70 molecular
Escherichia coli. J. Biol. Chem. 274, chaperone binds two calcium ions
28083–28086. within the ATPase domain. Structure
143 Dumitru, G. L., Groemping, Y., 5, 403–414.
Klostermeier, D., Restle, T., 151 van Montfort, R. L., Basha, E.,
Deuerling, E., & Reinstein, J. Friedrich, K. L., Slingsby, C., &
(2004). DafA Cycles Between the Vierling, E. (2001). Crystal structure
DnaK Chaperone System and and assembly of a eukaryotic small
Translational Machinery. J. Mol. Biol. heat shock protein. Nat. Struct. Biol. 8,
339, 1179–1189. 1025–1030.
197
6
Physical Methods for Studies of Fiber
Formation and Structure
6.1
Introduction
Elucidating protein fiber assembly and analyzing the acquired regular fibrous
structures have engaged the interest of scientists from different disciplines for a
long time. The complexity of the problem, together with the paucity of interpret-
able data, has generated a large volume of literature concerned with speculations
about possible assembly mechanisms and structural conformations.
Fibrous proteins are essential building blocks of life, providing a scaffold for our
cells, both intracellularly and extracellularly. However, despite nearly a century of
research of the assembly mechanisms and structures of fibrous protein, we have
only recently begun to get insights. In the last 10 years, atomic models for some
of the most important fibrous proteins have been proposed. We are increasingly
knowledgeable about how fibrous proteins form and how they function (for exam-
ple, the interaction of actin and myosin to produce muscular movement). Develop-
ment of methods specific to the study of fibrous proteins has enabled these inves-
tigations, since fibrous proteins are generally insoluble and often heterogeneous in
length, and therefore methods used for examining the structure of globular pro-
teins cannot be used.
In this chapter, an overview of some commonly used methods to examine the
assembly and structure of fibrous proteins is given. However, additional methods
can be suitable to analyze fibrous proteins. Among the methods not discussed are
solid-state NMR (ssNMR), near-field scanning optical microscopy (NSOM), scan-
ning transmission electron microscopy (STEM), and light microscopy, for all of
which reviews can be found elsewhere. Other complementary methods, specific
for the particular protein, include cross-linking studies and antibody-binding epit-
ope studies. These are valuable techniques that can help us to interpret structural
data and contribute to the understanding of the assembly pathway, but due to
space limitations they can not be discussed in this chapter.
6.2
Overview: Protein Fibers Formed in Vivo
6.2.1
Amyloid Fibers
6.2.2
Silks
Arthropods produce a great variety of silks that are used in the fabrication of struc-
tures outside of the body of the animal. Silks are typically fibrous filaments or
threads with diameters ranging from a few hundred nanometers to 50 mm. Insects
and spiders use their silks for lifelines, nests, traps, and cocoons. Characteristically,
the silk is produced in specialized glands and stored in a fluid state in the lumen of
the gland. As the fluid passes through the spinning duct, a rapid transformation
to the solid state takes place and the silk becomes water-insoluble [12–15]. How-
ever, some of the materials that are called silks are not produced by silk glands in
the traditional sense but by extrusions of the guts, e.g., in antlions, lacewings, and
many beetles [16].
The major components of traditional silks are fibrous proteins, which in the fi-
nal insoluble state have a high content of regular secondary structure. Wide varia-
tions in composition and structure have been found among different silks. There-
fore, the classification of silks is often based on the predominant form of regular
secondary structure in the final product as determined by X-ray diffraction studies.
Silks are classified as beta silks (which are rich in b-sheets; see Section 6.3.1.3), al-
pha silks (which resemble the X-ray diffraction pattern obtained from a-keratin and
which have a much lower content of Gly and a much higher content of charged
residues than any other silk; see Section 6.3.2.4), Gly-rich silks (which form a poly-
glycine II-like structure), and collagen-like silks (which resemble the X-ray diffrac-
tion pattern given by collagen; see Section 6.3.2.1). While silks from the class of
Arachnida (spiders) are generally b-silks, in the class of Insecta all four classes of
silk can be found, e.g., b-silks in the family of Bombycidae (moths, includes
Bombyx mori, whose caterpillar is the silkworm), a-silks in the family of Apidae
(bees), Gly-rich silks in the family of Blennocampinae (subspecies of sawflies),
and collagen-like silks in the family of Nematinae (subspecies of sawflies) [15,
17–20]. In contrast, silks from extrusions of the guts do not form traditional silk
filaments or threads; rather, they are hardened fibrous feces.
6.2.3
Collagens
Collagens are a family of highly characteristic fibrous proteins found in all multi-
cellular animals (see chapter 30 in Part I). In combination with elastin (see Section
6.2.8), mucopolysaccharides, and mineral salts, collagens form the connective tis-
sue responsible for the structural integrity of the animal body. The characteristic
feature of a typical collagen molecule is its long, stiff, triple-stranded helical struc-
ture, in which three collagen polypeptide chains, called a-chains, are wound
around each other in a rope-like superhelix (see Section 6.3.2.1). The polypeptide
chains in collagens have a distinctive conformation that is intimately related to
their structural function [21–25]. All collagen polypeptides have a similar primary
structure, in which every third amino acid is Gly (Gly-Xaa-Yaa)n with a predomi-
nance of Pro residues at position Xaa and Yaa. Many of the Pro residues, as well
200 6 Physical Methods for Studies of Fiber Formation and Structure
6.2.4
Actin, Myosin, and Tropomyosin Filaments
All eukaryotic species contain actin, which is the most abundant protein in many
eukaryotic cells. Actin filaments (F actin) can form both stable and labile structures
6.2 Overview: Protein Fibers Formed in Vivo 201
in cells. Stable actin filaments are a crucial component of the contractile apparatus
of muscle cells in so-called myofibrils (see below). Many cell movements, however,
depend on labile structures constructed from actin filaments. Actin filaments are
about 8 nm wide and consist of a tight helix of uniformly oriented actin molecules
also known as globular actin (G actin). Like a microtubule (see Section 6.2.7), an
actin filament is a polar structure with two structurally different ends. The so-
called minus end is relatively inert and slow growing, while the so-called plus end
is faster growing [43, 44]. Some lower eukaryotes, such as yeasts, have only one
actin gene encoding a single protein. All higher eukaryotes have several isoforms
encoded by a family of actin genes. At least six types of actin are present in mam-
malian tissue, which fall into three classes depending on their isoelectric point: a
actins are found in various types of muscle, whereas b and g actins are the princi-
pal constituents of non-muscle cells. Although there are subtle differences in the
properties of different forms of actin, the amino acid sequences have been highly
conserved in evolution and all assemble into filaments in vitro. Polymerization of
actin in vitro requires ATP as well as both monovalent and divalent cations, such as
Kþ and Mg 2þ . ATP hydrolysis weakens the bonds in the polymer (F actin) and
thereby promotes depolymerization [43].
Myofibrils are cylindrical structures 1–2 mm in diameter, which in addition to
thin filaments formed by actin contain thick filaments made of muscle isoforms
of myosin II. Muscle myosin has two heads with ATPase and motor activity and a
long, rod-like tail. A myosin II molecule is composed of two identical heavy chains,
each of which is complexed to a pair of light chains. The amino-terminal portion of
the heavy chain forms the motor-domain head, while the carboxy-terminal half of
the heavy chain forms an extended a-helix. Two heavy chains associate by twisting
their a-helical tail domains together into a coiled coil to form a stable dimer that
has two heads and a single rod-like tail [45–47]. A major role of the rod-like tail is
to allow the molecules to polymerize into bipolar filaments. Myosin II filaments
play an important role in muscle contraction, drive membrane furrowing during
cell division, and generate tension in stress fibers.
Tropomyosins are a group of proteins that bind to the sides of actin filaments
and (together with troponins) regulate the interaction of the filaments with myosin
in response to Ca 2þ . They are parallel, a-helical, coiled-coil dimers. Vertebrates typ-
ically express approximately 20 different tropomyosins from alternative splicing
from about four tropomyosin genes [48]. Tropomyosins exist as both homodimers
and heterodimers held together by a series of salt bridges and hydrophobic interac-
tions. They may also interact through their amino and carboxyl termini in solution
(especially in low-ionic-strength buffers). There are two major groups of tropomyo-
sin, the high- and low-molecular-weight tropomyosins. The former type exists pri-
marily in muscle tissues (cardiac, smooth, and skeletal), whereas non-muscle tis-
sues express both high- and low-molecular-weight tropomyosins. Tropomyosins
bind cooperatively to actin filaments [49–51], probably because there is end-to-end
binding of the tropomyosin molecules, but possibly also because tropomyosin re-
stricts the movement of the filament, causing the filament repeat distance to
be made more regular [52]. The binding of tropomyosin to actin is strongly
202 6 Physical Methods for Studies of Fiber Formation and Structure
6.2.5
Intermediate Filaments/Nuclear Lamina
Intermediate filaments are tough, durable, 10-nm wide protein fibers found in the
cytoplasm of most, but not all, animal cells, particularly in cells that are subject to
mechanical stress [54]. Unlike actin and tubulin, which are globular proteins,
monomeric intermediate filament proteins are highly elongated, with an amino-
terminal head, a carboxy-terminal tail, and a central rod domain. The central rod
domain consists of an extended a-helical region containing long tandem repeats
of a distinct amino acid sequence motif called heptad repeats. These repeats pro-
mote the formation of coiled-coil dimers between two parallel a-helices. Next, two
of the coiled-coil dimers associate in an antiparallel manner to form a tetrameric
subunit. The antiparallel arrangement of dimers implies that the tetramer and the
intermediate filaments formed thereof are non-polarized structures. This distin-
guishes intermediate filaments from microtubules (see Section 6.2.7) and actin fila-
ments (see Section 6.2.4), which are polarized and whose functions depend on
this polarity [55–58].
The cytoplasmic intermediate filaments in vertebrate cells can be grouped into
three classes: keratin filaments, vimentin/vimentin-related filaments, and neuro-
filaments. By far the most diverse family of these proteins is the family of a-keratins,
with more than 20 distinct proteins in epithelia and over eight more so-called hard
keratins in nails and hair. The a-keratins are evolutionarily distinct from b-keratins
found in bird feathers, which have an entirely different structure and will not be
further discussed in this context. Based on their amino acid sequence, a-keratins
can be subdivided into type I (acidic) and type II (neutral/basic) keratins, which al-
ways form heterodimers of type I and type II but never homodimers [59]. Unlike
keratins, the class of vimentin and vimentin-related proteins (e.g., desmin, periph-
erin) can form homopolymers [60]. In contrast to the first two classes of interme-
diate filaments, the third one, neurofilaments, is found uniquely in nerve cells,
wherein they extend along the length of an axon and form its primary cytoskeletal
component [61].
The nuclear lamina is a meshwork of intermediate filaments that lines the inside
surface of the inner nuclear membrane in eukaryotic cells. In mammalian cells the
nuclear lamina is composed of lamins, which are homologous to other intermedi-
ate filament proteins but differ from them in several ways. Lamins have a longer
central rod domain, contain a nuclear localization signal, and assemble into a
6.2 Overview: Protein Fibers Formed in Vivo 203
6.2.6
Fibrinogen/Fibrin
Fibrinogen is a soluble blood plasma protein, which during the process of clotting
is modified to give a derived product termed fibrin. Fibrinogen is a multifunctional
adhesive protein involved in a number of important physiological and patholog-
ical processes. Its multifunctional character is connected with its complex multi-
domain structure. Fibrinogen consists of two identical subunits, each of which is
formed by three nonidentical polypeptide chains, Aa, Bb, and g, all of which are
linked by disulfide bonds and assemble into a highly complex structure [63–65].
Conversion of fibrinogen into fibrin takes place under the influence of the enzyme
thrombin. The proteolysis results in spontaneous polymerization of fibrin and for-
mation of an insoluble clot that prevents the loss of blood upon vascular injury.
This clot serves as a provisional matrix in subsequent tissue repair. The polymer-
ization of fibrin is a multi-step process. First, two stranded protofibrils are gener-
ated, which associate with each other laterally to produce thicker fibrils. The fibrils
branch to form the fibrin clot, which is subsequently stabilized by the formation of
intermolecular cross-linkages [66]. Fibrinogen is a rather inert protein, but after
conversion fibrin interacts with different proteins, such as tissue-type plasminogen
activator, fibronectin, and some cell receptors, and participates in different physio-
logically important processes, including fibrinolysis, inflammation, angiogenesis,
and wound healing [67–69].
6.2.7
Microtubules
6.2.8
Elastic Fibers
The proteins elastin, fibrillin, and resilin occur in the highly elastic structures
found in vertebrates and insects, which are at least five times more extensible
than a rubber band of the same cross-sectional area. Elastin is a highly hydropho-
bic protein that, like collagen, is unusually rich in Pro and Gly residues, but unlike
collagen it is not glycosylated and contains little hydroxyproline and no hydroxyly-
sine. Elastin molecules are secreted into the extracellular space and assemble into
elastic fibers close to the plasma membrane. After secretion, the elastin molecules
become highly cross-linked between Lys residues to generate an extensive network
of fibers and sheets. In its native form, elastin is devoid of regular secondary struc-
ture [76–79].
The elastic fibers are not composed solely of elastin, but the elastin core is cov-
ered with microfibrils that are composed of several glycoproteins including the
large glycoprotein fibrillin [80, 81]. Microfibrils play an important part in the
assembly of elastic fibers. They appear before elastin in developing tissues and
seem to form a scaffold on which the secreted elastin molecules are deposited. As
the elastin is deposited, the microfibrils become displaced to the periphery of the
growing fiber [81].
Elastin and resilin are not known to be evolutionarily related, but their functions
are similar. Resilin forms the major component in several structures in insects
possessing rubber-like properties. Similar to elastin, the polypeptide chains are
three-dimensionally cross-linked and are devoid of regular secondary structure.
The amino acid composition of resilin resembles that of elastin insofar as the Gly
content is high, but otherwise there is little sequence homology between these two
proteins [19, 79, 82].
6.2.9
Flagella and Pili
Prokaryotes and archaea use a wide variety of structures to facilitate motility, most
of which include fibrillar proteins. The best studied of all prokaryotic motility
structures is the bacterial flagellum, which is composed of over 20 different pro-
teins. The bacterial flagellum is a rotary structure driven from a motor at the base
and a protein fiber acting as a propeller. The propeller consists of three major sub-
structures: a filament, a hook, and a basal body. The filament is a tubular structure
composed of 11 protofilaments about 20 nm in diameter, and the protofilaments
are typically built by a single protein called flagellin. Less commonly, the filament
6.2 Overview: Protein Fibers Formed in Vivo 205
6.2.10
Filamentary Structures in Rod-like Viruses
Filamentous phages comprise a family of simple viruses that have only about 10
genes and grow in gram-negative bacteria. Different biological strains have been
isolated, all of which have a similar virion structure and life cycle. The most prom-
inent filamentous phage is bacteriophage fd (M13), which infects E. coli. The wild-
type virion is a flexible rod about 6 nm in diameter and 800–2000 nm in length.
Several thousand identical a-helical proteins form a helical array around a single-
stranded circular DNA molecule at the core. It is assumed that the protein sub-
units forming the helical array pre-assemble within the membrane of the host
into ribbons with the same inter-subunit interactions as in the virion. A ribbon
grows by addition of newly synthesized protein subunits at one end, and at the
same time the other end of a ribbon twists out of the membrane and merges into
the assembling virion [93–95].
Some proteins of plant viruses, such as the tobacco mosaic virus (TMV), also
form self-assembled macromolecules. The virion of TMV is a rigid rod (18 300
nm) consisting of 2130 identical virion coat protein (CP) subunits stacked in a he-
206 6 Physical Methods for Studies of Fiber Formation and Structure
lix around a single strand of plus-sense RNA. Depending upon solution conditions,
TMV CP forms three general classes of aggregates, the 4S or A protein composed
of a mixture of low-ordered monomers, dimers, and trimers; the 20S disk or helix
composed of approximately 38 subunits; and an extended virion-like rod. Under
cellular conditions, the 20S helix makes up the predominant CP aggregate [96–
99]. In the absence of RNA, the structure within the central hole of the 20S fila-
ment is disordered, but upon RNA binding structural ordering occurs, which locks
the nucleoprotein complex into a virion-like helix and allows assembly to proceed
[100]. Virion elongation occurs in both the 5 0 and 3 0 directions of the RNA. Assem-
bly in the 5 0 direction occurs rapidly and is consistent with the addition of 20S ag-
gregates to the growing nucleoprotein rod. Assembly in the 3 0 direction occurs
considerably slower and is thought to involve the addition of 4S particles [97, 99].
6.2.11
Protein Fibers Used by Viruses and Bacteriophages to Bind to Their Hosts
Certain viruses and bacteriophages use fibrous proteins to bind to their host recep-
tors. Examples are adenovirus, reovirus, and the T-even bacteriophages, such as
bacteriophage T4. The fibers of adenovirus, reovirus, and T4 share common struc-
tural features: they are slim and elongated trimers with asymmetric morphologies,
and the proteins consist of an amino-terminal part that binds to the viral capsid, a
thin shaft, and a globular carboxy-terminal domain that attaches to a cell receptor.
Strikingly, the fibers formed of these proteins are resistant to proteases, heat, and,
under certain conditions, sodium dodecyl sulfate (SDS) [101]. The assembly of the
tail spike from Salmonella phage P22 is the most extensively studied model system
and has allowed detailed study of in vitro and in vivo folding pathways in light of
structural information [102].
6.3
Overview: Fiber Structures
Structural elucidation of fibrous proteins is nontrivial and often involves the use of
several structural techniques combined to derive a model structure. Structural
studies have often involved X-ray crystallography from single crystals of mono-
meric or complexed proteins to elucidate the structure of fibrous protein subunits.
Electron microscopy with helical image reconstruction and X-ray fiber diffraction
have then been used to examine the structure of the fiber, and, where possible,
modeling of a crystal structure has been matched with the electron microscopy
maps or X-ray diffraction data. Protein fiber structures that have been extensively
studied include natural proteins such as actin, tubulin, collagen, intermediate fila-
ments, silk, myosin, and tropomyosin as well as misfolded protein fibers such as
amyloid, paired helical filaments, and serpin polymers. In this section, a few se-
lected examples will be discussed.
6.3 Overview: Fiber Structures 207
6.3.1
Study of the Structure of b-sheet-containing Proteins
6.3.1.1 Amyloid
Amyloid fibrils extracted from tissue are extremely difficult to study due to contam-
inants and the degree of order within the fibrils themselves. For this reason, struc-
tural studies of amyloid fibrils have concentrated on examining the structure of
amyloid formed in vitro from various peptides with homology to known amyloid-
forming proteins. It was clear from the early work of Eanes and Glenner [2] that
amyloid fibrils resembled cross-b silk (see Section 6.3.1.3), in that the fiber diffrac-
tion pattern gave the cross-b diffraction signals. This means that the diffraction
pattern consists of two major diffraction signals at 4.7 Å and around 10 Å, on per-
pendicular axes, that correspond to the spacing between hydrogen-bonded b-
strands and the intersheet distance, respectively. Models of amyloid fibrils have
been built using X-ray diffraction data [1, 103–106], solid-state NMR data [107–
109], and electron microscopy [110, 111]. Many of these models have a common
core structure in which the b-strands run perpendicular to the fiber axis and are
hydrogen-bonded along the length of the fiber. Whether the amyloid fibrils are
composed of parallel or antiparallel b-sheets may vary depending on the composite
protein. X-ray diffraction and FTIR data from various amyloid fibrils have sug-
gested an antiparallel b-sheet arrangement for some amyloid fibrils formed from
short peptides [105]. Solid-state NMR data from full-length Ab peptides appear to
be consistent with a parallel arrangement [108]. Alternative models of amyloid
fibril core structure involving b-helical structure have been suggested [112, 113].
The mature amyloid fibril is thought to be composed of several protofilaments;
evidence for this comes from electron microscopy [111, 114] and atomic force
microscopy data [115].
6.3.1.3 b-Silks
All silk fibroins appear to contain crystalline regions or regions of long-range order,
interspersed with short-range-ordered, non-crystalline regions [14]. Silks spun by
208 6 Physical Methods for Studies of Fiber Formation and Structure
different organisms vary in the arrangement, proportions, and length of the crys-
talline and non-crystalline regions. Generally, b-pleated sheets are packed into tight
crystalline arrays with spacer regions of more loosely associated b-sheets, a-helices,
and/or b-spirals [14]. X-ray fiber diffraction has been used extensively to examine
the structure of silks made by different spiders and insects [14]. The crystalline do-
mains of Bombyx mori fibroin silk are composed of repeating glycine-alanine resi-
dues and form an antiparallel arrangement of b-pleated strands that run parallel to
the fiber axis. For this structural arrangement, a model was proposed by Marsh,
Corey, and Pauling [119], in which the intersheet distance is about 5.3 Å, which ac-
commodates the small side chains. A recent refinement of the model [120] showed
that the methyl groups of the alanine residues alternately point to either side of the
sheet structure along the hydrogen-bonding direction, so that the structure con-
sists of two antipolar, antiparallel b-sheet structures. Major ampullate (dragline)
silk from the spider Nephila clavipes has been examined by X-ray diffraction and
solid-state NMR, and, again, the silk structure is dominated by b-strands aligned
parallel to the fiber axis [121]. Tussah silk, which is the silk of the caterpillar of
the moth Antherea mylitta, also has an antiparallel arrangement of b-strands paral-
lel to the fiber axis and gives an X-ray diffraction pattern similar to that of Bombyx
mori silk, although some significant differences are observed. These differences are
thought to arise from different modes of sheet packing due to the high polyalanine
content of Tussah silk [19].
Cross-b silk was identified in the egg stalk of the green lacewing fly, Chrysopa
flava [122]. The X-ray diffraction pattern was examined in detail and found to arise
from an arrangement where the b-strands run orthogonal to the fiber axis and
hydrogen bonding extends between the strands forming a b-sheet that runs the
length of the fiber. A model was proposed by Geddes et al. [4] in which the b-
pleated strand folded back on to itself to form antiparallel b-sheet ribbons. The
length of the b-strands was about 25 Å, giving the ribbon a width of 25 Å. Several
ribbons then stacked together in the crystallite. This model has been used as the
basis of the amyloid core structure (see Section 6.3.1.1).
Several proteins have been shown to fold into a parallel b-helix, including
pectate lyase [124] and the bacteriophage P22 tail spike [125]. These have right-
handed, single-chain helices that have a bean-shaped cross-section, and the b-
strands run perpendicular to the long axis. Left-handed b-helices (e.g., UDP N-
acetylglucosamine acyltransferase) have a triangular cross-section, as each turn of
the helix is made up of three b-strands, connected by short linkers [126]. A triple b-
helix fold has been identified in part of the short-tail fiber of bacteriophage T4,
gp12. It has a triangular cross-section and is made up of three intertwined chains
[127]. The b-strands are again perpendicular to the long axis. This arrangement
has also been identified in a region of P22 tail spike, and a similar fold has been
found in bacteriophage T4 protein gp5. The arrangement forms the needle used by
T4 to puncture host cell membranes [128]. Very recently, the structure of the T4
baseplate-tail complex was determined to 12 Å by cryo-electron microscopy, and
this has enabled the known crystal structures to be fitted into the cryo-electron mi-
croscopy map to build up a full structure of the T4 tail tube and the arrangement
of proteins at the baseplate [129].
6.3.2
a-Helix-containing Protein Fibers
6.3.2.1 Collagen
The structure of the collagen superfamily has been studied for over 50 years. Colla-
gen has a complex supramolecular structure, involving the interaction of different
molecules giving different functions. It has a characteristic periodic structure,
forming many levels of association [130]. The primary structure consists of a re-
peating motif, Gly-Xaa-Yaa, interspersed by more variable regions. The secondary
structure involves the central domain of collagen a-chain folding into a tight,
right-handed helix with an axial residue-to-residue distance determined by the
spacing of the Pro residues (Xaa) and hydroxyprolines (Yaa). This arrangement re-
sults in a Gly residue row following a slightly left-handed helix winding around the
coil, and this is responsible for the packing to form the tertiary structure, a triple
helix, where the Glys are buried inside the supercoiled molecules [130]. Depending
on the collagen type, the triple helices may be comprised of differing collagen
chains or of three identical chains [130]. The triple helix is a long, rod-like mole-
cule around 1.5 nm wide. Recently, the crystal structure of the collagen triple helix
formed synthetically from [(Pro-Pro-Gly)10 ]3 peptide was solved to a resolution of
1.3 Å [25] (Figure 6.1). The triple helix undergoes fibrillogenesis to form supramo-
lecular structures. Non-helical ends that do not contain the Gly-Xaa-Yaa sequence
are known as the telopeptides. Many models of the molecular packing have been
presented (summarized in Ref. [130]). The most recent model of type I collagen
appears to fit data from X-ray diffraction, electron microscopy, and cross-linking
studies [130]. The proposed model is a one-dimensional, staggered, left-handed mi-
crofibril. The packing allows for interconnections between the microfibrils via the
telopeptides [131]. This model was deduced by careful testing of several models to
compare calculated and observed X-ray diffraction data [132].
210 6 Physical Methods for Studies of Fiber Formation and Structure
6.3.2.2 Tropomyosin
Tropomyosin molecules are arranged within the thin filament of muscle bound to
actin filaments (see below). The molecules are two-chain, a-helical, coiled-coil pro-
teins around 400 Å in length that link end to end to form continuous strands. The
crystal structure of full-length tropomyosin was solved from low-resolution X-ray
diffraction data with a contribution from electron microscopy, revealing a pitch of
around 140 Å [133, 134].
6.3.3
Protein Polymers Consisting of a Mixture of Secondary Structure
6.3.3.1 Tubulin
The structure of microtubules has been studied using cryo-electron microscopy,
electron diffraction, and X-ray crystallography. The structure of a bacterial homo-
logue of tubulin called FtsZ has also been solved using X-ray diffraction [137],
and synthetic polymers of FtsZ have been studied by electron microscopy and im-
age reconstruction [138]. Tubulin exists as ab heterodimers that polymerize into
protofilaments that make up the microtubule. The crystal structure of tubulin het-
erodimers was solved by electron crystallography [139] from two-dimension crys-
Fig. 6.2. Structure of the microtubule from heads of kinesin dimers in orange. Thanks to
electron microscopy image reconstruction. The Linda A. Amos, LMB, Cambridge for the
microtubule is shown in green with directly donation of this figure.
bound kinesin heads in yellow and tethered
212 6 Physical Methods for Studies of Fiber Formation and Structure
tals of zinc-induced tubulin sheets in ice, and the structural interpretation was sup-
ported by the X-ray crystal structure of FtsZ [137]. Each monomer is a compact el-
lipsoid made up of three domains, an amino terminal nucleotide-binding domain,
a smaller second domain, and a helical carboxy terminal domain [140]. The tubulin
heterodimers polymerize in a polar fashion to make longitudinal protofilaments
that in turn bundle to form the microtubule. The dimer structure was docked into
20-Å maps from cryo-electron microscopy of microtubules, creating a near-atomic
model of the microtubule [140]. This model revealed that the carboxy terminal hel-
ices form a crest on the outside of the protofilaments and that the long loops de-
fine the microtubule lumen [140]. The microtubule is polar, with plus and minus
ends, and the structure showed that the exchangeable nucleotide bound to b-tubu-
lin was exposed at the plus end of the microtubule, whereas the proposed cata-
lytic residue in a-tubulin was exposed at the minus end.
Cryo-electron microscopy and helical image reconstruction techniques have
proved particularly powerful in examining the arrangement and interaction of
motor proteins bound to microtubules [141]. Three-dimensional maps of microtu-
bules decorated with motor domains or kinesin in different nucleotide states could
be compared to naturally occurring microtubules (Figure 6.2, see p. 211) [141].
sive. Four arrangements of the heads are possible, whereby the heads may be par-
allel or antiparallel. In each case the two heads may lie on the same helical track in
a compact arrangement or on neighboring tracks in a splayed arrangement [148].
Recent work on electron microscopy data from filaments of the spider Brachypelma
smithi (tarantula) [149] tested the four possible arrangements by fitting the crystal
structure of myosin heads [150] into the map generated from electron microscopy
of negatively stained thick filaments. A good fit was obtained when the myosin
heads were arranged antiparallel and packed on the same helical ridge.
6.4
Methods to Study Fiber Assembly
6.4.1
Circular Dichroism Measurements for Monitoring Structural Changes Upon Fiber
Assembly
6.4.1.1 Theory of CD
Circular dichroism (CD) is a phenomenon that gives information about the un-
equal absorption of left- and right-handed circularly polarized light by optically
active molecules in solution [151–154] (see chapter 2 in Part I). CD signals are
observed in the same spectral regions as the absorption bands of a particular com-
pound, if the respective chromophores or the molecular environment of the com-
pound are asymmetric. CD signals of proteins occur mainly in two spectral re-
gions. The far-UV or amide region (170–250 nm) is dominated by contributions
of the peptide bonds, whereas CD signals in the near-UV region (250–320 nm)
originate from aromatic amino acids. In addition, disulfide bonds give rise to mi-
nor CD signals around 250 nm. The two main spectral regions give different kinds
of information about protein structure.
CD signals in the amide region contain information about the peptide bonds
and the secondary structure of a protein and are frequently employed to monitor
changes in secondary structure in the course of structural transitions [155–157].
Since many fiber-forming proteins undergo certain conformational changes upon
fiber assembly (e.g., from an intrinsically unfolded state to a b-sheet-rich fiber as
seen in some amyloid-forming proteins; see Figure 6.3), far-UV CD measurements
can be used to monitor the kinetics of fiber assembly.
CD signals in the near-UV region are observed when aromatic side chains are
immobilized in a folded protein and thus transferred to an asymmetric environ-
ment. Since Tyr and Phe residues yield in very small near-UV CD signals, only
Trp residues lead to utilizable near-UV CD spectra. The CD signal of aromatic re-
sidues is very small in the absence of ordered structure (e.g., in short peptides) and
therefore serves as a good tool for monitoring fiber assembly of peptides (in case
Trp residues are present). The magnitude as well as the wavelengths of the aro-
matic CD signals cannot be predicted, since they depend on the structural and
electronic environment of the immobilized chromophores. Therefore, the individ-
214 6 Physical Methods for Studies of Fiber Formation and Structure
Fig. 6.3. Far-UV CD spectra of the yeast prion line represents the spectrum of protein after
protein Sup35p-NM [9]. The green line fiber assembly, which reflects a b-sheet-rich
represents the spectrum of soluble protein that conformation.
depicts a random coil conformation. The black
ual peaks in the complex near-UV CD spectrum of a protein usually cannot be as-
signed to transitions in the vicinity of specific amino acid side chains. However, the
near-UV CD spectrum represents a highly sensitive criterion for the structural
state of a protein. It can thus be used as a fingerprint for either conformational
state, the soluble and the fibrous one [156, 157].
its contribution to the total optical density of the sample should be as high as pos-
sible. This implies that the absorbance of the solvent should be as small as possi-
ble. Absorbance spectra of the samples prior to the CD experiment should be re-
corded to determine the protein concentration necessary for the calculation of the
molar CD and to select the optimal conditions for the CD measurement. Most buf-
fers are transparent in the near-UV region, and path lengths of 1 cm or more can
be selected in this spectral region. For measurements in the far-UV region, short
path lengths (0.2–0.01 cm) and increased protein concentrations should be em-
ployed to minimize the contributions of the solvent to the total absorbance. As
a rule, 0.1-cm cells can be used for the spectral region down to 200 nm. For mea-
surements extending below 200 nm, 0.01-cm cells are advisable. Solutions to be
employed for CD measurements should be filtered or centrifuged to remove dust
or other solid suspended particles. The presence of light scattering in CD measure-
ments leads to artifacts that result in anomalous, red-shifted spectra. Since light
scattering increases during assembly of protein fibers, this point has to be taken
into consideration before evaluating the received data. When protein samples are
exposed to UV light around 220 nm for an extended time, as is frequently the
case when measuring far-UV CD spectra or conformational transition kinetics,
photochemical damage can occur. It is advisable to examine some functional or
structural properties of the protein under investigation after long-term CD mea-
surements [152, 155, 156].
6.4.2
Intrinsic Fluorescence Measurements to Analyze Structural Changes
dues, fluorescence of Tyr is barely detectable due to the strong emission of Trp, due
to wavelength shifts of Trp emission towards the wavelength of Tyr emission, and
due to non-radiative energy transfer from Tyr to Trp residues in compactly folded
proteins.
Changes in protein conformation very often lead to large changes in fluores-
cence emission, since fluorescence of aromatic amino acids depends on their envi-
ronmental conditions. In Trp-containing proteins, both shifts in emission wave-
length and changes in intensity can generally be observed upon conformational
changes. In hydrophobic environments, such as in the interior of a folded protein
or protein fibers, Trp emission occurs at lower wavelengths than in aqueous ones.
The fluorescence of Trp residues can be investigated selectively by excitation at
wavelengths greater than 295 nm. Due to the red shift and the increased intensity
of the absorbance spectrum of Trp residues, when compared with that of Tyr resi-
dues, protein absorbance above 295 nm originates almost exclusively from Trp res-
idues. Unlike absorbance, the changes in fluorescence upon folding can be very
large, which makes intrinsic Trp fluorescence useful as a sensitive probe for con-
formational changes of proteins upon fiber assembly and in analyzing the kinetics
thereof.
The fluorescence maximum of Tyr residues remains around 303 nm irrespective
of its molecular environment. Therefore, conformational changes of proteins are
accompanied by changes in the Tyr emission intensity, but not in the wavelength
of emission, which makes Tyr fluorescence a less sensitive tool for studying confor-
mational changes in comparison to Trp fluorescence [159, 160, 163, 164].
compartment of the instrument and cuvettes with specially designed bottoms for
magnetic bars are commercially available.
Protein emission spectra are broad and without detectable fine structure.
Changes upon structural transitions frequently involve the entire fluorescence
spectrum. Therefore, a variety of wavelengths and wide emission slits can be used
to follow conformational changes of proteins. The fluorescence intensity generally
decreases with increasing temperature. This decrease is substantial, with Tyr emis-
sion decreasing b1% per degree increase in temperature and an even more pro-
nounced temperature dependence of Trp fluorescence. In contrast, there is no
linear relationship between fluorophore concentration and emission intensity. The
actual form of the concentration dependence varies with the optical geometry of
the fluorometer and the cell. The nonlinearity is caused by inner filter optical
effects. Between the entrance window of the cells and the volume element of the
sample, from which fluorescence emission is collected, the exciting light is attenu-
ated by absorbance of the protein and the solvent. In strongly absorbing samples
(i.e., at high protein concentrations), the emission intensity is actually decreased,
as almost the entire incident light is absorbed before it can reach the center of the
cuvette [158, 159, 161]. Since growing fibers are large enough to scatter light, light
scattering as measured in the spectrofluorometer can be used as an analytic tool to
investigate fiber assembly kinetics (see Section 6.4.5).
6.4.3
Covalent Fluorescent Labeling to Determine Structural Changes of Proteins with
Environmentally Sensitive Fluorophores
6.4.4
1-Anilino-8-Naphthalensulfonate (ANS) Binding to Investigate Fiber Assembly
6.4.4.2 Experimental Guide to Using ANS for Monitoring Protein Fiber Assembly
ANS has been used to study the partially folded states of globular proteins as well
as the binding pockets of a number of carrier proteins and enzymes. ANS has also
been used to characterize fibrillar forms of proteins, such as fibronectin, islet amy-
loid polypeptide, or infectious isoforms of the prion protein PrP [177–180]. After
binding to fibers, ANS can show an increased fluorescence and a blue shift of its
fluorescence maxima. Protein concentrations should be in the range of 0.5–20 mM
and ANS should be added in a 10- to 100-fold excess. The concentration of ANS
can be determined by using a molar extinction coefficient of 6:8 10 3 M1 cm1
at 370 nm in methanol [181]. In a 1-cm cell the fluorescence inner-filter effect (see
Section 6.4.2.2) becomes significant at @30 mM ANS. Therefore, correction factors
have to be used for the fluorescence of ANS when its total concentration exceeds
10 mM. The magnitude of the inner-filter effect must be experimentally determined
for each instrument and whenever the instrumental configuration is altered. The
magnitude of the correction depends on the wavelength range and the path length,
but not on slit-width or sample turbidity for most of the instruments [182].
To determine whether ANS binds to the fibrous protein, but not to the soluble
counterpart, fluorescence emission spectra should be recorded from 400–650 nm
with an excitation wavelength of 380 nm. After measuring ANS fluorescence spec-
tra before and after fiber assembly, it can be determined whether ANS serves as a
tool for investigating fiber assembly of the chosen protein. Extreme care has to be
taken in determining whether ANS fluorescence changes are based solely on fiber
formation or whether ANS is also able to bind non-fibrous assembly intermediates.
If ANS binding is specific to the fibrous form of the protein, the wavelength with
maximal fluorescence differences before and after assembly can be used to moni-
tor fiber assembly kinetics.
6.4.5
Light Scattering to Monitor Particle Growth
6.4.6
Field-flow Fractionation to Monitor Particle Growth
ary surfaces are formed by just a few macromolecules. In contrast, most dissolved
macromolecules, especially protein fibers, have to be described in terms of an
asymmetrical, expanded and inhomogeneous coil. Considering the complex na-
ture of protein fiber conformations, two parameters are important: the root-mean-
square radius and the hydrodynamic radius of the macromolecule. Although a
theory predicts the hydrodynamic size of eluting species as a function of elution
time, absolute determination of size without reference to standards, assumptions
about the conformation of the particles, etc., can be obtained only by combination
of FFF with multi-angle light-scattering (MALS) detectors (see Section 6.4.3) [186–
188]. FFF in combination with MALS allows separation of protein fibers of differ-
ent sizes and therefore serves as a tool to monitor assembly and elongation of pro-
tein fibers.
6.4.7
Fiber Growth-rate Analysis Using Surface Plasmon Resonance
thick). The choice of metal used is critical, since the metal must exhibit free elec-
tron behavior. Suitable metals include silver, gold, copper, and aluminum. Silver
and gold are the most commonly used, since silver provides a sharp SPR reso-
nance peak and gold is highly stable [201]. The evanescent wave of the incoming
light is able to couple with the free oscillating electrons (plasmons) in the metal
film at a specific angle of incidence, and thus the plasmon resonance is resonantly
excited. This causes energy from the incident light to be lost to the metal film, re-
sulting in a reduction in the intensity of reflected light, which can be detected by a
two-dimensional array of photodiodes or charge-coupled detectors. Therefore, if the
refractive index directly above the metal surface changes by the adsorption of a pro-
tein layer, a change in the angle of incidence required to excite a surface plasmon
will occur [202]. By monitoring the angle at which resonance occurs during an ad-
sorption process with respect to time, an SPR adsorption profile can be obtained.
The difference between the initial and the final SPR angle gives an indication of
the extent of adsorption, and the positive gradient of the SPR adsorption curve
determines the rate of adsorption, allowing analysis of, e.g., assembly kinetics of a
growing protein fiber.
6.4.8
Single-fiber Growth Imaging Using Atomic Force Microscopy
mon points easily visible on the surface should be chosen as a reference (such as
protein aggregates or other impurities). The use of short, narrow-legged silicon ni-
tride cantilevers (nominal spring constant 0.032 N/m) is emphasized. The best
imaging results can be obtained using a tapping frequency in the range of 9 to 10
kHz. Typical scan rates are between 1 and 2 Hz, with drive amplitudes in the range
of 150 to 300 mV [115, 210].
6.4.9
Dyes Specific for Detecting Amyloid Fibers
CR to amyloid fibers formed by Ab peptides and insulin was highly specific and
could be quantified [214], the fact that CR also binds to native, partially unfolded
conformations of several proteins indicates that it must be used with caution as a
diagnostic test for the presence of amyloid fibrils in vitro.
The benzothiazoles thioflavin S and thioflavin T (ThT) are other classical amy-
loid stains that bind specifically to b-sheet-rich fibers but not to monomeric or oli-
gomeric intermediates [217, 218]. ThT proved to be particularly attractive because
of a substantial hypochromic shift of the excitation maximum of the bound dye
from 336 to 450 nm, permitting selective excitation of the bound species. ThT fluo-
resces only when it is bound to many but not all amyloid fibrils and fibrillar b-
sheet species [218]. The binding reaction is rapid (completed within 30 s) and is
sensitive enough to avoid significant contributions from light scattering. The
advantage of ThT over CR is that ThT does not inhibit fiber assembly and can
therefore also be used as a dye for real-time fiber growth, although it is important
to investigate in each particular case whether ThT stimulates assembly kinetics or
stabilizes the fibers [219].
plate is approximately 3 min. Protein fiber adsorption to the plates has to be inves-
tigated by incubating protein fiber solutions in the plate for 10 min at 37 C with
stirring and by determining the protein concentration before and after incubation
by measuring the UV absorbance.
Alternatively, fibril formation can be performed in glass vials. Ten-microliter ali-
quots are withdrawn from the glass vials and added directly to a fluorescence cu-
vette (1-cm path length semi-micro quartz cuvette) containing 1 mL of a ThT mix-
ture (5 mM ThT, 50 mM Tris buffer, and 100 mM NaCl pH 7.5). Emission spectra
are recorded immediately after addition of the aliquots to the ThT mixture from
470 nm to 560 nm (excitation at 450 nm). For each sample, the signal can be ob-
tained as the ThT intensity at 482 nm, from which a blank measurement recorded
prior to addition of protein to the ThT solution is subtracted.
6.5
Methods to Study Fiber Morphology and Structure
6.5.1
Scanning Electron Microscopy for Examining the Low-resolution Morphology of a
Fiber Specimen
imaging. SEM allows a large depth of field compared to the optical microscope,
permitting the visualization of rough surfaces in focus across the whole speci-
men. In addition, the contrast within scanning micrographs provides a three-
dimensional effect when the specimen is tilted with respect to the electron beam.
6.5.2
Transmission Electron Microscopy for Examining Fiber Morphology and Structure
Metal Shadowing for Enhanced Contrast of Fibrous Proteins The contrast provided
by negative-stain techniques may be inadequate for visualization of rod-shaped par-
ticles, and metal shadowing may increase the contrast [224], allowing better imag-
ing of the morphological features. Metal shadowing may be unidirectional or ro-
tary. The protein to be examined must be in a solution of glycerol (50% v/v) and
232 6 Physical Methods for Studies of Fiber Formation and Structure
then be sprayed onto a piece of freshly cleaved mica and dried. The protein-covered
mica is coated with platinum or another heavy metal (such as tantalum/tungsten)
by evaporation and then coated with a layer of carbon. The metal/carbon replicas
are then floated off onto cleaned grids and finally examined under the electron mi-
croscope. Contact prints of the micrographs provide the best images for further
analysis (Figure 6.6). Details of this procedure can be found in Ref. [224].
6.5.3
Cryo-electron Microscopy for Examination of the Structure of Fibrous Proteins
graph images, they must be digitized (some electron microscopes have fitted CCD
cameras, thus avoiding this step). Ideally, a fiber image will be scanned such that
the fiber is vertical in order to avoid interpolation resulting in loss of data at the
later stages. Examination of images can be carried out using a suitable program
such as Ximdisp [226]. Regions of fiber images may be selected and must be boxed
and floated. This involves masking off all density not belonging to the particle and
surrounding the particle box to reduce the density step at the edge of the box [227].
Fiber images may benefit from Fourier filtering and/or background subtraction.
The fibers may be straightened by a number of established techniques [228]. If
necessary, the image may be interpolated. Interactive Fourier transforms are cal-
culated to look for repeating features with the fiber. If a diffraction pattern is
observed, then further analysis may be carried out on the fiber image using helical
image reconstruction [227, 229, 230]. The determination of the position and tilt of
the helix axis must first be performed, followed by indexing of the layer line spot
positions. The indexing is tested and phases are measured, allowing a reconstruc-
tion to be carried out [227, 229, 230]. This results in a three-dimensional density
function for the object. This methodology can be followed in detail [222, 229].
6.5.4
Atomic Force Microscopy for Examining the Structure and Morphology of Fibrous
Proteins
Fig. 6.7. (a) AFM image of amyloid fibers ampullate silk from Araneus diadematus on
assembled with biotinylated yeast Sup35p-NM mica scanned in tapping mode. The left image
on streptavidin scanned in contact mode [251]. reflects the sample height above the surface,
The scale bar gives the color code for the and the right image depicts the tip deflection.
height information. (b) AFM images of major
to the shape and size of the tip. Of course, height measurements taken from speci-
mens in air may be affected by dehydration of the specimen, by the effect of grav-
ity, or by the force of the cantilever compressing soft protein. However, morphologi-
cal features may be clearly observed.
236 6 Physical Methods for Studies of Fiber Formation and Structure
6.5.5
Use of X-ray Diffraction for Examining the Structure of Fibrous Proteins
However, there will be regions of reciprocal space for which no data is recorded. By
tilting the specimen, additional data can be collected.
Sample Preparation Fibers in general are weakly diffracting, and therefore speci-
men preparation is particularly important. The specimen should be of an optimum
size and thickness to fill the beam. Otherwise background scatter may obscure
weak reflections. The amount of information that can be obtained from an X-ray
fiber diffractogram is closely associated with the degree of alignment. Some fi-
brous proteins occur naturally in a highly oriented form (e.g., keratin and colla-
gen). Often regions of orientation are quite limited, and therefore it can be useful
to use a microfocus X-ray beam for data collection. Orientation may be improved
by physical manipulation or by chemical treatment. For example, the orientation
of Chrysopa egg stalk silk fiber orientation was increased by stretching in water or
urea [4]. Preparation of oriented films and fibers from soluble fibrous proteins has
involved a number of procedures that include stretching, rolling, casting films,
drawing fibers from precipitate, shearing, magnetic alignment, centrifugation,
and capillary flow. Long-chain molecules can be drawn from the solution in a man-
ner similar to that used for making diffraction samples of DNA. Samples that can-
not be pulled into long fibers may be aligned by hanging a drop between two wax-
plugged capillary tubes arranged 1–2 mm apart [1]. Casting flat sheets is often
used as a method for aligning polymers [233–235], and this can be achieved by
placing a high concentration of a fiber solution on a suitable surface and allowing
the solution to dry to form a thin film that can be removed from the surface (such
238 6 Physical Methods for Studies of Fiber Formation and Structure
as Teflon). The film will show alignment of the fibers in the direction parallel to
the film surface but no alignment in the perpendicular direction. Further improve-
ment to alignment of fiber bundles or films may be achieved by annealing in high
humidity. Some fibrous protein specimens are diamagnetically anisotropic [236]
and may benefit from magnetic alignment [105, 237, 238]. The effect of the mag-
netic field on a single molecule is small, and therefore it is necessary to embed the
molecules in a liquid crystal so that the molecules act together and to use a high-
energy magnet [239]. The alignment can be fixed in the specimen by slow drying
while the specimen is still in the magnetic field.
Data Collection Since fibrous proteins are often weak scatterers, long exposure
times may be needed. Cameras specially designed for the particular qualities of
the specimen may be helpful. The arrangement of the apparatus will depend on
whether a high- or low-angle diffraction is being collected. The use of a helium
chamber and an evacuated camera can be helpful to reduce air scatter as well as
to allow the humidity around the specimen to be controlled if required. In the
past, the recording of diffraction data was performed using photographic film.
This allowed the placement of several films in the detector to record a range of in-
tensities. However, it is now more common to use detectors used for crystallogra-
phy, such as a MarResearch image plate or CCD cameras. If film has been used,
then the diffraction image must be digitized and converted to an appropriate for-
mat for data analysis.
6.5.6
Fourier Transformed Infrared Spectroscopy
the spectra observed from the formation of all-b-sheet protein fibers are more sub-
tle [248]. Observed and calculated infrared spectra from a-helix and antiparallel and
parallel b-structures can be found in Ref. [247].
(25–50 mm) may be used. Thin films can be prepared by drying the protein suspen-
sion or solution onto thin silver chloride disks.
Data analysis of FTIR spectra involves some expertise. Water or buffer subtrac-
tion must be performed before analysis of the spectrum. The conformation of the
polypeptide chain has a signature in the amide I band. However, the bands may
overlap, making interpretation difficult. This can be tackled in two ways. One
approach is resolution enhancement methods or deconvolution to decrease the
widths of the infrared bands and to allow determination of the overlapping compo-
nents. A second approach is to use pattern-recognition methods for which software
packages are available (e.g., GRAMS, Galactic Corp.).
6.6
Concluding Remarks
In the present chapter, fiber-forming proteins and methods to determine their fiber
assembly and fiber morphology have been described. There is a wide variation
among fibrous proteins with regard to the content of regular secondary structure.
Fibers of proteins such as collagen and tropomyosin with highly elongated mole-
cules tend to have a high content of regular secondary structure (generally greater
than 90%). In contrast, fibers of globular proteins, such as actin, feather keratin,
and flagellin, generally have much lower contents of regular structure (often less
than 50%). In another group of proteins, comprising elastin, resilin, and the ma-
trix proteins of mammalian keratins, the content of regular secondary structure is
very small. In these cases the protein fibers are cross-linked and have rubber-like
elastic properties.
So far, any attempt to correlate sequence with structure and function in fibrous
proteins is highly speculative. Therefore, from a given sequence it cannot be con-
cluded whether a protein forms fibers or not. However, the structure-function rela-
tion based on sequence data is the aim of many biologists, chemists, physicists,
and material scientists. One main conformation-determining factor in fibrous pro-
teins appears to be regions with repeating sequences. The simplest examples of
repeating sequences are found among the silks, which have, for instance, runs of
Glyn. The corresponding sections of the chains adopt conformations that appear
to be identical to those observed in synthetic homopolypeptides. Four other
conformation-determining factors in fibrous proteins are interactions involving
side chains, interactions between apolar side chains, hydrogen bond formation,
and the content and distribution of Gly and Pro residues.
Aggregation studies on amyloids, collagens, myofibrils, and silks illustrate the
capacity of fibrous proteins for self-assembly into filaments and other organized
aggregates. The mode of aggregation is generally found to be sensitive to changes
in pH, ionic strength, and the presence of other solutes, indicating that the assem-
bly process depends upon a fine balance among various types of weak secondary
bonding. Packing considerations are also clearly involved, but in the case of assem-
bly in aqueous environment, these will be less important than in a solvent-free
242 6 Physical Methods for Studies of Fiber Formation and Structure
crystalline phase. Several features of the self-assembly process in vivo are still not
well understood, such as nucleation and termination of fiber formation. For termi-
nation of fiber formation, two possibilities can be envisaged: the assembly process
could be intrinsically self-limiting, with the information residing in the individual
protein molecules, or the size of the aggregate is potentially unlimited, but regula-
tion of size is achieved through some external agency.
The future aim in investigating fibrillar proteins is to understand the kinetics of
fiber assembly (which will be a basis for controlling fiber assembly) and to gain
more insights into fibrous structure to learn about sequence-structure relation-
ships. Such knowledge will be a basis for elucidating the relevance of assembly
processes in cellular development, cellular structure, and cellular stability; for as-
sessing fundamental influences leading to protein-folding diseases; and for devel-
oping products through material science.
Acknowledgements
References
1 Sunde, M. & Blake, C. (1997). The basis of protein folding and its links
structure of amyloid fibrils by electron with human disease. Philos. Trans. R.
microscopy and X-ray diffraction. Adv. Soc. Lond. B Biol. Sci. 356, 133–145.
Protein Chem. 50, 123–159. 7 Lindquist, S. (1997). Mad cows meet
2 Eanes, E. D. & Glenner, G. G. psi-chotic yeast: the expansion of the
(1968). X-ray diffraction studies on prion hypothesis. Cell 89, 495–498.
amyloid filaments. J. Histochem. 8 Wickner, R. B., Edskes, H. K.,
Cytochem. 16, 673–677. Maddelein, M. L., Taylor, K. L. &
3 Teplow, D. B. (1998). Structural and Moriyama, H. (1999). Prions of yeast
kinetic features of amyloid beta- and fungi. Proteins as genetic
protein fibrillogenesis. Amyloid 5, material. J. Biol. Chem. 274, 555–558.
121–142. 9 Scheibel, T. (2002). [PSI]-chotic
4 Geddes, A. J., Parker, K. D., Atkins, yeasts: protein-only inheritance of a
E. D. & Beighton, E. (1968). ‘‘Cross- yeast prion. in Recent Research in
beta’’ conformation in proteins. J. Mol. Molecular Microbiology 1 (ed.
Biol. 32, 343–358. Pandalai, S. G.), Hindustan Publ.
5 Rochet, J. C. & Lansbury, P. T. Corp., Delhi, 71–89.
(2000). Amyloid fibrillogenesis: 10 Iconomidou, V. A., Vriend, G. &
themes and variations. Curr. Opin. Hamodrakas, S. J. (2000). Amyloids
Struct. Biol. 10, 60–68. protect the silkmoth oocyte and
6 Dobson, C. M. (2001). The structural embryo. FEBS Lett. 479, 141–145.
References 243
fibers. Austr. J. Mar. Freshwater Res. 3, and function. J. Muscle Res. Cell Motil.
199–204. 22, 5–49.
36 Rudall, K. M. (1955). The distribution 49 Wegner, A. (1979). Equilibrium of the
of collagen and chitin. Symp. Soc. Exp. actin-tropomyosin interaction. J. Mol.
Biol. 9, 49–71. Biol. 131, 839–853.
37 Melnick, S. C. (1958). Occurrence of 50 Yang, Y.-Z., Korn, E. D. &
collagen in the phylum Mollusca. Eisenberg, E. (1979). Cooperative
Nature 181, 1483. binding of tropomyosin to muscle and
38 DeVore, D. P., Engebretson, G. H., Acanthamoeba actin. J. Biol. Chem.
Schactele, C. F. & Sauk, J. J. (1984). 254, 7137–7140.
Identification of collagen from byssus 51 Zot, A. S. & Potter, J. D. (1987)
threads produced by the sea mussel Structural aspects of troponin-
Mytilus edulis. Comp. Biochem. Physiol. tropomyosin regulation of skeletal
77B, 529–531. muscle contraction. Ann. Rev. Biophys.
39 Benedict, C. V. & Waite, J. H. Biophys. Chem. 16, 535–539.
(1986). Location and analysis of byssal 52 Stokes, D. L. & DeRosier D. J.
structural proteins. J. Morphol. 189, (1987). The variable twist of actin and
261–270. its modulation by actin-binding
40 Pikkarainen, J., Rantanen, J., proteins. J. Cell Biol. 104, 1005–1017.
Vastamaki, M., Lapiaho, K., Kari, A. 53 Pante, N. (1994). Paramyosin polarity
& Kulonen, E. (1968). On collagens in the thick filament of molluscan
of invertebrates with special reference smooth muscles. J. Struct. Biol. 113,
to Mytilus edulis. Eur. J. Biochem. 4, 148–163.
555–560. 54 Albers, K. & Fuchs, E. (1992). The
41 Qin, X. X. & Waite, J. H. (1995). molecular biology of intermediate
Exotic collagen gradients in the byssus filament proteins. Int. Rev. Cytol. 134,
of the mussel Mytilus edulis. J. Exp. 243–279.
Biol. 198, 633–644. 55 Chou, Y. H., Bischoff, J. R., Beach,
42 Coyne, K. J., Qin, X. X. & Waite, D. & Goldman, R. D. (1990).
J. H. (1997). Extensible collagen in Intermediate filament reorganization
mussel byssus: a natural block during mitosis is mediated by p34cdc2
copolymer. Science 277, 1830–1832. phosphorylation of vimentin. Cell 62,
43 Carlier, M. F. (1991). Actin: protein 1063–1071.
structure and filament dynamics. J. 56 Stewart, M. (1993). Intermediate
Biol. Chem. 266, 1–4; Kabsch, W. & filament structure and assembly. Curr.
Vandekerckhove, J. (1992). Structure Opin. Cell Biol. 5, 3–11.
and function of actin. Annu. Rev. 57 Herrmann, H. & Aebi, U. (2000).
Biophys. Biomol. Struct. 21, 49–76. Intermediate filaments and their
44 Janmey, P. A., Shah, J. V., Tang, J. X. associates: multi-talented structural
& Stossel, T. P. (2001). Actin filament elements specifying cytoarchitecture
networks. Results Probl. Cell. Differ. 32, and cytodynamics. Curr. Opin. Cell
181–199. Biol. 12, 79–90.
45 Cheney, R. E., Riley, M. A. & 58 Strelkov, S. V., Herrmann, H. &
Mooseker, M. S. (1993). Phylogenetic Aebi, U. (2003). Molecular
analysis of the myosin superfamily. architecture of intermediate filaments.
Cell Motil. Cytoskeleton 24, 215–223. Bioessays 25, 243–251.
46 Titus, M. A. (1993). Myosins. Curr. 59 Coulombe, P. A. (1993). The cellular
Opin. Cell. Biol. 5, 77–81. and molecular biology of keratins:
47 Berg, J. S., Powell, B. C. & Cheney, beginning a new era. Curr. Opin. Cell
R. E. (2001). A millennial myosin Biol. 5, 17–29.
census. Mol. Biol. Cell. 12, 780– 60 Garrod, D. R. (1993). Desmosomes
794. and hemidesmosomes. Curr. Opin.
48 Perry, S. V. (2001). Vertebrate Cell Biol. 5, 30–40.
tropomyosin: distribution, properties 61 Nigg, E. A. (1992). Assembly and cell
References 245
129 Kostyuchenko, V. A., Leiman, P. G., 140 Nogales, E., Whittaker, M.,
Chipman, P. R., Kanamaru, S., Milligan, R. A. & Downing, K. H.
van Raaij, M. J., Arisaka, F., (1999). High-resolution model of the
Mesyanzhinov, V. V. & Rossman, microtubule. Cell 96, 79–88.
M. G. (2003). Three-dimensional 141 Hirose, K., Amos, W. B., Lockhart,
structure of the bacteriophage T4 A., Cross, R. A. & Amos, L. A. (1997).
baseplate. Nat. Struct. Biol. 10, 688– Three-dimensional cryo-electron
693. microscopy of 16-protofilament
130 Ottani, V., Martini, D., Franchi, microtubules: structure, polarity and
M., Ruggeri, A. & Raspanti, M. interaction with motor proteins.
(2002). Hierarchical structures in J. Struct. Biol. 118, 140–148.
fibrillar collagens. Micron 33, 587– 142 Squire, J. & Morris, E. (1998). A new
596. look at thin filament regulation in
131 Wess, T. J., Hammersley, A. P., Wess, vertebrate skeletal muscle. FASEB J.
L. & Miller, A. (1998). A consensus 12, 761–771.
model for molecular packing of type I 143 Hanson, E. J. & Lowry, J. (1963). The
collagen. J. Struct. Biol. 122, 92–100. structure of F-actin and of actin
132 Wess, T. J., Hammersley, A. P., Wess, filaments isolated from muscle. J. Mol.
L. & Miller, A. (1995). Type I Biol. 6, 48–60.
Collagen packing conformation of the 144 Kabsch, W., Mannherz, H. G., Suck,
triclinic unit cell. J. Mol. Biol. 248, D., Pai, E. F. & Holmes, K. C. (1990).
487–493. Atomic structure of the actin filament.
133 Phillips, G. N., Fillers, J. P. & Nature 347, 44–49.
Cohen, C. (1986). Tropomyosin crystal 145 Holmes, K., Popp, D., Gerhard, W.
structure and muscle regulation. & Kabsch, W. (1990). Atomic Model
J. Mol. Biol. 192, 111–131. of the Actin Filament. Nature 347,
134 Whitby, F. G., Kent, H., Stewart, F., 44–49.
Stewart, M., Xie, X., Hatch, V., 146 Lorenz, M., Poole, K., Popp, D.,
Cohen, C. & Phillips, G. N. (1992). Rosenbaum, G. & Holmes, K. (1995).
Structure of tropomyosin at 9 An Atomic Model of the Unregulated
Angstroms resolution. J. Mol. Biol. Thin Filament Obtained by X-ray
227, 441–452. Fiber Diffraction on Oriented Actin-
135 Crick, F. H. C. (1952). Is alpha- Tropomyosin Gels. J. Mol. Biol. 246,
keratin a coiled coil. Nature 170, 882– 108–119.
883. 147 Squire, J. M., Knupp, C., Al-Khayat,
136 Strelkov, S. V., Herrman, H., H. A. & Harford, J. J. (2003). Milli-
Geisler, N., Wedig, T., Zimbelmann, second time-resolved low-angle X-ray
R., Aebi, U. & Burkhard, P. (2002). fiber diffraction: a powerful high-
Conserved segments 1A and 2B of the sensitivity technique for modelling
intermediate filament dimer: their real-time movements in biological
atomic structures and role in filament macromolecular assemblies. Fib. Diff.
assembly. EMBO J. 21, 1255–1266. Rev. 11, 28–35.
137 Lowe, J. & Amos, L. A. (1998). Crystal 148 Morris, E., Squire, J. & Fuller, G.
structure of the bacterial cell-division (1991). The 4-Stranded Helical
protein FtsZ. Nature 391, 203–206. Arrangement of Myosin Heads on
138 Lowe, J. & Amos, L. A. (1999). Insect (Lethocerus) Flight Muscle
Tubulin-like protofilaments in Ca2þ Thick Filaments. J. Struct. Biol. 107,
induced FtsZ sheets. EMBO J. 18, 237–249.
2364–2371. 149 Padron, R., Alamo, L., Murgich, J.
139 Nogales, E., Wolf, S. & Downing, & Craig, R. (1998). Towards an
K. H. (1998). Structure of the alpha- atomic model of the thick filaments of
beta tubulin dimer by electron muscle. J. Mol. Biol. 275, 35–41.
crystallography. Nature 391, 199– 150 Rayment, I., Rypniewski, W.,
203. Schmidt-Base, K., Smith, R.,
References 249
175 Matulis, D., Baumann, C. G., 184 Shen, C. L., Scott, G. L., Merchant,
Bloomfield, V. A. & Lovrien, R. F. & Murphy, R. M. (1993). Light
(1999). 1-anilino-8-naphthalene scattering analysis of fibril growth
sulfonate as a protein conformational from the amino-terminal fragment
tightening agent. Biopolymers 49, 451– beta(1–28) of beta-amyloid peptide.
458. Biophys. J. 65, 2383–2395.
176 Kirk, W., Kurian, E. & Prendergast, 185 Harding, S. E. (1997) in Protein
F. (1996). Affinity of fatty acid for structure – a practical approach (ed.
(r)rat intestinal fatty acid binding Creighton, T. E.) Oxford University
protein: further examination. Biophys. Press, 219–251.
J. 70, 69–83. 186 Wyatt, P. J. (1991). Combined
177 Safar, J., Roller, P. P., Gajdusek, differential light scattering with
D. C. & Gibbs, C. J. (1984). Scrapie various liquid chromatography
amyloid (prion) protein has the separation techniques. Biochem. Soc.
conformational characteristics of an Trans. 19, 485.
aggregated molten globule folding 187 Roessner, D. & Kulicke, W. M.
intermediate. Biochemistry 33, 8375– (1994). On-line coupling of flow field-
8383. flow fractionation and multi-angle
178 Litvinovich, S. V., Brew, S. A., Aota, laser light scattering J. Chromatogr.
S., Akiyama, S. K., Haudenschild, C. 687, 249–258.
& Ingham, K. C. (1998). Formation of 188 Wyatt, P. J. & Villalpando, D.
amyloid-like fibrils by self-association (1997). High-Precision Measurement
of a partially unfolded fibronectin type of Submicrometer Particle Size
III module. J. Mol. Biol. 280, 245–258. Distributions. Langmuir 13, 3913–
179 Kayed, R., Bernhagen, J., 3914.
Greenfield, N., Sweimeh, K., 189 Harding, S. E. (1986). Applications of
Brunner, H., Voelter, W. & light scattering in microbiology.
Kapurniotu, A. (1999). Conforma- Biotechnol. Appl. Biochem. 8, 489–509.
tional transitions of islet amyloid poly- 190 Walsh, D. M., Lomakin, A.,
peptide (IAPP) in amyloid formation Benedek, G. B., Condron, M. M. &
in vitro. J. Mol. Biol. 287, 781–796. Teplow, D. B. (1997). Amyloid beta-
180 Baskakov, I. V., Legname, G., protein fibrillogenesis. Detection of a
Baldwin, M. A., Prusiner, S. B. & protofibrillar intermediate. J. Biol.
Cohen, F. E. (2002). Pathway complex- Chem. 272, 22364–22372.
ity of prion protein assembly into 191 Tseng, Y., Fedorov, E., McCaffery,
amyloid. J. Biol. Chem. 277, 21140– J. M., Almo, S. C. & Wirtz, D. (2001).
21148. Micromechanics and ultrastructure of
181 Mann, C. J. & Matthews, C. R. actin filament networks crosslinked by
(1993). Structure and stability of an human fascin: a comparison with
early folding intermediate of alpha-actinin. J. Mol. Biol. 310, 351–
Escherichia coli trp aporepressor 366.
measured by far-UV stopped-flow 192 Kita, R., Takahashi, A., Kaibara, M.
circular dichroism and 8-anilino-1- & Kubota, K. (2002). Formation of
naphthalene sulfonate binding. fibrin gel in fibrinogen-thrombin
Biochemistry 32, 5282–5290. system: static and dynamic light
182 Subbarao, N. K. & MacDonald, R. C. scattering study. Biomacromolecules 3,
(1993). Experimental method to 1013–1020.
correct fluorescence intensities for the 193 Giddings, J. C., Yang, F. J. & Myers,
inner filter effect. Analyst 118, 913– M. N. (1977). Flow field-flow fractiona-
916. tion as a methodology for protein
183 Benoit, H., Freund, L. & Spach, G. separation and characterization.
(1967) in: Poly-a-amino acids (ed. Anal. Biochem. 81, 394–407.
Fasman, G. D.) Marcel Dekker Inc. 194 Wahlund, K. G. & Giddings, J. C.
New York, 105–155. (1987). Properties of an asymmetrical
References 251
7
Protein Unfolding in the Cell
7.1
Introduction
7.2
Protein Translocation Across Membranes
7.2.1
Compartmentalization and Unfolding
mately 24 Å [4–6]. The channel across the inner membrane is flexible, but its max-
imum diameter is smaller than that of the outer-membrane channel [6–8]. The
majority of mitochondrial proteins are synthesized in the cytosol as preproteins
with positively charged N-terminal targeting signals [9]. After synthesis, prepro-
teins are localized to the mitochondria.
Several lines of evidence indicate that most precursor proteins are not in their
native conformation during translocation across the mitochondrial matrix. First,
import of mouse dihydrofolate reductase (DHFR) is blocked when its unfolding is
inhibited by the tightly binding ligand methotrexate [10]. Second, studies using
Neurospora crassa mitochondria at a low temperature identified a translocation in-
termediate during the import of F1 -ATPase b-subunit and cytochrome c1 precur-
sors [11]. This intermediate appeared to span both mitochondrial membranes,
which is a conformation that is possible only when the precursor protein is at least
partially unfolded. Third, attaching oxidized BPTI (containing three intramolecular
disulfide bridges) to the C-terminus of mouse DHFR prevented the import of the
DHFR moiety completely inside the mitochondria [12]. Finally, a study using bar-
nase mutants showed that a single disulfide bridge in a precursor protein slowed
import [7]. It is now widely accepted that proteins are in a fully unfolded confor-
mation during import.
The next question is when and where preproteins are unfolded. One possibility
could be that proteins never fold before import. However, it is thought that multi-
domain proteins fold co-translationally in eukaryotes [13]. The N-terminal domain
of nascent polypeptide chains folds before the synthesis of the C-terminal domain
is complete [13, 14]. In vitro studies using three model proteins demonstrated that
pre-sequences of differing lengths do not affect the stability or folding and unfold-
256 7 Protein Unfolding in the Cell
ing kinetics of the mature proteins [15–17]. Two in vivo experiments show that
precursor proteins can be in the native conformation prior to import. Studies in
S. cerevisiae showed that a hybrid protein consisting of the amino-terminal third
of cytochrome b2 followed by the DHFR domain accumulates outside the mito-
chondrial matrix as a translocation intermediate in the presence of the substrate
analogue aminopterin [18]. This shows that the DHFR domain was completely
translated and folded before translocation across the membrane. Similarly, precur-
sors containing the tightly folded heme-binding domain could be completely im-
ported into mitochondria only when the pre-sequence could actively engage the
mitochondrial unfolding machinery [19], suggesting that an authentic mitochon-
drial protein folds in the cytosol.
A subset of precursors, such as subunits of larger complexes or integral mem-
brane proteins, is unable to fold in the cytosol and interact with chaperones [20,
21]. These precursors require ATP outside the mitochondrial matrix for import.
The requirement for cytosolic ATP can be negated by denaturing the precursors
with urea [21]. Thus, cytosolic chaperones can facilitate import of precursors that
are unable to fold in the cytosol (precursors prone to aggregation). Although cyto-
solic chaperones affect import of some proteins, it is well documented that purified
mitochondria can import chemically pure folded precursor proteins [10, 22].
7.2.2
Mitochondria Actively Unfold Precursor Proteins
Mitochondria can import folded precursor proteins much faster than the spontane-
ous unfolding measured in free solution [22, 23]. In other words, mitochondria
actively unfold proteins during import. Two models can explain the nature of the
unfolding activity during translocation into mitochondria. One possibility is that
the mitochondrial membrane surface destabilizes importing proteins. For exam-
ple, it has been shown that lipid vesicles induce the partially unfolded molten-
globule state observed in diphtheria toxin [24–28]. However, the mitochondrial
membrane surface does not appear to affect the stability of importing proteins.
This question has been studied extensively using artificial precursor proteins com-
posed of barnase. Barnase is a small ribonuclease that is bound by its stabilizing
ligand barstar. Mutations had the same effect on the stability of barnase at the im-
port site on the mitochondria and in free solution [29]. In addition, the dissociation
constant of the barnase-barstar complex at the mitochondrial surface coincides
with that in free solution [29].
It appears that mitochondria can accelerate unfolding of at least some precursor
proteins by changing their unfolding pathway (Figure 7.2, see p. 258). Experiments
that measured the effect of mutations on the unfolding rate and stability of barnase
showed that the spontaneous unfolding pathway of barnase begins with a specific
sub-domain (formed by the second and third a-helices and some loops packed against
the edge of a b-sheet) and follows with the rest of the protein (the N-terminal a-
helix and the five-stranded b-sheet located at the C-terminus). The catalyzed un-
folding pathway during mitochondrial import is different: the mitochondrial im-
7.2 Protein Translocation Across Membranes 257
port machinery begins unfolding by unraveling the N-terminal a-helix and then
processively unravels the protein to its C-terminus [15]. Thus, mitochondria unfold
this precursor protein by unraveling it from its targeting signal [15]. Additional
evidence for this model comes from the observation that some model substrates
that are stabilized against spontaneous unfolding are not stabilized during unfold-
ing by the mitochondrial translocase [15]. However, stabilizing other proteins can
completely prevent their import, and, therefore, mitochondrial unfolding machin-
ery relies on the structure of the precursor protein near the targeting signal.
7.2.3
The Protein Import Machinery of Mitochondria
Fig. 7.2. Unfolding pathways of barnase. parts of the structure shown in red unfold
Structure of barnase, color-coded according to early, whereas those shown in blue unfold
the order in which structure is lost (left) during late. Figure reproduced from Ref. [15] with
spontaneous global unfolding in vitro and permission.
(right) during import into mitochondria. The
Fig. 7.3. The mitochondrial protein import the matrix always requires an electrical
machinery. Proteins in the outer/inner potential across the inner membrane and the
membrane are called Tom/Tim, followed by the ATP-dependent action of mHsp70. mHsp70 is
number indicated in the figure. The number found bound to the import machinery through
reflects their approximate molecular weight. Tim44 and free in the matrix. Precursors begin
During import, precursor proteins first interact to interact with mHsp70 while they are still
with the Tom20 and Tom22 receptors through associated with the import channels. G, J:
their targeting sequence. The Tom70 receptor mGrpE, Mdj1, two co-chaperones of Hsp70;
binds precursors associated with cytosolic IM/OM: inner/outer membrane; m-70:
chaperones. Targeting sequences insert into mHsp70. Figure reproduced from Ref. [40] with
the Tom40 channel and pass through the permission.
Tom23 complex into the matrix. Import into
7.2 Protein Translocation Across Membranes 259
only for the insertion of the targeting sequence in to the import channel. Unfold-
ing can be induced by mHsp70 in the absence of an electrical potential. The actual
length of natural targeting sequences is uncertain. However, databases indicate
that the mean cleavage site is at amino acid position 31. This suggests that most
targeting signals are too short to interact with mHsp70 before they unfold [40].
7.2.4
Specificity of Unfolding
7.2.5
Protein Import into Other Cellular Compartments
Unfolding may also play an important role in protein import into ER and chloro-
plasts. Chloroplasts are surrounded by two membranes and are divided into two
compartments, the stroma and the thylakoids. Most chloroplast proteins are im-
ported posttranslationally from the cytosol and, presumably, some precursors will
fold before translocation. However, in the case of chloroplast protein import, it is
not clear whether unfolding is always required. For example, one study indicates
that chloroplast membranes can import small folded proteins [46]. On the other
hand, import is blocked when large proteins or protein complexes are stabilized
against unfolding [47–50]. Within the chloroplast, translocation of proteins from
stroma into thylakoids occurs through two different machineries that seem to im-
pose different steric requirements on the transporting proteins. One subset of pro-
teins uses the ATP-dependent Sec system [51], while other proteins are transported
by a mechanism that does not require ATP but is instead dependent on the pH
gradient across the thylakoidal membrane [51]. The Sec-related pathway requires
protein unfolding, whereas the DpH-dependent pathway tolerates folded proteins.
Protein translocation into the endoplasmic reticulum generally does not require
unfolding because it occurs mostly co-translationally. However, at least in S. cerevi-
siae, there are examples of posttranslational translocation, and it is possible that
some of these precursor proteins fold in the cytosol. The internal diameter of the
translocon complex is 20–40 Å at its narrowest point, and experiments suggest that
several proteins have to be in an unfolded conformation to fit through the translo-
260 7 Protein Unfolding in the Cell
cation channel [9, 52]. Together these findings suggest that protein unfolding may
play a role in the import of a subset of ER-associated proteins. The ER import ma-
chinery resembles those of mitochondria and chloroplasts in that an Hsp70 homo-
logue is located at the exit of the protein import channel [53].
Protein unfolding may also play a key role in ER retro-translocation of misfolded
or unassembled polypeptides back into the cytoplasm for degradation [54–56].
Misfolded or unassembled polypeptides inside the ER are directed to proteasome-
mediated degradation in the cytoplasm in a process called ER-associated degrada-
tion (ERAD) [54, 56]. During this process, the polypeptide chain may be translo-
cated backwards into the cytoplasm by a pulling force generated by cytoplasmic
chaperones such as CDC48/p97/VCP, by proteins involved in the ubiquitination
process, or by the proteasome itself [55, 57–59].
7.3
Protein Unfolding and Degradation by ATP-dependent Proteases
7.3.1
Structural Considerations of Unfoldases Associated With Degradation
ATP-dependent proteases share similar overall structures [60, 61, 63–67]. The 26S
eukaryotic proteasome is a 2-MDa structure composed of two structurally and
functionally separated subunits: a catalytic subunit, called the 20S core particle,
and the ATPase subunit, called the 19S regulatory particle [60, 64]. The core par-
ticle is composed of four stacked heptameric rings, two of which consist of a-
subunits and two of b-subunits. The two b-rings are stacked together at the core of
the proteasome and contain the active sites for proteolysis. One ring of a-subunits
flanks the b-rings on each side. Together, the a- and b-rings form a cylindrical
structure that is flanked on each side by the regulatory subunit. The 19S cap,
which has a molecular weight of 700 kDa, contains 18 different subunits that rec-
7.3 Protein Unfolding and Degradation by ATP-dependent Proteases 261
ognize ubiquitinated proteins and present the substrate proteins to the proteolytic
core in an ATP-dependent manner [60, 61].
Substrate proteins are targeted to the proteasome by the covalent attachment of
multiple ubiquitin chains to their surface-exposed lysine residues [68]. A cascade of
reactions, catalyzed by ubiquitin-activating enzyme (E1), ubiquitin-conjugating en-
zymes (E2), and ubiquitin ligases (E3), forms isopeptide linkages between the C-
terminus of the ubiquitin moieties and the e-amino group of lysine residues on
the acceptor protein. Successive rounds of ubiquitination result in the formation
of polyubiquitin chains, and at least four ubiquitins are required for efficient tar-
geting [69]. The polyubiquitin chains are recognized by the S5a subunit of the
19S cap of the 26S proteasome [70]. However, at least one protein (ornithine decar-
boxylase [71]) has been identified that is targeted to the proteasome without ubiq-
uitin modification. In addition, the 19S regulatory cap can interact with misfolded
or natively unfolded intermediates in an ubiquitin-independent manner [72].
Other ATP-dependent proteases, such as ClpAP and ClpXP, share structural
similarities to the 26S proteasome (Figure 7.4). ClpAP and ClpXP are elongated cy-
lindrical complexes composed of hexameric ATPase single rings (ClpA and ClpX),
juxtaposed to a proteolytic component of two stacked heptameric rings (ClpP) [60,
61, 63, 65, 66]. The active sites of proteolysis in the Clp proteases are located on the
inside of the ClpP ring. Lon and the HflB proteases, unlike other ATP-dependent
proteases, are homo-oligomeric complexes. However, they share the overall cylin-
drical structure of other proteases, with the active sites buried deep inside the cen-
tral cavity [61, 62, 67].
Substrates of prokaryotic ATP-dependent proteases are recognized through their
targeting tags, which are mostly N- or C-terminal extensions of varying length. The
sequences of these targeting signals often show some homology, but definitive con-
sensus sequences for these targeting signals are still being determined and may be
specific to the various proteases. Interestingly, there are cases where specific target-
ing signals can specify more than one ATP-dependent protease. For example, sub-
strates tagged with the ssrA peptide are targeted to ClpAP, ClpXP, and HflB [73–
75]. Yet ClpAP, ClpXP, and HflB can also possess specific substrate preferences
[62, 76, 77]. Specificity is often conferred by adaptor proteins, which bind sub-
strates and deliver them in a trans-targeting mechanism to the protease [78].
7.3.2
Unfolding Is Required for Degradation by ATP-dependent Proteases
Fig. 7.4. Structures of the proteasome and proteolysis are indicated by red dots, and the
ClpAP. Top left: Structure of the eukaryotic slice surface is shown in green. Top right:
proteasome holo-enzyme from Xenopus laevis Structure of the ClpAP holo-enzyme as
as determined by electron microscopy; the determined by electron microscopy. Bottom
core particle is shown in blue, and the ATPase right: Structure of the ClpP particle, the
caps are in pink. Bottom left: Medial sections proteolytic core of ClpAP, as determined by
of the proteasome core particles from yeast as crystallography. Figure reproduced from Ref.
determined by crystallography; active sites of [62] with permission.
7.3.3
The Role of ATP and Models of Protein Unfolding
Fig. 7.5. The process of protein unfolding and ATPase component. The unfolded polypeptide
degradation by ATP-dependent proteases. ATP chain is fed into the proteolytic chamber in a
binding and hydrolysis are required for the process called translocation. Subsequently,
assembly of a functional ATP-dependent the polypeptide reaches the active sites of
protease. Targeted substrates for degradation proteolysis, where it is rapidly degraded and
bind noncovalently to the ATPase particle. released. The assembled ATP-dependent
Repeated cycles of ATP binding and hydrolysis protease can unfold and degrade several
drive unfolding of the substrate within the substrates before disassembly.
substrate-binding site at the ends of the cylindrical protease particle to the proteo-
lytic chamber where degradation occurs. The translocation would result in a pull-
ing force on the native polypeptide, which could help to collapse the structure of
the substrate [83, 84, 85]. An alternative model proposes that unfolding is not
coupled to translocation into the proteolytic compartment [86]. Instead, unfolding
occurs on the surface of the ATPase ring, where ATP binding and hydrolysis cause
the substrate-binding domains to undergo concerted conformational changes that
exert mechanical strain on the bound protein and result in unfolding. The pro-
posed role of ATP in this model draws parallels to the role of ATP in the action of
the molecular chaperone GroEL [87–89].
7.3.4
Proteins Are Unfolded Sequentially and Processively
7.3.5
The Influence of Substrate Structure on the Degradation Process
The local structure of the protein near the degradation tag determines the ability of
the protein to be degraded [83, 84]. This is best exemplified by studies with circular
permutants of DHFR where the original N- and C-termini are connected by a short
series of glycine residues and the structure is disrupted to create new termini [93].
The resulting proteins have almost identical structure and enzymatic activities [84,
93] and differ primarily in the location of the signals. However, these circular per-
mutants showed substantially different susceptibilities towards unfolding and deg-
radation by ATP-dependent proteases [84]. In these studies, susceptibility to un-
folding correlated with the local structure adjacent to the targeting signal, not
with the stability against spontaneous global unfolding. When the degradation sig-
nal leads into a stretch of polypeptide chain that forms an a-helix or a surface loop,
the proteins were unfolded and degraded efficiently. Substrates were more difficult
to unfold and degrade when the degradation signal led into an internal b-strand
[84].
7.3.6
Unfolding by Pulling
into the proteolytic chamber, the mechanics of force exertion are still under inves-
tigation. In unfolding during both translocation and the degradation process, the
unraveling mechanism is more reminiscent of unfolding by atomic force micros-
copy than of unfolding by chemical denaturants or heat. The spontaneous unfold-
ing pathway of proteins by chaotropic agents follows a global mechanism. When
unfolded by AFM, substrate proteins are unfolded by pulling mechanically on the
polypeptide chain [95–99]. However, the processes of protein unfolding during
AFM experiments and during protein degradation and translocation are not identi-
cal [79, 100]. During AFM experiments, a mechanical force is applied continuously
from both ends of the polypeptide chain until the protein unfolds. This resembles
a stretching force. In unfolding mediated by ATP-dependent proteases and mito-
chondrial import machinery, a force would be applied by pulling the substrate
against the sterically restrictive entrance of the translocation channel. In addition,
studies on ClpXP-mediated protein unfolding and degradation suggest that the un-
folding force is applied iteratively on the substrates, possibly by tugging at them
repeatedly [83].
7.3.7
Specificity of Degradation
The fact that the susceptibility to degradation depends on the local structure near
the degradation tag may contribute to the specificity of degradation. This particular
mechanism may have several physiological consequences. First, it allows the pro-
teases to degrade specific subunits of a multi-protein complex without affecting
other components. For example, the proteasome specifically degrades the cell-cycle
inhibitor Sic1 while it is associated with the yeast cyclin-cyclin-dependent kinase
(CDK) complex to release active CDK [101]. Second, the observation that b-
structures are difficult to unfold is important in the context of diseases that are
characterized by the accumulation of large intracellular protein aggregates, such
as Parkinson’s and Huntington’s diseases [102, 103]. Protein aggregates are found
associated with ubiquitin and components of the proteasome, suggesting that the
cell tries to degrade the aggregates but is unable to do so. The aggregates associ-
ated with amyloid diseases are characterized by the accumulation of long fibers
with extensive b-sheet character [104–106]. Third, in multi-domain proteins, differ-
ences in susceptibility to unfolding of individual domains will influence the end
product of the degradation reaction and may provide a mechanism for processing
proteins by partial degradation. Experimental evidence suggests that this mecha-
nism explains the activation of NFkB, which plays a central role in the regulation
of immune and inflammatory responses in mammals [84, 107]. The p105 precur-
sor of the p50 subunit of NF-kappa B is processed by proteasome-mediated degra-
dation. The proteasome degrades the domain at the C-terminus of p105 and spares
the N-terminal p50 domain. Processing by a partial degradation mechanism may
occur elsewhere in the cell. For example, the transcription factors cubitus interrup-
tus in Drosophila [108] and Spt23 and Mga2 in yeast [109] are activated by partial
proteasome processing.
266 7 Protein Unfolding in the Cell
7.4
Conclusions
Regulated protein unfolding is a critical step in several processes in the cell, in-
cluding translocation across membranes and degradation by ATP-dependent pro-
teases.
Translocases and ATP-dependent proteases catalyze unfolding using machi-
neries driven by ATP. These machineries denature proteins and translocate their
polypeptide chains across the barrier. In both of these processes, the proteins un-
ravel sequentially and processively as the machinery pulls the polypeptide chain at
one end, resulting in a cooperative collapse of the protein. As a consequence, the
susceptibility of a protein to unfolding depends on its local structure as well as on
its overall stability.
7.5
Experimental Protocols
7.5.1
Size of Import Channels in the Outer and Inner Membranes of Mitochondria
The diameters of the import channels in the outer and inner membranes of puri-
fied yeast mitochondria are measured by using precursor proteins in which the C-
termini are cross-linked to rigid compounds of specific dimensions [6]. The effects
of these modifications on translocation across the outer and inner mitochondrial
membranes are then determined.
Radioactive precursor proteins are synthesized by in vitro transcription and
translation in a rabbit reticulocyte lysate supplemented with [ 35 S] methionine
(Promega). Ribosomes and their associated incompletely translated polypeptide
chains are removed by centrifugation at 150 000 g for 15 min. Precursor proteins
are then partially purified by precipitation with 50% (v/v) saturated ammonium
sulfate for at least 30 min on ice, pelleted by centrifugation at 20 800 g for 15
min, and resuspended in import buffer (50 mM HEPES-KOH, pH 7.4, 50 mM
KCl, 10 mM MgCl2 , 2 mM KH2 PO4 , 5 mM unlabeled methionine, 1 mg mL1
fatty acid-free BSA). Size probes (Monomaleimido-Nanogold and Monomalei-
mido-Undecagold) are then attached to barnase precursors containing a single cys-
teine residue at their C-termini. After incubation for 2 h at room temperature,
modified precursors are used directly in the import experiments. To assess the di-
ameter of the inner-membrane protein import pore, the outer membrane of mito-
chondria is ruptured by hypo-osmotic shock before the import experiments.
7.5.2
Structure of Precursor Proteins During Import into Mitochondria
7.5.3
Import of Barnase Mutants
7.5.4
Protein Degradation by ATP-dependent Proteases
7.5.5
Use of Multi-domain Substrates
7.5.6
Studies Using Circular Permutants
teins have almost identical structures and similar enzymatic activities, differing
only in the location of the N- and C-termini within their structure [93]. In the eas-
ily digested proteins, the degradation signals lead directly into a-helices or surface
loops, whereas in the structures that can be too stable to be degraded, the degrada-
tion tags are attached to b-strands. These findings implicate that the local structure
near the degradation signal strongly influences the ability of a protein to be de-
graded by an ATP-dependent protease.
References
8
Natively Disordered Proteins
8.1
Introduction
8.1.1
The Protein Structure-Function Paradigm
The structure-function paradigm states that the amino acid sequence of a protein
determines its 3-D structure and that the function requires the prior formation of
this 3-D structure. This view was deeply engrained in protein science long before
the 3-D structure of a protein was first glimpsed almost 50 years ago. In 1893 Emil
Fischer developed the ‘‘lock-and-key’’ hypothesis from his studies on different
types of similar enzymes, one of which could hydrolyze a- but not b-glycosidic
bonds and another of which could hydrolyze b- but not a-glycosidic bonds [1]. By
1930 it had become clear that globular proteins lose their native biological activity
(i.e., they become denatured) when solution conditions are altered by adding heat
or solutes. Anson and Mirsky showed in 1925 that denatured hemoglobin could be
coaxed back to its native state by changing solution conditions [2]. This reversibility
is key because it means that the native protein and the denatured protein can be
treated as separate thermodynamic states. Most importantly, this treatment leads
directly to the idea that a protein’s function is determined by the definite structure
of the native state. In short, the structure is known to exist because it is destroyed
by denaturation. Wu stated as much in 1931, but his work was probably unknown
outside China, even though his papers were published in English [3]. The West
had to wait until 1936, when Mirsky and Pauling published their review of protein
denaturation [4].
Anfinsen and colleagues used the enzyme ribonuclease to solidify these ideas.
Their work led in 1957 to the ‘‘thermodynamic hypothesis.’’ Anfinsen wrote [5],
‘‘This hypothesis states that the three-dimensional structure of a native protein in
its normal physiological milieu is the one in which the Gibbs free energy of the
whole system is lowest: that is, the native conformation is determined by the total-
ity of interatomic interactions and hence by the amino acid sequence, in a given
environment.’’ Merrifield and colleagues then performed an elegant experiment
that drove home the idea; they synthesized ribonuclease in the test tube from the
amino acid sequence [6]. Their experiments provided direct evidence that the
amino acid sequence determines all other higher-order structure, function, and
stability.
As mentioned above, the observation that denaturation is reversible allows the
application of equilibrium thermodynamics. It was soon realized that many small
globular proteins exist in only two states, the native state or the denatured state.
That is, each protein molecule is either completely in the native state or completely
in the denatured state. Such two-state behavior leads directly to expressions for the
equilibrium constant, KD , and free energy, DGD , of denaturation:
KD ¼ ½D=½N ð1Þ
DGD ¼ RT lnðKD Þ ð2Þ
where R is the gas constant, T is the absolute temperature, and ½N and ½D repre-
sent the concentrations of the native and denatured states, respectively (see chap-
ters 2 and 3 in Part I).
The definition of KD , Eq. (1), is straightforward, but quantifying KD is more dif-
ficult than defining it. The difficulty arises because KD usually cannot be quantified
at the conditions of interest, i.e., at room temperature in buffered solutions near
physiological pH. Specifically, most biophysical techniques can only sensitively
measure KD between values of about 10 and 0.1, but the overwhelmming majority
(> 99.9%) of two-state globular protein molecules are in the native state at the con-
ditions of interest. The difficulty can be overcome by extrapolation. Increasing the
temperature or adding solutes such as urea or guanidinium chloride pushes KD
into the quantifiable region. Plots of RT lnðKD Þ versus temperature or solute con-
centration are then extrapolated back to the unperturbed condition to give DGD at
the conditions of interest. Many such studies indicate that small globular proteins
have a stability of between about 1 and 10 kcal mol1 at room temperature near
neutral pH. We introduced this formal definition of stability so that later we can
discuss the idea that the higher-order structure within some intrinsically dis-
ordered proteins is simply unstable.
What exactly is higher-order structure? Pauling showed that the protein chain
was organized in definite local structures, helices and sheets [7], but it was unclear
how these structures interact to form the native state. Because the conceptual ac-
cessibility of physical representations is greater than that of thermodynamics, the
advent of X-ray crystallography strongly reinforced the sequence-structure-function
8.1 Introduction 277
paradigm. In 1960, Kendrew and Perutz used X-ray crystallography to reveal the
intricate, atomic-level structures of myoglobin [8] and hemoglobin [9], effectively
locking in the sequence-to-structure paradigm. The paradigm took on the aura of
revealed truth when Phillips solved the first structure of an enzyme, lysozyme, in
1965 [10]. The position of the bound inhibitor revealed the structure of the active
site, making it clear that the precise location of the amino acid side chains is what
facilitates catalysis.
Given all this evidence, it appeared that the case was closed: the native state of
every protein possesses a definite and stable three-dimensional structure, and this
structure is required for biological function. But even early on there were worrying
observations. Sometimes loops were missing from high-resolution structures, and
these loops were known to be required for function [11, 12]. Nuclear magnetic res-
onance spectroscopy also showed that some proteins with known biological func-
tions did not possess stable, defined structure in solution [13].
Perhaps the most important difference to bear in mind when relating the
sequence-structure-function paradigm to intrinsically disordered proteins is the
difference between a structural state and a thermodynamic state. The native state
of a globular protein is both a structural state and thermodynamic state, but
the disordered (and denatured) state is only a thermodynamic state. That is, all the
molecules in a sample of the native state of a globular protein have nearly the same
structure, and this structure is what is lost upon denaturation. On the other hand,
the denatured state consists of a broad ensemble of molecules – each having a dif-
ferent conformation. Therefore, averaged quantities have different meanings for
native and disordered states. For a native globular protein, an averaged quantity,
such as the CD signal (see Section 8.2.4), gives information about each molecule
in the sample because nearly all the molecules are in the same structural state.
For a denatured or disordered protein, an averaged quantity contains information
about the ensemble, and this information may or may not be applicable to individ-
ual molecules in the sample.
8.1.2
Natively Disordered Proteins
Many proteins carry out function by means of regions that lack specific 3-D struc-
ture, existing instead as ensembles of flexible, unorganized molecules. In some
cases the proteins are flexible ensembles along their entire lengths, while in other
cases only localized regions lack organized structure. Still other proteins contain
regions of disorder without ascribed functions, but functions might be associated
with these regions at a later date. Whether the lack of specific 3-D structure occurs
wholly or in part, such proteins do not fit the standard paradigm that 3-D structure
is a prerequisite to function.
Various terms have been used to describe proteins or their regions that fail to
form specific 3-D structure, including: flexible [14], mobile [15], partially folded
[16], rheomorphic [400], natively denatured [17], natively unfolded [18], intrinsi-
cally unstructured [19], and intrinsically disordered [20].
None of these terms or combinations is completely appropriate. ‘‘Flexible,’’ ‘‘mo-
278 8 Natively Disordered Proteins
bile,’’ and ‘‘partially folded’’ have the longest histories and most extensive use;
however, all three of these terms are used in a variety of ways, including many
that are not associated with proteins that exist as structural ensembles under ap-
parently native conditions. For example, ordered regions with high B factors are
often called flexible or mobile. ‘‘Partially folded’’ is often used to describe transient
intermediates involved in protein folding. With regard to the term ‘‘natively,’’ it is
difficult to know whether a protein is in its native state. Even when under appar-
ently physiological conditions, a protein might fail to acquire a specific 3-D struc-
ture due to the absence of a critical ligand or because the crowded conditions
inside the cell are needed to promote folding. Because of such uncertainties, ‘‘in-
trinsically’’ is often chosen over ‘‘natively.’’ Unfolded and denatured are often
used interchangeably, so the oxymoron ‘‘natively denatured’’ has a certain appeal
but has not gained significant usage. ‘‘Unfolded’’ and ‘‘unstructured’’ both imply
lack of backbone organization, but natively disordered proteins often have regions
of secondary structure, sometimes transient and sometimes persistent. There are
even examples of apparently native proteins with functional regions that resemble
molten globules [21]; the molten globule contains persistent secondary structure
but lacks specific tertiary structures, having instead regions of non-rigid side chain
packing that leads to mobile secondary structure units [22, 23, 32] (see chapters 23
and 24 in Part I). Rheomorphic, which means ‘‘flowing structure’’ [400], provides
an interesting alternative to random coil, but again would not indicate native mol-
ten globules. Since ‘‘disordered’’ encompasses both extended and molten globular
forms, this descriptor has some advantage, but the widespread use of ‘‘disorder’’ in
association with human diseases complicates computer searches and gives a nega-
tive impression.
Here we use ‘‘natively disordered,’’ which has been infrequently used if at all,
mainly to distinguish this work from previous manuscripts on this topic. Develop-
ing a standardized vocabulary in this field would be of great benefit. We propose
‘‘collapsed disorder’’ for proteins and domains that exist under physiological con-
ditions primarily as molten globules and ‘‘extended disorder’’ for proteins and re-
gions that exist under physiological conditions primarily as random coil.
A significant body of work suggests that the unfolded state is not a true random
coil but instead possesses substantial amounts of an extended form that resembles
the polyproline II helix [24–26] as well as other local conformations that resemble
the native state (see chapter 20 in Part I). For this reason, extended disorder may be
a preferable term to random coil, but the latter term continues to have widespread
usage; therefore, for convenience, we will continue to use this term here – with the
understanding that by the term random coil we do not mean the true random coil
defined by the polymer chemist.
It is useful to introduce the topic of natively disordered proteins with a specific,
very clear example. Calcineurin (Figure 8.1A) makes a persuasive case for the exis-
tence and importance of native disorder [27–29]. This protein contains a catalytic A
subunit and a B subunit with 35% sequence identity to calmodulin. The A subunit
is a serine-threonine phosphatase that becomes activated upon association with the
Ca 2þ -calmodulin complex. Thus, calcineurin, which is widespread among the eu-
karyotes, connects the very important signaling systems based on Ca 2þ levels and
8.1 Introduction 279
Fig. 8.1. (A) 3-D structure of calcineurin, is located within the disordered region.
showing the A subunit (yellow), the B subunit (B) Side and top views of calmodulin (blue)
(blue), the autoinhibitory peptide (green), and binding a target helix (yellow). Note that the
the location of a 95-residue disordered region calmodulin molecule surrounds the target helix
(red). The calmodulin-binding site (red helix) when bound.
8.1.3
A New Protein Structure-Function Paradigm
The standard view is that amino acid sequence codes for 3-D structure and that 3-D
structure is a necessary prerequisite for protein function. In contrast, not only cal-
cineurin but also many additional proteins, as we have shown elsewhere [30, 31],
are natively disordered or contain natively disordered regions. For such proteins or
protein regions, lack of specific 3-D structures contributes importantly to their
functions. Here we explore the distinctions between the standard and our alterna-
tive view of protein structure-function relationships.
The standard structure-function paradigm as discussed above arose initially from
the study of enzymes. The original lock-and-key proposal [1] was based on distinc-
tive substrate recognition by a pair of enzymes. In the approximately 110 years
since this initial proposal, studies of enzyme catalysis have continuously reinforced
this view.
While more than 20 proposals have been made to explain enzyme catalysis [33],
it has become accepted that the most profitable way to think about the problem is
in terms of transition-state stabilization [34]. That is, as first suggested by Polanyi
[35] and later by Pauling [36], for an enzyme to carry out catalysis, it must bind
more tightly to the transition state than to the ground state. This tighter binding
lowers the transition-state energy and accelerates the reaction rate [34–36]. Transi-
tion states are derived from ground states by very slight movements of atoms (typ-
ically half the length or so of a chemical bond). Attempts to understand the mech-
anistic details of how an enzyme can bind more tightly to the transition state have
occupied a number of protein chemists and theoreticians, and this is not com-
pletely clear in every aspect even today [37], with some researchers arguing that
electrostatic contributions provide the dominant effects [38] (see chapter 7 in
Part I) and others emphasizing the importance of entropic effects [39] or other
contributions [40]. Despite these uncertainties, there seems to be universal agree-
ment that tighter binding to the transition state depends on an accurate prior posi-
tioning of the key residues in the enzyme. This prior positioning requires a well-
organized protein 3-D structure. Thus, in short, the evolution of ordered structure
in proteins was likely reinforced by or resulted directly from the importance of en-
zyme catalysis.
For not only the calcineurin example given above but also for most of the na-
tively disordered proteins we have studied [30, 31], the function of the disorder is
for signaling, regulation, or control. Compared to order, disorder has several clear
advantages for such functions. When disordered regions bind to signaling part-
ners, the free energy required to bring about the disorder-to-order transition takes
away from the interfacial, contact free energy, with the net result that a highly spe-
cific interaction can be combined with a low net free energy of association [20, 41,
77]. High specificity coupled with low affinity seems to be a useful pair of proper-
ties for a signaling interaction so that the signaling interaction is reversible. It
would appear to be more difficult to evolve a highly specific yet weak interaction
between two ordered structures. In addition, a disordered protein can readily bind
to multiple partners by changing shape to associate with different targets [20, 42,
8.2 Methods Used to Characterize Natively Disordered Proteins 281
43]. Multiple interactions are now being commonly documented, and proteins hav-
ing 20 or more partners are being described. In protein interaction or signaling
networks, proteins with multiple partners are often called hubs. We previously sug-
gested that the ability to interact with multiple partners may depend on regions of
native disorder [44], but so far we have investigated only a limited number of ex-
amples. Whether hub proteins utilize regions of native disorder to enable their
binding diversity is an important question for proteins of this class.
Given the above information, we propose a new protein structure-function
paradigm. Simply put, we propose a two-pathway paradigm, with sequence 3 3-D
structure 3 function for catalysis and sequence 3 native disorder 3 function for
signaling and regulation.
8.2
Methods Used to Characterize Natively Disordered Proteins
8.2.1
NMR Spectroscopy
ture of the structure and dynamics of intrinsically disordered proteins. This objec-
tive is most easily accomplished by unifying the different types of NMR data col-
lected on intrinsically disordered proteins while avoiding the pitfall described in
the following quotation from a classic paper on NMR and protein disorder [56]:
‘‘[U]ntil the amount of structural information provided by NMR methods increases
by several orders of magnitude, descriptions of non-native structure will probably
consist of simple lists of estimates of fractional population of secondary-structure
segments and of side-chain interactions.’’
It is interesting that the author of this quote has provided one of the most com-
plete descriptions of the structure and dynamics of an unfolded protein, a frag-
ment of staphylococcal nuclease referred to as D131D [56–64].
The most complete description of the structure and dynamics of intrinsically
disordered proteins would include the population and structure of the different
members of the rapidly interconverting ensemble along with the rate of intercon-
version between unique structures. Some of these parameters can be estimated
from NMR chemical shifts and resonance line shape measurements [48, 50, 51,
65–67]. Because liquid-state NMR detects the ensemble average, chemical shifts
can be treated as macroscopic equilibrium constants for secondary structure for-
mation [50, 51]. This information can be combined with measurements of hy-
drodynamic radii, using pulsed field gradient methods, to place limits on the
conformational space that is being explored, which ultimately facilitates structure
calculation of ensemble members [68–70]. Of course, it remains unclear how
many discrete states are represented in the resonance line shape measurements,
which presents the fundamental limitation of current NMR practice and theory to
provide a more detailed description of the ensemble members. Many consider this
problem insoluble, given the large number of possible conformations, even for a
small polypeptide. However, there is ample evidence that intrinsically disordered
proteins do not explore all of these conformations. Further, several systems have
been characterized where resonance line shape measurements were deconvoluted
to provide information about the number of ensemble members or to characterize
the interconversion rate between conformations [71–73, 397].
stant for folding from the unfolded state [50]. The extension of this relationship to
intrinsically disordered proteins makes the same two-state assumption. In this
case, the fraction of helix or strand present can be determined by comparing the
chemical shift value observed in an intrinsically disordered protein to the value ex-
pected in a stably folded protein [50, 51]. These data combined with knowledge of
the Stokes radius can be used to restrict the available conformations in a structure
calculation. This assumption is in agreement with a recent analysis of the effective
hydrodynamic radius of protein molecules in a variety of conformationnal states
[76]. In fact, based on the analysis of 180 proteins in different conformational
states, it has been shown that the overall protein dimension could be predicted
based on the chain length, i.e., the protein molecular weight, with an accuracy of
10%. Furthermore, it has been emphasized that the incorporation of biophysical
constraints, which can be rationalized based on conventional biophysical measure-
ments, might lead to considerable improvement in structure simulation proce-
dures. Clearly, the size and shape of the bounding volume used for structure
simulations play a crucial role in determining the efficiency and accuracy of any
algorithm. The incorporation of a size/shape constraint derived from experimental
data might lead to considerable improvement of simulation procedures [76].
In a study of an intrinsically disordered negative regulator of flagellar synthesis,
FlgM, chemical shift deviations from random coil values were observed for 13 Ca ,
13
Cb , and 13 C(O) nuclei [77]. Similar deviations were not observed for 1 Ha nuclei,
and this may be a general property of intrinsically disordered proteins. For FlgM,
the chemical shift deviations indicated the presence of helical structure in the
C-terminal half of the protein and a more extended flexible structure in the N-
terminal half. Two regions with significant helical structure were identified, con-
taining residues 60–73 and 83–90. Additional NMR and genetic studies demon-
strated that these two helical regions were necessary for interactions with the
sigma factor. The chemical shift deviations observed for FlgM exhibited a charac-
teristic variation where the central portion of the helix had a larger helical chem-
ical shift difference than the edges. To test whether the helical structure based on
chemical shifts represented an ensemble of nonrandom conformations, the 13 Ca
chemical shifts were measured in the presence of increasing concentrations of
the chemical denaturant urea. As the concentration of urea was increased, the
13
Ca chemical shifts moved toward the random coil value in a manner characteris-
tic of a non-cooperative unfolding transition.
In a study of the basic leucine zipper transcription factor GCN4, transient helical
structure was observed in the basic region based on chemical shift deviations
from random coil values for 13 Ca , 13 Cb , and 13 C(O) nuclei [78]. The tempera-
ture dependence of the three chemical shifts was also monitored. For the folded
leucine zipper region, a small dependence between chemical shift and temperature
was observed. Conversely, the unfolded basic region showed large changes in 13 Ca ,
13
Cb , and 13 C(O) chemical shifts when the temperature was changed [78]. The dy-
namic behavior of the basic region and rapidly interconverting structures are re-
sponsible for the temperature dependence of the chemical shifts [45]. In contrast,
the temperature dependence of the chemical shifts in the folded leucine zipper
284 8 Natively Disordered Proteins
was small, and this behavior is generally observed for both folded and random coil
regions.
presence of intrinsic protein disorder that it should be incorporated into all screen-
ing protocols developed for structural genomics. It is generally a waste of time and
resources to directly pursue crystallization of proteins that are intrinsically dis-
ordered. Crystallization meets with limited success for proteins containing in-
trinsically disordered regions. This is because the presence of structural inter-
conversions for intrinsically disordered proteins prohibits the formation of an
isomorphous lattice, unless there are one or a few stable, low-energy conforma-
tions that can be populated. During screening, proteins determined to have intrin-
sically disordered regions can be marked for further NMR analysis. On a technical
note, a range of relaxation and saturation delays should be used when measuring
the NH-NOE values in intrinsically disordered proteins [85].
Some of the most valuable contributions to understanding the structure and dy-
namics of intrinsically disordered proteins have come from using R1 , R 2 , and NH-
NOE data to model molecular motion or to solve the spectral density function. The
two most successful models for relaxation data analysis are the so called ‘‘model-
free’’ approach of Lipari and Szabo [86, 87] and reduced spectral density mapping
introduced by Peng and Wagner [80, 88].
( )
2 S 2 tm ð1 þ S 2 Þt
JðoÞ ¼ þ ð3Þ
5 ð1 þ ðotm Þ 2 Þ ð1 þ ðotÞ 2 Þ
where 1=t ¼ 1=tm þ 1=te and S 2 is the square of the generalized order parameter
describing the amplitude of internal motions. The overall correlation time, tm , is
determined based on an assumption of isotropic Brownian motion. For a folded
protein of known structure, the diffusion tensor can be incorporated into the
model to compensate for anisotropic motions [90–92]. It is unclear how valid the
assumption of isotropic rotation is for intrinsically disordered proteins. It is as-
sumed that anisotropic structures will populate the conformational ensemble of
an intrinsically disordered protein. This is in the absence of little direct evidence
on the subject. It has been suggested that the effects of rapid interconversion be-
tween more isotropic structures will tend to smooth out static anisotropy and result
in an isotropic average conformation [57].
Regardless of the analytical limitations, the model-free analysis of several
natively disordered proteins has provided to be a useful qualitative picture of the
heterogeneity in rotational diffusion that is observed for these systems [58, 77,
93–95]. Some general trends are observed: (1) correlation times are greater than
286 8 Natively Disordered Proteins
those calculated based on polymer theory, and (2) transient secondary structure
tends to induce rotational correlations.
Changes in S 2 have been used to estimate changes in conformational entropy
due to changes in ns-ps bond vector motions during protein folding and for intrin-
sically disordered protein folding that is coupled to binding [45, 63, 77, 96, 97]. In
aqueous buffer and near neutral pH, the N-terminal SH3 domain of the Drosophila
signal transduction protein Drk is in equilibrium between a folded, ordered struc-
ture and an unfolded, disordered ensemble. The unfolded ensemble is stabilized
by 2 M guanidine hydrochloride and the folded structure is stabilized by 0.4 M so-
dium sulfate. Order parameters were determined for both the unfolded and folded
Drk SH3 domains. The unfolded ensemble had an average S 2 value of 0:41 G 0:10
and the folded structure had an average S 2 value of 0:84 G 0:05. Based on the dif-
ference in S 2 between the unfolded and folded structures, the average conforma-
tional entropy change per residue was estimated to be 12 J (molK)1 . This approach
does not address additional entropy contributions from slower motional processes
or the release of solvent. Interestingly, the estimate of 12 J (molK)1 is similar to
the average total conformational entropy change per residue estimated from other
techniques (@14 J (molK)1 [96]. Another study has also suggested that changes
in order parameters provide a reliable estimate of the total conformational entropy
changes that occur during protein folding [63].
8.2.1.5 Using Reduced Spectral Density Mapping to Assess the Amplitude and
Frequencies of Intramolecular Motion
The Stokes-Einstein equation defines a correlation time for rotational diffusion of a
spherical particle:
hVsph
tc ¼ ð4Þ
TkB
In Eq. (4), this rotational correlation time, tc , depends directly on the volume of
the sphere ðVsph Þ and the solution viscosity ðhÞ and inversely on the temperature
ðTÞ. The spectral density function, JðoÞ, describes the frequency spectrum of rota-
tional motions of the NaH bond vector relative to the external magnetic field and is
derived from the Fourier transform of the spherical harmonics describing the rota-
tional motions [80]. For isotropic Brownian motion, JðoÞ is related to tc in a
frequency-dependent manner:
2 tc
JðoÞ ¼ ð5Þ
5 1 þ o 2 tc2
ond to millisecond range contributes positively to Jð0Þ but that this effect can be
attenuated by measuring R 2 under spin-lock conditions [80].
Reduced spectral density mapping is a robust approach for analyzing intrinsic
disorder because it does not depend on having a model of the molecular motions
under investigation. Several groups have characterized intrinsic protein disorder
using reduced spectral density mapping, providing new insights into protein dy-
namics [77, 78, 80, 94, 98–101]. In one of these studies, reduced spectral density
mapping was used to help demonstrate that the intrinsically disordered anti-sigma
factor, FlgM, contained two disordered domains [77]. One domain, representing
the N-terminal half of the protein, was characterized by fast internal motions and
small Jð0Þ values. The second domain, representing the C-terminal half of the
protein, was characterized by larger Jð0Þ values, representing correlated rotational
motions induced by transient helical structure. NMR was also used to help demon-
strate that the C-terminal half of FlgM contained the sigma factor–binding do-
main, and it was proposed that the transient helical structure was stabilized by
binding to the sigma factor. However, more recent NMR studies of FlgM showed
that this structure appears to be stabilized in the cell due to molecular crowding,
which would reduce the role of conformational entropy on the thermodynamics
of the FlgM/sigma factor interaction [102].
In another study, a temperature-dependent analysis of the reduced spectral den-
sity map was performed on the GCN4 bZip DNA-binding domain [78]. In the ab-
sence of DNA, GCN4 exists as a dimer formed through a coiled-coil C-terminal
domain and a disordered N-terminal DNA-binding domain. This disordered DNA-
binding domain becomes structured when DNA is added [54]. In the GCN4 study,
the reduced spectral density map was evaluated at three temperatures; 290 K,
300 K, and 310 K. Jð0Þ values increased in a manner consistent with changes in
solvent viscosity induced by increasing temperature. When Jð0Þ was normalized
for changes in solvent viscosity, values for the C-terminal dimerization domain
and the intrinsically disordered N-terminal DNA-binding domain were identical
within experimental error. This behavior suggested that the correlated rotational
motion of the folded leucine zipper had the dominant influence on the solution-
state backbone dynamics of the basic leucine zipper of GCN4. A recently com-
pleted study monitored the temperature dependence of the reduced spectral den-
sity map for a partially folded fragment of thioredoxin [401]. Unlike GCN4, the
thioredoxin fragment does not contain a stably folded domain. In this study, nor-
malizing Jð0Þ for changes in solvent viscosity induced a trend in the data, with
Jð0Þ increasing with increasing temperature. In the absence of chemical exchange,
these data suggest that we are monitoring an increase in the hydrodynamic volume
of the fragment (G. W. Daughdrill, unpublished data).
jority of D131D residues, their experimental data was best described by a modified
‘‘model-free’’ formalism that included contributions from internal motions on in-
termediate and fast timescales and slow overall tumbling. They observed that the
generalized order parameter S 2 correlated with sequence hydrophobicity and the
fractional populations of three alpha helices in the protein. In a more recent study,
residual dipolar couplings were measured for native staphylococcal nuclease and
D131D [60–63]. Native-like dipolar couplings were observed in D131D, even in
high concentrations of denaturant.
An extensive NMR dataset was used to identify an ensemble of three-
dimensional structures for the N-terminal SH3 domain of the Drosophila signal
transduction protein Drk, with properly assigned population weights [103, 104].
This was accomplished by calculating multiple unfolding trajectories of the protein
using the solution structure of the folded state as a starting point. Of course this
approach is limited to proteins that have a compact rigid conformation. However,
the ability to integrate multiple types of data describing the structural and dy-
namic properties of disordered proteins is pertinent to this discussion. Population
weights of the structures calculated from the unfolding trajectories were assigned
by optimizing their fit to experimental data based on minimizing pseudo energy
terms defined for each type of experimental constraint. This work marks the first
time that NOE, J-coupling, chemical shifts, translational diffusion coefficients, and
tryptophan solvent-accessible surface area data were used in combination to esti-
mate ensemble members. As seen with many other studies of the structure and
dynamics of intrinsically disordered proteins, the unfolded ensemble for this do-
main was significantly more compact than a theoretical random coil (see chapter
20 in Part I).
8.2.2
X-ray Crystallography
Wholly disordered proteins would not be expected to crystallize and thus could not
be studied by X-ray crystallography. On the other hand, proteins consisting of both
ordered and disordered regions can form crystals, with the ordered parts forming
the crystal and the disordered regions occupying spaces between the ordered parts.
Regions of disorder vary in location from one molecule to the next and therefore
fail to scatter X-rays coherently. The lack of coherent scattering leads to missing
electron density.
A given protein crystal is made up of identical repeating unit cells, where each
unit cell contains one to several protein molecules depending on the symmetry of
the crystal. Each atom, j, in the protein occupies a particular position, xj; yj, and zj.
It is convenient to express the positions as dimensionless coordinates that are frac-
tions of the lengths of the unit cell, yielding Xj; Yj, and Zj as the indicators of the
position of atom j. The result of an X-ray diffraction experiment is a 3-D grid of
spots that are indexed by three integers, h; k, and l. The positions of the spots are
determined by the crystal lattice. The intensities of the spots are determined by
both the symmetries in the crystal and the positions of the atoms in the molecule.
Specifically, each spot has an intensity that is the square of the magnitude of the
8.2 Methods Used to Characterize Natively Disordered Proteins 289
X
atoms
Fðh; k; lÞ ¼ fð jÞ exp½2p iðhXj þ kYj þ lZj Þ ð6Þ
j¼1
where h; k, and l are the indices of the spots in the diffraction pattern, fðjÞ is the
scattering power of atom j (dependent on the square of the number of electrons
in the atom), i is the square root of 1, and Xj ; Yj , and Zj are the coordinates of
atom j given as fractions of the unit cell dimensions as mentioned above.
Each Fðh; k; lÞ has both a magnitude and a phase. The magnitude of Fðh; k; lÞ is de-
termined by the square root of the intensity of each spot in the diffraction pattern.
For wavelets from two different spots, the phase is the shift in the peak values and
is 0 degrees for two wavelets that are exactly in phase (constructive interference)
and 180 degrees for two wavelets that are totally out of phase (which leads to de-
structive interference). The phase for two arbitrary wavelets can be found to be
any value between 0 and 360 degrees. Thus, given a structure, both the magnitude
and phase of each Fðh; k; lÞ can be calculated by carrying out the summation in Eq.
(6) over all of the atoms in the structure. Taking the Fourier transform of Eq. (6)
gives:
1 XXX
rðx; y; zÞ ¼ Fðh; k; lÞ exp½2p iðhx þ ky þ lzÞ ð7Þ
V h k l
where rðx; y; zÞ is the electron density of the protein at x; y, and z and V is the vol-
ume of the unit cell (the other terms have been defined above).
Thus, to determine a structural model of the given protein, one merely has to
carry out the summation of Eq. (7) and then fit the resulting electron density map
with a set of atoms that correspond to the connected set of residues in the struc-
ture. To carry out this triple-vector sum, it is necessary to know both the magnitude
and the phase of each Fðh; k; lÞ . While the magnitude is obtainable simply as the
square root of the intensity of the spot, the phase information is lost during data
collection because the time of arrival of the peak of each wavelet relative to those
of the others cannot be recorded with current technologies. The loss of this critical
information is commonly called the phase problem.
Three main methods have been developed to find the missing phase information
for each structure factor, Fðh; k; lÞ . Each of these methods has advantages and dis-
advantages.
The earliest successful approach for proteins was to add heavy atoms. Since scat-
tering power depends on the number of electrons squared, even a small number
of heavy atoms can perturb the diffraction values enough to allow the determina-
tion of the phase values. The mathematical details are fairly involved, but for this
approach to work, the positions of the heavy atoms must be determined by com-
paring the phase intensities with and without the heavy atoms. In addition, the ad-
dition of the heavy atoms cannot significantly perturb the structure of the protein.
With regard to this second point, the protein structure with and without the heavy
atom must be isomorphous. Thus, this approach is called isomorphous replace-
290 8 Natively Disordered Proteins
ment, and at least two such heavy atom replacements must be made to determine
each phase value.
A second approach is to use multiple X-ray wavelengths that traverse the absorp-
tion edge of a selected heavy atom. The change in scattering over the absorption
edge becomes substantial, which enables the phases to be determined. This
approach is called multi-wavelength anomalous dispersion, or MAD. For protein
structure determination by this approach, it is common to introduce selenium in
the form of selenomethionine. This is a convenient atom that has an absorption
edge at an appropriate wavelength value, and often (but not always) substitution
of methionine with selenomethionine does not significantly affect the structure or
activity of the protein of interest.
Finally, if the structure of a closely related protein is already known, it is often
possible to use the phases from the related protein with the intensities from the
protein of interest to generate the initial model structure. This approach is called
molecular replacement. Often the homologous structure is not similar enough, so
the phases are too inaccurate to give a reasonable starting structure. A second ma-
jor difficulty is determining the correct orientation of the known structure relative
to that of the unknown so that the phases can be correctly associated with the in-
tensities. Since a very large number of possible orientations exist, a complex search
of the possibilities needs to be developed. Evolutionary algorithms have recently
been found to be useful for this task [105].
Structure determination by X-ray crystallography typically involves not only
straightforward calculation and equation solving (as indicated above) but also sig-
nificant modeling and simulation. Once the phase problem has been solved by ex-
periment, an electron density map is generated by the calculations described in Eq.
(7). However, there are important technical difficulties in solving the phases, so the
phase values usually contain large errors and the resulting electron density map
contains many mistakes. To give an example of the difficulty, it is unclear whether
the heavy atom derivatives are truly isomorphous; even small protein movements
upon heavy atom binding can lead to significant errors in the estimation of the
phases. Modeling is then used to fit the amino acid sequence into the error-
containing electron density map. The structural model that emerges is typically ad-
justed by dynamics and simulation to improve bond lengths, bond angles, and
inter-atom contacts. The model is then used to calculate improved phases and the
whole process is reiterated until the structure no longer changes in successive
cycles.
In the overall process of protein crystallography, there are no clear-cut rules for
dealing with regions of low intensity or missing electron density. Some crystallo-
graphers are more aggressive than others in attempting to fit (or model) such
low-density or missing regions. This variability means that missing coordinates in
the Protein Data Bank (PDB) have not been assigned by uniform standards, which
in turn leads to variability in the assignment of disorder. All of this is compounded
by the imperfections (packing defects) in real crystals, which can additionally con-
tribute to the absence of electron density.
Crystallographers have classified regions of missing electron density as being
8.2 Methods Used to Characterize Natively Disordered Proteins 291
either static or dynamic [106, 107]. Static disorder is trapped into different confor-
mations by the crystallization process, while dynamic disorder is mobile. A dynam-
ically disordered region could potentially freeze into a single preferred position
upon cooling, while static disorder would be fixed regardless of temperature. By
collecting data and determining structures at a variety of temperatures, dynamic
disorder sometimes becomes frozen and thus distinguished from static disorder
[108].
From our point of view, more important than static or dynamic is whether the
region of missing electron density has a single set of coordinates along the back-
bone or whether the region exists as an ensemble of structures. A missing region
with one set of coordinates could be trapped in different positions by the crystal
lattice (statically disordered) or could be moving as a rigid body (dynamically dis-
order). In either case, the wobbly domain [20] is not an ensemble of structures,
i.e., is not natively disordered, but rather is an ordered region that adopts different
positions due to a flexible hinge.
Given the difficulties in ascribing disorder to regions of missing electron density
or to missing coordinates in the PDB, it is useful to use a second method, such as
NMR or protease sensitivity, to confirm that a missing region is due to intrinsic
disorder rather than some other cause. While such confirmation has been carried
out for a substantial number of proteins [20], the importance of disorder has not
been generally recognized, so such confirmation is not routine.
Despite the uncertainties described above, missing electron density provides a
useful sampling of native disorder in proteins. To estimate the frequency of such
regions, a representative set of proteins was studied for residues with missing
coordinates [109]. The representative set, called PDB_Select_25, was constructed
by first grouping the PDB into subsets of proteins with greater than 25% se-
quence identity and then choosing the highest-quality structure from each subset
[110]. Out of 1223 chains with 239 527 residues, only 391 chains, or 32%, dis-
played no residues with missing backbone atoms (Table 8.1). Thus, 68% of the
non-redundant proteins contained some residues that lacked electron density. The
832 chains with missing electron density contained 1168 distinct regions of dis-
order, corresponding to @1.4 such regions/chain. These 1168 disordered regions
contained 12 138 disordered residues, or @10 residues/disordered region on aver-
age. While most of the disordered regions are short, a few are quite long: 68 of
the disordered regions are greater than 30 residues in length. Overall, the residues
with missing coordinates are about 5% of the total.
A substantial fraction of the disorder-containing proteins in PDB are fragments
rather than whole proteins. Since disordered regions tend to inhibit crystallization,
they are often separated from ordered domains by genetic engineering or protease
digestion prior to crystallization attempts. In either case, longer regions of disorder
become shortened, presumably leading to improved probability of obtaining pro-
tein crystals. Given that wholly disordered proteins do not crystallize, and given
that the proteins in PDB often contain truncated disordered regions or are ordered
fragments that have been separated from flanking regions of disorder, it is clear
that the PDB substantially underrepresents the amount of protein disorder in
nature.
To further understand the distribution of order and disorder in crystallized pro-
teins, we carried out a comparison between the PDB and the Swiss-Prot databases.
Swiss-Prot provides an easy mechanism to extract information about various pro-
teins [111], so this comparison allows a convenient means to study disorder across
the different kingdoms. The overall results of this comparison are given in Table
8.2. Of 4175 exact sequence matches between proteins in PDB_Select_25 and
Swiss-Prot, 2258 were from eukaryotes, 1490 were from bacteria, 170 were from
archaea, and 257 were from viruses. Proteins with no disorder or only short disor-
der were estimated by considering proteins for which at least 95% of the primary
structure was represented as observed coordinates in the PDB structures. The
number and percentage of proteins in each set for these mostly ordered proteins
were 428 (19%) for eukaryotes, 594 (40%) for bacteria, 82 (48%) for archaea, and
42 (16%) for viruses. Proteins with substantial regions of disorder were estimated
by considering proteins for which the crystal structure contained less than half of
the entire sequence. The number and percentage of proteins in each set for these
proteins likely to contain substantial amounts of disorder were 713 (32%) for eu-
Tab. 8.2. Comparison of disorder between the PDB_Select_25 and Swiss-Prot databases.
karyotes, 97 (6.5%) for bacteria, 4 (2.3%) for archaea, and 123 (48%) for viruses.
These data suggest that both eukaryotes and viruses are likely to have proteins
with large regions of disorder flanking fragmentary regions of ordered, crystalliz-
able domains.
8.2.3
Small Angle X-ray Diffraction and Hydrodynamic Measurements
Both small-angle X-ray scattering (SAXS) and hydrodynamic methods such as gel-
exclusion chromatography or dynamic light scattering (see chapter 19 in Part I)
have been used to estimate the sizes of protein molecules in solution. Analytical
ultracentrifugation is another method for determining the molecular weight and
the hydrodynamic and thermodynamic properties of a protein [112]. The observed
size of a given protein can then be compared with the size of a globular protein of
the same mass. As expected, by these methods molten globules have overall sizes
similar to those of the globular proteins of the same mass, whereas random coil
forms are considerably extended and thus have a significantly larger size than their
globular counterparts. Natively disordered proteins and globular proteins can be
compared using several features of these techniques.
With regard to SAXS, one approach is to plot the normalized IðSÞS 2 versus S,
where I is the scattering intensity at a given scattering angle and S is the given
scattering angle. This method emphasizes changes in the signal at higher scatter-
ing angles, which in turn strongly depends on the dimensions of the scattering
molecule. The resulting graph, called a Kratky plot [113–115], readily distinguishes
random coil proteins from globular structures. That is, globular proteins give
parabolas, with scattering intensity increasing and then dropping sharply due to
the reduced scattering intensity at large angles. On the other hand, random coil
forms give monotonically increasing curves. Importantly, the natively disordered
proteins with extended disorder are characterized by low (coil-like) intramolecular
packing density, reflected by the absence of a maximum on their Kratky plots
[116–122]. This statement is illustrated by Figure 8.2, which compares the Kratky
plots of five natively disordered proteins with those of two rigid globular proteins.
One can see that the Kratky plots of natively disordered proteins do not exhibit
maxima. Maxima on Kratky plots are typical of the folded conformations of globu-
lar proteins such as staphylococcal nuclease. In other words, the absence of a max-
imum indicates the lack of a tightly packed core under physiological conditions in
vitro [123].
A second SAXS approach is to compare the radius of gyration, Rg , with that from
a globular protein of the same size. The Rg value is estimated from the Guinier ap-
proximation [124]:
!
S 2 Rg2
IðSÞ ¼ Ið0Þ exp ð8Þ
3
where IðSÞ is the scattering intensity at angle S. The value of Rg can then be deter-
294 8 Natively Disordered Proteins
mined from plots of ln½IðSÞ versus S 2 . As tabulated by Millet, Doniach, and Plaxco
[125], the radii of gyration of proteins unfolded by low pH, methanol, urea, or gua-
nidinium chloride are typically 1.5–2.5 times larger than the radii of gyration of the
same proteins in their native globular states. Note, however, that denaturation does
not always result in complete conversion to extended coil formation: some struc-
ture may persist [126, 127].
SAXS can also be used to estimate the maximum dimension of the protein. The
distance distribution function PðrÞ can be estimated by Fourier inversion of the
scattering intensity, IðSÞ [128, 129], where PðrÞ is the probability of finding a di-
mension of length r. A plot of PðrÞ versus r yields the maximum dimension in
the limit as PðrÞ goes to zero. The observed maximal dimension can then be com-
pared with those observed for globular and random coil forms of various proteins
[130, 131].
A fourth method is to compare hydrodynamic volumes. Globular proteins differ
significantly from fully or partly unstructured proteins in their hydrodynamic
properties. For both native, globular proteins and fully denatured proteins, empiri-
cal relationships have been determined between the Stoke’s radius, Rs, and the
number of residues in the chain [68, 132, 133]. These relationships in turn provide
estimates of the overall hydrodynamic sizes compared to the respective controls.
8.2 Methods Used to Characterize Natively Disordered Proteins 295
Two common methods for estimating the Stoke’s radius are gel-exclusion chro-
matography and dynamic light scattering. In the former, the mobility of the pro-
tein of interest is compared to the mobilities of a collection of protein standards
[132, 133]. In the latter, the translational diffusion coefficient is estimated and the
Stoke’s radius is calculated from the Stoke’s-Einstein equation.
For a given Stoke’s radius, Rs, the hydrodynamic volume is given by:
4
Vh ¼ pðRsÞ 3 ð9Þ
3
where N, MG, PMG, and U-GdmCl correspond to the native, molten globule, pre-
molten globule, and guanidinium chloride-unfolded globular proteins, respectively.
For the non-molten globular natively disordered proteins, their logðRS Þ versus
logðMÞ dependence may be divided in two groups: natively disordered proteins be-
having as random coils in poor solvent (denoted as NU-coil in Eq. (14)) and essen-
tially more compact proteins, which are similar to pre-molten globules with respect
to their hydrodynamic characteristics (denoted as NU-PMG in Eq. (15)) [123, 135]:
quence distinction between the two classes has not been found [123, 135]. But even
if a sequence distinction is found, it remains difficult to understand the origin of
this partition into two distinct classes. For example, if a hybrid protein were cre-
ated, with one half being random coil-like and one-half being pre-molten globule-
like, would its hydrodynamic properties lie between those of the two classes? Why
haven’t such chimeras been found in nature? Perhaps the simple explanation is
that not enough natively disordered proteins have been characterized. But if a
wide variety of natively disordered proteins have not been found in the current
sampling, even though such proteins do exist, what is leading to the partition ob-
served so far? This question deserves further study.
8.2.4
Circular Dichroism Spectropolarimetry
Circular dichroism (CD) measures the difference in the absorbance of left versus
right circularly polarized light and is therefore sensitive to the chirality of the envi-
ronment [137]. There are two types of UV-absorbing chromophores in proteins:
side chains of aromatic amino acid residues and peptide bonds [137, 138]. CD
spectra in the near-ultraviolet region (250–350 nm), also known as the aromatic
region, reflect the symmetry of the aromatic amino acid environment and, con-
sequently, characterize the protein tertiary structure. Proteins with rigid tertiary
structure are typically characterized by intense near-UV CD spectra, with unique
fine structure, which reflect the unique asymmetric environment of individual aro-
matic residues. Thus, natively disordered proteins are easily detected since they are
characterized by low-intensity, near-UV CD spectra of low complexity. The far-UV
region of a protein’s absorbance spectrum (190–240 nm) is dominated by the elec-
tronic absorbance from peptide bonds. The far-UV CD spectrum provides quantifi-
able information about the secondary structure of a protein because each category
of secondary structure (e.g., a-helix, b-sheet) has a different effect on the chiral en-
vironment of the peptide bond. Because the timescale for electronic absorbance
(@1012 seconds) is so much shorter than that of folding or unfolding reactions
(at least @106 seconds), CD provides information about the weighted average
structure of all the peptide bonds in the beam.
CD provides different information about native globular versus denatured or dis-
ordered proteins. In a native globular protein, the weighted average can be applied
to each molecule because nearly all the molecules are in the native state and each
molecule of the native state has a similar structure. For instance, if the CD data
indicate that 30% of the peptide bonds in a sample of a 100-residue native globular
protein are in the a-helical conformation and 70% are in the b-sheet conformation,
then 30 peptide bonds in each protein molecule are helical and the other 70 are
sheet. Furthermore, it is the same 30 and 70 in each molecule. For a denatured or
disordered protein, the CD spectrum provides information about the ensemble of
conformations, and the information will not be generally applicable to any single
molecule. For instance, changing the above example to a disordered protein, we
298 8 Natively Disordered Proteins
can say that 30% of the peptide bonds in the sample are helical and 70% are sheet,
but with only these data in hand, nothing can be inferred about the structure of an
individual molecule – neither the helix and sheet percentage nor the location of
those structures along the chain.
A good correlation exists between the relative decrease in hydrodynamic volume
and the increase in secondary structure content. This correlation was shown for a
set of 41 proteins that had been evaluated by both far-UV CD and hydrodynamic
methods [139]. Study of the equilibrium unfolding of these globular proteins re-
vealed that the Stokes radii ðRS Þ and secondary structure of native and partially
folded intermediates were closely correlated. Results of this analysis are presented
3 U
in Figure 8.4 as the dependence of ðRU S =RS Þ versus ½y222 =½y222 , which represent
relative compactness and relative content of ordered secondary structure, respec-
tively. Significantly, Figure 8.4 illustrates that data for both classes of conforma-
tions (native globular and partially folded intermediates) are accurately described
by a single dependence (correlation coefficient, r 2 ¼ 0:97) [139]:
Fig. 8.4. Correlation between the degree of of the unfolded conformation, while the
compactness and the amount of ordered amount of ordered secondary structure,
secondary structure for native globular proteins ½y222 =½yU222 , was calculated from far-UV CD
(filled circles) and their partially folded spectra as the increase in negative ellipticity at
intermediates (open circles). The degree of 222 nm relative to the unfolded conformation.
compactness, ðRUS =RS Þ 3 , was calculated for The data used to plot these dependencies are
different conformational states as the decrease taken from Uversky [139].
in hydrodynamic volume relative to the volume
8.2 Methods Used to Characterize Natively Disordered Proteins 299
3 !
RU
S ½y222
¼ ð1:047 G 0:010Þ ð0:31 G 0:12Þ ð16Þ
RS ½yU
222
This correlation means that the degree of compactness and the amount of
ordered secondary structure are conjugate parameters. In other words, there is no
compact equilibrium intermediate lacking secondary structure or any highly
ordered but non-compact species among the proteins analyzed. Therefore, hydro-
phobic collapse and secondary structure formation occur simultaneously rather
than as two subsequent independent processes. This conclusion generalizes earlier
observations made for several individual proteins including DnaK [140], apomyo-
globin [141], and staphylococcal nuclease [142].
Tiffany and Krimm noted over 35 years ago that CD spectra of reversibly dena-
tured proteins resemble that of a left-handed polyproline II helix [143]. It was only
recently, however, that Creamer revisited these data and put them on a more sound
footing with studies of non-proline model peptides [144, 145]. Polyproline helices
made from non-proline residues have a negative CD band at 195 nm and positive
band at 218 nm. These wavelengths are shorter than those of true polyproline
helices, (205 nm and 228 nm, respectively) because of the different absorption
maxima for primary and secondary amides. Inspection of the literature shows
that many intrinsically disordered proteins exhibit a negative feature near 195
nm, with near-zero ellipticity near 218 nm [17, 18, 146–149], and some exhibit
both the negative and the positive features [150, 151]. The interpretation of CD
spectra in terms of polyproline II helix content remains controversial for several
reasons. First, it is not yet clear what the difference is between the CD spectrum
of a random coil and the CD spectrum of a polyproline II helix. Second, there is
not yet a way to quantify the amount of polyproline II helix.
The conclusion is clear: many intrinsically disordered proteins resemble revers-
ibly denatured proteins and may exist, on average, in a polyproline II helix. How-
ever, some of these proteins exhibit small amounts of other secondary structures
[77]. Sometimes [102, 152–157], but not always [148, 158], secondary structure
can be induced by molecular crowding and ‘‘structure-inducing’’ co-solutes.
8.2.5
Infrared and Raman Spectroscopy
Infrared (IR) spectra are the result of intramolecular movements (bond stretching,
change of angles between bonds along with other complicated types of motion) of
various functional groups (e.g., methyl, carbonyl, amide, etc.) [159]. The merits of
spectral analysis in the IR region originate from the fact that the modes of vibra-
tion for each group are sensitive to changes in chemical structure, molecular
conformation, and environment. In the case of proteins and polypeptides, two in-
frared bands that are connected with vibrational transitions in the peptide back-
bone and reflect the normal oscillations of simple atom groups are of the most in-
terest. These bands correspond to the stretching of NaH and CbO bonds (amide I
band) and the deformation of NaH bonds (amide II band). The amide I and amide
300 8 Natively Disordered Proteins
Fig. 8.5. FTIR spectra in the amide I region pestis capsular protein Caf1 (dashed line), and
measured for the natively disordered a- a-synuclein amyloid fibril with cross-b-structure
synuclein (solid line), a-helical human a- (dash-dot-dotted line). Spectra are normalized
fetoprotein (dotted line), b-structural Yersinia to have same area.
8.2 Methods Used to Characterize Natively Disordered Proteins 301
weights corresponding to their content in the protein, and (3) the spectral charac-
teristics of secondary structure are the same for all proteins and for all the struc-
tural elements of one type in the protein. The finding that Fourier transform infra-
red spectroscopy (FTIR) exhibited a high sensitivity to the conformational state of
macromolecules resulted in numerous studies where this approach was used to
analyze protein molecular structure (including a number of natively disordered
proteins) and to investigate the processes of protein denaturation and renaturation.
Raman is complementary to IR, depending on inelastic light scattering rather
than absorption to obtain vibrational information, and is further complementary
to IR in emphasizing vibrations lacking changes in dipole moments. Raman spec-
troscopy has long been used for the analysis of protein structure and protein con-
formational changes [398]. Recently, specialized Raman methods, namely surface
enhanced resonance Raman [399] and Raman optical activity [400], have demon-
strated significant usefulness for the characterization of natively disordered proteins.
FTIR also provides a means of keeping track of conformational changes in pro-
teins. These are followed by monitoring the changes in the frequencies of the IR
bands that result from deuteration (substitution of hydrogen atoms by deuterium)
of the molecule [159]. Since it is usually known which band belongs to which func-
tional group (carbonyl, oxy-, or amino group), one can identify the exchangeable
groups by observing the changes in band position as a result of deuteration. The
rate of deuteration depends on the accessibility of a given group to the solvent
that, in turn, varies according to changes in conformation. Thus, by tracking the
changes in the deuterium-exchange rates for different solvent compositions and
other environmental parameters, one can obtain information about the resultant
conformational changes within a given protein.
8.2.6
Fluorescence Methods
means that lmax of tryptophan fluorescence contains some basic information about
whether the protein is compact under the experimental conditions. For this reason,
the analysis of intrinsic protein fluorescence is frequently used to study protein
structure and conformational change.
I0
¼ ð1 þ KSV ½QÞe V½Q ð17Þ
I
where I0 and I are the fluorescence intensities in the absence and presence of
quencher, KSV is the dynamic quenching constant, V is a static quenching con-
stant, and ½Q is the quencher concentration.
To some extent, the information obtained from dynamic quenching of intrinsic
fluorescence is similar to that obtained from studies of deuterium exchange (see
Section 8.2.8 and chapter 18 in Part I), since it reflects the accessibility of defined
protein groups to the solvent. However, in distinction from the deuterium ex-
change, this method can be used to evaluate the amplitude and timescale of dy-
namic processes by using quenchers of different size, polarity, and charge. In one
example, it has been shown that the rate of diffusion of oxygen (which is one of the
smallest and most efficient quenchers of intrinsic protein fluorescence) within a
protein molecule is only two to four times slower than in an aqueous solution
[167, 168]. Furthermore, oxygen was shown to affect even those tryptophan resi-
dues that, according to X-ray structural analysis, should not be accessible to the sol-
vent. These observations clearly demonstrated the presence of substantial struc-
tural fluctuations in proteins on the nanosecond timescale [167, 168].
Acrylamide is one of the most widely used quenchers of intrinsic protein fluores-
cence [169, 170]. Acrylamide, like oxygen, is a neutral quencher but has a much
larger molecular size. This size difference results in a dramatic decrease in the
rate of protein fluorescence quenching over that of oxygen [170, 171]. This
decrease is due to the inaccessibility of the globular protein interior to the acryla-
mide molecule. Thus, acrylamide actively quenches only the intrinsic fluorescence
of solvent-exposed residues. As applied to conformational analysis, acrylamide
quenching was shown to decrease by two orders of magnitude as unstructured
polypeptide chains transitioned to globular structure [170, 171]. Importantly, the
degree of shielding of tryptophan residues by the intramolecular environment of
the molten globule state was shown to be close to that determined for the native
globular proteins, whereas the accessibility of tryptophans to acrylamide in the
pre-molten globule state was closer to that in the unfolded polypeptide chain
[142]. The fluorescence quenching by acrylamide of the single tryptophan residue
in the beta 2 subunit of tryptophan synthase was used to verify the presence of a
8.2 Methods Used to Characterize Natively Disordered Proteins 303
(see chapter 17 in Part I). It should be noted that the transfer of excitation energy
between the donor and the acceptor originates only upon the fulfillment of several
conditions: (1) the absorption (excitation) spectrum of the acceptor overlaps with
the emission (luminescence) spectrum of the donor (an essential prerequisite for
resonance), (2) spatial proximity of the donor and the acceptor (within a few dozen
angstroms), (3) a sufficiently high quantum yield of the donor, and (4) a favorable
spatial orientation of the donor and acceptor. The biggest advantage, and hence
the attractiveness, of fluorescence resonance energy transfer (FRET) is that it can
be used as a molecular ruler to measure distances between the donor and acceptor.
According to Förster, the efficiency of energy transfer, E, from the excited donor, D,
to the non-excited acceptor, A, located from the D at a distance RDA is determined
by the equation [181]
1
E¼ ð18Þ
RDA 6
1þ
Ro
where the parameter fD is the fluorescence quantum yield of the donor in the ab-
sence of the acceptor, n is the refractive index of the medium, N is the Avogadro’s
number, l is the wavelength, FD ðlÞ is the fluorescence spectrum of the donor with
the total area normalized to unity, and eA ðlÞ is the molar extinction coefficient of
the acceptor. These parameters are obtained directly from independent experi-
ments. Finally, hk 2 i represents the effect of the relative orientations of the donor
and acceptor transition dipoles on the energy transfer efficiency. For a particular
donor-acceptor orientation, this parameter is given as
where a is the angle between the transition moments of the donor and the acceptor
and b and g are the angles between the donor and acceptor transition moments
and the donor-acceptor vector, respectively. Experimentally, the efficiency of direct
energy transfer, E, is calculated as the relative loss of donor fluorescence due to
the interaction with the acceptor [163, 181]:
fD; A
E ¼1 ð21Þ
fD
where fD and fD; A are the fluorescence quantum yields of the donor in the ab-
sence and the presence of acceptor, respectively.
8.2 Methods Used to Characterize Natively Disordered Proteins 305
For FRET experiments, one typically uses the intrinsic chromophores (tyrosines
or tryptophans) as donors and covalently attached chromophores (such as dansyl)
as acceptors. According to Eq. (18), the efficiency of energy transfer is proportional
to the inverse sixth power of the distance between donor and acceptor. Obviously,
structural changes within a protein molecule might be accompanied by changes in
this distance, giving rise to the considerable changes in energy transfer. FRET has
been used to show that the urea-induced unfolding of proteins is accompanied by a
considerable increase in hydrodynamic dimensions. This expansion resulted in a
significant decrease in the efficiency of energy transfer from the aromatic amino
acids within the protein (donor) to the covalently attached dansyl (acceptor).
An elegant approach based on the unique spectroscopic properties of nitrated
tyrosine (which has maximal absorbance at @350 nm, does not emit light, and
serves as an acceptor for tryptophan) has been recently elaborated [182–187]. For
these experiments tyrosine residues are modified by reaction with tetranitrome-
thane to convert them to a nitro form, Tyr(NO2 ). The extent of decrease of trypto-
phan fluorescence in the presence of Tyr(NO2 ) provides a measure of average dis-
tance ðRDA Þ between these residues. This approach has been applied to study the
3-D structure of apomyoglobin in different conformational states [187]. These con-
formations included the native, molten globule, and unfolded conformations. The
A-helix of horse myoglobin contains Trp7 and Trp14, the G-helix contains Tyr103,
and the H-helix contains Tyr146. Both tyrosine residues can be converted suc-
cessively into the nitro form [186]. Comparison of tryptophan fluorescence in un-
modified and modified apomyoglobin permits the evaluation of the distances from
tryptophan residues to individual tyrosine residues (i.e., the distances between
identified points on the protein). Employing this energy transfer method to a
variety of nonnative forms of horse apomyoglobin revealed that the helical com-
plex formed by the A-, G-, and H-helices exists in both the molten globule and pre-
molten globule states [187].
Fig. 8.6. ANS interaction with proteins. (A) spectrum comparable with that of the molten
ANS fluorescence spectra measured for free globular state of a-lactalbumin (see text).
dye (solid line) and in the presence of natively (B) Dependence of the ANS fluorescence
disordered coil-like a-synuclein (dotted line), intensity on urea concentration measured for
natively disordered pre-molten globule-like a series of rigid globular proteins capable of
caldesmon 636–771 fragment (dash-dot- ANS binding: bovine serum albumin (circles),
dotted line), and molten globule state of a- apomyoglobin, pH 7.5 (squares), and
lactalbumin (dashed line). The native molten hexokinase (inverse triangles).
globular domain of clusterin has an ANS
8.2 Methods Used to Characterize Natively Disordered Proteins 307
8.2.7
Conformational Stability
denatured and exists as a molten globule [194, 199]. To perform this type of analy-
sis, the values of Dneff (the difference in the number of denaturant molecules
‘‘bound’’ to one protein molecule in each of the two states) should be determined.
This quantity should be compared to the Dneff eff
N!U and DnMG!U values corresponding
to the native-to-coil and molten globule-to-coil transitions in globular protein of
that given molecular mass [194].
8.2.8
Mass Spectrometry-based High-resolution Hydrogen-Deuterium Exchange
the past 40 years [206]. Hydrogen-deuterium exchange (HDX) rates are dependent
on thermodynamics and dynamic behavior and thus yield information regarding
the structural stability of the protein under study. The study of HDX as applied to
proteins was initiated by Kaj Linderstrøm-Lang in the 1950s as a method to inves-
tigate Pauling’s newly postulated a-helix and b-sheet secondary structures [207].
The two-step model of HDX is described as [208]
kop kch kcl
C H 8 OH 8 OD 8 C D ð22Þ
kcl kch kop
The EX2 HDX mechanism, where kcl g kch , describes most proteins at or below
neutral pH. In this case, kex can be found using the following equation:
kop kch
kex ¼ ¼ K op kch ð24Þ
kcl
where K op is the unfolding equilibrium constant that is also equal to the ratio of
the observed rate ðkex Þ to that of a random coil amide ðkch Þ.
HDX rates for random coil amides ðkch Þ, which depend on pH, primary sequence,
and temperature, can be obtained from studies of peptide behavior [209]. The ratio
of the observed rate of exchange ðkex Þ to that of a random coil ðkch Þ relates to the
free energy of unfolding by the following equation:
kex
DGop ¼ RT ln ¼ RT lnðK op Þ ð25Þ
kch
Eq. (25) shows that there is a logarithmic correlation between the exchange rate
and the conformation stability, i.e., the faster the exchange rate, the less stable the
folded state. Being initially developed to study protein folding on a global scale,
this formalism could also be used to analyze the intrinsic dynamic behavior of pro-
teins and, most attractively, to quantify the stability of localized regions within pro-
tein molecules.
Hydrogen bonding, accessibility to the protein surface, and flexibility of the im-
mediate and adjacent regions affect the HDX rate. The combined contribution of
these structural and dynamic factors is termed the ‘‘protection factor,’’ which has
8.2 Methods Used to Characterize Natively Disordered Proteins 311
8.2.9
Protease Sensitivity
@20 years [217]. Increasing the stability of a protein, e.g., by ligand or substrate
binding, in turn often leads to reduced digestion rates [218].
In essentially all biochemistry textbooks, the structure of proteins is described
as a hierarchy. The amino acid sequence is the primary structure; the local folding
of the backbone into helices, sheets, turns, etc. is the secondary structure; and the
overall arrangement and interactions of the secondary structural elements make
up the tertiary structure. Many details of this hierarchy were determined from the
first protein 3-D structure [16], but the original proposal for the primary-secondary-
tertiary hierarchy was based on differential protease digestion rates over the three
types of structure [217].
While acknowledging that flexible protein regions are easy targets for proteoly-
sis, some researchers argued that the surface exposure of susceptible amino acids
in appropriate conformations could provide important sites for accelerated diges-
tion rates [219]. Indeed, proteins that inhibit proteases, such as soybean trypsin in-
hibitor [220], have local regions of 3-D structure that fit the active sites and bind
irreversibly to their target proteases. Despite the examples provided by the inhibi-
tors, susceptible surface digestion sites in ordered proteins are not very common
[221, 222]. Despite the paucity of examples, the importance of surface exposure
rather than flexibility or disorder seems to have become widely accepted as the
main feature determining protease sensitivity.
In the well-studied protease trypsin, most of the lysine and arginine target resi-
dues are located on protein surfaces, but, as mentioned above, very few of these are
actually sites of digestion when located in regions of organized structure. Starting
with a few protein examples having specific surface-accessible digestion sites, at-
tempts were made to use molecular simulation methods to dock the backbones of
susceptible residues into the active site of trypsin. These studies suggested the
need to unfold more than 10 residues to fit each target residue’s backbone into
trypsin’s active site [223]. Indeed, recent studies show that backbone hydrogen
bonds are protected from water by being surrounded or wrapped with hydrophobic
side chains from nearby residues, with essentially every ordered residue being well
wrapped [224]. Backbone hydrogen bond wrapping and binding the backbone deep
within trypsin’s active site are mutually exclusive, and so the requirement for local
unfolding (unwrapping) is easily understood.
Direct support for the importance of local unfolding has been provided by com-
parison of the digestion rates of myoglobin and apomyoglobin [225]. Myoglobin is
digested very slowly by several different proteases. Removal of the heme leads to
local unfolding of the F-helix. This disordered loop becomes the site of very rapid
digestion by several different proteases including trypsin. The trypsin digestion
rate following the order-to-disorder transition of the F-helix is at least five orders
of magnitude faster.
Many fully ordered proteins, such as the myoglobin example given above, are di-
gested by proteases without observable intermediates of significant size. That is, if
digestion progress were followed using PAGE, myoglobin would be seen to disap-
pear and small fragments would appear, but mid-sized intermediates would not be
observed. In contrast, digestion of the F-helix in apomyoglobin leads to two rela-
8.2 Methods Used to Characterize Natively Disordered Proteins 313
tively stable mid-sized fragments. The likely cause of the lack of observable mid-
sized intermediates for fully ordered proteins is that digestion proceeds at multiple
sites within the ordered regions of the protein. Access to these sites is mediated by
transient unfolding events within ordered regions that allow initial digestion
events. These initial cuts perturb the local structure and inhibit refolding. The
sum of these changes leads to the exposure of multiple protease sites, which are
rapidly cleaved in a random order. Additional transient unfolding events elsewhere
in the structure lead to the same outcome. Thus the regions of the protein where
structure has been disturbed become rapidly digested into a multitude of different-
sized intermediate fragments without the accumulation of an observable quantity
of any particular mid-sized intermediate [225].
Molten globules have also been probed by protease digestion [226]. Protease di-
gestion of molten globules leads rapidly to multiple mid-sized fragments; upon
further digestion, the intermediates convert to smaller-sized fragments consistent
with nearly complete digestion. The transient stability of several mid-sized inter-
mediates suggests that molten globules have multiple regions of high flexibility
that are easy to digest interspersed with stabler structured regions that are resistant
to digestion.
At this time we have not found published, systematic studies of fully unfolded,
natively disordered proteins. One would expect such proteins to undergo digestion
in a random fashion with all of the target residues equally accessible to digestion
and with only sequence-dependent effects to modulate the digestion rates at the
various sites. If this general picture were true, then proteolytic digestion would
lead to rapid conversion of the primary protein into small fragments, without
significant amounts of transient, mid-sized intermediates. Highly disordered pro-
teins such as p21Waf1=Cip1=Sdi1 [402, 403] and cytochrome c [404] undergo digestion
to small fragments without any observable intermediates, as suggested here. The
digestion patterns of extensively unfolded proteins likely resemble those of fully
folded proteins: both evidently lack the accumulation of mid-sized intermediates.
However, the two types of proteins can be distinguished by their vastly different
digestion rates: extensively unfolded proteins would be expected to be digested at
least 10 5 times faster than fully folded forms.
8.2.10
Prediction from Sequence
Amino acid sequence codes for protein 3-D structure [5]. Since native disorder can
be viewed as a type of structure, we wanted to test whether the amino acid compo-
sitions of disordered regions differed from the compositions of ordered regions
and, if so, whether the types of enrichments and depletions gave insight into the
disorder. Thus, we studied amino acid compositions of collections of both ordered
and natively disordered protein regions longer than 30 amino acids. We chose such
long regions to lessen the chance that non-local interactions would complicate our
analysis.
Natively disordered sequences characterized by X-ray diffraction, NMR spectros-
314 8 Natively Disordered Proteins
copy, and CD spectroscopy all contain similar amino acid compositions, and these
are very different from the compositions of the ordered parts of proteins in PDB.
For example, compared to the ordered proteins, natively disordered proteins con-
tain (on average) 50% less W, 40% less F, 50% less isoleucine, 40% less Y, and
25% or so less C, V, and L, but 30% more K, 40% more E, 40% more P, and 25%
more S, Q, and R [31]. Thus, natively disordered proteins and regions are substan-
tially enriched in typically hydrophilic residues found on protein surfaces and sig-
nificantly depleted of hydrophobic, especially aromatic, residues found in protein
interiors. These data help to explain why natively disordered proteins fail to form
persistent, well-organized 3-D structure.
Because ordered and disordered proteins contain significantly different amino
acid compositions, it is possible to construct order/disorder predictors that use
amino acid sequence data as inputs. By now, several researchers have published
predictors of order and disorder [13, 30, 109, 227–233]. The first predictor, devel-
oped by R. J. P. Williams, separated just two natively disordered proteins from a
small set of ordered proteins by noting the abnormally high charge/hydrophobic
ratio for the two natively disordered proteins. This paper is significant as being
the first indication that natively disordered proteins have amino acid compositions
that differ substantially from those of proteins with 3-D structure. The predictor by
Uversky et al. [227] also utilizes the relative abundance of charged and hydropho-
bic groups, but on a much larger set of proteins. Most of the other predictors [30,
109, 229–233] utilize neural networks.
Predicting order and disorder was included in the most recent cycle of the Criti-
cal Assessment of Protein Structure Prediction (CASP) [234]. While the summary
publication contained papers from two sets of researchers [109, 233], several addi-
tional groups participated. All of the groups achieved similar levels of accuracy
[234]. Furthermore, prediction accuracies were on the order of 90% or so for re-
gions of order and 75–80% for regions of intrinsic disorder. These values are
significantly above what could be expected by chance. The predictability of native
disorder from sequence further supports the conjecture that natively disordered
proteins and regions lack specific 3-D structures as a result of their amino acid se-
quences. That is, amino acid sequence codes for both order and disorder.
8.2.11
Advantage of Multiple Methods
Given the limitations of the various physical methods, it is useful if natively dis-
ordered proteins are characterized by multiple methods. Each approach gives a
slightly different view, with better understanding arising from the synthesis of the
different perspectives (see chapter 2 in Part I).
A significant difficulty with long disordered regions identified by X-ray diffrac-
tion is that the missing coordinates could indicate either native disorder or a wob-
bly domain that moves as a folded, ordered, rigid body. Thus, it is especially useful
if X-ray-indicated disorder is confirmed by additional methods. The X-ray-indicated
regions of disorder in calcineurin [29, 235], clotting factor Xa [236–238], histone
H5 (C. Crane-Robinson, personal communication) [239–241], and topoisomerase
8.3 Do Natively Disordered Proteins Exist Inside Cells? 315
8.3
Do Natively Disordered Proteins Exist Inside Cells?
8.3.1
Evolution of Ordered and Disordered Proteins Is Fundamentally Different
Fig. 8.7. Idealized steady-state heteronuclear agreement with X-ray-determined structure; all
15
N{ 1 H}-NOE for the backbone amides of the regions of missing electron density in the X-ray
hypothetical protein discussed in the text. The crystal structure have negative values. (B)
15
domain profile of this protein based on the X- N{ 1 H}-NOE indicating that the C-terminal
ray crystal structure is: short disordered N- domain is a wobbly domain (a structured
terminal region/ordered region/disordered domain attached via a short, flexible linker)
region/ordered region/long disordered C- and is not totally disordered as determined by
terminal region. (A) 15 N{ 1 H}-NOE that is in X-ray crystallography.
8.3 Do Natively Disordered Proteins Exist Inside Cells? 317
namic structures of natively disordered proteins limits our ability to predict their
existence based on sequence data. It also limits our understanding of how the
sequences of such regions specify function, the presence or absence of residual
structure, and degree of flexibility.
The evolution of globular protein structures depends on the maintenance of a
nonpolar interior and a polar exterior to promote collapse and folding through the
hydrophobic effect [261]. This is achieved by the placement and distribution of
nonpolar amino acids so that those making favorable nonpolar contacts in the
folded structure will be far apart in the linear sequence (so-called long-range inter-
actions). Therefore, the evolution of globular protein structure is partly dependent
on selection for this property. In addition to the hydrophobic effect, desolvation of
backbone hydrogen bonds appears to be of similar importance [224].
The evolution of intrinsically disordered protein structure seems to depend on
selection for other properties. Natively disordered proteins with extended disorder
do not form globular structures and therefore do not have a requirement to main-
tain long-range interactions such as those required by globular proteins. This cre-
ates a potential for these sequences to accumulate more variation and to generate
more sequence divergence than globular proteins (see Section 8.3.1.2). For globular
proteins, the selection for structural motifs is apparent by sequence and structural
comparison of evolutionarily related protein structures. In an attempt to develop
evolutionary relationships for intrinsically disordered proteins, the Daughdrill lab
is currently investigating the structure, dynamics, and function of a conserved flex-
ible linker from the 70-kDa subunit of replication protein A (RPA70). The flexible
linker for human RPA70 is @70 residues long [269, 272]. For the handful of se-
quenced RPA70 homologues, the similarity among linker sequences varies signifi-
cantly, going from 43% sequence identity between Homo sapiens and Xenopus laevis
to no significant similarity between Homo sapiens and Saccharomyces cerevisiae. It is
unclear which selective processes have resulted in the observed sequence variations
among RPA70 linkers. It is also unclear how the observed sequence variation af-
fects the structure and function of the linkers. If natural selection works to pre-
serve flexible structures, then one would expect that the linkers from different
species have evolved to the same level of flexibility although adopting different se-
quences. By testing this hypothesis, we will begin understanding the rules govern-
ing the evolution of natively disordered protein sequences.
ments of the RPA70 homologues lend support to this hypothesis and suggest that
the linker has evolved at a rate 1.5 to 5 times faster than the rest of RPA70 [275].
This study tested the evolutionary rate heterogeneity between intrinsically disor-
dered regions and ordered regions of proteins by estimating the pairwise genetic
distances among the ordered and disordered regions of 26 protein families that
have at least one member with a region of disorder of 30 or more consecutive res-
idues that has been characterized by X-ray crystallography or NMR. For five of the
protein families, there were no significant differences in pairwise genetic distances
between ordered and disordered sequences. Disordered regions evolved more rap-
idly than ordered regions for 19 of the 26 families. The known functions for some
of these disordered regions were diverse, including flexible linkers and binding
sites for protein, DNA, or RNA. The functions of other disordered regions were un-
known. For the two remaining families, the disordered regions evolved signifi-
cantly more slowly than the ordered regions. The functions of these more slowly
evolving disordered regions included sites for DNA binding. According to the au-
thors, much more work is needed to understand the underlying causes of the vari-
ability in the evolutionary rates of intrinsically ordered and disordered proteins.
Figure 8.8 illustrates the point, showing the contrast between the protein sequence
alignment for the flexible linker and a fragment of the ssDNA binding domain
from eight RPA70 homologues. Because of the known functional consequences of
linker deletion in RPA70 [272, 276–279], we are interested in how the level of func-
tional selection for flexible regions is related to their evolutionary rates. As is ob-
served for folded proteins, most likely this relationship depends on the function
under selection [259–261, 273]. However, because the substitution rate for flexible
regions can be uncoupled from the maintenance of long-range interactions, the ex-
act dependence between the rate of evolution and functional selection should differ
significantly from that observed for folded proteins. Experiments planned to test
the functional consequences of swapping divergent flexible linkers between species
will begin to address this subject.
to incorporate structural classes that are based on the presence of residual second-
ary structure and the degree of flexibility estimated from NMR relaxation measure-
ments.
Another important feature of protein evolution is the pattern of amino acid sub-
stitutions observed over time. For ordered proteins, the change of an amino acid
into one of a similar chemical type is commonly observed, whereas the change to
a chemically dissimilar one is rare. For example, isoleucine to leucine, aspartate to
glutamate, and arginine to lysine are all commonly observed in related ordered
proteins, while tryptophan to glycine, lysine to aspartate, and leucine to serine are
all uncommon changes, especially for buried residues.
Patterns of amino acid substitutions are readily observed in substitution matrices
used to assign values to various possible sequence alignments. These scoring ma-
trices are constructed by first assembling a set of aligned sequences, typically with
low rates of change in pairs of sequences so that there is confidence in the align-
ments. Given such a set of high-confidence alignments, the frequencies of the
various amino acid substitutions are then calculated and used to build a scoring
matrix. Important and commonly used scoring matrices are the PAM [283] and
the Blossum [284] series.
To improve alignments of disordered regions, a scoring matrix for disordered re-
gions was developed [285]. First, homologous groups of disordered proteins were
aligned by standard protocols using the Blossum 62 scoring matrix. From this set
of aligned sequences, a new scoring matrix was calculated and then realignment
was carried out with this new scoring matrix. As compared to the first alignment,
the new alignment was improved as estimated by a reduction in the sizes and
number of gaps in the pairwise alignments. The new set of alignments was then
used to develop a new scoring matrix, and a new set of alignments was generated.
These steps were repeated in an iterative manner until little or no change was ob-
served in successive sets of alignments and in successive scoring matrices.
The resulting scoring matrix for disordered regions showed significant dif-
ferences from the Blossum 62 or PAM 250 scoring matrices [286]. Glycine/
tryptophan, serine/glutamate, and alanine/lysine substitutions, for example, were
much more common in aligned regions of native disorder as compared to aligned
regions of ordered proteins. Since natively disordered regions lack specific struc-
ture and the accompanying specific amino acid interactions, substitutions of dispa-
rate amino acids are not so strongly inhibited by the need to conserve structure.
Thus the commonness of such disparate substitutions in natively disordered re-
gions is readily understood.
The commonly observed higher rates of amino acid change [287] and the distinc-
tive pattern of amino acid substitutions [286] both strongly support the notion that
native disorder exists inside the cell.
8.3.2
Direct Measurement by NMR
The inside of cells are extremely crowded, and proteins themselves do most of the
crowding since they occupy 40% of a cell’s volume and achieve concentrations of
8.3 Do Natively Disordered Proteins Exist Inside Cells? 321
greater than 500 g L1 [288–290]. Despite the crowded nature of the cell’s interior,
almost all proteins are studied outside cells and in dilute solutions, i.e., total solute
concentrations of less than 1 g L1 . It is also clear that macromolecular and small-
molecule crowding can increase protein stability [291–293]. Could some disor-
dered proteins be artifacts of the way proteins are studied? Are some just unstable
proteins that unfold in dilute solution? A recent study by Dedmon et al. [102] on a
protein called FlgM shows that the answer to the question is yes and no. Some dis-
ordered proteins probably have structure in cells while others do not.
Bacteria use rotating flagella to move through liquids [294, 295]. The protein
FlgM is part of the system controlling flagellar synthesis. It binds the transcription
factor s 28, arresting transcription of the genes encoding the late flagellar proteins.
Transcription can resume when FlgM leaves the cell, most probably via extrusion
through the partially assembled flagellum.
Free FlgM is mostly disordered in dilute solution, but NMR studies in dilute
solution indicate that the C-terminal half of FlgM becomes structured on bind-
ing s 28 , as is shown by the disappearance of cross-peaks from residues in the C-
terminal half of FlgM in the FlgM-s 28 complex [77, 267]. One signature of protein
structure can be the absence of cross-peaks in 1 H- 15 N HSQC NMR spectra because
of conformational exchange [296–298]. The disappearance of cross-peaks results
from chemical exchange between a disordered and more ordered form. Specifically,
cross-peaks broaden until they are undetectable when the rate of chemical ex-
change between states is about the same as the difference in the resonance fre-
quencies of the nuclei undergoing exchange. The bipartite behavior of FlgM (i.e.,
disappearance of cross-peaks from the C-terminal half with retention of cross-
peaks from the N-terminal half ) provides a valuable built-in control for studying
the response of FlgM to different solution conditions.
How do these observations about the ability of FlgM to gain structure on binding
its partner relate to what might happen in cells? Until recently, all protein NMR
was performed in vitro on purified protein samples in dilute solution. Two years
ago, Dötsch and colleagues showed the feasibility of obtaining the spectra of 15 N-
labeled proteins inside living Escherichia coli cells [299–301]. Overexpression is key
to the success of in-cell NMR. The protein of interest must contain a large propor-
tion of the 15 N in the sample so that the spectrum of the overexpressed protein can
be observed on top of signals arising from other 15 N-enriched proteins in the cell,
which contribute to a uniform background.
15
N-enriched FlgM was found to give excellent in-cell NMR data [102]. About
half the cross-peaks disappear in cells. Most importantly, the cross-peaks that dis-
appear are the same ones that disappear on s 28 binding in simple buffered solu-
tion, and the cross-peaks that persist are the same ones that persist on s 28 binding.
These data suggest that FlgM gains structure all by itself under the crowded con-
ditions found in the cell, but it is important to rule out alternative explanations.
There is a homologue of S. typhimurium s 28 in E. coli, but there is not enough of
the homologue present in E. coli (i.e., FlgM is overexpressed, the s 28 homologue is
not) for s 28-FlgM binding to explain the results. Furthermore, the same behavior is
observed in vitro – in the complete absence of s 28 – when intracellular crowding
was mimicked by using 450 g L1 glucose, 400 g L1 bovine serum albumin, or
322 8 Natively Disordered Proteins
450 g L1 ovalbumin. The lack of cross-peaks from the C-terminal half of the pro-
tein is not caused by degradation, because the FlgM can be isolated intact at the
end of the in-cell experiment. The gain of structure in cells does not seem to be
an artifact of FlgM overexpression because the total protein concentration is inde-
pendent of FlgM expression. Two observations show that the presence or absence
of cross-peaks is not simply a matter of viscosity. First, cross-peaks from the N-
terminal half of FlgM are present under all conditions tested even though the
relative viscosities of the solutions differ dramatically. Second, the absence of
cross-peaks does not correlate with increased viscosity. Taken together, these data
strongly suggest that even in the absence of its binding partner, the C-terminal por-
tion of FlgM gains structure in cells and in crowded in vitro conditions.
Does this observation about the C-terminal part of FlgM mean that all intrinsi-
cally disordered proteins will gain structure under crowded conditions? No. First,
the N-terminal part of FlgM remains disordered under crowded conditions both in-
side and outside the cell. Second, the same observation has been made for another
protein under crowded conditions in vitro [302]. Third, as discussed below, several
functions of disordered proteins require the absence of stable structure. Perhaps
the N-terminal half of FlgM gains structure only upon binding some yet unknown
molecule, or maybe it needs to remain disordered to ensure its exit from the cell.
It is also important to note, however, that crowding can induce compaction even
when it does not introduce structure [158].
In summary, some, but certainly not all, so-called natively disordered proteins
will gain structure in cells. In terms of the equilibrium thermodynamics of protein
stability (Eqs. (1) and (2)), these proteins are best considered as simply very unsta-
ble. This instability may be essential for function, e.g., allowing rapid degradation
or facilitating exit from the cell. And, finally, it is important to consider this discus-
sion in terms of Anfinsen’s thermodynamic hypothesis. Specifically the last four
words of his statement [5]: ‘‘the native conformation is determined by the totality
of inter-atomic interactions and hence by the amino acid sequence, in a given envi-
ronment.’’
8.4
Functional Repertoire
8.4.1
Molecular Recognition
[303]. The release of solvent will provide a favorable entropic contribution to the
binding. The extent of this favourable effect depends on the amount of hydropho-
bic burial that occurs during the formation of the nonspecific encounter complex.
This encounter complex will undergo a sequential selection to achieve a specific
complex. Sequential selection is achieved by consecutive structural interconver-
sions that increase the surface complementarity. A subsequent increase in surface
complementarity leads to a stabler complex. Of course, this model does not address
the possibility that folding to a single structure does not occur but rather the bound
structure is dynamic and the overall stability is governed by a collection of compet-
ing interactions.
tropy was observed for the mouse major urinary protein (MUP-I) upon binding the
hydrophobic mouse pheromone 2-sec-butyl-4,5-dihydrothiazole [320]. 15 N relax-
ation measurements of free and pheromone-bound MUP-I were fitted using the
Lipari-Szabo model-free approach [86, 87]. Order parameter differences were ob-
served between free and bound MUP-I that were consistent with an increase in
the conformational entropy. Out of 162 MUP-I residues, 68 showed significant re-
ductions in S 2 upon pheromone binding. The changes were distributed through-
out the b-barrel structure of MUP-I, with a particular prevalence in a helical turn
proposed to form a ligand entry gate at one end of the binding cavity. The authors
called into question assumptions that the protein backbone becomes more re-
stricted upon ligand binding.
Increased side chain entropy facilitates effector binding to the signal trans-
duction protein Cdc42Hs [321]. Cdc42Hs is a member of the Ras superfamily of
GTP-binding proteins that displays a wide range of side chain flexibility. Methyl
axis order parameters ranged from 0:3 G 0:1 (highly disordered) in regions near
the effector-binding site to 0:9 G 0:1 in some helices. Upon effector binding, the
majority of methyl groups showed a significant reduction in their order param-
eters, indicating increased entropy. Many of the methyl groups that showed
increased disorder were not part of the effector-binding interface. The authors pro-
pose that increased methyl dynamics balance entropy losses as the largely unstruc-
tured effector peptide folds into an ordered structure upon binding.
Actually, there is an entire class of proteins for which ligand binding may be
accompanied by destabilization of the native state [306, 307]. For example, the in-
troduction of a Ca 2þ -binding amino acid sequence did not affect the structure or
stability of the T4 lysozyme in the absence of calcium. However, in the presence
of this cation, the stability of the mutant protein was detectably less than that of
wild-type T4 lysozyme. This instability suggests that the binding of Ca 2þ might be
accompanied by considerable conformational changes in the modified loop that
lead to destabilization of the protein [322]. Similar effects have been described for
the calcium-binding N-domain fragments of Paramecium calmodulin [323], rat
calmodulin [324], and isolated domains of troponin C [325].
Similarly, the tertiary structure of calreticulin, a 46.8-kDa chaperone involved in
the conformational maturation of glycoproteins in the lumen and endoplasmic re-
ticulum, was distorted by Zn 2þ binding, which resulted in a concomitant decrease
in the conformational stability of this protein [326]. The zinc-induced structural
perturbations and destabilization are characteristic of several calcium-binding pro-
teins, including a-lactalbumin, parvalbumin, and recoverin [307].
Of special note are some biomedical implications of the observed destabilization
upon binding. For example, b2-microglobulin is able to bind Cu 2þ . The binding is
accompanied by a significant destabilization of the protein, suggesting that the ion
has a higher affinity for the unfolded form [327]. b2-microglobulin is a 12-kDa pol-
ypeptide that is necessary for the cell surface expression of the class I major histo-
compatibility complex (MHC). Turnover of MHC results in release of soluble
b2-microglobulin, followed by its catabolism in the kidney. In patients suffering
from kidney disease treated by dialysis, b2-microglobulin forms amyloid deposits
8.4 Functional Repertoire 325
8.4.2
Assembly/Disassembly
Protein crystallographers are often frustrated in attempts to ply their trade because
disordered N- and C-termini prevent crystallization. The idea that such sequences
have been retained by nature to frustrate crystallographers, although enticing, is
probably invalid since the termini have been around longer than crystallographers.
Instead, disordered termini are often conserved to facilitate assembly and disas-
sembly of complex objects, such as viruses.
The main idea of disorder-assisted assembly is that sequences from several dif-
ferent proteins are required for the disordered termini to fold into a defined struc-
ture that stabilizes the assembly. In some instances, only one player is disordered,
while in others disordered chains must interact to gain structure. Disorder-assisted
assembly accomplishes two beneficial goals. First, it ensures assembly only when
all the players are in their correct positions. Second, it prevents aggregation of in-
dividual components. For example, the N-terminal 20 residues of porcine muscle
lactate dehydrogenase form a disordered tail in the monomer that is essential for
tetramerization [328]. Namba has reviewed several of these systems, including ex-
amples from the assembly of tobacco mosaic virus, bacterial flagella, icosahedral
viruses, and DNA/RNA complexes [329].
8.4.3
Highly Entropic Chains
These are the elite of protein disorder – proteins, whole or in part, that function
only when disordered. Terms such as ‘‘entropic bristles,’’ ‘‘entropic springs,’’ and
‘‘entropic clocks’’ have been used to describe these systems, but these terms can
be misleading because all matter, except for perfect crystals at 0 Kelvin, has non-
zero entropy values.
Hoh set down the main concept in his contribution about entropic bristle do-
mains [330]. Disordered regions functioning as entropic bristles within a binding
site will block binding until the bristle is modified. The modification causes the
disordered region to move to one side of the binding site. Members of this class
can be recognized by the effects of deletion. That is, removal of the bristle should
lead to permanent activation [330]. The C-terminal region of p53 has been shown
to function as an entropic bristle domain of this type [330, 331]. Bristles can also
act as springs when two sets of disordered proteins are brought into contact with
each other. The interactions of neurofilaments may be controlled in this way [332–
334].
An extended unfolded region is important for the timed inactivation of some
326 8 Natively Disordered Proteins
Fig. 8.9. Example of an entropic clock. side of the pore assembly excludes binding of
Simplified model of a Shaker-type voltage- the ball domain. (B) Open state following
gated Kþ ion channel (dark grey) with a ‘‘ball- membrane depolarization. (C) After
and-chain’’ timing mechanism. The ball-and- depolarization, the cytoplasmic side of the
chain is comprised of an inactivation, or ball, pore opening assumes a negative charge that
domain (light grey) that is tethered to the facilitates interaction with the positively
pore assembly by a disordered chain of @60 charged ball domain. (D) Inactivation of the
residues. For simplicity, only four of the channel occurs when the ball domain occludes
proposed 10 states are shown [336]. The the pore. The transition from (C) to (D) does
cytoplasmic side of the assembly is oriented not involve charge migration and can be
downward. (A) Closed state prior to membrane modeled as a random walk of the ball domain
depolarization. Note that conformational towards the pore opening. (Portions of the
changes in the pore have sealed the channel figure are based on Antz et al. [396]).
and that a positive charge on the cytoplasmic
8.4 Functional Repertoire 327
examples described above, there are counter examples from related systems or pro-
teins where the region either is absent or is replaced by a globular domain.
8.4.4
Protein Modification
gions of disorder, while the complexity distribution of the residues flanking non-
phosphorylation sites matches almost exactly the complexity distribution obtained
for a collection of ordered proteins. In addition, there is a high correspondence be-
tween the prediction of disorder and the occurrence of phosphorylation and, con-
versely, the prediction of order and the lack of phosphorylation (unpublished ob-
servations). This is expected from the amino acid compositions of the residues
flanking phosphorylation and non-phosphorylation sites. A new predictor of phos-
phorylation exhibited small but significant improvement if predictions of order
and disorder were added [343]. These observations support the suggestion that
sites of protein phosphorylation occur preferentially in regions of native disorder.
Data support the suggestion that protease digestion and phosphorylation both
occur preferentially within regions of disorder. Also, several other types of protein
modification, such as acetylation, fatty acid acylation, ubiquitination, and methyla-
tion, have also been observed to occur in regions of intrinsic disorder [30, 31].
From these findings, it is tempting to suggest that sites of protein modification in
eukaryotic cells universally exhibit a preference for natively disordered regions.
What might be the basis of a preference for locating sites of modification within
regions of native disorder? For all of the examples discussed above, the modifying
enzyme has to bind to and modify similar sites in a wide variety of proteins. If all
the regions flanking these sites are disordered before binding to the modifying en-
zyme, it is easy to understand how a single enzyme could bind to and modify a
wide variety of protein targets. If instead all of these regions bind as ordered struc-
tures, then there is the complicating feature that the proteins being regulated by
the modification must all adopt the same local structure at the site of modification.
This imposes significant constraints on the site of modification. The structural
simplification that arises from locating the sites of modification within regions of
disorder is herein proposed to be an important principle. Elsewhere we point out
that a particular advantage of disorder for regulatory and signaling regions is that
changes such as protein modification lead to large-scale disorder-to-order structural
transitions: such large-scale structural changes are not subtle and so could be an
advantage for signaling and regulation as compared to the much smaller changes
that would be expected from the decoration of an ordered protein structure (see
Section 8.1.3).
8.5
Importance of Disorder for Protein Folding
Interest in the unfolded protein state has increased markedly in recent years. A
major motivation has been to better understand the structural transitions that
occur as a protein acquires 3-D structure, both from the point of view of the mech-
anism of folding and from the point of view of the energetics. A major effort
has been to connect structural models of the unfolded polypeptide chain with ex-
perimental data supporting the given model. The overall sizes of guanidinium
chloride-denatured proteins fit the values expected for random coils when excluded
8.5 Importance of Disorder for Protein Folding 329
volume effects were taken into account [351]. For a true random coil, the F and C
angles of a given dipeptide are independent of the angles of the dipeptides before
and after. This is often called the Flory isolated-pair hypothesis [352]. Both experi-
ments and calculations call the isolated-pair hypothesis into question [353]. For ex-
ample, using primarily repulsive terms, Pappu et al. [354] computed the effects of
steric clash (or excluded volume) on blocked polyalanines of various lengths. Con-
trary to Flory’s isolated hypothesis, they found that excluded volume effects were
sufficient to lead to preferred backbone conformations, with the polyproline II he-
lix being among the most preferred. Representing unfolded proteins as polymer
chain models, which have the advantage of simplifications that result from ignor-
ing most atomic details, is proving useful for modeling experimental data pertain-
ing to protein stability, folding, and interactions [355].
While there has been substantial emphasis on the study of unfolded proteins as
precursors to 3-D structures, as briefly described above, to our knowledge there has
been no systematic discussion or studies of how natively disordered regions affect
protein 3-D structure and folding kinetics. Here we will consider several possibil-
ities, each of which is a simple hypothesis that has not yet been tested by experi-
ment. These hypotheses form the basis for experiments into the effects of native
disorder on protein structure and folding.
First, a well-known example with ample experimental support is provided. Tryp-
sinogen folds into a stable 3-D structure, but compared to trypsin, the folding is
incomplete and the protein is inactive. Trypsinogen’s folding does involve the for-
mation of the crucial catalytic triad, and yet trypsinogen remains inactive evidently
because the binding pocket for the substrate lysine or arginine fails to form com-
pletely, e.g., remains natively disordered [12, 356]. Thus, trypsinogen, the precur-
sor to trypsin, remains inactive and thus does not harm the interior of the cell.
The folding into active trypsin is inhibited by a short region of native disorder at
the amino terminus; this region is completely missing in the electron density
map of trypsinogen [357]. Once trypsinogen has been exported from the cell, this
disordered region is cleaved by trypsin. The amino terminus changes from a highly
charged moiety into a hydrophobic terminus of isoleucine followed by valine (IV).
In the absence of the natively disordered, charged extension, this IV terminus be-
comes capable of binding into a particular site elsewhere in the structure. This
binding of the IV moieties brings about a disorder-to-order transition of trypsin’s
binding pocket, now enabling the protein to bind arginine or lysine and thereby
converting inactive trypsinogen into active trypsin [358]. Even in the absence of
the cleavage, high levels of an IV dipeptide can stimulate the protease activity of
uncleaved trypsinogen by binding into the site used by the IV terminus [12]. We
speculate that proteolytic removal of a natively disordered region may be a com-
mon mechanism for regulating protein folding and function, but we have not yet
searched systematically for other examples.
While trypsinogen provides an example of a protein for which a region of native
disorder inhibits protein folding, we speculate that there are proteins for which re-
gions of native disorder promote protein folding. Suppose, for example, that a pro-
tein contained a short, highly charged region of native disorder and a folded do-
330 8 Natively Disordered Proteins
main with an opposite charge of significant magnitude. From various analyses sug-
gesting that a high net charge can lead to a natively unfolded protein [123, 227,
359], the region of oppositely charged disorder might be essential for overall
charge balance and, if so, would be required for protein folding. A simple experi-
ment is proposed here: (1) identify example proteins in the PDB with short, highly
charged tails and oppositely charged, folded domains; (2) determine which do-
mains are likely not to fold without their oppositely charged tail by means of the
net charge versus hydropathy plot [227]; (3) remove the charged tail by proteolysis
or genetic engineering; and (4) compare the folding rate of the shorter protein with
that of the full-length protein. The prediction is that, if located in the unfolded re-
gion of the net charge hydropathy plot, the shorter protein would fail to fold or
would require higher ionic strength to fold as compared to the full-length protein.
Various theoretical analyses on protein folding have suggested relationships
among the size, stability, and topology of a protein fold and the rate and mecha-
nism by which the fold is achieved. The characterization of the folding of a large
number of simple, single-domain proteins enabled detailed studies to test the vari-
ous models and assertions regarding the mechanism of protein folding. The sim-
ple proteins in this set are characterized by having single domains; by the lack of
prosthetic groups, disulfide bonds, and cis proline residues; and by two-state refold-
ing kinetics. Despite the simplicity of this set, more than a million-fold variation in
refolding rates was observed. A remarkable finding was that the statistically signif-
icant correlations among the folding rate, the transition state placement, and the
relative contact order are observed, where the transition state placement is an esti-
mate of the fraction of the burial of the hydrophobic surface area in the transition
state as compared to the native state, and where the relative contact order is the
length-normalized sequence separation between contacting residues in the native
state [360]. A number of alternative empirical measures of topology were later
shown to correlate about equally well with folding rates as does the relative contact
order, including local secondary structure content [406], the number of sequence-
distant contacts per residue [361], the fraction of contacts that are distant in the
sequence [362], and the total contact distance [363]. The quality of these alternative
measures supports the conjecture that contact order predicts rates, not because it is
directly related to the mechanism of folding, but rather because it is related to an
alternative physical parameter of importance [364].
Attempts to reconcile the observed relationship between contact order or other
measures of topology and folding rates has led to the topomer search model
[365], which was based substantially on similar prior models [366, 367]. Simply
put, the topomer search model stipulates that the rate-limiting step in protein fold-
ing is the search for an unfolded conformation with the correct overall topology.
The unfolded form with the correct topology then rapidly folds into the native state
[364].
Assuming for the sake of discussion the basic correctness of the topomer search
model, the expected effects of native disorder can be readily described. First let us
consider natively disordered regions at the amino or carboxy terminus that do not
affect the overall protein stability and that do not stabilize misfolded intermediates.
8.6 Experimental Protocols 331
Such natively disordered regions would be expected to have very small effects on
the folding rates. On the other hand, natively disordered internal loops would
be expected to slow the rate of folding in a length-dependent manner. Indeed, a
systematic study of the folding rates of simple, two-state proteins with natively
disordered loops of varying lengths could provide a useful test of the topomer
search model.
The relationship between the log of the folding rates of simple two-state proteins
and the contact order value exhibits an r 2 value of about 0.8, suggesting that this
measure captures in excess of 3/4 of the variance in the logarithms of the reported
folding rates [364]. The topomer search model, which evidently captures the de-
pendence of protein folding on the contact order, ignores variation in foldability
along the sequence. On the other hand, predictors of order and disorder capture
local sequence tendencies for order or disorder. Such a local tendency for disorder
could be overridden by non-local interactions in the final native structure, and so a
local tendency for disorder could be important for a folded protein even if the na-
tive structure does not exhibit actual disorder. We wonder whether there is any re-
lationship between protein folding rates and variations in order/disorder tenden-
cies along an amino acid sequence. That is, we wonder whether the 1/4 variability
in the logarithms of the folding rate that is not captured by relative contact order
could be related to differences in the amounts or in the organization of disorder
tendencies along the amino acid sequences. Local regions with high tendencies
for disorder could, for example, lead to very non-uniform polymer chain models
by local alterations in the stiffness or persistence length; locating such anomalies
at topologically critical sites could greatly speed up or slow down protein folding
rates. These ideas could be tested both computationally and experimentally.
8.6
Experimental Protocols
In this section we include several protocols for NMR spectroscopy, X-ray crystallo-
graphy, and circular dichroism spectropolarimetry. We focus on these methods be-
cause they are traditionally the most frequently used techniques to characterize na-
tive disorder in proteins. Protocols for small angle X-ray diffraction, hydrodynamic
measurement, fluorescence methods, conformational stability, mass spectrometry-
based high-resolution hydrogen-deuterium exchange, protease sensitivity, and pre-
diction from sequence are available from the references cited in the above methods
section.
8.6.1
NMR Spectroscopy
tein sample at a concentration of @1 mM. These methods also assume that the in-
corporation of an NMR-active isotope such as 15 N and/or 13 C is possible.
where g and d are, respectively, the magnitude and duration of the gradient pulses,
D is the time between gradient pulses, and g is the gyromagnetic ratio of the ob-
served nucleus [372]. Nonlinear least-squares regression of a decaying exponential
function onto the data can be used to extract D. It is useful to combine these mea-
surements with another technique such as sedimentation to obtain information
about the shape of the protein.
dimensional NMR methods [81]. Spin-lattice relaxation rates are typically deter-
mined by collecting 8–12 two-dimensional spectra using relaxation delays from
10–1500 ms. Spin-spin relaxation rates can determined by collecting 8–12 two-
dimensional spectra using spin-echo delays of 10–300 ms. Peak heights from
each series of relaxation experiments are then fitted to a single decaying exponen-
tial. To measure the 1 H- 15 N NOEs, one spectrum is acquired with a 3-s mixing
time for the NOE to buildup and another spectrum is acquired with a 3-s recycle
delay for a reference. It is often necessary to optimize this delay, as intrinsically
disordered proteins and longer delays will attenuate larger-than-predicted negative
NHNOE values for highly dynamic regions of the protein [85]. For all experiments,
water suppression can be achieved by using pulsed field gradients. Uncertainties
in measured peak heights are usually estimated from baseline noise level and,
with good tuning and shimming, are typically less than 1% of the peak heights
from the first R1 and R 2 delay points.
gN
sNH ¼ R1 ðNOE 1Þ ð30Þ
gH
4sNH
JðoH Þ ¼ ð31Þ
5d 2
334 8 Natively Disordered Proteins
½4R1 5sNH
JðoN Þ ¼ ð32Þ
½3d 2 þ 4c 2
½6R 2 3R1 2:72sNH
Jð0Þ ¼ ð33Þ
½3d 2 þ 4c 2
This approach estimates the magnitude of the spectral density function at the
given frequencies, making no assumptions about the form of the spectral density
function or about the molecular behavior giving rise to the relaxation.
8.6.2
X-ray Crystallography
teins. Failure to obtain crystals is the single greatest experimental problem in X-ray
crystallography. Here, we present calsequestrin as a case study illustrating the suc-
cessful crystallization of a natively disordered protein.
Calsequestrin is a calcium-storage protein located within the sarcoplasmic retic-
ulum that binds 40–50 calcium ions with @1 mM affinity. Calsequestrin is a highly
acidic protein with many of its acidic residues located in clusters. The physio-
chemical properties of purified calsequestrin as revealed by tryptophan fluores-
cence [378–381], circular dichroism [378, 380–383], Raman spectroscopy [380,
384], NMR [380], and proteolytic digestion [381, 384, 385] indicated that calseques-
trin was mostly unfolded at low ionic strength. As ionic strength was increased,
folding into a compact structure was observed. This structure can be induced by
calcium as well as by other ions such as Naþ , Zn 2þ , Sr 2þ, Tb 2þ , Kþ , and Hþ . The
high Ca 2þ -binding capacity of calsequestrin was believed to require the formation
of aggregates, thus transitioning from a soluble disordered form to a solid crystal-
line form [381, 386, 387]. However, crystallization attempts in the presence of Ca 2þ
resulted in needle-like crystals. These liner polymeric forms were also observed
in vivo, suggesting that this was the physiologically relevant form of the complex
[388]. These narrow crystals were unsuitable for structure determination but did
demonstrate that crystallization of calsequestrin was possible. The facts that calse-
questrin adopted structure in the presence of mono- and divalent cations other
than calcium and that these forms were not observed to aggregate and precipitate
as the needle-like crystals did suggested that alternate crystal forms could be
possible.
The following hypothesis was formed: the growth of needle-like crystals of calse-
questrin is a two-step process [389]. The first step was the induction of structure by
high ionic strength. The second was the calcium-specific cross-bridging of individ-
ual calsequestrin molecules by means of the unneutralized charges remaining af-
ter initial nonspecific binding of ions by the monomers. This cross-bridging could
account for growth in a single direction, thereby producing the needles. To test this
hypothesis, crystallization in the absence of Ca 2þ was attempted [389, 390].
Calsequestrin was purified from the skeletal muscle of New Zealand white rab-
bits as previously described [381]. Approximately 500 initial crystallization experi-
ments were conducted using the hanging-drop vapor-diffusion method [391] with
an incomplete factorial approach [392] to cover a wide range of conditions. Each
condition was tested in a volume of 1 mL containing 5–10 mg of calsequestrin. Con-
ditions with high monovalent cation concentrations were emphasized, along with a
range of 16 different precipitation reagents. Good-sized non-needle crystals were
obtained when 2-methyl-2,4 pentane diol (MPD) was used as the precipitating re-
agent. The best crystals were grown from a solution containing 10% (v/v) MPD,
0.1 M sodium citrate, 0.05 M sodium cacodylate, pH 6.5, and 5 mg mL1 calse-
questrin. The nominal initial Naþ concentration was 0.35 M. The rectangular crys-
tals formed within one week and grew to 0:2 0:2 0:8 mm by the second week.
Structure determination details can be found elsewhere [389, 390].
Surprisingly, the resulting structure of calsequestrin exhibited three identical do-
mains, each with a thioredoxin protein fold. The three domains interact to form a
disk-like shape with an approximate radius of 32 Å and a thickness of 35 Å. No
336 8 Natively Disordered Proteins
clues to this three-domain structure were obtained from sequence analysis [393,
394]. Rather, analysis pointed towards a globular N-terminal domain and a C-
terminal disordered region [381]. Even in hindsight, no significant similarities
among the three similar domains could be deduced from the sequence.
A common approach to disordered regions in proteins is to remove the coding
regions for the disordered parts from the recombinant expression constructs so
that these regions do not prevent the protein from crystallizing [344]. In the case
of calsequestrin, the fact that structure could be induced in the disordered protein
by increasing cation concentration led to attempts at crystallization under non-in-
tuitive conditions. The result was the elucidation of an interesting and important
structure.
8.6.3
Circular Dichroism Spectropolarimetry
There is an excellent book on protein circular dichroism that contains several sec-
tions on the collection and interpretation of spectra [137]. As discussed above (see
Section 8.2.4), the interpretation of spectra from disordered proteins remains con-
troversial. However, the two most important experimental concepts are well agreed
upon.
First, collect data as far into the ultraviolet region as possible. The far-UV region,
from 260 nm to 190 nm, contains a great deal of information about secondary
structure. With a powerful UV source (which often means a new lamp) and a
well-behaved sample, data can be obtained down to wavelengths below 190 nm.
Second, only the electronic transitions of the protein being studied should contrib-
ute to the absorbance. Any extrinsic absorbance degrades the instrument’s ability
to detect the small differences between left and right circularly polarized light ab-
sorbed by the protein under study.
The two most common sources of unwanted absorbance are light scattering and
buffer/co-solute absorbance. The protein must not aggregate to such an extent that
the sample scatters light. It is important to pass every sample through a 0.2-micron
filter prior to data acquisition. A common mistake is to use a buffer that absorbs in
the ultraviolet region. Histidine is a common buffer because it is used to elute His-
tagged proteins from Ni 2þ -affinity columns, but at the concentrations used for elu-
tion, the histidine contributes to an excessive amount of ultraviolet absorbance.
Tris is also a poor choice. Phosphate and acetate are more useful buffers, at least
in terms of absorbance. Absorbance can also be a problem when collecting spectra
at high co-solute (e.g., sugars, urea, etc.) concentrations.
The best way to proceed is to collect a spectrum of filtered water or buffer and
then compare this spectrum to that of the solution. After choosing appropriate
solution conditions, there is no better protocol for preparing the sample than thor-
ough dialysis followed by filtration. The reference solution is then made from a fil-
tered sample of the solution from outside the dialysis bag.
CD analysis is especially sensitive to error due to misestimation of protein con-
centration. It is important that extreme care be taken during concentration deter-
References 337
mination. Additionally, we prefer the method of Gill and von Hippel [395] rather
than any of the popular colorimetric assays. This method is based on the calcula-
tion of a molar extinction coefficient based on the amino acid content of the pro-
tein under study and is typically accurate to G5%.
Acknowledgements
A.K.D. and M.S.C. are supported by NIH grant R01 LM 007688 (Indiana Univer-
sity School of Medicine). A.K.D. is additionally supported by NIH grants R43
CA099053, R43 CA 097629, R43 CA 097568, and R43 GM 066412 (Molecular Ki-
netics, Inc.). G.J.P. thanks the NSF and NIH for support and Trevor Creamer for
helpful discussions. G.W.D. is supported by NIH grant P20 RR 16448 from the
COBRE Program of the National Center for Research Resources.
References
exchange processes on the estimation 83 Kordel, J., Skelton, N. J., Akke, M.,
of equilibrium-constants by NMR. J. Palmer, A. G. & Chazin, W. J.
Magn. Reson. 33, 519–529. (1992). Backbone dynamics of
74 Spera, S. & Bax, A. (1991). Empirical calcium-loaded calbindin D9k studied
correlation between protein backbone by 2-dimensional proton-detected N15
conformation and C-alpha and C-beta NMR spectroscopy. Biochemistry 31,
C-13 nuclear-magnetic-resonance 4856–4866.
chemical-shifts. J. Am. Chem. Soc. 113, 84 McEvoy, M. M., de la Cruz, A. F. &
5490–5492. Dahlquist, F. W. (1997). Large
75 Wishart, D. S., Sykes, B. D. & modular proteins by NMR. Nat. Struct.
Richards, F. M. (1991). Relationship Biol. 4, 9.
between nuclear magnetic resonance 85 Renner, C., Schleicher, M.,
chemical shift and protein secondary Moroder, L. & Holak, T. A. (2002).
structure. J. Mol. Biol. 222, 311–33. Practical aspects of the 2D N-15-{H-1}-
76 Tcherkasskaya, O., Davidson, E. A. NOE experiment. J. Biomol. NMR 23,
& Uversky, V. N. (2003). Biophysical 23–33.
constraints for protein structure 86 Lipari, G. & Szabo, A. (1982). Model-
prediction. J. Proteome Res. 2, 37–42. Free Approach to the Interpretation
77 Daughdrill, G. W., Hanely, L. J. & of Nuclear Magnetic-Resonance
Dahlquist, F. W. (1998). The C- Relaxation in Macromolecules .1.
terminal half of the anti-sigma factor Theory and Range of Validity. J. Am.
FlgM contains a dynamic equilibrium Chem. Soc. 104, 4546–4559.
solution structure favoring helical 87 Lipari, G. & Szabo, A. (1982). Model-
conformations. Biochemistry 37, 1076– Free Approach to the Interpretation
1082. of Nuclear Magnetic-Resonance
78 Bracken, C., Carr, P. A., Cavanagh, Relaxation in Macromolecules .2.
J. & Palmer, A. G., 3rd. (1999). Analysis of Experimental Results. J.
Temperature dependence of intra- Am. Chem. Soc. 104, 4559–4570.
molecular dynamics of the basic 88 Peng, J. W. & Wagner, G. (1992).
leucine zipper of GCN4: implications Mapping of the spectral densities of
for the entropy of association with NaH bond motions in eglin c using
DNA. J. Mol. Biol. 285, 2133–2146. heteronuclear relaxation experiments.
79 Palmer, A. G., 3rd. (1993). Dynamic Biochemistry 31, 8571–86.
properties of proteins from NMR 89 Abragam, A. (1961). The principles of
spectroscopy. Curr. Opin. Biotechnol. 4, nuclear magnetism. The International
385–91. series of monographs on physics,
80 Lefevre, J. F., Dayie, K. T., Peng, Clarendon Press, Oxford.
J. W. & Wagner, G. (1996). Internal 90 Bruschweiler, R., Liao, X. &
mobility in the partially folded DNA Wright, P. E. (1995). Long-range
binding and dimerization domains of motional restrictions in a multi-
GAL4: NMR analysis of the NaH domain zinc-finger protein from
spectral density functions. Biochemistry anisotropic tumbling. Science 268,
35, 2674–86. 886–9.
81 Kay, L. E., Torchia, D. A. & Bax, A. 91 Tjandra, N., Feller, S. E., Pastor,
(1989). Backbone dynamics of proteins R. W. & Bax, A. (1995). Rotational
as studied by 15N inverse detected diffusion anisotropy of human
heteronuclear NMR spectroscopy: ubiquitin from N-15 NMR relaxation.
application to staphylococcal nuclease. J. Am. Chem. Soc. 117, 12562–12566.
Biochemistry 28, 8972–9. 92 Lee, L. K., Rance, M., Chazin, W. J.
82 Skelton, N. J., Palmer, A. G., Akke, & Palmer, A. G. (1997). Rotational
M., Kordel, J., Rance, M. & Chazin, diffusion anisotropy of proteins from
W. J. (1993). Practical aspects of 2- simultaneous analysis of N-15 and C-
dimensional proton-detected N-15 spin 13(alpha) nuclear spin relaxation. J.
relaxation measurements. J. Magn. Biomol. NMR 9, 287–298.
Reson. B 102, 253–264. 93 Buck, M., Schwalbe, H. & Dobson,
342 8 Natively Disordered Proteins
114 Doniach, S., Bascle, J., Garel, T. & 122 Munishkina, L. A., Fink, A. L. &
Orland, H. (1995). Partially folded Uversky, V. N. (In press). Formation
states of proteins: characterization by of amyloid fibrils from the core
X-ray scattering. J. Mol. Biol. 254, 960– histones in vitro. J. Mol. Biol.
7. 123 Uversky, V. N. (2002). What does it
115 Kataoka, M. & Goto, Y. (1996). X-ray mean to be natively unfolded? Eur. J.
solution scattering studies of protein Biochem. 269, 2–12.
folding. Fold. Des. 1, R107–14. 124 Guinier, A. & Fournet, G. (1955).
116 Uversky, V. N., Gillespie, J. R., Small-angle scattering of X-rays, John
Millett, I. S., Khodyakova, A. V., Wiley & Sons.
Vasiliev, A. M., Chernovskaya, T. V., 125 Millett, I. S., Doniach, S. & Plaxco,
Vasilenko, R. N., Kozlovskaya, G. K. W. (2002). Toward a taxonomy of
D., Dolgikh, D. A., Fink, A. L., the denatured state: small angle
Doniach, S. & Abramov, V. M. scattering studies of unfolded proteins.
(1999). Natively unfolded human Adv. Protein Chem. 62, 241–62.
prothymosin alpha adopts partially 126 Jaenicke, R. & Seckler, R. (1997).
folded collapsed conformation at Protein misassembly in vitro. Adv.
acidic pH. Biochemistry 38, 15009–16. Protein Chem. 50, 1–59.
117 Uversky, V. N., Gillespie, J. R., 127 Rose, G. D. (2002). Unfolded Proteins.
Millett, I. S., Khodyakova, A. V., Advances in Protein Chemistry (F. M.,
Vasilenko, R. N., Vasiliev, A. M., R., S., E. D. & J., K., Eds.), 62,
Rodionov, I. L., Kozlovskaya, G. D., Academic Press, New York.
Dolgikh, D. A., Fink, A. L., 128 Severgun, D. I. (1992). Determina-
Doniach, S., Permyakov, E. A. & tion of the regularization parameter
Abramov, V. M. (2000). Zn(2þ)- in indirect-transform methods
mediated structure formation and using perceptual criteria. J. Appl.
compaction of the ‘‘natively unfolded’’ Cryst. 25, 495–503.
human prothymosin alpha. Biochem. 129 Bergmann, A., Orthaber, D.,
Biophys. Res. Commun. 267, 663–668. Scherf, G. & Glatter, O. (2000).
118 Li, J., Uversky, V. N. & Fink, A. L. Improvement of SAXS measurements
(2001). Effect of familial Parkinson’s on Kratky slit systems by Göbel
disease point mutations A30P and mirrors and imaging-plate detectors. J.
A53T on the structural properties, Appl. Cryst. 33, 869–875.
aggregation, and fibrillation of human 130 Panick, G., Malessa, R., Winter, R.,
alpha-synuclein. Biochemistry 40, Rapp, G., Frye, K. J. & Royer, C. A.
11604–13. (1998). Structural characterization of
119 Uversky, V. N., Li, J. & Fink, A. L. the pressure-denatured state and
(2001). Evidence for a partially folded unfolding/refolding kinetics of
intermediate in alpha-synuclein fibril staphylococcal nuclease by synchro-
formation. J. Biol. Chem. 276, 10737– tron small-angle X-ray scattering and
10744. Fourier-transform infrared spectros-
120 Uversky, V. N., Li, J., Souillac, P., copy. J. Mol. Biol. 275, 389–402.
Jakes, R., Goedert, M. & Fink, A. L. 131 Pérez, J., Vachette, P., Russo, D.,
(2002). Biophysical properties of the Desmadril, M. & Durand, D. (2001).
synucleins and their propensities to Heat-induced unfolding of neocarzino-
fibrillate: inhibition of alpha-synuclein statin, a small all-beta protein
assembly by beta- and gamma- investigated by small-angle X-ray
synucleins. J. Biol. Chem. 25, 25. scattering. J. Mol. Biol. 308, 721–43.
121 Permyakov, S. E., Millett, I. S., 132 Uversky, V. N. (1993). Use of fast
Doniach, S., Permyakov, E. A. & protein size-exclusion liquid
Uversky, V. N. (2003). Natively unfolded chromatography to study the
C-terminal domain of caldesmon unfolding of proteins which denature
remains substantially unstructured through the molten globule.
after the effective binding to Biochemistry 32, 13288–13298.
calmodulin. Proteins 53, 855–62. 133 Uversky, V. N. (1994). Gel-permeation
344 8 Natively Disordered Proteins
236 Brandstetter, H., Bauer, M., topoisomerase II. Mol. Cell. Biol. 14,
Huber, R., Lollar, P. & Bode, W. 3197–3207.
(1995). X-ray structure of clotting 245 Muchmore, S. W., Sattler, M.,
factor IXa: active site and module Liang, H., Meadows, R. P., Harlan,
structure related to Xase activity and J. E., Yoon, H. S., Nettesheim, D.,
hemophilia B. Proc. Natl. Acad. Sci. Chang, B. S., Thompson, C. B.,
U.S.A. 92, 9796–9800. Wong, S. L., Ng, S. L. & Fesik, S. W.
237 Brandstetter, H., Kuhne, A., Bode, (1996). X-ray and NMR structure of
W., Huber, R., von der Saal, W., human Bcl-xL , an inhibitor of
Wirthensohn, K. & Engh, R. A. programmed cell death. Nature 381,
(1996). X-ray structure of active site- 335–341.
inhibited clotting factor Xa. 246 Yamamoto, K., Ichijo, H. &
Implications for drug design and Korsmeyer, S. J. (1999). BCL-2 is
substrate recognition. J. Biol. Chem. phosphorylated and inactivated by an
271, 29988–29992. ASK1/Jun N-terminal protein kinase
238 Padmanabhan, K., Padmanabhan, K. pathway normally activated at G(2)/M.
P., Tulinsky, A., Park, C. H., Bode, Mol. Cell. Biol. 19, 8469–8478.
W., Huber, R., Blankenship, D. T., 247 Cheng, E. H., Kirsch, D. G., Clem,
Cardin, A. D. & Kisiel, W. (1993). R. J., Ravi, R., Kastan, M. B., Bedi,
Structure of human des(1–45) factor A., Ueno, K. & Hardwick, J. M.
Xa at 2.2 A resolution. J. Mol. Biol. (1997). Conversion of Bcl-2 to a Bax-
232, 947–966. like death effector by caspases. Science
239 Aviles, F. J., Chapman, G. E., 278, 1966–1968.
Kneale, G. G., Crane-Robinson, C. 248 Chang, B. S., Minn, A. J.,
& Bradbury, E. M. (1978). The Muchmore, S. W., Fesik, S. W. &
conformation of histone H5. Isolation Thompson, C. B. (1997).
and characterisation of the globular Identification of a novel regulatory
segment. Eur. J. Biochem. 88, 363–371. domain in Bcl-X(L) and Bcl-2. EMBO
240 Ramakrishnan, V., Finch, J. T., J. 16, 968–977.
Graziano, V., Lee, P. L. & Sweet, R. 249 Holliger, P., Riechmann, L. &
M. (1993). Crystal structure of globular Williams, R. L. (1999). Crystal
domain of histone H5 and its structure of the two N-terminal
implications for nucleosome binding. domains of g3p from filamentous
Nature 362, 219–223. phage fd at 1.9 A: evidence for
241 Graziano, V., Gerchman, S. E., conformational lability. J. Mol. Biol.
Wonacott, A. J., Sweet, R. M., 288, 649–657.
Wells, J. R., White, S. W. & 250 Holliger, P. & Riechmann, L.
Ramakrishnan, V. (1990). Crystalliza- (1997). A conserved infection pathway
tion of the globular domain of histone for filamentous bacteriophages is
H5. J. Mol. Biol. 212, 253–257. suggested by the structure of the
242 Shaiu, W. L., Hu, T. & Hsieh, T. S. membrane penetration domain of the
(1999). The hydrophilic, protease- minor coat protein g3p from phage fd.
sensitive terminal domains of Structure 5, 265–275.
eucaryotic DNA topoisomerases have 251 Nilsson, N., Malmborg, A. C. &
essential intracellular functions. Pac. Borrebaeck, C. A. (2000). The phage
Symp. Biocomput. 4, 578–589. infection process: a functional role for
243 Berger, J. M., Gamblin, S. J., the distal linker region of bacterio-
Harrison, S. C. & Wang, J. C. (1996). phage protein 3. J. Virol. 74, 4229–
Structure and mechanism of DNA 4235.
topoisomerase II. Nature 379, 225– 252 Lee, C. H., Saksela, K., Mirza, U. A.,
232. Chait, B. T. & Kuriyan, J. (1996).
244 Caron, P. R., Watt, P. & Wang, J. C. Crystal structure of the conserved core
(1994). The C-terminal domain of of HIV-1 Nef complexed with a Src
Saccharomyces cerevisiae DNA family SH3 domain. Cell 85, 931–942.
350 8 Natively Disordered Proteins
253 Arold, S., Franken, P., Strub, M. P., 262 Harrison, R. W., Chatterjee, D. &
Hoh, F., Benichou, S., Benarous, R. Weber, I. T. (1995). Analysis of six
& Dumas, C. (1997). The crystal protein structures predicted by
structure of HIV-1 Nef protein bound comparative modeling techniques.
to the Fyn kinase SH3 domain Proteins 23, 463–71.
suggests a role for this complex in 263 Skolnick, J. & Fetrow, J. S. (2000).
altered T cell receptor signaling. From genes to protein structure and
Structure 5, 1361–1372. function: novel applications of com-
254 Geyer, M., Munte, C. E., Schorr, J., putational approaches in the genomic
Kellner, R. & Kalbitzer, H. R. era. Trends Biotechnol. 18, 34–9.
(1999). Structure of the anchor- 264 Skolnick, J., Fetrow, J. S. &
domain of myristoylated and non- Kolinski, A. (2000). Structural
myristoylated HIV-1 Nef protein. J. genomics and its importance for gene
Mol. Biol. 289, 123–138. function analysis. Nat. Biotechnol. 18,
255 Arold, S. T. & Baur, A. S. (2001). 283–287.
Dynamic Nef and Nef dynamics: how 265 Thornton, J. M., Orengo, C. A.,
structure could explain the complex Todd, A. E. & Pearl, F. M. (1999).
activities of this small HIV protein. Protein folds, functions and evolution.
Trends Biochem. Sci. 26, 356–363. J. Mol. Biol. 293, 333–42.
256 Karlin, D., Longhi, S., Receveur, V. 266 Dunker, A. K., Obradovic, Z.,
& Canard, B. (2002). The N-terminal Romero, P., Garner, E. C. & Brown,
domain of the phosphoprotein of C. J. (2000). Intrinsic protein disorder
morbilliviruses belongs to the natively in complete genomes. Genome Inform.
unfolded class of proteins. Virology Ser. Workshop Genome Inform. 11, 161–
296, 251–262. 171.
257 Karlin, D., Ferron, F., Canard, B. 267 Daughdrill, G. W., Chadsey, M. S.,
& Longhi, S. (2003). Structural dis- Karlinsey, J. E., Hughes, K. T. &
order and modular organization in Dahlquist, F. W. (1997). The C-
Paramyxovirinae N and P. J. Gen. terminal half of the anti-sigma factor,
Virol. 84, 3239–52. FlgM, becomes structured when
258 Longhi, S., Receveur-Brechot, V., bound to its target, sigma 28. Nat.
Karlin, D., Johansson, K., Darbon, Struct. Biol. 4, 285–291.
H., Bhella, D., Yeo, R., Finet, S. & 268 Donne, D. G., Viles, J. H., Groth,
Canard, B. (2003). The C-terminal D., Mehlhorn, I., James, T. L.,
domain of the measles virus nucleo- Cohen, F. E., Prusiner, S. B.,
protein is intrinsically disordered and Wright, P. E. & Dyson, H. J. (1997).
folds upon binding the C-terminal Structure of the recombinant full-
moiety of the phosphoprotein. J. Biol. length hamster prion protein PrP(29–
Chem. 278, 18638. 231): the N terminus is highly flexible.
259 Lesk, A. M., Levitt, M. & Chothia, Proc. Natl. Acad. Sci. U.S.A. 94,
C. (1986). Alignment of the amino 13452–13457.
acid sequences of distantly related 269 Jacobs, D. M., Lipton, A. S., Isern,
proteins using variable gap penalties. N. G., Daughdrill, G. W., Lowry, D.
Protein Eng. 1, 77–8. F., Gomes, X. & Wold, M. S. (1999).
260 Lesk, A. M. & Chothia, C. (1980). Human replication protein A: global
How different amino acid sequences fold of the N-terminal RPA-70 domain
determine similar protein structures: reveals a basic cleft and flexible C-
the structure and evolutionary terminal linker. J. Biomol. NMR 14,
dynamics of the globins. J. Mol. Biol. 321–331.
136, 225–70. 270 Lee, H., Mok, K. H., Muhandiram,
261 Chothia, C. & Lesk, A. M. (1987). R., Park, K. H., Suk, J. E., Kim, D.
The evolution of protein structures. H., Chang, J., Sung, Y. C., Choi,
Cold Spring Harb. Symp. Quant. Biol. K. Y. & Han, K. H. (2000). Local
52, 399–405. structural elements in the mostly