Professional Documents
Culture Documents
Protein Folding An Introduction - Compress
Protein Folding An Introduction - Compress
Cláudio M. Gomes
Patrícia F. N. Faísca
Protein Folding
An Introduction
SpringerBriefs in Molecular Science
Series editor
Cláudio M. Gomes, Faculty of Sciences, Biosystems & Integrative Sciences
Institute, University of Lisbon, Lisbon, Portugal
About the Series
Protein Folding
An Introduction
123
Cláudio M. Gomes Patrícia F. N. Faísca
Department of Chemistry Department of Physics
and Biochemistry Faculty of Sciences
Faculty of Sciences Biosystems & Integrative
Biosystems & Integrative Sciences Institute, University of Lisbon
Sciences Institute, University of Lisbon Lisbon, Portugal
Lisbon, Portugal
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
In memory of Professor Mark T. Fisher (1954–2018),
a dear friend and expert on protein folding whose
fascination by the GroEL folding chaperonin, one his
favourite molecular machines, was only surpassed by
his curiosity and keen spirit.
Mark Fisher in Lisbon,
February 2010
Preface
vii
viii Preface
research topic at the forefront of modern molecular biology and that many of its
younger readers will choose one of the many facets of protein folding as their future
research topic.
ix
x Contents
The authors are long-lasting friends who share a passion for protein folding. Having had distinct
academic backgrounds and tracks, they are now faculty members at the University of Lisbon, where
they teach and research in a highly collaborative environment.
xi
Protein Folding: An Introduction
We have come a long way since coining of the term protein and the early findings
that proteins are charged macromolecules composed of strings of amino acids linked
by peptide bonds. Today, structural biologists have technologies that allow in many
cases to achieve an atomic-level understanding of protein structure, dynamics and
folding; protein physics approaches have made substantial contributions to under-
standing the intricacies of folding mechanisms and its energetics; biochemists have
developed conceptual frameworks to relate protein structure with biological func-
tions. Yet, despite the efforts of a vibrant community of protein scientists, a lot of
questions remain to be answered in the field of protein structure and folding.
Without aiming to outline a historical chronology of the field, we, however, feel
it is important to start this book by providing the reader with a sense of some
its scientific landmarks. For an in-depth scholar perspective, interested readers are
referred to the beautiful account of the history of proteins by Tanford and Reynolds
in their book Nature’s Robots [1].
The term protein has been coined back in the early nineteenth century as a
proposal from Berzelius in correspondence with Mulder. In a letter written in 1838,
about their discussion on the results of the elemental analysis of albumins and
fibrin, performed by the latter, Berzelius coined the term from the Greek word
proteios which means primary (or, of primary importance), as he considered pro-
teins were the primitive substances in animal nutrition. Since proteins were named
as such, it took over 100 years for the dawn of the field of protein structural
biology. Paradoxically, or not so much as we shall find out along this book, the
importance of protein structure and folding became evident from unfolding studies
by Anson and Mirsky in the 1930s, showing that protein denaturation can actually
be reverted [2]. These findings prompted subsequent investigations, such as those
carried out by Anfinsen in the late 1950s, aimed at determining how protein
structure is maintained and what would be the essential interactions that hold the
structure of a native protein. The development of methods for the determination of
the structure of crystals by X-ray diffraction and their application to study proteins,
pioneered among others by Astbury with his studies on fibrous proteins in the
1930s [3], was critical for this outcome [4]. The regular patterns defining the
structural features of proteins became clear when Pauling proposed the structure of
the a-helix and b-sheets in 1951 [5], which was soon followed by the complete
structural determination of the structure of myoglobin and then haemoglobin,
respectively, by Kendrew and Perutz in 1959 [6].
Interestingly, many of the following questions in the field focused on under-
standing protein folding, i.e. the process through which a linear chain of amino
acids acquires a three-dimensional structure, which is biologically functional. The
so-called protein folding problem encapsulates three fundamental questions: (1) to
decipher the physical code according to which the amino acid sequence dictates a
protein’s native structure; (2) to establish the mechanisms that allow proteins to fold
so fast; and (3) to determine how the structure of a protein can be predicted from its
sequence. As pointed out in a recent review on the subject ‘What began as three
questions of basic science one half-century ago has now grown into the full-fledged
research field of protein physical science [7]’.
Number of Entries
Electron
microscopy
structures
Year
Fig. 1 Evolution of the number of known protein structures. Protein-only structures; inset: growth
of structures from 3D electron microscopy experiments released per year. Source PDB (rcsb.org)
Fig. 2 Diversity of protein topologies. Proteins adopt a diversity of structures with distinct
structural classifications, as in the depicted global representation of the protein fold space in which
information from the structural classification of protein (SCOP) database is included [9, 10]
4 Protein Folding: An Introduction
Fig. 3 Topological knots and knotted proteins. a A topological (or mathematical) knot with no
more than three crossings on a planar projection is termed trefoil (or 31) knot. b Knotted
proteins have their backbones tangled in a physical knot. A physical knot is different from a
topological knot because the curve it forms in space is open instead of closed. c A simple
representation of a slipknot. d Minimal, smoothed and ‘topologically equivalent’ representation of
backbone of the bacterial protein YibK (PDB ID: 1j85) following the application of Taylor’s
algorithm. Taylor’s algorithm reduces the protein backbone such that the knotted core, i.e. the
minimal segment of the polypeptide chain that contains the knot, is sufficiently far from both chain
ends for the knot type to be well defined. In this case, it is possible to identify a trefoil knot
The biological functions of proteins are in many cases closely tied to their
three-dimensional structure, and this constitutes the so-called structure-function
paradigm of structural biology. This is certainly a generally valid principle for
most globular proteins that are involved in structural or catalytic functions. In these
cases, disorganisation of the protein structure and loss of tertiary interactions impair
function, which is tightly associated with a given structural scaffold, for example to
accommodate a ligand or substrate or to assure a particular arrangement of a cat-
alytic site.
However, in the last years, it has become increasingly evident that many
polypeptides, or segments within polypeptides, occur under physiological condi-
tions in the cell, without folding into a well-defined tertiary structure. These in-
trinsically disordered proteins (or IDPs) instead adopt an ensemble of
unstructured or disordered conformations which are nevertheless functional [27,
28], defying the structure-function paradigm. Indeed, rather than being detrimental,
6 Protein Folding: An Introduction
these characteristics will, in some proteins, result in better biological functions. This
is the case of proteins involved in signalling processes that are able to accommodate
a considerable ‘fuzziness’ within their structure and that this actually results in
increased functional efficiency.
This is explained by the fact that a signalling protein must engage into multiple
protein–protein interactions that are favoured by the high mobility of disordered
conformers, allowing more efficient sampling of protein-target interactions. In this
scenario, the fact that a segment within these proteins is unstructured (Box 2—
Intrinsically Disordered Proteins) and can accommodate and fold upon binding to
multiple targets results in a functional advantage, sustaining the emerging disor-
der–function paradigm.
Intrinsically
Folding upon
Disordered
binding
Protein
Fig. 4 Folding upon binding. Schematic representation of folding upon binding of an IDP to a
target protein, resulting in a fuzzy complex with some regions still disordered (blue shadows)
Proteins are flexible molecules that have an overall shape but ‘wobble’ internally.
Motions in proteins involve rapid local motion with infrequent (slow) changes to a
different conformation. These changes involve rapid transitions between states.
Protein motion is thermally driven: water and solute interactions with proteins
induce vibrations and rocking motions (librations). Rapid local motions are har-
monic (symmetric vibrations) and uncorrelated [33].
The rotation of amino acid side chains is influenced by steric effects: bulkier side
chains are less symmetrical and undergo slower rotations, usually involving dis-
placement of nearby groups; on the other hand, short side chains rotate faster as
steric conflicts are minimised. Protein dynamics is influenced by interactions as
well as by binding of a ligand, which will induce order within the segment com-
prising its ligands. This is illustrated by cofactor containing proteins in which apo
forms are frequently much more dynamic than the holo counterparts. The same
10 Protein Folding: An Introduction
Fig. 5 Cartoon
representation of bovine
RNase A (PDB ID: 1FS3), C58 N-term
highlighting the four S–S
bonds that hold the protein’s
tertiary structure
C110 C26
C84
C72
C95
C65 C40
12 Protein Folding: An Introduction
(d) (c)
removal of urea Reoxidised
and addition of a under
minute amount of denaturing
58 110
β-mercaptoethanol 40 conditions
26
65
N 72
Disulfide scrambled
upon removal of the denaturant under oxidative conditions that favour the
re-establishment of disulphide bonds, the protein regains its native conformation
that corresponds to an enzymatically active form.
However, if the initial mixture of RNase A was reoxidised under denaturing
conditions, Anfinsen noted that this resulted in scrambling of RNase A disulphide
bonds (Fig. 6c). By scrambling, we mean that a random set of S–S linkages are
formed among all possible interactions available from connecting the eight cysteine
side chains. Even if urea is subsequently removed under oxidising conditions,
RNase A was only around 1% active. This observation suggested that the functional
RNase A conformation can only be achieved if the polypeptide arranges itself in a
way such that the specific cysteine pairs that are supposed to form disulphide bonds
are brought together through folding. This implied that the formation of RNase A
structure is dictated by the primary sequence and that, like Anfinsen put it, ‘the
scrambled protein appears to be devoid of the various aspects of structural reg-
ularity that characterize the native molecule’ [42]. Indeed, removal of urea from
scrambled inactive RNase and addition of a minute amount of b-mercaptoethanol
would result in the interchange of the disulphides and on its gradual interconversion
into a fully active conformer, indistinguishable from native RNase A (Fig. 6d).
What are then the dominant driving forces for protein folding? In a nutshell, protein
folding can be depicted as a process during which the exposure of non-polar side
chains to the surrounding aqueous environment is minimised, while packing
interactions and hydrogen bonds are optimised. The former describes the so-called
hydrophobic effect, which, as we shall see, constitutes the major thermodynamic
driving force for protein folding, while the latter refers to energetic contributions
from low-magnitude interactions, notably van der Walls interactions, which have
been overviewed in detail in Sect. 1.3.
Thermodynamically, the stability of a protein is given by the free energy dif-
ference between its folded and unfolded states. For a simple two-state system in
which a protein is in equilibrium between unfolded (U) and folded (F) states,
UF; ð1Þ
the free energy change of the folding process (DGfolding) is given by:
where Keq is the equilibrium constant, DH the enthalpy change and DSconf the
conformational entropy change. R is the universal gas constant and T is the absolute
temperature.
The experimental determination of the free energy difference between the
unfolded and the folded states can be obtained through gradual protein denaturation
of the folded protein under equilibrium conditions (Box 3—Protein Denaturation).
This can be achieved by applying the linear extrapolation method to analyse
chemical denaturation curves [43, 44], or using calorimetric approaches during
protein thermal unfolding [45].
Thermal Denaturation
Temperature increase leads to the destabilisation of non-covalent interactions
resulting in native structure loss. During thermal denaturation, the confor-
mational entropy of the unfolded state also increases and the entropic dif-
ference between the folded and unfolded states becomes large enough to
overcome the hydrophobic effect, thus resulting in protein unfolding.
Chemical Denaturation
The most common form of protein chemical denaturation is the one achieved
by chaotropic agents such as urea, guanidinium chloride (GuHCl) or guani-
dinium thiocyanate (GuSCN) (Fig. 7). While the exact mechanisms of action
are still under debate, one can take the general view that these compounds
disrupt the network of hydrogen bonds between water molecules, weakening
the hydrophobic effect, and therefore reducing the stability of the folded state.
In other words, by creating disorder in the solvent’s structure (i.e. by
increasing its entropy), chaotropic agents facilitate the solvation of the
non-polar side chains, and water molecules compete with atoms in the protein
for intraprotein interactions. After disruption of the secondary structure, these
denaturants will also interact directly with polar residues and the protein
peptide backbone resulting in the stabilisation of non-native conformations,
which easily convert into unfolded conformations. The relative efficacy of
these chemicals is GuSCN > GuHCl > urea. Organic solvents that interact
with non-polar groups in the protein interior will stabilise the unfolded states
of proteins and might as well result in protein denaturation.
pH denaturation
Protein structure and activity is optimal within a given pH range. Whenever
the solution’s pH changes in a way that affects the protonation state of side
chains of charged residues (Lys, Arg, His, Glu, Asp), there will be a weak-
ening of stabilising electrostatic interactions involving those groups.
In fact, the determined values for DGfolding are extremely low: the difference in
thermodynamic stability between the folded and the unfolded conformations is as
low as 20–80 kJ mol−1. This difference is comparable to the magnitude of some of
the stabilising forces that held proteins together (hydrogen bonds, electrostatic
interactions, van der Waals interactions), and much lower than, for example, the
dissociation energy of a single covalent bond (200–500 kJ mol−1). The inventory
of thermodynamic contributions to the protein folding process as discussed in
Sect. 1.3 allows depicting that the net driving force for protein folding is the result
of the difference between energetic and entropic contributions of high magnitude
with opposing effects. However, proteins are dynamic entities and their confor-
mational flexibility, which is quintessential for biological functions, would not be
attained should proteins be highly stable. This is the reason why proteins are said to
be marginally stable, a property believed to have been positively selected during
evolution [46] (Fig. 8).
The unfolded state is stabilised by a high conformational entropy (−T DSconf),
a term that results from the fact that random polypeptides can adopt a multitude of
distinct conformations with high mobility. The loss of conformational entropy is
thus a major opposing factor in the folding process, and its magnitude is higher
when the residual structure of the unfolded state is the lowest. For a completely
disorganised unfolded polypeptide, the configurational entropy would be the
highest; however, the fact that some proteins retain secondary structure in the
unfolded state indicates that the energetic penalty from this component is variable.
The main driving force for the folding process is the hydrophobic effect,
leading to non-polar interactions within the protein core. It illustrates the impor-
tance of water molecules, water structure and protein hydration in protein structure
and stability. The hydrophobic effect can be interpreted as follows. An unfolded
polypeptide exposes a high surface area of non-polar side chains to water mole-
cules, and this decreases the water H-bonding network, creating an energetically
H
-T S
Internal interactions
Conformational entropy
-T S
Hydrophobic effect
30 Therm
20 Meso
10 Psycro
0
20 40 60 80 100
Temperature (°C)
The beginning of research in protein folding can be traced back to the work
developed in the 1930s by Anson and Minsky who studied and discussed the
reversibility of protein denaturation [57]. Indeed, the refolding process observed
upon complete and previous unfolding of a protein under controlled experimental
conditions (of pH, ionic strength, ion concentration, etc.) offers an operational
3 Folding Kinetics and Mechanisms: How Is Structure Acquired? 21
(although not perfectly adequate) model to mimic the process of protein folding
inside the living cell (i.e. in vivo).
An interesting feature of reversible (thermal or chemical) protein denaturation
curves is their peculiar shape (Fig. 10a). The sigmoidal (or ‘S-shaped’) curve
indicates an abrupt change of a measurable protein property (e.g. energy, gyration
radius) from native to denatured state values. Furthermore, a narrow transition
region (as quantified by DT) indicates that many amino acids are involved in the
process. The ‘S-shaped’ curve is thus considered the hallmark of a cooperative
process. Protein folding cooperativity comes in two flavours: two-state (first-order
or ‘all-or-none’) cooperativity and one-state (or higher-order) cooperativity. When
the folding transition can be modelled by a two-state process, there is a
temperature (Figs. 10a and 10b), the so-called transition midpoint (or melting
temperature) Tm, at which the distribution of molecules over a measurable property
is bimodal (Fig. 10c).
This means that at Tm, only two states (the native and denatured) co-exist in
thermodynamic equilibrium, and the process resembles a first-order phase transition
between homogeneous phases (e.g. the gas-liquid, or solid-liquid phase transition).
This type of transition is typical of small, single-domain proteins (*150 amino
acids). Larger proteins, on the other hand, are likely to fold by populating inter-
mediate states as indicated by the presence of plateaus and shoulders in the tran-
sition curves, as well as multiphasic kinetics. Studies on molten globule states have
provided some experimental insights into what a common intermediate of the
protein folding process might be (Box 5—Molten Globules)
molecule melts as a single unit as expected in a two-state transition [61]. When the
van’t Hoff criterion is fulfilled, the model protein is said to exhibit thermodynamic
or calorimetric cooperativity.
BT
2 ¼ kB TH 2
with rH being the standard deviation of H (see also Fig. 10c), the van’t
Hoff enthalpy can be re-written as DHVH ¼ 2rH;MAX , i.e. van’t Hoff enthalpy
is equal to two times the standard deviation of enthalpy at TMAX. For the van’t
Hoff enthalpy to be approximately equal to the calorimetric enthalpy, the
enthalpy distribution should be bimodal with one peak corresponding to the
average enthalpy of the native state and the other to that of the denatured state
(Fig. 10c). This does not mean that there is absolutely no conformation with
enthalpy values in between, though the van’t Hoff criterion does require their
population to be very small.
fitting can be done by assuming that the only populated states are the native
(N) and the denatured states (D) (Fig. 12a). This does not mean that there are
no folding intermediates; indeed, these could be high-energy conformations,
which are transiently populated, and will not be detected by conventional
methods.
Two-state folding kinetics is generally studied in the context of transition
state (TS) theory by assuming that the folding rate constant, kf, exhibits an
Arrhenius-like dependence on activation energy of folding, i.e. the free
energy difference between the thermodynamically unstable folding transition
state (TS) and D,
kf / expðDGTSD =RTÞ,
and the unfolding rate constant is determined by the difference in free
energy between N and TS,
ku / expðDGTSN =RTÞ,
where R is the gas constant and T the absolute temperature.
Experiments show that two-state relaxation times have ‘chevron plot’
dependences on the concentration of denaturant (e.g. [GuHCl]) (Fig. 12b).
More precisely, ln kobs versus [denaturant] gives a V-shaped kinetics curve.
In the chevron plot,
kobs ¼ kf þ ku (s−1), with
kf ¼ kfH2 O expðmkf ½denaturantÞ and,
ku ¼ kuH2 O expðmku ½denaturantÞ,
H2 O
where kuðfÞ is the value extrapolated in the absence of denaturant and mk is
a constant of proportionality [62]. Therefore, the observation of a chevron
plot is often used as a signature of two-state folding kinetics. To construct a
chevron plot, one needs to perform folding experiments and unfolding
experiments to measure kobs under different denaturing conditions. In folding
experiments, a sample of unfolded protein in high-denaturant conditions is
rapidly mixed with an excess of buffer resulting in a low overall concentration
of denaturant. On the other hand, in unfolding experiments, a jump to
unfolding conditions driven by the rapid mixing of the protein with an excess
of denaturant solution results in a high-denaturant concentration. Folding (or
unfolding) progress is monitored by recording structural changes with an
optical probe (e.g. CD or fluorescence, see Sect. 5.1) and kobs is measured at
selected denaturant concentrations. In folding experiments, kobs approximates
kf for low-denaturant concentration, while in unfolding experiments, it
approximates ku at high-denaturant conditions. The chevron plot has a
characteristic V shape because the protein folds more slowly and unfolds
H2 O
more rapidly in the presence of denaturant than in pure buffer. ln kuðfÞ can be
determined by extrapolating back ln kobs to zero-denaturant conditions. The
folding transition is considered to exhibit two-state cooperativity when its
kinetics shows chevron behaviour. On the other hand, the occurrence of the
so-called chevron ‘rollovers’ (where chevron plots flatten out at very low
3 Folding Kinetics and Mechanisms: How Is Structure Acquired? 25
Fig. 12 Two-state folding kinetics. The two-state model of folding kinetics is generally addressed
in the context of transition state theory where a free energy barrier (on the top of which lays the
transition state) separates the denatured and the native states (a). Dependence of the observed rate
constant on denaturant concentration showing typical linear chevron behaviour. The deviation
from linear behaviour, the so-called chevron rollovers (grey line), indicates the presence of
intermediate states
The two-state model of protein folding, which became widely adopted in the 1960s,
led Cyrus Levinthal to establish the so-called Levinthal paradox [68]. He argued
that if folding is really a two-state process, without intermediates, a protein must
randomly explore all accessible unfolded conformations in order to find the native
one, which is a global free energy minimum according to Anfinsen’s thermody-
namic hypothesis [42] (see Sect. 2.1). If the search is unbiased (i.e. all unfolded
conformations are equally probable), a simple counting argument leads to an absurd
estimate of the folding time: assume that each amino acid within a protein can only
adopt two different conformations; assume as well that the conformational change is
so fast that amino acids can switch conformations in just one picosecond (the
timescale of a thermal vibration). Then, a small protein with 100 amino acids would
have access to a total number of 2100 (*1030) conformations, therefore requiring
about 2100 ps (i.e. 1010 years) to find the native one. Yet, proteins typically fold in
the timescale of milliseconds to seconds (exceptions include model systems where
proline isomerisation or other specific issues slowdown folding considerably). To
bypass this paradox, Levinthal proposed that protein folding should occur under
kinetic control rather than thermodynamic control. What this means is that instead
of folding to the structure that is the most thermodynamically stable, as implied by
Anfinsen’s dogma, a protein must fold to a metastable structure (i.e. a local energy
minimum) that is accessible through the fastest folding pathway and is biologically
active. In Levinthal’s view, a folding pathway is a well-defined sequence of events
‘which follows one another so as to carry the protein from the unfolded random coil
to a uniquely folded metastable state’ [68]. According to Levinthal, the goal of
achieving a global free energy minimum is not compatible with doing it fast, and ‘if
the final folded state turned out to be the one of lowest conformational energy, it
would be a consequence of biological evolution not of physical chemistry’. The
Levinthal paradox is an important landmark in the history of protein science
because it stimulated research on the kinetics and mechanisms of protein folding,
i.e. How do proteins acquire its native structure in a biologically relevant timescale?
Needless to say the search for mechanisms based on the formation of intermediate
states dominated folding arena for quite a while [69]. Eventually, in 1996, a model
for a kinetically controlled folding pathway as envisioned by Levinthal was sug-
gested for protein Serpin Plasminogen Activator Inhibitor 1 (PAI-1); the latter
3 Folding Kinetics and Mechanisms: How Is Structure Acquired? 27
forms a metastable active structure that converts slowly to the more stable but
low-activity, ‘latent’ conformation [70].
increase in computer power that started in the 1990s. Indeed, the possibility to use
computer simulations of simplified protein models combined with theoretical
studies framed in statistical mechanics played a decisive role towards unravelling
this remarkable biological puzzle leading to two landmark concepts in protein
folding. One is that of the folding funnel, developed in the context of the free
energy landscape theory, while the other is the nucleation condensation
(NC) mechanism of protein folding. Interestingly, the NC mechanism was predicted
in Monte Carlo lattice simulations at about the same time it was first supported by
experiments in vitro. Given their importance in the development of a theory of
protein folding, we address them briefly.
Although a model for folding as being limited by nucleation had been originally
proposed by Baldwin and co-workers in the early 1970s [77], it was only in the
1990s that Shakhnovich and co-workers reported the first detailed microscopic
study, based on Monte Carlo simulations of a simple lattice model, which supports
the hypothesis of a nucleation mechanism, akin to the nucleation growth mechanism
of first-order phase transitions, being at the heart of the folding process [78]. Indeed,
Shakhnovich and co-workers observed that in the folding of the lattice model sys-
tem, the rate-limiting step is the formation of a specific set of native interactions
predominantly involving residues far apart in the sequence, termed folding nucleus
(FN). Once the FN is formed, the native state is promptly (and reproducibly)
achieved. Since the formation of the FN is rate limiting, then it should coincide with
the formation of the TS in a two-state folding transition. Therefore, nucleation and
TS became inextricably linked topics in the context of protein folding.
where kmut and kWT are the folding rates of the mutant and wild-type (WT) proteins,
respectively, and DDGND is the change in the free energy of folding upon mutation
3 Folding Kinetics and Mechanisms: How Is Structure Acquired? 29
[62]. For a conservative mutation (which causes a small perturbation in the folding
process), RT lnðkmut =kWT Þ measures the change in the activation energy of fold-
ing, DDGTSD , and therefore
u ¼ DDGTSD =DDGND
Likewise, for two-state folding proteins, a phi-value near unity means that the TS
is energetically perturbed upon mutation as much as the native state is perturbed.
This can be interpreted as if the mutated residue is fully native (i.e. has all its native
interactions established) in the TS. On the other hand, a phi-value near zero can be
taken as evidence that the residue is as unstructured in the TS as it is in the
denatured ensemble. The occurrence of fractional phi-values may indicate the
existence of multiple folding pathways or a unique transition state with genuinely
weakened interactions [62]. Moreover, the interpretation of the so-called nonclas-
sical phi-values (u > 1 and u < 0) is also not straightforward and alternative
models for the phi-value have been proposed [80].
According to Fersht and co-workers, the picture of the TS that emerges from the
phi-value analysis is compatible with CI2 folding via a nucleation mechanism like
that reported for lattice proteins. The lack of tertiary structure in the TS of CI2 was
taken as evidence that secondary and tertiary structures form concomitantly in a
process that is triggered by the formation of the FN, a set of local interactions
stabilised by a few long-range interactions which are mainly associated with the
residues displaying the highest phi-values. Such a process was coined the nucle-
ation–condensation (NC) mechanism of protein folding [81]. Subsequent studies,
focusing on other target proteins, provided further evidence that the NC mechanism
is common among small, single-domain proteins (reviewed in [82]).
In the early 1990s, Wolynes, Onuchic and co-workers developed the free energy
landscape theory of protein folding in the framework of simple statistical
mechanics models of polymers, theory of spin glasses and computer simulation
[83–87].
The landscape theory, its underlying concepts and subtleties have been deeply
discussed and reflected by Dill and Chan in a series of pedagogical papers [88–90].
In what follows, we provide a brief summary of the theory’s pivotal concepts
following Refs. [88, 89].
The free energy landscape is a multidimensional representation of the folding
process where the vertical axis represents the internal free energy of a single
conformation, while the multiple lateral axes represent the conformational coordi-
nates (e.g. the dihedral bond angles, /1, W1, /2, W2 …). The internal free energy of
one conformation accounts for everything (i.e. hydrogen bonds, salt bridges, torsion
angle energies, hydrophobic and solvation free energies, etc.) except for the
30 Protein Folding: An Introduction
conformational entropy. It is called a free energy (and not merely energy) because
of the solvation terms, which can involve entropic contributions due to water
ordering.
Also, it does not represent the macroscopic free energy that would be measured
in a folding experiment in vitro because the internal free energy describes a
single-chain only, and not the ensemble average over all chain conformations. The
interested reader is advised to read [91] where an illuminating discussion regarding
the relation between the internal free energy and macroscopic free energy is
provided.
A point in the free energy landscape represents a conformation, and geometri-
cally closed conformations are close to each other on the free energy landscape. In
this framework, folding is viewed as a succession of random conformational
transitions starting from an arbitrary unfolded conformation. In its search for the
native state, a protein will lower its internal free energy by shielding its
hydrophobic residues in the core, by increasing its hydrogen bond content, by
increasing the number of salt bridges, etc. while it becomes progressively more
compact. The energy landscape has a funnelled shape that reflects this behaviour,
i.e. the concomitant decrease in the chain’s internal free energy and conformational
entropy that occurs during folding (Fig. 13a).
Roughly speaking, the conformational entropy is the number of conformations
with a given value of internal free energy; it is represented by the funnel’s width. In
the landscape’s framework, it does not make sense to think of a folding pathway in
the Levinthal sense, i.e. a succession of transitions between specific conformations
leading to the native state in a directed manner. Rather, in the landscape view, the
process of protein folding can be metaphorically pictured like that of water flowing
down a mountain. Within this scenario, instead of thinking in terms of specific
Fig. 13 Protein folding funnels. Pictorial representation of a perfectly funnel- shaped free energy
landscape representing a fast two-state folding transition (a). The Levinthal paradox can be framed
in terms of a ‘golf-course’-shaped free energy landscape featuring only the denatured and the
native states. In the absence of an energetic bias, the protein needs to randomly explore the
ensemble of denatured conformations until it finds one that gives access to the native state (b)
3 Folding Kinetics and Mechanisms: How Is Structure Acquired? 31
In 1998, Plaxco et al. proposed the relative contact order parameter, CO, as a
simple empirical measure of the native structure’s geometric complexity [92].
The CO measures the average sequence separation of contacting residue pairs in the
native structure relative to the chain length of the protein and is defined as
1 X
N
CO ¼ Dij ji jj
LN i;j
where Dij ¼ 1 if residues i and j are in contact and is 0 otherwise; N is the total
number of contacts and L is the chain length. The CO can be viewed as a metric of
native geometry because it measures the average sequence separation of contacting
residue pairs in the native structure. In a subsequent study, Plaxco et al. reported a
rather strong correlation (r = 0.92) between the CO parameter and the logarithmic
folding rates of 24 small (*150 amino acids) single-domain, two-state proteins
[93], suggesting that the native’s state geometry could be the major determinant of
two-state folding kinetics.
Shortly after this discovery, several authors proposed alternative properties
(some of which bearing some resemblance with the CO) to quantify the geometry of
the native structure (e.g. the long-range order [94] and cliquishness [95], just to
32 Protein Folding: An Introduction
mention two examples) that appeared to correlate equally well with the folding
rates of small, two-state proteins. Clearly, part of the charm of these reductionist
approaches is that they dramatically simplify the solution to the folding puzzle since
the protein’s primary sequence, and all its inherent complexities, is no longer at the
centre stage of the problem. Concretely, Plaxco’s ‘CO-law’ somehow complements
Anfinsen’s folding principle—that the protein’s primary sequence determines the
native structure—by stating that it is the native structure itself, through its geometric
property CO, that determines the folding rate.
An array of studies collected in the last decade, both theoretical and experi-
mental, somehow contributed to strengthen the idea that the CO is a major driver of
two-state folding kinetics [96–100]. However, in order for the CO-rate dependence
to acquire the status of a fundamental principle of protein folding, it is necessary to
understand its fundamental roots, i.e. to identify the physical mechanism underlying
the correlation. Faísca and Ball were the first to explore the CO-rate dependence in
the context of Monte Carlo simulations of simple lattice models with additive
pairwise interactions, but they only observed moderately high correlations for long
chain lengths and high CO [101]. A subsequent study by Kaya and Chan, also
framed on lattice simulations, indicated that the CO-dependent rate may result from
a coupling mechanism between local and non-local (i.e. long-range) interactions
(i.e. interactions involving pairs of residues that are far away along the sequence)
[102]. Indeed, the latter renders the folding transition more cooperative (both
kinetically and thermodynamically) but also results in highly dispersed folding
rates, a necessary condition to observe a strong statistical correlation with the CO.
Several other physical mechanisms have been proposed to rationalise the
CO-dependent folding rates. An interesting example is the topomer search model,
which stipulates that the rate-limiting step in folding is a diffusive search within the
unfolded state for a conformer with the correct topology [103]. However, despite
these and more recent insights [104], the physical principles underlying the CO-rate
dependence remain elusive, and a widely accepted physical theory for the CO-rate
dependence is still missing. Furthermore, it is known that the CO alone is not able
to predict the folding rate of larger, multidomain proteins [105, 106]. Additionally,
when recent experimental kinetic data have been carefully selected for
single-domain proteins, taking care to eliminate temperature effects, the correlation
between folding rates and CO does not seem to be so relevant [107, 108].
Researchers have been trying to disclose the folding mechanism of knotted proteins
over the last 12 years. Insights gathered in the context of molecular simulations,
using different kinds of models and methodologies, have been particularly important
and illuminating (reviewed in [109–111]). On the experimental side, research has
focused mainly on knotted trefoils YibK (PDB ID: 1j85) and YbeA (PDB ID: 1ns5)
[112]. In what follows, we provide a summary of the main results obtained so far.
3 Folding Kinetics and Mechanisms: How Is Structure Acquired? 33
Although the chaperonin mechanism in YibK and YibA has not yet been
established, it has been suggested that it should not be limited to steric confinement
and it was proposed that the chaperonin facilitates unfolding of kinetically and
topologically trapped intermediates, or that it stabilises interactions that promote
knotting [121]. These specific effects remain to be elucidated, and, in more general
terms, the active role played by the chaperonin in the folding of knotted proteins
remains to be established.
An important question which also pends investigation is the relationship
between the knotting mechanism and the nucleation mechanism in knotted model
systems with two-state kinetics. Assuming that there is evolutionary control of the
folding speed, it should have resulted in additional pressure applied on the folding
nucleus [125]. Therefore, an overlap between folding and knotting may imply that
the interactions that nucleate the knot have also been optimised for folding speed
(i.e. the optimisation of the knotting mechanism is a side effect of folding opti-
misation). In line with this conjecture, designed protein 2ouf, embedding a trefoil
knot, folds fast with a two-state transition (with the nucleation of the transition state
and nucleation of the knot being concomitant processes) [126].
The physical principles that govern protein folding remain unaltered in vivo, despite
the increased complexity of the intracellular environment. Indeed, the interior of
cells is a highly crowded environment in which the amount of macromolecules
reaches very high concentrations, up to 300–400 mg/ml (Fig. 15).
One of the consequences of the macromolecular crowding effect is that very
little free water is present in cells, and this has an important consequence on the
protein hydration layer (see Sect. 1.4) and on the protein folding process. Indeed,
crowding can reduce the yield of correctly folded protein by increasing protein
aggregation through aberrant non-native interactions of newly formed nascent
polypeptides.
Also, macromolecular crowding affects protein folding dynamics and protein
structure so that crowding-induced conformational changes are certainly an
important source of non-native states and aberrantly folded proteins whose func-
tional deficiency and potential toxicity to cells must be avoided at all costs.
To deal with these hazardous effects of the intracellular environment, cells have
developed several regulatory mechanisms, which include:
• regulation of protein folding at the ribosome, achieved by co-translational
folding and, in some instances, helix formation within the ribosome exit
channel, and tertiary folding at the vestibule [128];
4 Protein Misfolding: Why Proteins Misbehave?
Fig. 15 Intracellular molecular crowding. Molecular model of a bacterial cytoplasm with atomistic-level detail highlighted with proteins, tRNA, GroEL and
ribosomes. Reprinted from [127] with permission (CC-BY licence)
35
36 Protein Folding: An Introduction
Fig. 16 Organisation of chaperone pathways in the cytosol. The number of interacting substrates
with the different chaperones is indicated as a percentage of the total proteome. Reprinted from
[132] with permission
Fig. 17 Protein folding in the GroEL-GroES chaperonin cage. Substrate protein (SP) binding to
the GroEL cavity and subsequent conformational changes triggered by ATP and GroEL promote
folding and subsequent release of the encapsulated protein. The structural model is based on PDB
ID: 1AON. Reprinted from [135] with permission
38 Protein Folding: An Introduction
Fig. 18 Protein misfolding landscapes. Energetic funnels depicting cases in which a mutation or
adverse condition (represented by blue glow) results in formation of a a destabilised protein,
b accumulation of an unstable folding conformer and c formation of protein aggregates [35]
The so-called protein misfolding diseases (also known as protein folding diseases
or conformational disorders) refers to a vast group of pathologies that are related to
faulty protein folding, or to misfolding and aggregation. Defects in the protein
folding process result in disease due to protein loss of function that occurs because
of destabilisation or degradation, or in toxic gain of function due to amyloid for-
mation and toxic accumulation. In many instances, the intense physiological reg-
ulation of protein folding by molecular chaperones and protein quality control
systems, is insufficient to prevent the formation of misfit conformations that result
in destabilised protein folds and/or protein aggregates [139]. Indeed, misfolded
proteins lead to disease by affecting a variety of pathways and cellular processes
(Fig. 20).
Protein misfolding diseases can be organised into three broad groups: (a) dis-
eases resulting from protein misfolding and destabilisation with no aggregation;
(b) diseases resulting from protein misfolding with aggregation (amyloid); and
(c) diseases resulting from defects in molecular chaperones (chaperonopathies).
Although protein misfolding accounts for most of the protein folding diseases, it is
noteworthy that some pathologies within this group also result from defects on
molecular chaperones (Table 2).
4 Protein Misfolding: Why Proteins Misbehave? 41
The observation that proteins aggregate dates back to the nineteenth century, i.e. to
a time well before the beginning of research in protein folding. The term amyloid
comes from the Latin word amylum, which means starch. This naming reflects the
fact that when it was originally discovered in 1854, amyloid was considered to be a
polysaccharide [142]. But amyloid is not a sugar, and a few years later, it became
clear that amyloid is a material made of proteins. The phenomenon has not attracted
considerable attention until recently, when it became clear that there is an associ-
ation between the formation and the deposition of amyloid fibrils in vivo and
disease states, as discussed below [139, 143].
42 Protein Folding: An Introduction
The most important hallmark of amyloid, which also constitutes its structural
fingerprint, is the cross b-sheet motif in which stacked b-strands run perpendicu-
larly to the fibril axis (Fig. 21). Astubry originally proposed the ‘cross-b spine’
motif in 1935 based on X-ray diffraction studies [145].
A low solubility associated with its one-dimensional nature makes amyloid
particularly challenging for X-ray crystallography, and its large size challenges the
use of NMR. As a result, atomic force microscopy, solid-state NMR, and
transmission electron microscopy became important tools in structural studies of
amyloid. Thus, it is perhaps not surprising that 70 years elapsed from Astbury’s
seminal findings until Eisenberg and co-workers reported the first atomically
resolved picture of amyloid based on X-ray diffraction [147]. The success of
Eisenberg and colleagues relied on their ‘reductionist’ approach. Indeed, instead of
focusing into amyloids of a full-length protein chain, they looked into amyloids
from a soluble fibril-forming peptide of yeast protein Sup35. The generic cross-beta
spine motif turned out to be a double beta-sheet, with each sheet formed from
parallel segments (the beta-strands) stacked in register. Side chains protruding from
the two sheets form a tight, dry, self-complementing interface termed ‘steric zip-
per’. Several classes of steric zippers have been identified which differ in the
organization of the beta-strands within and between the beta-sheets (parallel and
antiparallel), and in the stacking of the sheets (face-to-face, face-to-back,
back-to-back, etc.) [148]. A network of hydrogen bonds, together with hydrophobic
interactions and p-p stacking interactions, guarantees the stability of amyloid fibrils.
It has been suggested that the geometrical restrictions imposed by pi-stacking may
actually accelerate amyloid fibril formation, thus playing a particularly important
role in amyloid assembly [149].
44 Protein Folding: An Introduction
Recent advances in the structural biology of amyloids were made possible using
cryo-electron microscopy approaches which circumvent the need for ordered
crystals for X-ray crystallography. In the last two years, this has allowed the
high-resolution structural determination of amyloids from amyloid b (1-42) [150],
tau [151], a-synuclein [152] and b2-microblogulin [153]. These studies revealed the
existence of structural (or thermodynamic) polymorphisms in amyloids (re-
viewed in [154]). The latter correspond to different b-strand arrangements, which
are all compatible with the cross-b spine motif. They result from the packing of
different steric zippers (segmental and packing polymorphisms), different side
chain packing or different supramolecular assemblies of protofilaments (Fase
Arches) (assembly polymorphism). The ability of a protein sequence to fibrillate
into multiple similar energy minima (i.e. different polymorphs) depends on the
employed experimental conditions (temperature, pH, etc.), with the solvent (water)
playing a particularly important role in the structural diversity in fibril assembly
[155]. Interestingly, there is growing evidence that both biological and synthetic
surfaces can not only enhance amyloid assembly, but also lead to different amyloid
polymorphs [156]. This has important implications for cell biology given the large
surface area that is provided by macromolecules and phospholipid bilayers in the
intracellular environment.
Recently, the concept of amyloid polymorphisms was extended beyond struc-
ture, to include the so-called ‘stability’ polymorphism. The latter reflects the fact
that co-polimerisation of different protein variants is able to expand the repertoire of
structural and thermodynamic polymorphs by creating fibrils with different struc-
tural and thermodynamic signatures [157].
As we shall see in the next section, amyloids are frequently associated with
disease states. However, amyloid is not necessarily related to disease. The so-called
functional amyloids relate to a large number of cases in which amyloids play
functional roles in many life forms (from bacteria to fungi and mammals) [158–
160]. For instance, it is known that several microorganisms (e.g. E. coli) use
amyloid structural motifs in extracellular biomaterials with important physiological
role (e.g. curli fibres that facilitate the formation of biofilms [161]). Amyloid
protects the oocyte and embryo of insects (e.g. the silk moth) and fishes [162]. In
secretory granules, peptide and protein hormones are stored in an amyloid-like
aggregation state [163]. Amyloid templates and accelerates the covalent poly-
merisation of reactive small molecules into melanin [164]. These are just a few
examples that illustrate the biological function of amyloids; what was initially
exclusively viewed as a pathological material is now considered to play an
important role in maintaining and protecting the living cells.
The concept of functional amyloid led to the proposal that the term
‘amyloid-like’ should be used for proteins that possess the hallmarks of amyloid
but are not associated with pathological plaques. But is there any real difference
between functional and disease-related amyloid? While a definite answer to this
question is still lacking, results obtained so far indicate that parallel in-register
b-sheet structures (composed of individual polypeptides stacking in-register every
4.7 Å along the fibril axis) are common to many full-length proteins in pathological
4 Protein Misfolding: Why Proteins Misbehave? 45
Given the importance of amyloid in health, disease and biotechnology, solving the
mechanism that leads to amyloid, and more generally, understanding the mecha-
nism(s) of protein aggregation, is of the utmost importance. But the task of dis-
secting the aggregation routes leading to amyloids is proving even more
challenging than solving the folding mechanism leading to a biologically functional
native state. This is so because protein aggregation is remarkably complex for
several reasons:
• conformational states formed along the aggregation pathway are significantly
structural heterogeneous;
• many of them are only transiently populated, which makes structural charac-
terisation a challenge with routinely used biophysics methods;
• their formation depends critically on the environmental conditions (temperature,
pH, salt, other proteins, membranes, ions, etc.);
• due to the multiple length scales and timescales involved in the process, it is
necessary to employ a wide range of techniques that span these wide-ranging
length and timescales.
Despite these difficulties, significant advances have been made to tackle in vitro
[169] and in vivo [170, 171] the complexity of protein aggregation and
amyloidosis.
The need to understand how proteins aggregate into amyloids led to a renewed
interest in the concept of intermediate states, which was somehow set aside in the
1990s when small proteins with two-state kinetics became widely popular models
to study protein folding. Nowadays, it is widely accepted that protein aggregation is
46 Protein Folding: An Introduction
filaments, fibrils and protofibrils that form along the amyloid cascade (Fig. 22).
Moreover, important insights may be gained by defining the rate constants gov-
erning each microscopic step, and determining the manner according to which they
depend on protein sequence and environmental conditions, i.e. the kinetics of
protein aggregation in amyloid formation [182].
A typical aggregation curve (showing the percentage of aggregate vs. time)
exhibits a sigmoidal shape, which is similar to that observed in a two-state folding
transition (Fig. 23a), although not necessarily as steep. Based on the anatomy of
this curve, the mechanism of aggregation is generally divided into three stages: the
lag phase, the growth phase and the final phase (or plateau). Monomers are
dominant during the lag phase, fibrils dominate the final plateau (where the con-
centration of monomers has reached its equilibrium value), while in the growth (or
elongation) phase, their concentrations are similar; the rate formation of amyloid is
largest in the last stage.
The lag phase corresponds to the formation of a critical nucleus, i.e. the smallest
aggregates in the process that are stable enough so that further growth by monomer
addition is faster than dissociation into monomers. In thermodynamic terms, nuclei
are the species of highest free energy along the amyloid pathway (as the TS is the
state of highest free energy in a two-state folding transition) (Fig. 23b). In a system
where aggregation starts from a solution of pure monomeric proteins (or peptides),
the formation of the critical nucleus (which can consist in millions of dimers of the
EAP) is the only molecular event contributing to amyloid formation. However,
primary nucleation typically represents 10−7% of the lag time, which means that
additional fibril-dependent processes (elongation, fragmentation and secondary
nucleation) will also occur within the lag time, and the critical nucleus should
involve the formation of large populations of several species. Indeed, in general,
none of the three phases can be ascribed to a single microscopic process and several
species will co-exist in each phase [182, 183]. The microscopic processes
Fig. 23 Typical amyloid aggregation curve. The three phases of amyloidogenesis (a) and the free
energy of aggregation (projected along a suitable reaction coordinate), showing that the
aggregation kinetics is limited by the process of nucleation, which corresponds to the formation of
a critical number (of the order of millions) of aggregation nuclei
48 Protein Folding: An Introduction
PRIMARY
NUCLEATION
monomer
SECONDARY NUCLEATION
ELONGATION
fibril
oligomer
driving the overall formation of fibrils (Fig. 24) can be, however, determined using
kinetics in combination with systematic experimental data sets analysed in a global
manner [184].
An outstanding question in amyloid disease, which is directly related to the
mechanism of amyloid formation, concerns the mechanism(s) of cytotoxicity. The
classical amyloid hypothesis states that the toxic species is the amyloid fibril itself
[185]. However, this idea is gradually evolving into the concept that the oligomers
produced along the amyloid cascade are the primary toxic species, while fibrils may
be inert or even protective [186]. This assumption rests on growing evidence that
pre-fibrillar oligomers have the potential to disrupt the permeability of cellular
membranes (through the formation of ion channels, pores or non-selective per-
meation of lipid bilayers) eventually causing cell death [187]. Other known toxicity
processes involve seeding reactions over other amyloidogenic proteins,
metal-mediated toxicity through generation of reactive oxygen species (ROS), and
saturation of the proteostasis network by sequestering of molecular chaperones by
sticky amyloids and its precursors [188] (Fig. 25).
The identification and structural characterisation of the several species produced
along the aggregation pathway leading to amyloid is therefore of utmost impor-
tance. However, difficulty in obtaining highly pure samples of non-fibrillar
aggregates that are sufficiently long-lived for biophysical studies has significantly
hindered progress in the field. It is likely the case that molecular simulations will
play an important role in this new challenge [189].
4 Protein Misfolding: Why Proteins Misbehave? 49
Chaperone
protein
Membrane non-amyloid
disruption conformer
O2
Sequestering of
ROS
Molecular Chaperones
Amyloid Seeding
Metal mediated
ROS toxicity
thermodynamically stable state of any protein is the amyloid state has important
conceptual consequences. In particular, it calls for a re-thinking of the folding space
(which accounts for the native and unfolded states and folding intermediates) to
include amyloid and all relevant structures that pave the way to amyloid formation
[181].
The ability to determine the conformational status of a protein to infer about its
folding state and evaluate fractional unfolding or denaturation is critical in protein
folding studies. To this purpose, recent years have witnessed what has been coined
an expanding arsenal of methodologies applied to protein folding that range from
fluorescence, circular dichroism (CD), nuclear magnetic resonance (NMR) and
Fourier transform infrared spectroscopy (FTIR). These spectroscopic approaches
allow to monitor structural and conformational changes in proteins, from multiple
complementary angles, as a function of time (in folding/unfolding kinetic assays),
denaturant (chemical denaturation or thermal denaturation) or chemical environ-
ment (solution pH, metal ion, ligand/inhibitor). Several excellent reviews and
methods papers covering advanced applications, advantages and limitation of dif-
ferent protein spectroscopies are available, to which the interested reader may refer
to [201–205]. Here we briefly highlight the value in protein folding studies of
protein fluorescence and circular dichroism, two biophysical spectroscopies com-
monly available in biochemistry laboratories.
Protein Fluorescence Aromatic residues, most notably Trp due to its high quan-
tum yield, afford intrinsic fluorescence to proteins which is very useful to evaluate
conformational changes. Trp-fluorescence emission, being extremely sensitive to
the polarity of the side chain environment, is an excellent reporter of the folding
state. Upon excitation at 280 nm, Trp will emit maximally from 345 to 355 nm,
depending on how water exposes the indole side chain is (Fig. 26a). A buried Trp
will emit closer to 345 nm, whereas the emission maximum of a completely
solvent-exposed Trp is red-shifted towards 355 nm. Since Trp residues are fre-
quently buried in the protein interior, the intensity-averaged maximum of the
emission band is thus proportional to the folded state of the protein, from which the
fraction F/U may be determined. For proteins with n Trp, the resulting emission
reflects the environments of the n side chains.
Through fluorescence quenching experiments using iodine or acrylamide, one
can determine the relative fraction of emitting and quencher-accessible Trp side
chains, using the Stern-Volmer equation. A variety of extrinsic fluorophores are
also routinely employed to protein folding and conformation (Fig. 27). Among the
most useful is 8-anilinonaphthalene-1-sulphonic acid (ANS) which emits upon
5 Methods for Protein Folding 51
(a) (b)
80°C
25°C
7M 25°C
GdmCl
80°C
Fig. 27 Examples of common extrinsic fluorophores used in protein folding and stability analysis
bond whose absorption bands in the far-UV (190–250 nm) result from n ! p*
(222 nm) and p ! p* (190 nm) electronic transitions. A different arrangement of
the groups (another conformation) changes the overlap of the molecular orbitals and
their energy levels. What makes CD spectroscopy so sensitive to protein secondary
structure is the fact that the arrangement of the groups in distinct conformations (as
in different types of secondary structure) changes the overlap of the molecular
orbitals and their energy levels and some conformations permit a more constructive
interaction than others (affecting the intensity and allowing a discrimination
between different types of secondary structure). Far-UV CD thus allows not only to
identify and quantitate the relative contributions of different types of secondary
structural elements in a protein but also to discriminate between folded and
unfolded conformers (Fig. 27).
Proteins that are mostly a-helical have intense spectra with a positive band at
190 nm and a significant double well with minima at 208 and 222 nm. Regular
b-sheet proteins with long and aligned strands (referred to as b-I) exhibit less
intense bands and the maximum at 190 nm is slightly red-shifted. Random (r) and
b-II type proteins (which are b-rich proteins that contain only short strands that are
not rigidly aligned) exhibit a similar CD spectrum, with a negative band at 200 nm
and almost no positive bands (Fig. 28).
Albeit with substantially lower intensities, tryptophan (300–280 nm), tyrosine
(290–270 nm) and phenylalanine residues (270–250 nm) as well as disulphide
bonds (250 nm) also exhibit characteristic CD signatures in the near-UV region,
and thus, CD can also be used to monitor the protein tertiary structure. However,
when compared with Trp-fluorescence, near-UV CD is much less sensitive to ter-
tiary interactions. Therefore, while far-UV CD is the main method to monitor
secondary structures, Trp-fluorescence is much more suitable to evaluate tertiary
structure changes.
Since the 1990s, the use of Monte Carlo simulations of lattice models has been
helping to establish the fundamental principles driving the remarkable biological
process of protein folding. A simple lattice representation reduces the protein to its
backbone structure: amino acids are represented by beads that occupy the vertices
of a (two- or three-dimensional) regular lattice, and the peptide bond is reduced to
sticks of uniform size (corresponding to the lattice spacing) (Fig. 29a).
Interactions between the amino acids can be modelled by the HP potential [74]
that captures the hydrophobic effect by considering hydrophobic and hydrophilic
amino acids only, by the sequence-based potential, which takes into account the
heterogeneity of interactions resulting from the 20-amino acid alphabet by using the
Miyazawa–Jernigan interaction matrix [208], or by the native-centric (or
structure-based) Go potential, in which the interaction matrix is exclusively dictated
by the native structure of the model protein, i.e. only native interactions contribute
to protein energetics [209]. Lattice models are crude representations of real proteins
that feature the fundamental ingredients of their polymeric nature. They are ade-
quate to explore fundamental aspects of the folding process that do not depend on
specific details of proteins, and computational efficiency allows evaluating folding
thermodynamics and kinetics (including rates) with high accuracy.
To address the folding process of specific proteins, researchers developed
another class of coarse-grained models [210], which use an off-lattice representa-
tion of the protein, which can be either restricted to Ca atoms (Fig. 29b) or be fully
atomistic (Fig. 29c) [211, 212]. The folding space of off-lattice models is often
Fig. 29 Protein models used in simulations of protein folding in order of increasing complexity.
The simple (cubic) lattice model (a) is a generic protein representation displaying the fundamental
features of the protein backbone (chain connectivity, excluded-volume, etc.). Each bead represents
one of the 20 existing amino acids that are connected by sticks representing the peptide bond. The
C-a model (b) is the simplest off-lattice representation. As the lattice model, it is also a
coarse-grained description of the protein that reduces each amino acid to a sphere catered in the
position of each C-a carbon. However, it is a more realistic representation of the protein that not
only considers the polymeric nature of the protein backbone but also features the specific
three-dimensional native structure of the protein. Finally, in the full atomistic off-lattice
representation, (c) all the heavy atoms of the protein are explicitly represented. Figures drawn with
PyMOL (pymol.org)
54 Protein Folding: An Introduction
References
8. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN,
Bourne PE (2000) The protein data bank. Nucleic Acids Res 28:235–242
9. Hou J, Sims GE, Zhang C, Kim S-H (2003) A global representation of the protein fold space.
Proc Natl Acad Sci 100:2386–2390
10. Schaeffer RD, Daggett V (2011) Protein folds and protein folding. Protein Eng Des
Sel PEDS 24:11–19
11. Richardson JS (1977) [beta]-Sheet topology and the relatedness of proteins. Nature 268:495–
500
12. Mansfield ML (1994) Are there knots in proteins? Nat Struct Mol Biol 1:213–214
13. Taylor WR (2000) A deeply knotted protein structure and how it might fold. Nature
406:916–919
14. Koniaris K, Muthukumar M (1991) Knottedness in ring polymers. Phys Rev Lett 66:2211–
2214
15. Bölinger D, Sułkowska JI, Hsu H-P, Mirny LA, Kardar M, Onuchic JN, Virnau P (2010) A
Stevedore’s protein knot. PLoS Comput Biol 6:e1000731
16. King NP, Yeates EO, Yeates TO (2007) Identification of rare slipknots in proteins and their
implications for stability and folding. J Mol Biol 373:153–166
17. Jamroz M, Niemyska W, Rawdon EJ, Stasiak A, Millett KC, Sułkowski P, Sulkowska JI
(2015) KnotProt: a database of proteins with knots and slipknots. Nucleic Acids Res 43:
D306–D314
18. Lua RC, Grosberg AY (2006) Statistics of knots, geometry of conformations, and evolution
of proteins. PLoS Comput Biol 2:e45
19. Virnau P, Mirny LA, Kardar M (2006) Intricate knots in proteins: function and evolution.
PLoS Comput Biol 2:e122
20. Sułkowska JI, Rawdon EJ, Millett KC, Onuchic JN, Stasiak A (2012) Conservation of
complex knotting and slipknotting patterns in proteins. Proc Natl Acad Sci 109:E1715–
E1723
21. Soler MA, Nunes A, Faísca PFN (2014) Effects of knot type in the folding of topologically
complex lattice proteins. J Chem Phys 141:025101
22. Nureki O, Shirouzu M, Hashimoto K, Ishitani R, Terada T, Tamakoshi M, Oshima T,
Chijimatsu M, Takio K, Vassylyev DG, Shibata T, Inoue Y, Kuramitsu S, Yokoyama S
(2002) An enzyme with a deep trefoil knot for the active-site architecture. Acta Crystallogr
Sect D 58:1129–1137
23. Jacobs SA, Harp JM, Devarakonda S, Kim Y, Rastinejad F, Khorasanizadeh S (2002) The
active site of the SET domain is constructed on a knot. Nat Struct Mol Biol 9:833–838
24. Sułkowska JI, Sułkowski P, Szymczak P, Cieplak M (2008) Stabilizing effect of knots on
proteins. Proc Natl Acad Sci 105:19714–19719
25. Alam MT, Yamada T, Carlsson U, Ikai A (2002) The importance of being knotted: effects of
the C-terminal knot structure on enzymatic and mechanical properties of bovine carbonic
anhydrase II 1. FEBS Lett 519:35–40
26. Soler MA, Faísca PFN (2013) Effects of knots on protein folding properties. PLoS ONE 8:
e74755
27. Uversky VN (2014) Intrinsically disordered proteins. Springer, New York
28. Theillet FX, Binolfi A, Frembgen-Kesner T, Hingorani K, Sarkar M, Kyne C, Li C,
Crowley PB, Gierasch L, Pielak GJ, Elcock AH, Gershenson A, Selenko P (2014)
Physicochemical properties of cells and their effects on intrinsically disordered proteins
(IDPs). Chem Rev 114:6661–6714
29. Riback JA, Bowman MA, Zmyslowski AM, Knoverek CR, Jumper JM, Hinshaw JR,
Kaye EB, Freed KF, Clark PL, Sosnick TR (2017) Innovative scattering analysis shows that
hydrophobic disordered proteins are expanded in water. Science 358:238–241
30. van der Lee R, Buljan M, Lang B, Weatheritt RJ, Daughdrill GW, Dunker AK, Fuxreiter M,
Gough J, Gsponer J, Jones DT, Kim PM, Kriwacki RW, Oldfield CJ, Pappu RV, Tompa P,
Uversky VN, Wright PE, Babu MM (2014) Classification of intrinsically disordered regions
and proteins. Chem Rev 114:6589–6631
56 Protein Folding: An Introduction
106. Galzitskaya OV, Garbuzynskiy SO, Ivankov DN, Finkelstein AV (2003) Chain length is the
main determinant of the folding rate for proteins with three-state folding kinetics. Proteins
Struct Funct Bioinf 51:162–166
107. Naganathan AN, Muñoz V (2005) Scaling of folding times with protein size. J Am Chem
Soc 127:480–481
108. De Sancho D, Doshi U, Muñoz V (2009) Protein folding rates and stability: how much is
there beyond size? J Am Chem Soc 131:2074–2075
109. Sułkowska Joanna I, Noel Jeffrey K, Ramírez-Sarmiento César A, Rawdon Eric J, Millett
Kenneth C, Onuchic José N (2013) Knotting pathways in proteins. Biochem Soc Trans
41:523–527
110. Faísca PFN (2015) Knotted proteins: a tangled tale of structural biology. Comput Struct
Biotechnol Jurnal 13:459–468
111. Jackson SE, Suma A, Micheletti C (2017) How to fold intricately: using theory and
experiments to unravel the properties of knotted proteins. Curr Opin Struct Biol 42:6–14
112. Mallam AL, Jackson SE (2007) A comparison of the folding of two knotted proteins: YbeA
and YibK. J Mol Biol 366:650–665
113. Wallin S, Zeldovich KB, Shakhnovich EI (2007) The folding mechanics of a knotted
protein. J Mol Biol 368:884–893
114. Škrbić T, Micheletti C, Faccioli P (2012) The role of non-native interactions in the folding of
knotted proteins. PLoS Comput Biol 8:e1002504
115. Soler MA, Faísca PFN (2012) How difficult is it to fold a knotted protein? In silico insights
from surface-tethered folding experiments. PLoS ONE 7:e52343
116. Beccara S, Škrbić T, Covino R, Micheletti C, Faccioli P (2013) Folding pathways of a
knotted protein with a realistic atomistic force field. PLOS Comput Biol 9: e1003002
117. Sułkowska JI, Sułkowski P, Onuchic J (2009) Dodging the crisis of folding proteins with
knots. Proc Natl Acad Sci 106:3119–3124
118. Noel JK, Sułkowska JI, Onuchic JN (2010) Slipknotting upon native-like loop formation in a
trefoil knot protein. Proc Natl Acad Sci 107:15403–15408
119. Noel JK, Onuchic JN, Sulkowska JI (2013) Knotting a protein in explicit solvent. J Phys
Chem Lett 4:3570–3573
120. Lim NCH, Jackson SE (2015) Mechanistic insights into the folding of knotted proteins
in vitro and in vivo. J Mol Biol 427:248–258
121. Mallam AL, Jackson SE (2012) Knot formation in newly translated proteins is spontaneous
and accelerated by chaperonins. Nat Chem Biol 8:147–153
122. Bustamante A, Sotelo-Campos J, Guerra DG, Floor M, Wilson CAM, Bustamante C, Báez
M (2017) The energy cost of polypeptide knot formation and its folding consequences. Nat
Commun 8:1581
123. Soler MA, Rey A, Faisca PFN (2016) Steric confinement and enhanced local flexibility
assist knotting in simple models of protein folding. Phys Chem Chem Phys 18:26391–26403
124. Niewieczerzal S, Sulkowska JI (2017) Knotting and unknotting proteins in the chaperonin
cage: effects of the excluded volume. PLoS ONE 12:e0176744
125. Mirny L, Shakhnovich E (2001) Evolutionary conservation of the folding nucleus (Edited by
Fersht AR). J Mol Biol 308:123–129
126. Sułkowska JI, Noel JK, Onuchic JN (2012) Energy landscape of knotted protein folding.
Proc Natl Acad Sci 109:17783–17788
127. Yu I, Mori T, Ando T, Harada R, Jung J, Sugita Y, Feig M (2016) Biomolecular interactions
modulate macromolecular structure and dynamics in atomistic model of a bacterial
cytoplasm. eLife 5: e19274. https://doi.org/10.7554/eLife.19274
128. Bhushan S, Gartmann M, Halic M, Armache J-P, Jarasch A, Mielke T, Berninghausen O,
Wilson DN, Beckmann R (2010) a-Helical nascent polypeptide chains visualized within
distinct regions of the ribosomal exit tunnel. Nat Struct Mol Biol 17:313
129. Chaney JL, Clark PL (2015) Roles for synonymous codon usage in protein biogenesis.
Annual Rev Biophys 44:143–166
60 Protein Folding: An Introduction
130. Labbadia J, Morimoto RI (2015) The biology of proteostasis in aging and disease. Annu Rev
Biochem 84:435–464
131. Jahn TR, Parker MJ, Homans SW, Radford SE (2006) Amyloid formation under
physiological conditions proceeds via a native-like folding intermediate. Nat Struct Mol
Biol 13:195
132. Hartl FU, Bracher A, Hayer-Hartl M (2011) Molecular chaperones in protein folding and
proteostasis. Nature 475:324
133. Mogk A, Bukau B, Kampinga HH (2018) Cellular handling of protein aggregates by
disaggregation machines. Mol Cell 69:214–226
134. Horowitz S, Koldewey P, Stull F, Bardwell JC (2018) Folding while bound to chaperones.
Curr Opin Struct Biol 48:1–5
135. Hayer-Hartl M, Bracher A, Hartl FU (2016) The GroEL–GroES chaperonin machine: a
nano-cage for protein folding. Trends Biochem Sci 41:62–76
136. Chiti F (2006) Relative importance of hydrophobicity, net charge, and secondary structure
propensities in protein aggregation. In: Uversky VN, Fink AL (eds) Protein misfolding,
aggregation, and conformational diseases: Part A: Protein aggregation and conformational
diseases. Springer, Boston, pp 43–59
137. Ventura S (2005) Sequence determinants of protein aggregation: tools to increase protein
solubility. Microb Cell Fact 4:11
138. Rousseau F, Schymkowitz J, Serrano L (2006) Protein aggregation and amyloidosis:
confusion of the kinds? Curr Opin Struct Biol 16:118–126
139. Gregersen N, Bross P, Vang S, Christensen JH (2006) Protein misfolding and human
disease. Annu Rev Genomics Hum Genet 7:103–124
140. Stoppini M, Bellotti V (2015) Systemic amyloidosis: lessons from b2-microglobulin. J Biol
Chem 290:9951–9958
141. Chiti F, Dobson CM (2006) Protein misfolding, functional amyloid, and human disease.
Annu Rev Biochem 75:333–366
142. Tanskanen M (2013) “Amyloid”—historical aspects. In: Feng D (ed) Amyloidosis. InTech,
Rijeka, pp Ch. 01
143. Ross CA, Poirier MA (2004) Protein aggregation and neurodegenerative disease. Nat Med
10:S10
144. Shewmaker F, McGlinchey RP, Wickner RB (2011) Structural insights into functional and
pathological amyloid. J Biol Chem 286:16533–16540
145. Astbury WT, Dickinson S, Bailey K (1935) The X-ray interpretation of denaturation and the
structure of the seed globulins. Biochem J 29(2351–2360):2351
146. Xiao Y, Ma B (2015) Abeta(1–42) fibril structure illuminates self-recognition and replication
of amyloid in Alzheimer’s disease. Nat Struct Mol Biol 22:499–505
147. Nelson R, Sawaya MR, Balbirnie M, Madsen AØ, Riekel C, Grothe R, Eisenberg D (2005)
Structure of the cross-b spine of amyloid-like fibrils. Nature 435:773–778
148. Sawaya MR, Sambashivan S, Nelson R, Ivanova MI, Sievers SA, Apostol MI,
Thompson MJ, Balbirnie M, Wiltzius JJW, McFarlane HT, Madsen AO, Riekel C,
Eisenberg D (2007) Atomic structures of amyloid cross-[bgr] spines reveal varied steric
zippers. Nature 447:453–457
149. Gazit E (2002) A possible role for p-stacking in the self-assembly of amyloid fibrils.
FASEB J 16:77–83
150. Gremer L, Scholzel D, Schenk C, Reinartz E, Labahn J (2017) Fibril structure of
amyloid-beta (1–42) by cryo-electron microscopy. Science 358:116–119
151. Fitzpatrick AWP, Falcon B, He S, Murzin AG, Murshudov G, Garringer HJ, Crowther RA,
Ghetti B, Goedert M, Scheres SHW (2017) Cryo-EM structures of tau filaments from
Alzheimer’s disease. Nature 547:185–190
152. Li B, Ge P, Murray KA, Sheth P, Zhang M, Nair G, Sawaya MR (2018) Cryo-EM of
full-length alpha-synuclein reveals fibril polymorphs with a common structural kernel. Nat
Commun 9:3609
References 61
153. Iadanza MG, Silvers R (2018) The structure of a beta2-microglobulin fibril suggests a
molecular basis for its amyloid polymorphism. 9:4517
154. Tycko R (2014) Physical and structural basis for polymorphism in amyloid fibrils. Protein
Sci 23:1528–1539
155. Thirumalai D, Reddy G, Straub JE (2012) Role of water in protein aggregation and amyloid
polymorphism. Acc Chem Res 45:83–92
156. Arce FT, Jang H, Ramachandran S, Landon PB, Nussinov R, Lal R (2011) Polymorphism of
amyloid b peptide in different environments: implications for membrane insertion and pore
formation. Soft Matter 7:5267–5273
157. Sarell CJ, Woods LA, Su Y, Debelouchina GT, Ashcroft AE, Griffin RG, Stockley PG,
Radford SE (2013) Expanding the repertoire of amyloid polymorphs by co-polymerization
of related protein precursors. J Biol Chem
158. Pham CLL, Kwan AH, Sunde M (2014) Functional amyloid: widespread in Nature, diverse
in purpose. Essays Biochem 56:207–219
159. Otzen D (2010) Functional amyloid. Prion 4:256–264
160. Fowler DM, Koulov AV, Balch WE, Kelly JW (2007) Functional amyloid—from bacteria to
humans. Trends Biochem Sci 32:217–224
161. Evans ML, Chapman MR (2014) Curli biogenesis: order out of disorder. Biochimica et
Biophysica Acta (BBA)—Mol Cell Res 1843:1551–1558
162. Iconomidou VA, Vriend G, Hamodrakas SJ (2000) Amyloids protect the silkmoth oocyte
and embryo. FEBS Lett 479:141–145
163. Maji SK, Perrin MH, Sawaya MR, Jessberger S, Vadodaria K, Rissman RA, Singru PS,
Nilsson KPR, Simon R, Schubert D, Eisenberg D, Rivier J, Sawchenko P, Vale W, Riek R
(2009) Functional amyloids as natural storage of peptide hormones in pituitary secretory
granules. Science 325:328–332
164. Fowler DM, Koulov AV, Alory-Jost C, Marks MS, Balch WE, Kelly JW (2005) Functional
amyloid formation within mammalian tissue. PLoS Biol 4:e6
165. Smith JF, Knowles TPJ, Dobson CM, MacPhee CE, Welland ME (2006) Characterization of
the nanoscale properties of individual amyloid fibrils. Proc Natl Acad Sci 103:15806–15811
166. Greenwald J, Riek R (2010) Biology of amyloid: structure, function, and regulation.
Structure 18:1244–1260
167. Knowles TPJ, Mezzenga R (2016) Amyloid fibrils as building blocks for natural and
artificial functional materials. Adv Mater 28:6546–6561
168. Scheibel T, Parthasarathy R, Sawicki G, Lin X-M, Jaeger H, Lindquist SL (2003)
Conducting nanowires built by controlled self-assembly of amyloid fibers and selective
metal deposition. Proc Natl Acad Sci 100:4527–4532
169. Nilsson MR (2004) Techniques to study amyloid fibril formation in vitro. Methods 34:151–
160
170. Alberti S, Halfmann R, Lindquist S (2010) Biochemical, cell biological, and genetic assays
to analyze amyloid and prion aggregation in yeast (Chap. 30). In: Methods in enzymology.
Academic Press, pp 709–734
171. Sleutel M, Van den Broeck I, Van Gerven N, Feuillie C, Jonckheere W, Valotteau C,
Dufrene YF, Remaut H (2017) Nucleation and growth of a bacterial functional amyloid at
single-fiber resolution. Nat Chem Biol 13:902–908
172. Giurleo JT, He X, Talaga DS (2008) b-Lactoglobulin assembles into amyloid through
sequential aggregated intermediates. J Mol Biol 381:1332–1348
173. Thirumalai D, Klimov DK, Dima RI (2003) Emerging ideas on the molecular basis of
protein and peptide aggregation. Curr Opin Struct Biol 13:146–159
174. Kelly JW (1998) The alternative conformations of amyloidogenic proteins and their
multi-step assembly pathways. Curr Opin Struct Biol 8:101–106
175. Mahler H-C, Friess W, Grauschopf U, Kiese S (2008) Protein aggregation: pathways,
induction factors and analysis. J Pharm Sci 98:2909–2934
176. Chiti F, Dobson CM (2009) Amyloid formation by globular proteins under native
conditions. Nat Chem Biol 5:15–22
62 Protein Folding: An Introduction
177. Jahn TR, Parker MJ, Homans SW, Radford SE (2006) Amyloid formation under
physiological conditions proceeds via a native-like folding intermediate. Nat Struct Mol
Biol 13:195–201
178. Estácio SG, Krobath H, Vila-Viçosa D, Machuqueiro M, Shakhnovich EI, Faísca PFN
(2014) A simulated intermediate state for folding and aggregation provides insights into
DN6 b2-microglobulin amyloidogenic behavior. PLoS Comput Biol 10:e1003606
179. Neudecker P, Robustelli P, Cavalli A, Walsh P, Lundström P, Zarrine-Afsar A, Sharpe S,
Vendruscolo M, Kay LE (2012) Structure of an intermediate state in protein folding and
aggregation. Science 336:362–366
180. Honda Ryo P, Xu M, Yamaguchi K-I, Roder H, Kuwata K (2015) A native-like intermediate
serves as a branching point between the folding and aggregation pathways of the mouse
prion protein. Structure 23:1735–1742
181. Jahn TR, Radford SE (2008) Folding versus aggregation: polypeptide conformations on
competing pathways. Arch Biochem Biophys 469:100–117
182. Cohen SIA, Vendruscolo M, Dobson CM, Knowles TPJ (2012) From macroscopic
measurements to microscopic mechanisms of protein aggregation. J Mol Biol 421:160–171
183. Buell AK, Dobson CM, Knowles TPJ (2014) The physical chemistry of the amyloid
phenomenon: thermodynamics and kinetics of filamentous protein aggregation. Essays
Biochem 56:11–39
184. Meisl G, Michaels TCT, Linse S, Knowles TPJ (2018) Kinetic analysis of amyloid
formation. Methods Mol Biol 1779:181–196
185. Herrup K (2015) The case for rejecting the amyloid cascade hypothesis. Nat Neurosci
18:794
186. Stefani M (2012) Structural features and cytotoxicity of amyloid oligomers: implications in
Alzheimer’s disease and other diseases with amyloid deposits. Prog Neurobiol 99:226–245
187. Bucciantini M, Rigacci S, Stefani M (2014) Amyloid aggregation: role of biological
membranes and the aggregate-membrane system. J Phys Chem Lett 5:517–527
188. Leal SS, Botelho HM, Gomes CM (2012) Metal ions as modulators of protein conformation
and misfolding in neurodegeneration. Coord Chem Rev 256:2253–2270
189. Morriss-Andrews A, Shea J-E (2015) Computational studies of protein aggregation: methods
and applications. Annu Rev Phys Chem 66:643–666
190. Balchin D, Hayer-Hartl M, Hartl FU (2016) In vivo aspects of protein folding and quality
control. Science 353
191. Michelitsch MD, Weissman JS (2000) A census of glutamine/asparagine-rich regions:
implications for their conserved function and the prediction of novel prions. Proc Natl Acad
Sci 97:11910–11915
192. Bemporad F, Calloni G, Campioni S, Plakoutsi G, Taddei N, Chiti F (2006) Sequence and
structural determinants of amyloid fibril formation. Acc Chem Res 39:620–627
193. De Baets G, Schymkowitz J, Rousseau F (2014) Predicting aggregation-prone sequences in
proteins. Essays Biochem 56:41–52
194. Beerten J, Schymkowitz J, Rousseau F (2012) Aggregation prone regions and gatekeeping
residues in protein sequences. Curr Top Med Chem 12:2470–2478
195. Uversky VN (2010) Targeting intrinsically disordered proteins in neurodegenerative and
protein dysfunction diseases: another illustration of the D(2) concept. Expert Rev Proteomics
7:543–564
196. Fernandez-Escamilla A-M, Rousseau F, Schymkowitz J, Serrano L (2004) Prediction of
sequence-dependent and mutational effects on the aggregation of peptides and proteins. Nat
Biotechnol 22:1302
197. Tartaglia GG, Vendruscolo M (2008) The Zyggregator method for predicting protein
aggregation propensities. Chem Soc Rev 37:1395–1401
198. Conchillo-Solé O, de Groot NS, Avilés FX, Vendrell J, Daura X, Ventura S (2007)
AGGRESCAN: a server for the prediction and evaluation of “hot spots” of aggregation in
polypeptides. BMC Bioinf 8:65
References 63