Download as pdf or txt
Download as pdf or txt
You are on page 1of 74

SPRINGER BRIEFS IN MOLECULAR SCIENCE

PROTEIN FOLDING AND STRUCTURE

Cláudio M. Gomes
Patrícia F. N. Faísca

Protein Folding
An Introduction
SpringerBriefs in Molecular Science

Protein Folding and Structure

Series editor
Cláudio M. Gomes, Faculty of Sciences, Biosystems & Integrative Sciences
Institute, University of Lisbon, Lisbon, Portugal
About the Series

Prepared by leading experts, the Springer Briefs


subseries on Protein Folding and Structure contains
diverse types of contributions, from snapshot volumes
that allow fast entry to a general topic to those covering
more specialized aspects in the field of protein folding
and structure. In common, these Briefs aim at covering
essential concepts, methodologies and ideas in the
context of contemporary research in protein science.
Through these compact volumes, this series serves as a
venue for publication between typical research papers,
review articles and full books, and aims at a broad
audience, from students to researchers in academia and
industry.

About the Editor

Cláudio M. Gomes is Associate Professor at the


Faculty of Sciences University of Lisboa where he
heads the Protein Folding and Misfolding Laboratory
as part of BioISI Biosystems and Integrative Sciences
Institute. He obtained is Ph.D. in Biochemistry (1999)
from the Universidade Nova de Lisboa, as a graduate
of the Gulbenkian Ph.D. program in Biology and
Medicine and holds Habilitation (Agregação) in
Biochemistry (2013). He has extensive publishing and
editorial activities, both as a prolific author, member of
Editorial boards and editor of thematic issues and
books. In collaboration with Springer, he set the
Springer Briefs subseries on Protein Folding and Structure, which launched its
first volume in 2014.

More information about this series at http://www.springer.com/series/11958


Cláudio M. Gomes Patrícia F. N. Faísca

Protein Folding
An Introduction

123
Cláudio M. Gomes Patrícia F. N. Faísca
Department of Chemistry Department of Physics
and Biochemistry Faculty of Sciences
Faculty of Sciences Biosystems & Integrative
Biosystems & Integrative Sciences Institute, University of Lisbon
Sciences Institute, University of Lisbon Lisbon, Portugal
Lisbon, Portugal

ISSN 2191-5407 ISSN 2191-5415 (electronic)


SpringerBriefs in Molecular Science
ISSN 2199-3157 ISSN 2199-3165 (electronic)
Protein Folding and Structure
ISBN 978-3-319-00881-3 ISBN 978-3-319-00882-0 (eBook)
https://doi.org/10.1007/978-3-319-00882-0
Library of Congress Control Number: 2019930273

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2019


This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made. The publisher remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
In memory of Professor Mark T. Fisher (1954–2018),
a dear friend and expert on protein folding whose
fascination by the GroEL folding chaperonin, one his
favourite molecular machines, was only surpassed by
his curiosity and keen spirit.
Mark Fisher in Lisbon,
February 2010
Preface

Understanding protein folding is essential to understand Biology, as this self-


organising process is essential for life. Proteins are fascinating macromolecules that
perform a myriad of biological functions, from catalysis to signaling and structure
maintenance, just to mention a few examples. To be able to perform their functional
role, most proteins must fold into a specific three-dimensional structure, the
so-called native state, whose coordinates are exclusively dictated by the protein’s
amino acid sequence termed primary structure. Within this statement lies one of the
mind-blowing facts in Protein Science—the realisation that a given linear chain of
amino acids encodes all the required information to fold the polypeptide into the
native structure, as well as one of the major open questions in the field—what are the
rules that dictate such specific and unique protein structure. And it is all about
Physics and how Biology harnesses it!
This sixth volume of the Springer Briefs series on Protein Folding and Structure,
which is also my inaugural contribution to the series as a co-author of this volume,
introduces the reader to the fundamentals of protein folding in its multiple per-
spectives. The first three chapters organise current knowledge departing from basic,
yet complex, questions: How is protein structure maintained? Why is structure
acquired? And, how is it acquired? These chapters encompass the fundamental
concepts and provide the reader with a perspective on how knowledge evolved over
the last decades. We then turn to protein folding in vivo, disclosing the biological
perspective of the problem and influence of the cellular milieu on a process, which is
otherwise strictly ruled by the laws of Physics. By examining protein misfolding in
the context of human disease, we highlight the very important biomedical and
societal dimension of protein folding research, which encompasses several amyloid-
forming neurodegenerative diseases such as Alzheimer’s, among numerous others.
The volume closes with a timely chapter on selected methods for protein folding
research.
As a Series Editor, my expectation is that this volume will become a valuable
resource to students in the biological and physical sciences, a primer to those
wishing to enter the field or a key source of references for established researchers.
As an author, I also hope that this book successfully conveys the fascination for a

vii
viii Preface

research topic at the forefront of modern molecular biology and that many of its
younger readers will choose one of the many facets of protein folding as their future
research topic.

Lisbon, Portugal Cláudio M. Gomes


November 2018 Editor, Springer Briefs series on Protein
Folding and Structure
Contents

Protein Folding: An Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1


1 Protein Structure—How Is Structure Maintained? . . . . . . . . . . . . . . . . . . 1
1.1 The Dawn of Protein Structural Biology . . . . . . . . . . . . . . . . . . . . . 1
1.2 The Universe of Protein Structures . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Physical Interactions Stabilising Proteins . . . . . . . . . . . . . . . . . . . . 7
1.4 Protein Dynamics and Solvation . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Protein Folding—Why Is Structure Acquired? . . . . . . . . . . . . . . . . . . . . . 10
2.1 The Anfinsen Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 The Thermodynamic Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Driving Forces for Protein Folding—Hydrophobic Effect
and the Thermodynamics of Protein Folding . . . . . . . . . . ........ 14
3 Folding Kinetics and Mechanisms: How Is Structure
Acquired? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........ 20
3.1 Two-State Cooperativity in Protein Folding . . . . . . . . . . ........ 20
3.2 The Levinthal Paradox and the Timescale of Protein
Folding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........ 26
3.3 Mechanisms of Protein Folding . . . . . . . . . . . . . . . . . . . ........ 27
3.4 The Nucleation Condensation Mechanism of Protein
Folding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........ 28
3.5 Phi-value Analysis and the Structure of the Folding
Transition State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........ 28
3.6 The Energy Landscape and Folding Funnels . . . . . . . . . . ........ 29
3.7 The Importance of Native Geometry as a Determinant
of Folding Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.8 The Folding Mechanism of Knotted Proteins . . . . . . . . . . . . . . . . . 32
4 Protein Misfolding: Why Proteins Misbehave? . . . . . . . . . . . . . . . . . . . . 34
4.1 Protein Folding In Vivo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2 Protein Misfolding and Aggregation . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3 Protein Misfolding Diseases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

ix
x Contents

4.4 The Amyloid State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41


4.5 Mechanism and Kinetics of Protein Aggregation . . . . . . . . . . . . . . . 45
4.6 Aggregation Propensity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5 Methods for Protein Folding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.1 Biophysical Spectroscopies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.2 Computational Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
About the Authors

The authors are long-lasting friends who share a passion for protein folding. Having had distinct
academic backgrounds and tracks, they are now faculty members at the University of Lisbon, where
they teach and research in a highly collaborative environment.

Cláudio M. Gomes is Associate Professor of Biochemistry at the Department of


Chemistry and Biochemistry of the Faculty of Sciences (DQB-FCUL), University of
Lisbon, where he heads the ‘Protein Folding and Misfolding Laboratory’ within
BioISI—Biosystems and Integrative Sciences Institute. He is alumnus from the
Gulbenkian Ph.D. program in Biology and Medicine and he has obtained his Ph.D.
(1999) and Habilitation (2013) in Biochemistry at Universidade Nova de Lisboa. He
is an expert on structural biology, biochemistry and biophysics of protein stability,
folding and misfolding, with +110 articles published (h-index 26). His current
interests focus on mechanisms of protein aggregation in the context of complex
biomedical problems such as those arising in Alzheimer’s neurodegeneration.

Patrícia F. N. Faísca is Assistant Professor of Physics at the Physics Department


of the Faculty of Sciences (DF-FCUL), University of Lisbon, and principal inves-
tigator at BioISI—Biosystems and Integrative Sciences Institute. She received her
Ph.D. (Physics) in 2002 at the University of Warwick (UK), as part of the
Gulbenkian Ph.D. Program in biology and medicine. She has a broad interdisci-
plinary education covering physics, biology and mathematics. Her research on
computational biophysics is based on the use of molecular simulations, especially of
coarse-grained models. Her current research interests focus on the folding of knotted
proteins and on protein misfolding and aggregation in a disease-related context.

xi
Protein Folding: An Introduction

1 Protein Structure—How Is Structure Maintained?

1.1 The Dawn of Protein Structural Biology

We have come a long way since coining of the term protein and the early findings
that proteins are charged macromolecules composed of strings of amino acids linked
by peptide bonds. Today, structural biologists have technologies that allow in many
cases to achieve an atomic-level understanding of protein structure, dynamics and
folding; protein physics approaches have made substantial contributions to under-
standing the intricacies of folding mechanisms and its energetics; biochemists have
developed conceptual frameworks to relate protein structure with biological func-
tions. Yet, despite the efforts of a vibrant community of protein scientists, a lot of
questions remain to be answered in the field of protein structure and folding.
Without aiming to outline a historical chronology of the field, we, however, feel
it is important to start this book by providing the reader with a sense of some
its scientific landmarks. For an in-depth scholar perspective, interested readers are
referred to the beautiful account of the history of proteins by Tanford and Reynolds
in their book Nature’s Robots [1].
The term protein has been coined back in the early nineteenth century as a
proposal from Berzelius in correspondence with Mulder. In a letter written in 1838,
about their discussion on the results of the elemental analysis of albumins and
fibrin, performed by the latter, Berzelius coined the term from the Greek word
proteios which means primary (or, of primary importance), as he considered pro-
teins were the primitive substances in animal nutrition. Since proteins were named
as such, it took over 100 years for the dawn of the field of protein structural
biology. Paradoxically, or not so much as we shall find out along this book, the
importance of protein structure and folding became evident from unfolding studies
by Anson and Mirsky in the 1930s, showing that protein denaturation can actually
be reverted [2]. These findings prompted subsequent investigations, such as those

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2019 1


C.M. Gomes and P.F. Faísca, Protein Folding, Protein Folding and Structure
https://doi.org/10.1007/978-3-319-00882-0_1
2 Protein Folding: An Introduction

carried out by Anfinsen in the late 1950s, aimed at determining how protein
structure is maintained and what would be the essential interactions that hold the
structure of a native protein. The development of methods for the determination of
the structure of crystals by X-ray diffraction and their application to study proteins,
pioneered among others by Astbury with his studies on fibrous proteins in the
1930s [3], was critical for this outcome [4]. The regular patterns defining the
structural features of proteins became clear when Pauling proposed the structure of
the a-helix and b-sheets in 1951 [5], which was soon followed by the complete
structural determination of the structure of myoglobin and then haemoglobin,
respectively, by Kendrew and Perutz in 1959 [6].
Interestingly, many of the following questions in the field focused on under-
standing protein folding, i.e. the process through which a linear chain of amino
acids acquires a three-dimensional structure, which is biologically functional. The
so-called protein folding problem encapsulates three fundamental questions: (1) to
decipher the physical code according to which the amino acid sequence dictates a
protein’s native structure; (2) to establish the mechanisms that allow proteins to fold
so fast; and (3) to determine how the structure of a protein can be predicted from its
sequence. As pointed out in a recent review on the subject ‘What began as three
questions of basic science one half-century ago has now grown into the full-fledged
research field of protein physical science [7]’.

1.2 The Universe of Protein Structures

Proteins fold into a diversity of three-dimensional structures with diverse topologies


and well-defined structural hierarchies. The primary structure of a protein, consisting
of the sequence of amino acids that form a polypeptide chain, self-assembles into
secondary structural elements such as a-helix and b-sheets whose interactions define
the protein tertiary structure. Some proteins form either homo or hetero multimers and
thus acquire quaternary structure. A paradigmatic example of a protein with quater-
nary structure is that of haemoglobin. A protein fold is then defined as the arrangement
of the secondary structural elements of the structure relative to each other in space.
The number of known protein structures has been massively increasing in the
last decades with the improvement of methods for structure determination—X-ray
crystallography, NMR and more recently cryo-electron microscopy. At the end of
2018, the Protein Data Bank (PDB) (rcsb.org) [8] listed an impressive number of
*135,000 protein structures of which 90% are determined from X-ray crystal-
lography (Fig. 1).
However, there is considerable structural redundancy and proteins fold into a
more limited number of unique folds and topologies within the protein universe [9].
Estimates for the number of protein folds have greatly varied, but recent studies
suggest it should converge towards 2000 [10]. The graphical representation of the
different folds according to their structural topology provides an impressive illus-
tration of the diversity of the protein universe (Fig. 2).
1 Protein Structure—How Is Structure Maintained? 3

Annually released structures Total number Protein structures known

Number of Entries
Electron
microscopy
structures

Year

Fig. 1 Evolution of the number of known protein structures. Protein-only structures; inset: growth
of structures from 3D electron microscopy experiments released per year. Source PDB (rcsb.org)

Alpha and beta proteins ( )


Mainly parallel beta sheets ( units)
Proteins: +51,000
Folds: + 140

All beta proteins ( )


All alpha proteins ( ) Proteins: +48,000
Proteins: +46,000 Folds: +170
Folds: +280

Alpha and beta proteins ( )


Mainly antiparallel beta sheets
(segregated and regions)
Proteins: +53,000
Folds: +376

Fig. 2 Diversity of protein topologies. Proteins adopt a diversity of structures with distinct
structural classifications, as in the depicted global representation of the protein fold space in which
information from the structural classification of protein (SCOP) database is included [9, 10]
4 Protein Folding: An Introduction

The structural diversity of proteins is further exemplified by proteins that contain


a physical (or open) knot in their native. A priori such arrangements might seem
highly unlikely, but the fact is that there are several examples of knotted proteins
in the PDB. These are the focus of intense research as the influence of knots in
folding, stability and function of those proteins is not yet fully understood (Box 1—
Knotted Proteins).

Box 1—Knotted Proteins


Knotted proteins are proteins whose native structure embeds a physical knot.
The first study to report a knotted protein dates back to 1977 [11], but it was
only in 1994 that Mansfield performed the first systematic survey of the PDB
that searched for knotted proteins [12]. Specific methods have been devel-
oped to determine whether a given protein conformation is knotted. An
important example is the algorithm developed by Taylor [13], which repre-
sents an extension of the Koniaris–Muthukumar method [14] and is appli-
cable to a wider range of protein conformations.
There are knots of different types (Fig. 3). Roughly speaking, knot types
differ in the minimal number of crossings in a planar projection. Although the
trefoil (or 31) knot type is the most common, it is possible to find proteins
with the 41, 52 and even the Stevedore (or 61) knot [15]. An interesting
variation among knotted proteins is that of the slipknot, in which one of the
protein termini adopts a hairpin-like conformation that threads a loop formed
by the remainder of the chain [16]. Nowadays, it is known that only about 1%
of the available PDB entries correspond to proteins with a knotted topology
(including slipknots) [17], and analytical arguments together with simulation
results indicate that such a small percentage reflects the fact that knotted
proteins are actually statistically rare [18].
Research in the field of knotted proteins has evolved around two major
questions. The first one concerns the establishment of the functional advan-
tage(s) that knots convey to their carriers. Based on the analysis of specific
knotted systems, it has been suggested that knots (and slipknots) could play a
role against degradation by sterically precluding translocation through the
proteasome pore [19], provide structural stability in transporter proteins [20],
enhance the structural rigidity of the native state [21], help shape and form the
binding site of enzymes [22, 23], enhance thermal [16] and mechanical [24]
stability, or even alter enzymatic activity [25]. Based on computer simula-
tions, it has also been proposed that a general functional advantage of knotted
backbones is to increase the kinetic stability of their carriers [21, 26].
However, in most cases, it is not possible to determine the structural and/or
functional advantage of knotted proteins, and therefore, one cannot rule out
the possibility that in most cases they do not convey any structural or
functional advantage at all. A second major question in the field of knotted
proteins concerns the determination of their folding mechanism. We briefly
discuss this amazing folding puzzle in Sect. 3.8.
1 Protein Structure—How Is Structure Maintained? 5

Fig. 3 Topological knots and knotted proteins. a A topological (or mathematical) knot with no
more than three crossings on a planar projection is termed trefoil (or 31) knot. b Knotted
proteins have their backbones tangled in a physical knot. A physical knot is different from a
topological knot because the curve it forms in space is open instead of closed. c A simple
representation of a slipknot. d Minimal, smoothed and ‘topologically equivalent’ representation of
backbone of the bacterial protein YibK (PDB ID: 1j85) following the application of Taylor’s
algorithm. Taylor’s algorithm reduces the protein backbone such that the knotted core, i.e. the
minimal segment of the polypeptide chain that contains the knot, is sufficiently far from both chain
ends for the knot type to be well defined. In this case, it is possible to identify a trefoil knot

The biological functions of proteins are in many cases closely tied to their
three-dimensional structure, and this constitutes the so-called structure-function
paradigm of structural biology. This is certainly a generally valid principle for
most globular proteins that are involved in structural or catalytic functions. In these
cases, disorganisation of the protein structure and loss of tertiary interactions impair
function, which is tightly associated with a given structural scaffold, for example to
accommodate a ligand or substrate or to assure a particular arrangement of a cat-
alytic site.
However, in the last years, it has become increasingly evident that many
polypeptides, or segments within polypeptides, occur under physiological condi-
tions in the cell, without folding into a well-defined tertiary structure. These in-
trinsically disordered proteins (or IDPs) instead adopt an ensemble of
unstructured or disordered conformations which are nevertheless functional [27,
28], defying the structure-function paradigm. Indeed, rather than being detrimental,
6 Protein Folding: An Introduction

these characteristics will, in some proteins, result in better biological functions. This
is the case of proteins involved in signalling processes that are able to accommodate
a considerable ‘fuzziness’ within their structure and that this actually results in
increased functional efficiency.
This is explained by the fact that a signalling protein must engage into multiple
protein–protein interactions that are favoured by the high mobility of disordered
conformers, allowing more efficient sampling of protein-target interactions. In this
scenario, the fact that a segment within these proteins is unstructured (Box 2—
Intrinsically Disordered Proteins) and can accommodate and fold upon binding to
multiple targets results in a functional advantage, sustaining the emerging disor-
der–function paradigm.

Box 2—Intrinsically Disordered Proteins


It is known that intrinsically disordered regions (IDRs) are present in 40%
of eukaryotic proteomes and that proteins have been evolutionarily selected
since they represent an advantage to their carrier protein by modulating its
interactions with other proteins. The interaction between a disordered seg-
ment and a target protein gives rise to a fuzzy protein (or structural) complex,
because although a large fraction of the disordered polypeptide adopts a
defined structure upon complex formation, some distinct segments may
remain disordered (Fig. 4).
This is very important as it allows for a ‘dynamical’ interaction since the
remaining conformational flexibility drives further sampling of productive
interactions. Further, the disordered segment adopts different conformations
upon binding to different interaction partners, increasing functional plasticity.
Interestingly, protein–protein interactions involving disordered segments are
prone to biological regulation by post-translational modifications within the
disordered region—embedded modification sites are thus a strategy to regu-
late interaction properties through a modification in the chemical composition
of the interacting segment.
In principle, the propensity of a region for intrinsic disorder can be inferred
from physicochemical principles. Intrinsically disordered regions lack suffi-
cient hydrophobic residues to mediate cooperative folding, and they typically
contain a higher proportion of polar or charged amino acids [27]. However, a
recent account has found that even IDPs with low net charge and high
hydrophobicity remain highly expanded in water, challenging the general
view that protein-like sequences collapse in water [29].
The order-promoting residues are in general more hydrophobic and less
flexible (Ile, Val, Leu, Phe, Cys, Trp, Tyr and Asn), while
disorder-promoting residues tend to be less hydrophobic and more flexible
(Arg, Lys, Glu, Pro, Ser, Gln, and Gly, Ala). Several computer algorithms,
reviewed in, are available to predict disorder propensity within proteins [27].
1 Protein Structure—How Is Structure Maintained? 7

Intrinsically
Folding upon
Disordered
binding
Protein

Fig. 4 Folding upon binding. Schematic representation of folding upon binding of an IDP to a
target protein, resulting in a fuzzy complex with some regions still disordered (blue shadows)

As a trade-off for the lack of a unique three-dimensional structure, IDPs gen-


erally sample a variety of conformations that are in dynamic equilibrium under
physiological conditions—this led to the recently proposed continuum model of
protein structure according to which proteins will adopt a continuum of confor-
mational states ranging from highly dynamic, expanded conformational ensembles
with high disorder, to highly ordered compact, dynamically restricted, fully folded
globular states [30].

1.3 Physical Interactions Stabilising Proteins

The maintenance of the three-dimensional structure of a protein results from the


additive contribution of different physical interactions which altogether act to hold
the protein together. The folding of a polypeptide results in a well-defined
three-dimensional conformation whose energetic stability is determined by the
interactions established between amino acids. In the next chapters, we will address
the thermodynamic and kinetic grounds of the folding process, but for now, we will
start by overviewing non-covalent interactions that stabilise the structure of a
protein.
The fold of a given protein depends on specific and unspecific non-covalent
interactions. Unspecific interactions are essentially non-polar (hydrophobic) and
van der Waals interactions that are important to drive the folding process. Specific
interactions are essentially electrostatic and comprise salt bridges and hydrogen
bonds, which are fundamental for protein folding, structure and dynamics. [31, 32].
As we will see, despite their low individual energies (Table 1), collectively, these
non-covalent interactions assure the maintenance of protein folding and stability.
Non-polar (hydrophobic) interactions established between non-polar amino acid
chains in the interior of proteins are among the most significant stabilising
8 Protein Folding: An Introduction

Table 1 Comparison of Interactions Typical energy (kJ mol−1)


typical energies of physical
interactions in proteins Van der Waals 1
Hydrogen bonds 8–29
Electrostatic 17–50
Non-polar 60–80
Disulphide bond (S–S) 250
Single covalent bond 200–500
Adapted from [32, 33]

interactions of protein structure. These are formed to minimise the exposure of


non-polar regions to water molecules that surround proteins and provide a signif-
icant stabilising contribution (*60–80 kJ mol−1). As we shall discuss later, the
establishment of non-polar hydrophobic interactions results from the so-called
hydrophobic effect, which is the main driving force for protein folding.
Van der Waals interactions arise when an atom with a partial charge is nearby an
uncharged one, causing an instant redistribution of the electron density, which
results in a weak attractive interaction between the neighbouring atoms. These
forces occur in both polar and non-polar groups, involving transiently induced or
permanent electric dipoles. These interactions are individually very weak
(*1 kJ mol−1) but collectively strong and are extremely short ranged, meaning
they are optimised in tightly packed core of proteins. Thus, van der Waals inter-
actions are tightly coupled to the hydrophobic packing of a protein.
Electrostatic interactions are charge-charge interactions established between
permanently charged ions and extend over significant distances. They mostly
involve polar groups from side chains. However, protein backbone atoms are
partially charged and can be also involved in attractive and repulsive interactions;
although these are weaker as charges are smaller, their cumulative effect can be
significant. Salt bridges are often close in protein sequence and are formed within
the same secondary structural element or domain, and not so frequently in segments
joining flexible hinges [31]. This suggests that these interactions contribute to
protein stability by restraining backbone motions. The electrostatic interactions to
be considered in a protein can be of multiple types. Interactions involving polar
amino acids with surrounding water molecules do not affect protein stability but are
rather important to modulate protein solubility. Interactions involving polar groups
located at the protein surface have a marginal contribution to protein stability as
charges are shielded from each other by water molecules and the strength of the
interaction is weakened to 17–50 kJ mol−1. The stabilising effect of salt bridges at
the protein interior depends on the polarity of the local environment: if it is too high
(e.g. because of other nearby buried charges or water molecules), then a salt bridge
will become weaker because of shielding effects from other charges. However, most
of the salt bridges in proteins are organised in clusters to stabilise the protein
structure [31].
1 Protein Structure—How Is Structure Maintained? 9

Hydrogen bonds establish between two electronegative atoms with hydrogen in


between, bonded to one of them. On the outside of a protein, peptide groups
hydrogen bond to surrounding water, whereas inside a protein, peptide groups
hydrogen bond either to another peptide group or to a buried water molecule. Most
of the backbone CO and NH groups (90%) are H-bonded, favouring internal
organisation and limiting protein conformations. Hydrogen bonds in proteins are
directional as their strength depends on dipole orientation, ranging from 8 to
29 kJ mol−1. This is a very important characteristic as it provides specificity for the
interaction. Hydrogen bonds thus contribute to determine the stable tertiary struc-
ture of a folded protein.
Other interactions, specific to some proteins, also play important stabilising roles
and thus deserve a reference. That is the case of disulphide bonds (S–S), which are
covalent interactions established between oxidised side chains of cysteine residues.
Only a few proteins contain S–S bonds, some of which play key structural roles as
they effectively stabilise the native state. In the case of proteins that contain more
than two cysteines involved in these interactions, such as ribonuclease A, the
incorrect formation of disulphides (disulphide scrambling) will prevent the native
structure to be formed, as we will discuss in Sect. 2.1.
Likewise, protein ligands and cofactors are important contributors to the
maintenance of a structural domain. Examples include organic molecules (coen-
zymes), such as flavins, and metallic ions or metallic cofactors, such as haem or
iron-sulphur clusters, which in most cases have catalytic functions but also play
important roles in the energetic stability of proteins [34]. The most prominent
example among the latter is given by zinc binding to the so-called zinc finger
domain, which is the most prominent structural fold within the human proteome,
since it is found in several transcription factors.

1.4 Protein Dynamics and Solvation

Proteins are flexible molecules that have an overall shape but ‘wobble’ internally.
Motions in proteins involve rapid local motion with infrequent (slow) changes to a
different conformation. These changes involve rapid transitions between states.
Protein motion is thermally driven: water and solute interactions with proteins
induce vibrations and rocking motions (librations). Rapid local motions are har-
monic (symmetric vibrations) and uncorrelated [33].
The rotation of amino acid side chains is influenced by steric effects: bulkier side
chains are less symmetrical and undergo slower rotations, usually involving dis-
placement of nearby groups; on the other hand, short side chains rotate faster as
steric conflicts are minimised. Protein dynamics is influenced by interactions as
well as by binding of a ligand, which will induce order within the segment com-
prising its ligands. This is illustrated by cofactor containing proteins in which apo
forms are frequently much more dynamic than the holo counterparts. The same
10 Protein Folding: An Introduction

applies to enzymes in which substrate binding may result in structural ordering at


the catalytic site, which decreases overall motion in proteins. Interestingly, these
two examples illustrate cases of ligand-induced protein stabilisation, a valuable
approach to counteract misfolding in protein conformational diseases [35, 36].
Atoms in proteins fluctuate around an average position. In the packed interior,
motions are restricted (<1 Å), whereas at the protein surface, mobility increases (up
to several Å). There are several instances of proteins with domains that have distinct
dynamics, one being rigid and the other flexible, but this usually involves a linker
peptide segment between the two. Control of protein dynamics is an important
way of regulating biological function, and this is particularly relevant in signalling
proteins that are engaged in multiple interactions and thus need some intrinsic
conformational variability, within allowed limits. Whole folded domains rarely
undergo large distortions at ordinary temperatures. Transitions from one type of
folding motif to another are rare except in pathological cases such as those
involving the formation of amyloids, as we will explain in Sect. 4.1.
Water molecules and protein solvation are extremely important in understanding
protein dynamics. The protein hydration layer is the layer of water molecules that
are formed around proteins; it is essential for protein folding and function because
water molecules provide thermal energy for protein movements, as well as
H-bonding to charged groups and backbone atoms accessible from the surface. The
dynamics of water within the hydration layer around a protein is distinct from that
of bulk water up to a distance of 1 nm. The duration of contact of a specific water
molecule with the protein surface may occur in the sub-nanosecond timescale. In
crowded solutions (as in cells), the mass of water is no more than 40% of the total
protein mass. Consequently, the hydration layer (no more than two water molecules
thick) is essentially all water available in the ultra-crowded intracellular environ-
ments [33]. Water molecules in protein cavities are also important modulators of
protein dynamics: a buried water molecule retains some mobility, and its motions
within a cavity allow the protein to reorganise its hydrogen bonding to accom-
modate such motions. Indeed, buried water molecules are found in cavities close to
active sites and to interdomain interfaces as this facilitates protein motions to
enhance catalysis and domain interactions [33].

2 Protein Folding—Why Is Structure Acquired?

In the 1950s, the American biochemist Christian Anfinsen (1916–1995) made a


series of seminal discoveries that demonstrated that the information required to fold
a polypeptide chain into a well-defined and biologically functional conformation is
strictly encoded in its amino acid sequence. These experiments departed from the
need to expand knowledge on the amino acid structure of proteins, at the birth of
the field of protein structural biochemistry. Indeed, the structure of myoglobin, the
first protein to have its three-dimensional structure determined by X-ray crystal-
lography, was solved by John Kendrew in Cambridge in 1958.
2 Protein Folding—Why Is Structure Acquired? 11

2.1 The Anfinsen Experiments

In his experiments, Anfinsen proposed to address a set of questions that would


eventually contribute to establish the protein folding problem. Why does a linear
polypeptide chain fold into a specific three-dimensional structure? Does the protein
folding process involve other enzymes? Why is a polypeptide chain folded into a
unique conformation? These were the questions triggering the experiments leading
to the thermodynamic hypothesis of protein folding.
The Anfinsen’s experiments were based on studies using bovine pancreatic
ribonuclease (RNase A), an enzyme that catalyses the breakdown of single-stranded
RNA and facilitates the DNA-RNA interaction in the pancreatic cells of cows.
RNase is a 124-amino acid protein that contains four disulphide bonds (Fig. 5).
Partly, Anfinsen’s choice for this enzyme was practical: the Armour meat-packing
company of Chicago could provide Anfinsen’s laboratory with a ready supply of
bovine pancreas, the raw material from which the enzyme could be purified in
sufficient amounts for his studies.
RNase is a protein that is very stable during purification, being relatively
resistant to degradation and very well behaved once purified, as it remains soluble
and does not undergo aggregation or any other form of self-association. The fact
that protein denaturation can be reversible was known by protein chemists as early
as 1925 following the discoveries by Anson and Mirksy [1], but there was no
insight into how a protein folds to become active.
In his experiments, Anfinsen employed reagents that allowed him to manipulate
the redox status of RNase’s disulphide bonds and its denaturation. To denature
RNase A, Anfinsen used urea, a chaotropic agent that breaks up non-covalent
interactions (e.g. hydrogen bonds and ionic interactions) responsible for main-
taining the protein’s secondary structural elements. To disrupt the covalent disul-
phide bonds that hold together different regions of the protein contributing to the

Fig. 5 Cartoon
representation of bovine
RNase A (PDB ID: 1FS3), C58 N-term
highlighting the four S–S
bonds that hold the protein’s
tertiary structure
C110 C26
C84
C72

C95
C65 C40
12 Protein Folding: An Introduction

stabilisation of the native structure, he used a reducing agent called


b-mercaptoethanol (b-ME).
With these tools, Anfinsen was therefore capable of abolishing the secondary
and tertiary structures of RNase A, and was able to test under which conditions the
denatured protein would refold into a biologically competent conformation. This
could be easily accessed through an enzymatic assay.
Anfinsen’s experiments can be summarised in a sequence of observations, which
were drawn during several years and led to discoveries reported in his seminal
publications [37–42] (Fig. 6).
In the first experiment, highly purified and active RNase A was treated with
excess b-mercaptoethanol and 8 M urea. Under these conditions, the protein
undergoes denaturation and the disulphide linkages break up due to disulphide
reduction (Fig. 6a). Anfinsen observed that upon these procedures, RNase A
activity was lost, implying that both the protein’s structure and the network of
disulphide bonds were important for enzymatic activity.
Next, Anfinsen observed that if both urea and b-mercaptoethanol are slowly and
simultaneously removed in the presence of oxygen (e.g. through dialysis or dilu-
tion), the result is regaining of RNase A activity (Fig. 6b). This indicates that

40-95 Native Denatured reduced


26-84
40 SH
N 26 40
SH
84 58
95 HS
26 (a)
N 65
110
Addition of 8M urea HS
SH
and β -mercaptoethanol 72
65-72 58 110
HS
84
72 SH
65 HS 95
Dyalisis of 8M urea
58-110
and β -mercaptoethanol
(b)

(d) (c)
removal of urea Reoxidised
and addition of a under
minute amount of denaturing
58 110
β-mercaptoethanol 40 conditions
26
65

N 72

Disulfide scrambled

Fig. 6 A schematic representation of Anfinsen’s experiment


2 Protein Folding—Why Is Structure Acquired? 13

upon removal of the denaturant under oxidative conditions that favour the
re-establishment of disulphide bonds, the protein regains its native conformation
that corresponds to an enzymatically active form.
However, if the initial mixture of RNase A was reoxidised under denaturing
conditions, Anfinsen noted that this resulted in scrambling of RNase A disulphide
bonds (Fig. 6c). By scrambling, we mean that a random set of S–S linkages are
formed among all possible interactions available from connecting the eight cysteine
side chains. Even if urea is subsequently removed under oxidising conditions,
RNase A was only around 1% active. This observation suggested that the functional
RNase A conformation can only be achieved if the polypeptide arranges itself in a
way such that the specific cysteine pairs that are supposed to form disulphide bonds
are brought together through folding. This implied that the formation of RNase A
structure is dictated by the primary sequence and that, like Anfinsen put it, ‘the
scrambled protein appears to be devoid of the various aspects of structural reg-
ularity that characterize the native molecule’ [42]. Indeed, removal of urea from
scrambled inactive RNase and addition of a minute amount of b-mercaptoethanol
would result in the interchange of the disulphides and on its gradual interconversion
into a fully active conformer, indistinguishable from native RNase A (Fig. 6d).

2.2 The Thermodynamic Hypothesis

A corollary of Anfinsen’s experiments on RNase A was the widely known ther-


modynamic hypothesis of protein folding. As per Anfinsen’s own words, ‘this
hypothesis states that the three-dimensional structure of a native protein in its
normal physiological milieu (solvent, pH, ionic strength, presence of other com-
ponents such as metal ions or prosthetic groups, temperature and other) is the one
in which the Gibbs free energy of the whole system is lowest; that is, the native
conformation is determined by the totality of the interatomic interactions and hence
by the amino acid sequence, in a given environment’ [42].
His postulate, which also became known as Anfinsen’s Dogma, states that since
protein folding is a spontaneous process at constant temperature and pressure, then
the native conformation must correspond to the state that is the global minimum of
the Gibbs free energy of the whole system (i.e. solvent and protein). The native
conformation is therefore unique, in the sense that there is no other biologically
active conformation with a comparable (minimal) free energy under the same
environmental conditions. Strictly speaking, globular proteins are flexible entities in
solution. However, the range of allowed motions is rather limited and only a small
fraction of the total ensemble of native structures can deviate from this (folded)
configuration [33].
14 Protein Folding: An Introduction

2.3 Driving Forces for Protein Folding—Hydrophobic


Effect and the Thermodynamics of Protein Folding

What are then the dominant driving forces for protein folding? In a nutshell, protein
folding can be depicted as a process during which the exposure of non-polar side
chains to the surrounding aqueous environment is minimised, while packing
interactions and hydrogen bonds are optimised. The former describes the so-called
hydrophobic effect, which, as we shall see, constitutes the major thermodynamic
driving force for protein folding, while the latter refers to energetic contributions
from low-magnitude interactions, notably van der Walls interactions, which have
been overviewed in detail in Sect. 1.3.
Thermodynamically, the stability of a protein is given by the free energy dif-
ference between its folded and unfolded states. For a simple two-state system in
which a protein is in equilibrium between unfolded (U) and folded (F) states,

UF; ð1Þ

the free energy change of the folding process (DGfolding) is given by:

DGfolding ¼ RT ln Keq ¼ DHT  DSconf ð2Þ

where Keq is the equilibrium constant, DH the enthalpy change and DSconf the
conformational entropy change. R is the universal gas constant and T is the absolute
temperature.
The experimental determination of the free energy difference between the
unfolded and the folded states can be obtained through gradual protein denaturation
of the folded protein under equilibrium conditions (Box 3—Protein Denaturation).
This can be achieved by applying the linear extrapolation method to analyse
chemical denaturation curves [43, 44], or using calorimetric approaches during
protein thermal unfolding [45].

Box 3—Protein Denaturation


Protein denaturation is the process through which a protein loses interactions
that stabilise the native state due to physical and chemical changes in the
environment. Protein denaturation can be induced in several ways, leading to
different effects on protein stability and structure; the latter can be determined/
monitored using thermodynamic or spectroscopic approaches. A denatured
protein loses its tertiary structure and is essentially structurally disorganised.
However, the protein unfolding process may result in a partially organised
state that may retain some secondary structure as a result of the intercon-
version of alternative secondary structures—this includes molten globules as
well as other non-native conformers.
2 Protein Folding—Why Is Structure Acquired? 15

Thermal Denaturation
Temperature increase leads to the destabilisation of non-covalent interactions
resulting in native structure loss. During thermal denaturation, the confor-
mational entropy of the unfolded state also increases and the entropic dif-
ference between the folded and unfolded states becomes large enough to
overcome the hydrophobic effect, thus resulting in protein unfolding.
Chemical Denaturation
The most common form of protein chemical denaturation is the one achieved
by chaotropic agents such as urea, guanidinium chloride (GuHCl) or guani-
dinium thiocyanate (GuSCN) (Fig. 7). While the exact mechanisms of action
are still under debate, one can take the general view that these compounds
disrupt the network of hydrogen bonds between water molecules, weakening
the hydrophobic effect, and therefore reducing the stability of the folded state.
In other words, by creating disorder in the solvent’s structure (i.e. by
increasing its entropy), chaotropic agents facilitate the solvation of the
non-polar side chains, and water molecules compete with atoms in the protein
for intraprotein interactions. After disruption of the secondary structure, these
denaturants will also interact directly with polar residues and the protein
peptide backbone resulting in the stabilisation of non-native conformations,
which easily convert into unfolded conformations. The relative efficacy of
these chemicals is GuSCN > GuHCl > urea. Organic solvents that interact
with non-polar groups in the protein interior will stabilise the unfolded states
of proteins and might as well result in protein denaturation.
pH denaturation
Protein structure and activity is optimal within a given pH range. Whenever
the solution’s pH changes in a way that affects the protonation state of side
chains of charged residues (Lys, Arg, His, Glu, Asp), there will be a weak-
ening of stabilising electrostatic interactions involving those groups.

Guanidinium Chloride Urea Guanidinium thiocyanate

Fig. 7 Chemical denaturants routinely used in in vitro experiments


16 Protein Folding: An Introduction

In fact, the determined values for DGfolding are extremely low: the difference in
thermodynamic stability between the folded and the unfolded conformations is as
low as 20–80 kJ mol−1. This difference is comparable to the magnitude of some of
the stabilising forces that held proteins together (hydrogen bonds, electrostatic
interactions, van der Waals interactions), and much lower than, for example, the
dissociation energy of a single covalent bond (200–500 kJ mol−1). The inventory
of thermodynamic contributions to the protein folding process as discussed in
Sect. 1.3 allows depicting that the net driving force for protein folding is the result
of the difference between energetic and entropic contributions of high magnitude
with opposing effects. However, proteins are dynamic entities and their confor-
mational flexibility, which is quintessential for biological functions, would not be
attained should proteins be highly stable. This is the reason why proteins are said to
be marginally stable, a property believed to have been positively selected during
evolution [46] (Fig. 8).
The unfolded state is stabilised by a high conformational entropy (−T  DSconf),
a term that results from the fact that random polypeptides can adopt a multitude of
distinct conformations with high mobility. The loss of conformational entropy is
thus a major opposing factor in the folding process, and its magnitude is higher
when the residual structure of the unfolded state is the lowest. For a completely
disorganised unfolded polypeptide, the configurational entropy would be the
highest; however, the fact that some proteins retain secondary structure in the
unfolded state indicates that the energetic penalty from this component is variable.
The main driving force for the folding process is the hydrophobic effect,
leading to non-polar interactions within the protein core. It illustrates the impor-
tance of water molecules, water structure and protein hydration in protein structure
and stability. The hydrophobic effect can be interpreted as follows. An unfolded
polypeptide exposes a high surface area of non-polar side chains to water mole-
cules, and this decreases the water H-bonding network, creating an energetically

Fig. 8 Thermodynamics of Folded Unfolded


protein folding
Gfolding= = - RT ln Keq = H - T. Sconf

H
-T S
Internal interactions
Conformational entropy
-T S

Hydrophobic effect

Net: 20-80 kJ.mol-1 G


2 Protein Folding—Why Is Structure Acquired? 17

unfavourable state. To minimise this, water molecules respond by becoming more


ordered around the exposed hydrophobic group tightening intrawater H-bonding.
Overall, this results in a loss of energy of the protein/water system and in loss of
solvent’s entropy around the protein; the way this can be counteracted is by
associating the non-polar groups in a way that they get separated from the aqueous
environment. The consequence of this association is the wrapping of the
polypeptide chain around these newly hydrophobic cores that will now be at the
protein interior, shielded from interactions with the solvent. The packing of
hydrophobic regions that are now interacting with themselves through dispersion
forces results in disruption of the water networks previously organised around the
protein non-polar groups. Released waters have higher mobility and are involved in
fewer H-bonding per water, representing a state of higher entropy. Therefore, the
hydrophobic effect is essentially of entropic nature and is no less than a decrease in
an unfavourable energetic state involving solvent water molecules interacting with a
polypeptide, which results in its folding.
The driving force for this phenomenon is not attraction of hydrophobic groups to
each other; rather, it occurs because water molecules would associate with each
other than with hydrophobic moieties. The magnitude of the hydrophobic effect is
still debatable—however, for some proteins, it can be correlated with the amount of
surface area buried upon folding and with the reduction of the volume of a given
amino acid as it gets buried into the protein interior [32].
Given that the folded state is stabilised by non-polar interactions, and the
unfolded state exclusively stabilised by conformational entropy, the major driver
for folding is the hydrophobic effect, i.e. a thermodynamic drive. However, van der
Waals interactions and electrostatic interactions—hydrogen bonds and ionic inter-
actions, albeit occurring in both the folded and the unfolded states, have different
magnitudes in both states and are more significant in folded proteins. In summary,
the hydrophobic effect drives the collapse of the polypeptide chain, and the
low-magnitude interactions overviewed in Sect. 1.3 favour the internal organisation
of the protein contributing to its energetic stabilisation. In nature, some proteins
have enhanced thermal stability as a result of evolutionarily improved energetic
stability (Box 4—Thermostable Proteins)

Box 4—Thermostable Proteins


Thermophilic organisms that thrive in volcanic regions, hot pools and thermal
lakes are frequently referred to as subsisting under extreme temperatures and
environments, but the fact is that this statement is rather anthropocentric. The
growth temperatures under which thermophiles (Topt up to 65 °C) and
hyperthermophiles (Topt above 80 °C) grow are optimal growth temperatures
for these organisms, which have resulted from evolutionary adaptation to
specific habitats and their environmental conditions. Microbes living opti-
mally in habitats in which temperature is well above the so-called mesophilic
conditions (20 < T (°C) < 45) do so as they have evolved an impressive
18 Protein Folding: An Introduction

portfolio of strategies that have allowed the adaptation of biochemical and


cellular processes to operate optimally at high temperatures.
These thermoadaptive strategies include for instance the preferential
synthesis of lipids that result in tighter and less flexible biomembranes,
increased DNA protection by histone-like proteins or the biosynthesis of the
so-called compatible solutes, which are small molecules that accumulate at a
very high intracellular concentration and exert a stabilising effect over bio-
molecules. These are usually sugar-based carbohydrate chemical moieties,
and many (hyper)thermophiles have evolved to synthesise unique com-
pounds, such as mannosylglycerate and glucosylglycerate, that afford an
impressive extrinsic stabilisation over proteins by osmolyte effects [47–49].
However, (hyper)thermophilic proteomes are intrinsically stable, irre-
spective of extrinsic stabilising factors: the proteins from these organisms
remain folded and function under these high temperatures at which cells have
evolved to grow optimally at. A long-standing question in the field of protein
science has thus been What makes a protein thermostable? This is a major
gap in the basic knowledge related to the principles of protein structure and
folding as discussed throughout this volume, but it has also paramount
importance regarding biotechnological applications of proteins. Indeed, the
ability to engineer proteins to make them more thermostable or to increase the
catalytic efficiency of enzymes working at high temperatures is extremely
valuable in a number of industries [50].
This, for example, includes proteases withstanding higher temperatures to
increase the efficiency of bio-friendly lower-phosphate content detergents,
thermostable enzymes for dairy and food industries or antibodies or other
protein-based biologics for biomedical and therapeutic applications whose
longer thermal stability would thus allow them to withstand longer storage
periods or do not require so strict low-temperature storage conditions. The
hallmark example of a thermophilic protein with massive biotechnological
importance and market value is the DNA polymerase from the thermophilic
bacterium Thermus aquaticus [51], the well-known Taq polymerase whose
optimal catalytic temperature of around 80 °C made it perfectly suited to
withstand the high temperature at which the thermal cycling reactions take
place.
The advent of sequencing of complete genomes and its massification from
the beginning of the twenty-first century have anticipated the possibility that
general rules determining enhanced protein thermal stability could be inferred
from global analysis of (hyper)thermophilic genomes. This has been further
propelled by the fact that the complete genome of the first thermophile, that of
the methanogenic archaeon, Methanococcus jannaschii (Topt = 85 °C) has
been determined as early as 1996 [52]. A recent account has revealed an
impressive number of nearly 250 complete thermophilic genes, of which
around 30% are hyperthermophiles [53]. However, despite this wealth of
genomic data and many comparative studies, between mesophilic and
2 Protein Folding—Why Is Structure Acquired? 19

thermophilic genomes, as well as between homologous thermophilic and


mesophilic protein families, the fact is that no holy grail of protein ther-
mostability has been discovered. Nevertheless, such studies have allowed
establishing several important factors in dictating thermostability, including
amino acid composition bias, structural factors, conformational dynamics and
stability-activity trade-off in enzymes that reflect the fact that multiple
strategies that result in lifting the curve towards higher free energy values
(Fig. 9) allow for high stability of the thermophilic proteins.
From these, we can define a generic set of hallmark characteristics of
thermophilic proteins that can be organised as structural and sequence
adaptations [54, 55]:
Structural adaptations
• thermophilic adaptation in general results in an increase of protein
structural rigidity while retaining local flexibility of functionally important
regions;
• differences between native and denatured states, with more compact
denatured states in thermophiles than in mesophiles which may still retain
residual structure; the effect of entropy on increased stability may also
arise from different degrees of compactness in the native structure;
• decreased number of cavities and buried polar residues, as these are
usually destabilising) and extensive hydrogen bonding and secondary
structural elements, which generally have a stabilising contribution;
• specific amino acid substitutions lead to reduced entropy in the unfolded
state due to different degrees of flexibility;
• different role of electrostatics (charged residues) has also been attributed
to enhanced stability (enthalpy gain from ionic interactions)
Sequence adaptations
• increase in non-polar amino acids, especially hydrophobic and Pro
residues which contribute to the hydrophobic interactions;
• increase in charged amino acids, especially Arg and Glu residues which
contribute to the ionic interactions;
• increase in aromatic amino acids, especially Tyr residue which contribute
to the cation-p interactions;
• decrease in Met and uncharged polar residues which are thermolabile
amino acids;
• on the other extreme, psychrophiles (i.e. organisms living at temperatures
below <20 °C) must contain cold-adapted proteins and enzymes which
require a set of features that are in a way opposing to those observed in
thermophilic proteins.
Cold-adapted proteins have evolved to be structurally flexible and cat-
alytically efficient at low temperatures, and this is achieved through small
20 Protein Folding: An Introduction

changes in protein secondary and tertiary structures. In comparison with


mesophilic counterparts, psychrophilic proteins have decreased disulphide
bonds, a decrease of net charge in helix-dipole structures, a decrease in
protein-solvent interactions, a decrease in the number of hydrogen bonds at
domain interfaces and a general decrease in the number of hydrophobic
interactions within the core of the protein [55].
Some enzymes from thermophiles that are very stable at optimal tem-
peratures for the host (high) have low activities at the lower temperatures
(37 °C). There is therefore a compromise between the stability and activity in
the structure of the active site of a protein. There are several positions in the
active site that can be mutated to give more stable but less active protein and
vice versa. This activity-stability trade-off implies that an increase in activity
is accompanied by a concomitant decrease in protein stability [56].

Fig. 9 Typical protein 40


stability profiles of proteins
from different sources
Free Energy ( G)

30 Therm

20 Meso

10 Psycro

0
20 40 60 80 100
Temperature (°C)

3 Folding Kinetics and Mechanisms: How Is Structure


Acquired?

3.1 Two-State Cooperativity in Protein Folding

The beginning of research in protein folding can be traced back to the work
developed in the 1930s by Anson and Minsky who studied and discussed the
reversibility of protein denaturation [57]. Indeed, the refolding process observed
upon complete and previous unfolding of a protein under controlled experimental
conditions (of pH, ionic strength, ion concentration, etc.) offers an operational
3 Folding Kinetics and Mechanisms: How Is Structure Acquired? 21

(although not perfectly adequate) model to mimic the process of protein folding
inside the living cell (i.e. in vivo).
An interesting feature of reversible (thermal or chemical) protein denaturation
curves is their peculiar shape (Fig. 10a). The sigmoidal (or ‘S-shaped’) curve
indicates an abrupt change of a measurable protein property (e.g. energy, gyration
radius) from native to denatured state values. Furthermore, a narrow transition
region (as quantified by DT) indicates that many amino acids are involved in the
process. The ‘S-shaped’ curve is thus considered the hallmark of a cooperative
process. Protein folding cooperativity comes in two flavours: two-state (first-order
or ‘all-or-none’) cooperativity and one-state (or higher-order) cooperativity. When
the folding transition can be modelled by a two-state process, there is a
temperature (Figs. 10a and 10b), the so-called transition midpoint (or melting
temperature) Tm, at which the distribution of molecules over a measurable property
is bimodal (Fig. 10c).
This means that at Tm, only two states (the native and denatured) co-exist in
thermodynamic equilibrium, and the process resembles a first-order phase transition
between homogeneous phases (e.g. the gas-liquid, or solid-liquid phase transition).
This type of transition is typical of small, single-domain proteins (*150 amino
acids). Larger proteins, on the other hand, are likely to fold by populating inter-
mediate states as indicated by the presence of plateaus and shoulders in the tran-
sition curves, as well as multiphasic kinetics. Studies on molten globule states have
provided some experimental insights into what a common intermediate of the
protein folding process might be (Box 5—Molten Globules)

Fig. 10 Protein denaturation transition. a Thermal denaturation curve exhibiting a sigmoidal


shape with DT measuring the width of the transition and Tm being the melting temperature (or
transition midpoint). b Heat capacity curve obtained via differential scanning calorimetry, showing
that the heat capacity peaks at Tm. c In a two-state folding transition, the distribution of molecules
over a measurable property is bimodal at Tm. Adapted from fig. 4 in [58]
22 Protein Folding: An Introduction

Box 5—Molten Globules


Molten globules are compact, partially folded conformations of proteins that
are thought to be common intermediates in the protein folding process.
Molten globules are also formed when native proteins are exposed to mildly
destabilising conditions such as low pH, low concentrations of chemical
denaturants (<2 M GuHCl or urea) or upon protein-membrane interactions or
loss of a cofactor. The main characteristics of the molten globule state have
been defined by the pioneering work from Ptitsyn on a-lactalbumins in the
1970s [59, 60]. Ptitsyn proposed a model according to which a molten
globule is an intermediate of the folding process that preserves the mean
overall structure of the folded protein but differs mainly by having a looser
packing and higher mobility at loops and at the termini of the protein chain
(Fig. 11).
These observations have been subsequently expanded to many other
proteins and resulted in the currently accepted biophysical criteria for
establishing a molten globule state:
• a compact globular state yet with an expanded radius of about 10% with
respect to that of the native protein;
• a native-like secondary structure as detected by far-UV circular dichroism;
• loss of tertiary interactions with no detectable near-UV CD signal,
quenched Trp-fluorescence signals and NMR peak broadening;
Solvent exposure of hydrophobic patches due to non-specific assembly of
secondary structure and hydrophobic interactions is detectable by 1-anilino
naphthalene-8-sulphonic acid (ANS) fluorescence enhancement.

Fig. 11 Molten globule


model

Native Structure Molten Globule

When denaturation is performed via differential scanning calorimetry (DSC) the


heat capacity peaks at Tm (Fig. 10b), and the van’t Hoff criterion (Box 6—
Thermodynamic Cooperativity and the van’t Hoff Criterion), introduced by Lumry,
Biltonen and Brandts in the 1960s, is often applied to determine whether the
3 Folding Kinetics and Mechanisms: How Is Structure Acquired? 23

molecule melts as a single unit as expected in a two-state transition [61]. When the
van’t Hoff criterion is fulfilled, the model protein is said to exhibit thermodynamic
or calorimetric cooperativity.

Box 6—Thermodynamic Cooperativity and the van’t Hoff Criterion


To apply van’t Hoff criterion, a DSC experiment is carried out such that the
temperature varies from a temperature below the transition midpoint (Tm) to a
temperature well above it. Two independent measurements are performed: the
enthalpy change, DHcal , which is given by the area under the Cp versus
T curve after baseline subtractions and includes contributions from all pro-
cesses (Fig. 10b), and the van’t Hoff enthalpy, DHVH , which assumes a
two-state process. In protein folding, the latter is generally taken as
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
DHVH ¼ 2TMAX kB CP;MAX ,
where TMAX is the temperature at which the heat capacity peaks, CP,MAX is
the maximum value of the heat capacity, and kB is the Boltzmann constant
[58]. When DHVH =DHcal  1; the protein is considered to exhibit thermo-
dynamic (or calorimetric) cooperativity, i.e. the molecule melts as a single
unit and the transition is considered two-state. To see why, consider the
following reasoning [58]. Using the definition of heat capacity at constant
pressure CP ¼ ð@H=@T ÞP , it comes that DHcal ¼ HN  HU , with HN(U) being
the average enthalpy of the native (denatured) state. On the other hand, since,
2
r2
CP ¼ hH kihHi
2

BT
2 ¼ kB TH 2
with rH being the standard deviation of H (see also Fig. 10c), the van’t
Hoff enthalpy can be re-written as DHVH ¼ 2rH;MAX , i.e. van’t Hoff enthalpy
is equal to two times the standard deviation of enthalpy at TMAX. For the van’t
Hoff enthalpy to be approximately equal to the calorimetric enthalpy, the
enthalpy distribution should be bimodal with one peak corresponding to the
average enthalpy of the native state and the other to that of the denatured state
(Fig. 10c). This does not mean that there is absolutely no conformation with
enthalpy values in between, though the van’t Hoff criterion does require their
population to be very small.

A more stringent criterion for two-state cooperative folding is the observation of


kinetic cooperativity, whose hallmark is the so-called chevron behaviour (Box 7
—Two-State Folding Kinetics and Chevron Plots) (reviewed in [58]). This is so
because kinetic cooperativity implies calorimetric cooperativity, but the reverse is
not necessarily true.

Box 7—Two-State Folding Kinetics and Chevron Plots


The so-called two-state kinetics is a simple phenomenological model, which
is used to represent folding kinetics when experimental data are well fitted by
single-exponential time decay. The kinetics is called two-state because the
24 Protein Folding: An Introduction

fitting can be done by assuming that the only populated states are the native
(N) and the denatured states (D) (Fig. 12a). This does not mean that there are
no folding intermediates; indeed, these could be high-energy conformations,
which are transiently populated, and will not be detected by conventional
methods.
Two-state folding kinetics is generally studied in the context of transition
state (TS) theory by assuming that the folding rate constant, kf, exhibits an
Arrhenius-like dependence on activation energy of folding, i.e. the free
energy difference between the thermodynamically unstable folding transition
state (TS) and D,
kf / expðDGTSD =RTÞ,
and the unfolding rate constant is determined by the difference in free
energy between N and TS,
ku / expðDGTSN =RTÞ,
where R is the gas constant and T the absolute temperature.
Experiments show that two-state relaxation times have ‘chevron plot’
dependences on the concentration of denaturant (e.g. [GuHCl]) (Fig. 12b).
More precisely, ln kobs versus [denaturant] gives a V-shaped kinetics curve.
In the chevron plot,
kobs ¼ kf þ ku (s−1), with
kf ¼ kfH2 O expðmkf ½denaturantÞ and,
ku ¼ kuH2 O expðmku ½denaturantÞ,
H2 O
where kuðfÞ is the value extrapolated in the absence of denaturant and mk is
a constant of proportionality [62]. Therefore, the observation of a chevron
plot is often used as a signature of two-state folding kinetics. To construct a
chevron plot, one needs to perform folding experiments and unfolding
experiments to measure kobs under different denaturing conditions. In folding
experiments, a sample of unfolded protein in high-denaturant conditions is
rapidly mixed with an excess of buffer resulting in a low overall concentration
of denaturant. On the other hand, in unfolding experiments, a jump to
unfolding conditions driven by the rapid mixing of the protein with an excess
of denaturant solution results in a high-denaturant concentration. Folding (or
unfolding) progress is monitored by recording structural changes with an
optical probe (e.g. CD or fluorescence, see Sect. 5.1) and kobs is measured at
selected denaturant concentrations. In folding experiments, kobs approximates
kf for low-denaturant concentration, while in unfolding experiments, it
approximates ku at high-denaturant conditions. The chevron plot has a
characteristic V shape because the protein folds more slowly and unfolds
H2 O
more rapidly in the presence of denaturant than in pure buffer. ln kuðfÞ can be
determined by extrapolating back ln kobs to zero-denaturant conditions. The
folding transition is considered to exhibit two-state cooperativity when its
kinetics shows chevron behaviour. On the other hand, the occurrence of the
so-called chevron ‘rollovers’ (where chevron plots flatten out at very low
3 Folding Kinetics and Mechanisms: How Is Structure Acquired? 25

denaturant, or even where folding rates decrease as denaturant concentration


decreases to zero- Fig. 12b), typically indicates the presence of folding
intermediates, which can be either on- or off-pathway.

Fig. 12 Two-state folding kinetics. The two-state model of folding kinetics is generally addressed
in the context of transition state theory where a free energy barrier (on the top of which lays the
transition state) separates the denatured and the native states (a). Dependence of the observed rate
constant on denaturant concentration showing typical linear chevron behaviour. The deviation
from linear behaviour, the so-called chevron rollovers (grey line), indicates the presence of
intermediate states

In general, a cooperative process is expected to involve interactions that do not


behave independently of each other; what this means is that the establishment of
one interaction should make the formation of additional interactions more favour-
able. A fundamental question that has attracted considerable attention concerns the
identification of the physical mechanism behind two-state cooperativity in protein
folding. Since it is not yet possible to modulate folding cooperativity in experiments
in vitro, this question has been mainly addressed by theoretical studies framed on
molecular simulations (Sect. 5.1).
The general picture that emerges from these studies is that two-state folding
cooperativity results from non-trivial energetics based on multi-body interactions.
Although the atomistic origins of many-body interactions are not well understood,
desolvation effects and side chain packing are likely contributors [63]. A recent
study also emphasised the importance of N- to C-termini coupling in protein folding
by providing compelling evidence that the existence of native interactions between
the terminal regions of the polypeptide chain (a structural feature shared by small,
single-domain proteins [64]) is a major determinant of the height of the free energy
barrier that separates the folded from the denatured state in a two-state transition,
being therefore a critical modulator of protein folding rates and thermodynamic
cooperativity [65]. Interestingly, it has been recently shown through molecular
26 Protein Folding: An Introduction

simulations [66, 67] that single-point mutations leading to unstructured termini


detached from the protein’s core (i.e. mutations that weaken the coupling between
protein termini) are associated with the population of aggregation-prone interme-
diate states (Sect. 4.1), some of which resemble a molten globule (Box 5—Molten
Globules) [66].

3.2 The Levinthal Paradox and the Timescale of Protein


Folding

The two-state model of protein folding, which became widely adopted in the 1960s,
led Cyrus Levinthal to establish the so-called Levinthal paradox [68]. He argued
that if folding is really a two-state process, without intermediates, a protein must
randomly explore all accessible unfolded conformations in order to find the native
one, which is a global free energy minimum according to Anfinsen’s thermody-
namic hypothesis [42] (see Sect. 2.1). If the search is unbiased (i.e. all unfolded
conformations are equally probable), a simple counting argument leads to an absurd
estimate of the folding time: assume that each amino acid within a protein can only
adopt two different conformations; assume as well that the conformational change is
so fast that amino acids can switch conformations in just one picosecond (the
timescale of a thermal vibration). Then, a small protein with 100 amino acids would
have access to a total number of 2100 (*1030) conformations, therefore requiring
about 2100 ps (i.e. 1010 years) to find the native one. Yet, proteins typically fold in
the timescale of milliseconds to seconds (exceptions include model systems where
proline isomerisation or other specific issues slowdown folding considerably). To
bypass this paradox, Levinthal proposed that protein folding should occur under
kinetic control rather than thermodynamic control. What this means is that instead
of folding to the structure that is the most thermodynamically stable, as implied by
Anfinsen’s dogma, a protein must fold to a metastable structure (i.e. a local energy
minimum) that is accessible through the fastest folding pathway and is biologically
active. In Levinthal’s view, a folding pathway is a well-defined sequence of events
‘which follows one another so as to carry the protein from the unfolded random coil
to a uniquely folded metastable state’ [68]. According to Levinthal, the goal of
achieving a global free energy minimum is not compatible with doing it fast, and ‘if
the final folded state turned out to be the one of lowest conformational energy, it
would be a consequence of biological evolution not of physical chemistry’. The
Levinthal paradox is an important landmark in the history of protein science
because it stimulated research on the kinetics and mechanisms of protein folding,
i.e. How do proteins acquire its native structure in a biologically relevant timescale?
Needless to say the search for mechanisms based on the formation of intermediate
states dominated folding arena for quite a while [69]. Eventually, in 1996, a model
for a kinetically controlled folding pathway as envisioned by Levinthal was sug-
gested for protein Serpin Plasminogen Activator Inhibitor 1 (PAI-1); the latter
3 Folding Kinetics and Mechanisms: How Is Structure Acquired? 27

forms a metastable active structure that converts slowly to the more stable but
low-activity, ‘latent’ conformation [70].

3.3 Mechanisms of Protein Folding

Solving the mechanism of protein folding is a problem of paramount importance in


protein science, but unravelling its solution has been far from straightforward. In
part, this difficulty stands from the fact that proteins do not appear to fold by means
of a unique mechanism, and, over the years, several phenomenological models have
been proposed for protein folding. Levinthal himself suggested that protein folding
would be speeded up by what he termed ‘nucleation points’, local interactions
between the amino acids that initiate the folding process by forming stably and
rapidly, allowing the subsequent formation of larger structural elements that
eventually coalesce into the native structure [71]. Implicit in Levinthal’s suggestion
was the core idea of what later became known the framework model, which
envisions protein folding as a hierarchical process, where the formation of
hydrogen-bonded secondary structural elements (e.g. a-helices and b-sheets) pre-
cedes and drives the formation of the tertiary structure [72]. The diffusion–collision
model, proposed in the 1970s, also falls in the class of a framework model; it
assumes that the core of the folding process is based on diffusive encounters (i.e.
collisions) between secondary structural elements (and other metastable regions of
the structure) leading to intermediate states of higher stability, which eventually
coalesce to give rise to the tertiary structure [73]. The hydrophobic collapse model
offers a completely different view of the folding process. Indeed, it assumes that the
chain initially collapses as a result of the hydrophobic effect and subsequently
rearranges locally to form the correct secondary structural elements [74]. Therefore,
in this case, it is the tertiary structure that forms first, while in the framework model,
this step is always preceded by the formation of secondary structural elements.
Research carried out in the 1980s and 1990s by several laboratories worldwide
emphasised the idea that small, single-domain proteins (*150 amino acids),
epitomised by chymotrypsin inhibitor 2 (CI2), fold fast in the timescale of
microseconds to seconds without the need to populate (stable) intermediate states
[75]. Jackson and Ferhst showed that the folding kinetics of CI2 fits remarkably
well a single-exponential model [76]. In the follow-up of these findings, a model of
two-state folding framed on Eyring’s transition state theory became immensely
adopted, and the folding transition state came into the spotlight of protein folding
research. If transition state theory applies to protein folding, the formation of the
native structure and the folding rate should be limited by the formation of the
thermodynamically unstable transition state. How does the folding transition state
look like? How can it be studied experimentally? These questions attracted con-
siderable attention in the 1990s and early 2000s.
To help answering these questions, researchers started using molecular simu-
lations (Sect. 5.1) to study protein folding by taking advantage of a considerable
28 Protein Folding: An Introduction

increase in computer power that started in the 1990s. Indeed, the possibility to use
computer simulations of simplified protein models combined with theoretical
studies framed in statistical mechanics played a decisive role towards unravelling
this remarkable biological puzzle leading to two landmark concepts in protein
folding. One is that of the folding funnel, developed in the context of the free
energy landscape theory, while the other is the nucleation condensation
(NC) mechanism of protein folding. Interestingly, the NC mechanism was predicted
in Monte Carlo lattice simulations at about the same time it was first supported by
experiments in vitro. Given their importance in the development of a theory of
protein folding, we address them briefly.

3.4 The Nucleation Condensation Mechanism of Protein


Folding

Although a model for folding as being limited by nucleation had been originally
proposed by Baldwin and co-workers in the early 1970s [77], it was only in the
1990s that Shakhnovich and co-workers reported the first detailed microscopic
study, based on Monte Carlo simulations of a simple lattice model, which supports
the hypothesis of a nucleation mechanism, akin to the nucleation growth mechanism
of first-order phase transitions, being at the heart of the folding process [78]. Indeed,
Shakhnovich and co-workers observed that in the folding of the lattice model sys-
tem, the rate-limiting step is the formation of a specific set of native interactions
predominantly involving residues far apart in the sequence, termed folding nucleus
(FN). Once the FN is formed, the native state is promptly (and reproducibly)
achieved. Since the formation of the FN is rate limiting, then it should coincide with
the formation of the TS in a two-state folding transition. Therefore, nucleation and
TS became inextricably linked topics in the context of protein folding.

3.5 Phi-value Analysis and the Structure of the Folding


Transition State

Shortly after Shakhnovich’s discovery, the extensive use of a protein engineering


method developed by Fersht, termed phi-value analysis, provided the first
microscopic characterisation of the structure of the TS of CI2 [79]. The phi-value,
u, is obtained by measuring the effect of a single-site mutation on the folding rate
and folding stability, namely

u ¼ RT lnðkmut =kWT Þ=DDGND ;

where kmut and kWT are the folding rates of the mutant and wild-type (WT) proteins,
respectively, and DDGND is the change in the free energy of folding upon mutation
3 Folding Kinetics and Mechanisms: How Is Structure Acquired? 29

[62]. For a conservative mutation (which causes a small perturbation in the folding
process), RT lnðkmut =kWT Þ measures the change in the activation energy of fold-
ing, DDGTSD , and therefore

u ¼ DDGTSD =DDGND

Likewise, for two-state folding proteins, a phi-value near unity means that the TS
is energetically perturbed upon mutation as much as the native state is perturbed.
This can be interpreted as if the mutated residue is fully native (i.e. has all its native
interactions established) in the TS. On the other hand, a phi-value near zero can be
taken as evidence that the residue is as unstructured in the TS as it is in the
denatured ensemble. The occurrence of fractional phi-values may indicate the
existence of multiple folding pathways or a unique transition state with genuinely
weakened interactions [62]. Moreover, the interpretation of the so-called nonclas-
sical phi-values (u > 1 and u < 0) is also not straightforward and alternative
models for the phi-value have been proposed [80].
According to Fersht and co-workers, the picture of the TS that emerges from the
phi-value analysis is compatible with CI2 folding via a nucleation mechanism like
that reported for lattice proteins. The lack of tertiary structure in the TS of CI2 was
taken as evidence that secondary and tertiary structures form concomitantly in a
process that is triggered by the formation of the FN, a set of local interactions
stabilised by a few long-range interactions which are mainly associated with the
residues displaying the highest phi-values. Such a process was coined the nucle-
ation–condensation (NC) mechanism of protein folding [81]. Subsequent studies,
focusing on other target proteins, provided further evidence that the NC mechanism
is common among small, single-domain proteins (reviewed in [82]).

3.6 The Energy Landscape and Folding Funnels

In the early 1990s, Wolynes, Onuchic and co-workers developed the free energy
landscape theory of protein folding in the framework of simple statistical
mechanics models of polymers, theory of spin glasses and computer simulation
[83–87].
The landscape theory, its underlying concepts and subtleties have been deeply
discussed and reflected by Dill and Chan in a series of pedagogical papers [88–90].
In what follows, we provide a brief summary of the theory’s pivotal concepts
following Refs. [88, 89].
The free energy landscape is a multidimensional representation of the folding
process where the vertical axis represents the internal free energy of a single
conformation, while the multiple lateral axes represent the conformational coordi-
nates (e.g. the dihedral bond angles, /1, W1, /2, W2 …). The internal free energy of
one conformation accounts for everything (i.e. hydrogen bonds, salt bridges, torsion
angle energies, hydrophobic and solvation free energies, etc.) except for the
30 Protein Folding: An Introduction

conformational entropy. It is called a free energy (and not merely energy) because
of the solvation terms, which can involve entropic contributions due to water
ordering.
Also, it does not represent the macroscopic free energy that would be measured
in a folding experiment in vitro because the internal free energy describes a
single-chain only, and not the ensemble average over all chain conformations. The
interested reader is advised to read [91] where an illuminating discussion regarding
the relation between the internal free energy and macroscopic free energy is
provided.
A point in the free energy landscape represents a conformation, and geometri-
cally closed conformations are close to each other on the free energy landscape. In
this framework, folding is viewed as a succession of random conformational
transitions starting from an arbitrary unfolded conformation. In its search for the
native state, a protein will lower its internal free energy by shielding its
hydrophobic residues in the core, by increasing its hydrogen bond content, by
increasing the number of salt bridges, etc. while it becomes progressively more
compact. The energy landscape has a funnelled shape that reflects this behaviour,
i.e. the concomitant decrease in the chain’s internal free energy and conformational
entropy that occurs during folding (Fig. 13a).
Roughly speaking, the conformational entropy is the number of conformations
with a given value of internal free energy; it is represented by the funnel’s width. In
the landscape’s framework, it does not make sense to think of a folding pathway in
the Levinthal sense, i.e. a succession of transitions between specific conformations
leading to the native state in a directed manner. Rather, in the landscape view, the
process of protein folding can be metaphorically pictured like that of water flowing
down a mountain. Within this scenario, instead of thinking in terms of specific

Fig. 13 Protein folding funnels. Pictorial representation of a perfectly funnel- shaped free energy
landscape representing a fast two-state folding transition (a). The Levinthal paradox can be framed
in terms of a ‘golf-course’-shaped free energy landscape featuring only the denatured and the
native states. In the absence of an energetic bias, the protein needs to randomly explore the
ensemble of denatured conformations until it finds one that gives access to the native state (b)
3 Folding Kinetics and Mechanisms: How Is Structure Acquired? 31

conformations, one should think of ensembles of conformations: the denatured


state, at the top of the funnel, is the largest ensemble, corresponding to the largest
conformational entropy, and the native state, at the bottom, is the smallest.
The transition state is the ensemble of ‘bottleneck’ conformations located
somewhere near the bottom of the funnel. The width of the funnel progressively
shrinks down as folding progresses reflecting the fact that the number of accessible
conformations progressively narrows down to one as the chain approaches the
native state.
A chain attains its native state starting from an arbitrarily unfolded conformation
by many possible routes (and not only one specific pathway). Therefore, it satisfies
Anfinsen’s thermodynamic hypothesis in a directed and rapid way. The folding
funnel is, therefore, a conceptual insight that reconciles Anfinsen’s thermodynamic
hypothesis with the biological timescale of protein folding. In particular, fast
single-exponential kinetics is described by funnel-shaped landscape with no sig-
nificant kinetic traps (i.e. a smooth landscape). As pointed out by Dill and Chan
[89], the folding funnel shows that the Levinthal paradox is not a ‘real’ problem but
an artefact from framing the folding problem in terms of a golf-course energy
landscape, where all the unfolded conformations share the same internal free energy
(Fig. 13b).

3.7 The Importance of Native Geometry as a Determinant


of Folding Rates

In 1998, Plaxco et al. proposed the relative contact order parameter, CO, as a
simple empirical measure of the native structure’s geometric complexity [92].
The CO measures the average sequence separation of contacting residue pairs in the
native structure relative to the chain length of the protein and is defined as

1 X
N
CO ¼ Dij ji  jj
LN i;j

where Dij ¼ 1 if residues i and j are in contact and is 0 otherwise; N is the total
number of contacts and L is the chain length. The CO can be viewed as a metric of
native geometry because it measures the average sequence separation of contacting
residue pairs in the native structure. In a subsequent study, Plaxco et al. reported a
rather strong correlation (r = 0.92) between the CO parameter and the logarithmic
folding rates of 24 small (*150 amino acids) single-domain, two-state proteins
[93], suggesting that the native’s state geometry could be the major determinant of
two-state folding kinetics.
Shortly after this discovery, several authors proposed alternative properties
(some of which bearing some resemblance with the CO) to quantify the geometry of
the native structure (e.g. the long-range order [94] and cliquishness [95], just to
32 Protein Folding: An Introduction

mention two examples) that appeared to correlate equally well with the folding
rates of small, two-state proteins. Clearly, part of the charm of these reductionist
approaches is that they dramatically simplify the solution to the folding puzzle since
the protein’s primary sequence, and all its inherent complexities, is no longer at the
centre stage of the problem. Concretely, Plaxco’s ‘CO-law’ somehow complements
Anfinsen’s folding principle—that the protein’s primary sequence determines the
native structure—by stating that it is the native structure itself, through its geometric
property CO, that determines the folding rate.
An array of studies collected in the last decade, both theoretical and experi-
mental, somehow contributed to strengthen the idea that the CO is a major driver of
two-state folding kinetics [96–100]. However, in order for the CO-rate dependence
to acquire the status of a fundamental principle of protein folding, it is necessary to
understand its fundamental roots, i.e. to identify the physical mechanism underlying
the correlation. Faísca and Ball were the first to explore the CO-rate dependence in
the context of Monte Carlo simulations of simple lattice models with additive
pairwise interactions, but they only observed moderately high correlations for long
chain lengths and high CO [101]. A subsequent study by Kaya and Chan, also
framed on lattice simulations, indicated that the CO-dependent rate may result from
a coupling mechanism between local and non-local (i.e. long-range) interactions
(i.e. interactions involving pairs of residues that are far away along the sequence)
[102]. Indeed, the latter renders the folding transition more cooperative (both
kinetically and thermodynamically) but also results in highly dispersed folding
rates, a necessary condition to observe a strong statistical correlation with the CO.
Several other physical mechanisms have been proposed to rationalise the
CO-dependent folding rates. An interesting example is the topomer search model,
which stipulates that the rate-limiting step in folding is a diffusive search within the
unfolded state for a conformer with the correct topology [103]. However, despite
these and more recent insights [104], the physical principles underlying the CO-rate
dependence remain elusive, and a widely accepted physical theory for the CO-rate
dependence is still missing. Furthermore, it is known that the CO alone is not able
to predict the folding rate of larger, multidomain proteins [105, 106]. Additionally,
when recent experimental kinetic data have been carefully selected for
single-domain proteins, taking care to eliminate temperature effects, the correlation
between folding rates and CO does not seem to be so relevant [107, 108].

3.8 The Folding Mechanism of Knotted Proteins

Researchers have been trying to disclose the folding mechanism of knotted proteins
over the last 12 years. Insights gathered in the context of molecular simulations,
using different kinds of models and methodologies, have been particularly important
and illuminating (reviewed in [109–111]). On the experimental side, research has
focused mainly on knotted trefoils YibK (PDB ID: 1j85) and YbeA (PDB ID: 1ns5)
[112]. In what follows, we provide a summary of the main results obtained so far.
3 Folding Kinetics and Mechanisms: How Is Structure Acquired? 33

According to the current picture, the folding mechanism of knotted proteins is a


highly ordered process, where the formation of the so-called knotting loop pre-
cedes the threading step upon which the protein gets knotted. Two main proposals
have been put forward for the threading step based on molecular simulations. In one
case, direct threading, the chain terminus that lays closer to the knotted core (i.e.
shortest knot tail) threads the knotting loop directly (see Fig. 14a) [21, 113, 114,
115, 116], while in the other case, slipknotting, the chain terminus arranges itself
into a hairpin that threads the knotting loop transiently forming a slipknotted
conformation (see Fig. 14b) [117–119]. According to simulations, the threading
step appears to be determined by the degree of participation of non-native inter-
actions (i.e. interactions which are not established in the native state) in protein
energetics [110].
The structural complexity associated with the knotting process typically leads to
slow folding rates of knotted proteins both in simulations and in experiments
in vitro. In part, a small folding rate results from backtracking, i.e. the breaking and
re-establishment of specific native contacts, which necessarily occurs when an
incorrect sequence of events takes place leading to malformed knots, or other
topologically trapped conformations. However, experiments mimicking in vivo
conditions showed that in the presence of the chaperonin GroEL-GroES (Sect. 4.1),
folding rate starting from unfolded (and unknotted) conformations becomes at least
20-fold higher, with GroEL-GroES having no effect on the folding rate of denatured
(and knotted) conformations. Since none of the proteins populate misfolded species
during or after translation, these results appear to indicate that the chaperonin is
likely to play a specific and active role in assisting the knotting step [120, 121].
Recent results obtained through molecular simulations suggest that, in part, this
acceleration results from an increased knotting frequency due to steric confinement
(which decreases the entropic barrier to knotting [122]) as well as a decrease of
backtracking [123, 124].

Fig. 14 Knotting mechanisms: Direct threading and slipknotting


34 Protein Folding: An Introduction

Although the chaperonin mechanism in YibK and YibA has not yet been
established, it has been suggested that it should not be limited to steric confinement
and it was proposed that the chaperonin facilitates unfolding of kinetically and
topologically trapped intermediates, or that it stabilises interactions that promote
knotting [121]. These specific effects remain to be elucidated, and, in more general
terms, the active role played by the chaperonin in the folding of knotted proteins
remains to be established.
An important question which also pends investigation is the relationship
between the knotting mechanism and the nucleation mechanism in knotted model
systems with two-state kinetics. Assuming that there is evolutionary control of the
folding speed, it should have resulted in additional pressure applied on the folding
nucleus [125]. Therefore, an overlap between folding and knotting may imply that
the interactions that nucleate the knot have also been optimised for folding speed
(i.e. the optimisation of the knotting mechanism is a side effect of folding opti-
misation). In line with this conjecture, designed protein 2ouf, embedding a trefoil
knot, folds fast with a two-state transition (with the nucleation of the transition state
and nucleation of the knot being concomitant processes) [126].

4 Protein Misfolding: Why Proteins Misbehave?

4.1 Protein Folding In Vivo

The physical principles that govern protein folding remain unaltered in vivo, despite
the increased complexity of the intracellular environment. Indeed, the interior of
cells is a highly crowded environment in which the amount of macromolecules
reaches very high concentrations, up to 300–400 mg/ml (Fig. 15).
One of the consequences of the macromolecular crowding effect is that very
little free water is present in cells, and this has an important consequence on the
protein hydration layer (see Sect. 1.4) and on the protein folding process. Indeed,
crowding can reduce the yield of correctly folded protein by increasing protein
aggregation through aberrant non-native interactions of newly formed nascent
polypeptides.
Also, macromolecular crowding affects protein folding dynamics and protein
structure so that crowding-induced conformational changes are certainly an
important source of non-native states and aberrantly folded proteins whose func-
tional deficiency and potential toxicity to cells must be avoided at all costs.
To deal with these hazardous effects of the intracellular environment, cells have
developed several regulatory mechanisms, which include:
• regulation of protein folding at the ribosome, achieved by co-translational
folding and, in some instances, helix formation within the ribosome exit
channel, and tertiary folding at the vestibule [128];
4 Protein Misfolding: Why Proteins Misbehave?

Fig. 15 Intracellular molecular crowding. Molecular model of a bacterial cytoplasm with atomistic-level detail highlighted with proteins, tRNA, GroEL and
ribosomes. Reprinted from [127] with permission (CC-BY licence)
35
36 Protein Folding: An Introduction

• control of the transcription rates by usage of synonymous rare codons that


influence translation rate—by slowing down protein synthesis, this pausing
mechanism allows for correct protein folding to occur [129]; and
• protein quality control systems by employing enzymatic and protein degra-
dation machineries that altogether regulate protein homeostasis (proteostasis),
which refers to the global balance between protein synthesis, folding and
degradation [130].
The cellular protein quality control systems comprise different types of folding
catalysts, including enzymes, molecular chaperones and degradation machines. For
instance, enzymes with prolyl isomerase activities are very important as they
interconvert the cis and trans isomers of peptide bonds involving proline residues.
In prolines, there is a relatively small energetic difference between the cis config-
uration and the more common trans. The result is that the folding kinetics of a
protein with prolines in cis is slowed down and may lead to aberrant conformers.
An example is b2-microglobulin, the causing agent of dialysis-related amyloidosis,
that populates an aggregation-prone intermediate state characterised by a non-native
trans isomerisation of Pro32 [131]. Prolyl isomerases enhance the process of
cis-trans isomerisation which would otherwise be rate limiting for the folding
process. Another example is provided by the protein disulphide isomerase which
catalyses the formation and breakage of disulphide bonds in proteins. The action of
this enzyme prevents the formation of incorrect disulphide bonds, and
disulphide-scrambled conformers are thus avoided.
Molecular chaperones are among the most prominent cell systems for protein
quality control. A molecular chaperone is any protein that interacts with, stabilises
or assists other proteins to acquire their native functional conformation [132]. They
act as catalysts, accelerating the folding process without interfering with the shape
of the final structure, thus maintaining Anfinsen’s dogma untouched. Small pro-
teins (*150 amino acids), which typically fold fast with two state kinetics, reach
their native state fold without the need for biological intervention. However, this is
not the case for larger and multidomain proteins in which single domains up to 300
amino acids fold co-translationally as independent folding units, with catalytic
mediation from molecular chaperones. Both prokaryotes and eukaryotes have
evolved a myriad of molecular chaperone systems (Fig. 16), and a number of
excellent reviews are available for an in-depth insight into the subject [132–134].
Briefly, chaperones can be categorised according to the stage of the protein
synthesis and folding process in which they intervene: factors that bind closely to
the exit channel of the ribosome and bind hydrophobic chain segments (e.g. trigger
factor in bacteria and HSP70 in eukaryotes), factors from the HSP70 group that are
not ribosome-bound and which mediate co- or post-translational folding (e.g. DnaK
in bacteria and HSC70 in eukaryotes), and downstream chaperones (e.g. GroEL in
bacteria and TRiC in eukaryotes) that accommodate folding peptides distributed by
the previous group.
4 Protein Misfolding: Why Proteins Misbehave? 37

Fig. 16 Organisation of chaperone pathways in the cytosol. The number of interacting substrates
with the different chaperones is indicated as a percentage of the total proteome. Reprinted from
[132] with permission

The GroEL-GroES chaperonin is a particularly interesting example of the


evolution of an effective molecular chaperone whose function benefits from high
intracellular macromolecular crowding. Chaperonins are large (+850 kDa) oligo-
meric proteins that act by encapsulating in their interior partially folded polypep-
tides (Fig. 17).

Fig. 17 Protein folding in the GroEL-GroES chaperonin cage. Substrate protein (SP) binding to
the GroEL cavity and subsequent conformational changes triggered by ATP and GroEL promote
folding and subsequent release of the encapsulated protein. The structural model is based on PDB
ID: 1AON. Reprinted from [135] with permission
38 Protein Folding: An Introduction

The exact mechanism through which the GroEL-GroES chaperonin assists


protein folding has been a matter of debate and some controversy, with the different
views coalescing into three general models [135]. The passive cage model pos-
tulates that encapsulation of the (mis)folding protein into the cavity accelerates the
folding process by isolating it from the complex intracellular milieu, as this min-
imises unwanted intermolecular contacts leading to aggregation—for this reason,
this model has been coined Anfinsen’s cage model. The active cage mechanism
suggests that folding catalysis results from the steric confinement of dead-end
folding intermediates that result from hydrophobic collapsed domains which leads
to folding with full yield upon a single encapsulation. Interestingly, this seems to be
the operative mechanism in proteins with complex topologies such as knotted
proteins (See Box 1 in Sect. 5.1). Finally, the iterative annealing model proposes
that full folding of the substrate protein is achieved through multiple rounds of
binding/release of intermediates that prevents stalling of the folding reaction
through active ATP-driven unfolding of kinetically tapped intermediates.
Interestingly, the high intracellular macromolecular crowding increases the
effectiveness of large chaperone proteins such as GroEL as in crowded environ-
ments, the degree of association of proteins is orders of magnitude higher than in
water and higher-order oligomers are favoured, resulting in increased folding
efficiency.

4.2 Protein Misfolding and Aggregation

The tight physiological regulation of protein folding by protein quality control


systems and molecular chaperones is often insufficient to prevent the formation of
misfit conformations that result in destabilised protein folds and/or protein aggre-
gates. This may occur due to genetic mutations or due to a modification in physical
and chemical conditions in the cell that perturb the folding landscape and result in
protein misfolding.
Protein misfolding events can be depicted using pictorial folding landscapes
illustrating one of the three possible scenarios: formation of a destabilised protein,
accumulation of an unstable folding conformer or formation of protein aggregates
(Fig. 18).
Protein Destabilisation The formation of a destabilised protein leads to broad-
ening of the native basin in the folding funnel with many native-like conformers
separated by small free energy barriers. Consequently, a large ensemble of con-
formers with different stabilities and folding properties is formed, impairing protein
function or favouring degradation or faulty interactions. For some proteins, these
structural fluctuations, and possible local unfolding, may even trigger aggregation
phenomena (Fig. 18a).
4 Protein Misfolding: Why Proteins Misbehave? 39

Fig. 18 Protein misfolding landscapes. Energetic funnels depicting cases in which a mutation or
adverse condition (represented by blue glow) results in formation of a a destabilised protein,
b accumulation of an unstable folding conformer and c formation of protein aggregates [35]

Protein Misfolding The population of partially folded conformation (i.e. an


intermediate state) is stabilised by non-native interactions that can act as a kinetic
trap leading to protein misfolding (Fig. 18b).
Protein Aggregation Finally, protein misfolding may lead to protein aggregation,
which is a process whereby monomeric proteins self-associate to form higher-order
oligomers (i.e. dimers, trimers, tetramers, etc.). Through different mechanisms,
protein conformers prone to substantial intermolecular interactions build up, and
this propensity promotes a competing aggregation pathway which gains promi-
nence over the folding pathway. For this reason, protein aggregation is viewed as a
side effect of protein misfolding. The association of protein misfolding and
aggregation stands from the recognition that misfolded species typically expose
hydrophobic patches making them prone to self-associate (reviewed in [35, 136,
137]) (Fig. 18c). Sometimes, the final product of protein aggregation is the
so-called amyloid fibrils (discussed in the following sections), and the mechanism
leading to amyloids is specifically called amyloidogenesis [138]. More often,
protein aggregation leads to amorphous aggregates with a granular appearance
(without amyloid fibrils) to protofibrils (including annular oligomeric aggregates)
and other oligomeric aggregated states.
40 Protein Folding: An Introduction

Destabilized protein Protein Aggregate Amyloid fibril

• Low stability • Amorphous • Highly ordered


conformational state • insoluble protein • Insoluble fibril-shaped
• May have locally altered conglomerate protein aggregates
secondary structure or • Results from intermolecular • formed from β-sheet
poor tertiary interactions. hydrophobic interactions stacking in a cross- β motif
• It usually retains some between misfolded proteins • Highly resistant to
biological function • Readily resolubilized in proteolysis
• More susceptible to buffer in native like • Requires high
proteolysis than native conditions concentrations of denaturant
or detergent to resolubilize

Fig. 19 Physical and structural characteristics of misfolded proteins

In summary, a defect in the protein folding process results in multiple misfolded


protein states with different characteristics (Fig. 19).

4.3 Protein Misfolding Diseases

The so-called protein misfolding diseases (also known as protein folding diseases
or conformational disorders) refers to a vast group of pathologies that are related to
faulty protein folding, or to misfolding and aggregation. Defects in the protein
folding process result in disease due to protein loss of function that occurs because
of destabilisation or degradation, or in toxic gain of function due to amyloid for-
mation and toxic accumulation. In many instances, the intense physiological reg-
ulation of protein folding by molecular chaperones and protein quality control
systems, is insufficient to prevent the formation of misfit conformations that result
in destabilised protein folds and/or protein aggregates [139]. Indeed, misfolded
proteins lead to disease by affecting a variety of pathways and cellular processes
(Fig. 20).
Protein misfolding diseases can be organised into three broad groups: (a) dis-
eases resulting from protein misfolding and destabilisation with no aggregation;
(b) diseases resulting from protein misfolding with aggregation (amyloid); and
(c) diseases resulting from defects in molecular chaperones (chaperonopathies).
Although protein misfolding accounts for most of the protein folding diseases, it is
noteworthy that some pathologies within this group also result from defects on
molecular chaperones (Table 2).
4 Protein Misfolding: Why Proteins Misbehave? 41

Fig. 20 Misfolded proteins: multiple pathways to disease

Among the amyloid-forming folding diseases, well-known examples include


several neurodegenerative disorders (e.g. Alzheimer’s disease, Parkinson’s disease
and amyotrophic lateral sclerosis), type II diabetes, and the transmissible spongi-
form encephalopathies also known as prion diseases (e.g. Creutzfeldt-Jakob disease
(CJD)). A less known example, affecting people undergoing haemodialysis, is
dialysis-related amyloidosis, where the deposition of amyloid plaques of protein b2-
microblogulin in the osteoarticular system eventually leads to bone destruction and
death [140]. But there are more examples and the number keeps growing. Indeed,
nowadays, more than 40 diseases have been associated with amyloidosis [141].

4.4 The Amyloid State

The observation that proteins aggregate dates back to the nineteenth century, i.e. to
a time well before the beginning of research in protein folding. The term amyloid
comes from the Latin word amylum, which means starch. This naming reflects the
fact that when it was originally discovered in 1854, amyloid was considered to be a
polysaccharide [142]. But amyloid is not a sugar, and a few years later, it became
clear that amyloid is a material made of proteins. The phenomenon has not attracted
considerable attention until recently, when it became clear that there is an associ-
ation between the formation and the deposition of amyloid fibrils in vivo and
disease states, as discussed below [139, 143].
42 Protein Folding: An Introduction

Table 2 Classification of protein folding diseases


Disease Protein
Diseases resulting from protein misfolding and destabilisation with no aggregation
Phenylketonuria (PKU) Phenylalanine hydroxylase (PAH)
Gaucher’s disease Glucosylceramidase
Cystic fibrosis Cystic fibrosis transmembrane
conductance regulator (CFTR)
Glycogen storage disease type II (or Pompe’s Acid a-glucosidase
disease)
Fabry disease a-Galactosidase (GLA)
MCAD deficiency Medium-chain acyl-CoA dehydrogenase
(MCAD)
Homocystinuria Cystathionine b-synthase (CBS)
Multiple acyl-co dehydrogenase deficiency (or Electron transfer flavoprotein (ETF)
glutaric aciduria type II) ETF oxidoreductase (ETF: QO)
Diseases resulting from protein misfolding with aggregation (amyloid)
Alzheimer’s disease (AD) Amyloid b-peptide
Spongiform encephalopathies (TSE) Prion protein
Parkinson’s disease (PD) a-synuclein
Amyotrophic lateral sclerosis (ALS) Superoxide dismutase 1
Huntington’s disease (HD) Huntington (polyQ)
Lysozyme amyloidosis Lysozyme
Type II diabetes Amylin
Familial amyloidotic polyneuropathy (FAP) Transthyretin
Light-chain amyloidosis Immunoglobulin light chains
Haemodialysis-related amyloidosis b2-microglobulin
Diseases resulting from defects in molecular chaperones (chaperonopathies)
Hereditary spastic paraplegia Hsp60
Spastic ataxia Charlevoix-Saguenay Sacsin (DnaJC29)
Cataracts a-Crystalline B
Myofibrillar myopathy
Adapted from [35]

The current definition of amyloid—and we stress current because the concept is


evolving—is based on its structural and biophysical properties. The following
characteristics constitute the main criteria traditionally used to classify a substance
as amyloid [144]:
• the yellow-green birefringence properties under Congo red binding revealed by
polarisation light microscopic studies,
• the intense fluorescence on binding thioflavin-T, and
• the fibrillar submicroscopic structure (i.e. bundles of straight, rigid fibrils
ranging in width from 60 to 130 Å and in length from 1000 to 16,000 Å)
revealed by electron microscopy studies.
4 Protein Misfolding: Why Proteins Misbehave? 43

Fig. 21 Three-dimensional structure of amyloid. Structural model of the threefold symmetric


repeat unit as viewed from the fibril growth axis derived from solid-state NMR (a) and fibril view
(PDB ID: 2MXU) showing cross-b unit of amyloid fibril with three b-strand regions connected by
a short coil (b) [146]

The most important hallmark of amyloid, which also constitutes its structural
fingerprint, is the cross b-sheet motif in which stacked b-strands run perpendicu-
larly to the fibril axis (Fig. 21). Astubry originally proposed the ‘cross-b spine’
motif in 1935 based on X-ray diffraction studies [145].
A low solubility associated with its one-dimensional nature makes amyloid
particularly challenging for X-ray crystallography, and its large size challenges the
use of NMR. As a result, atomic force microscopy, solid-state NMR, and
transmission electron microscopy became important tools in structural studies of
amyloid. Thus, it is perhaps not surprising that 70 years elapsed from Astbury’s
seminal findings until Eisenberg and co-workers reported the first atomically
resolved picture of amyloid based on X-ray diffraction [147]. The success of
Eisenberg and colleagues relied on their ‘reductionist’ approach. Indeed, instead of
focusing into amyloids of a full-length protein chain, they looked into amyloids
from a soluble fibril-forming peptide of yeast protein Sup35. The generic cross-beta
spine motif turned out to be a double beta-sheet, with each sheet formed from
parallel segments (the beta-strands) stacked in register. Side chains protruding from
the two sheets form a tight, dry, self-complementing interface termed ‘steric zip-
per’. Several classes of steric zippers have been identified which differ in the
organization of the beta-strands within and between the beta-sheets (parallel and
antiparallel), and in the stacking of the sheets (face-to-face, face-to-back,
back-to-back, etc.) [148]. A network of hydrogen bonds, together with hydrophobic
interactions and p-p stacking interactions, guarantees the stability of amyloid fibrils.
It has been suggested that the geometrical restrictions imposed by pi-stacking may
actually accelerate amyloid fibril formation, thus playing a particularly important
role in amyloid assembly [149].
44 Protein Folding: An Introduction

Recent advances in the structural biology of amyloids were made possible using
cryo-electron microscopy approaches which circumvent the need for ordered
crystals for X-ray crystallography. In the last two years, this has allowed the
high-resolution structural determination of amyloids from amyloid b (1-42) [150],
tau [151], a-synuclein [152] and b2-microblogulin [153]. These studies revealed the
existence of structural (or thermodynamic) polymorphisms in amyloids (re-
viewed in [154]). The latter correspond to different b-strand arrangements, which
are all compatible with the cross-b spine motif. They result from the packing of
different steric zippers (segmental and packing polymorphisms), different side
chain packing or different supramolecular assemblies of protofilaments (Fase
Arches) (assembly polymorphism). The ability of a protein sequence to fibrillate
into multiple similar energy minima (i.e. different polymorphs) depends on the
employed experimental conditions (temperature, pH, etc.), with the solvent (water)
playing a particularly important role in the structural diversity in fibril assembly
[155]. Interestingly, there is growing evidence that both biological and synthetic
surfaces can not only enhance amyloid assembly, but also lead to different amyloid
polymorphs [156]. This has important implications for cell biology given the large
surface area that is provided by macromolecules and phospholipid bilayers in the
intracellular environment.
Recently, the concept of amyloid polymorphisms was extended beyond struc-
ture, to include the so-called ‘stability’ polymorphism. The latter reflects the fact
that co-polimerisation of different protein variants is able to expand the repertoire of
structural and thermodynamic polymorphs by creating fibrils with different struc-
tural and thermodynamic signatures [157].
As we shall see in the next section, amyloids are frequently associated with
disease states. However, amyloid is not necessarily related to disease. The so-called
functional amyloids relate to a large number of cases in which amyloids play
functional roles in many life forms (from bacteria to fungi and mammals) [158–
160]. For instance, it is known that several microorganisms (e.g. E. coli) use
amyloid structural motifs in extracellular biomaterials with important physiological
role (e.g. curli fibres that facilitate the formation of biofilms [161]). Amyloid
protects the oocyte and embryo of insects (e.g. the silk moth) and fishes [162]. In
secretory granules, peptide and protein hormones are stored in an amyloid-like
aggregation state [163]. Amyloid templates and accelerates the covalent poly-
merisation of reactive small molecules into melanin [164]. These are just a few
examples that illustrate the biological function of amyloids; what was initially
exclusively viewed as a pathological material is now considered to play an
important role in maintaining and protecting the living cells.
The concept of functional amyloid led to the proposal that the term
‘amyloid-like’ should be used for proteins that possess the hallmarks of amyloid
but are not associated with pathological plaques. But is there any real difference
between functional and disease-related amyloid? While a definite answer to this
question is still lacking, results obtained so far indicate that parallel in-register
b-sheet structures (composed of individual polypeptides stacking in-register every
4.7 Å along the fibril axis) are common to many full-length proteins in pathological
4 Protein Misfolding: Why Proteins Misbehave? 45

amyloids, while b-helices (composed of a single polypeptide wrapping around an


axis forming intramolecular parallel b-sheets) are likely the structural basis of
functional amyloids [144].
Amyloid is also interesting because of its unusual physical properties. It displays
a remarkable proteolytic and thermal resistance (with a melting temperature
larger than 130 °C). Amyloids might be used as nanomaterials as they are com-
parable to steel in terms of physical strength and comparable to silk in what regards
mechanical stiffness [165]. These remarkable features allied with its functional
repertoire have led researchers to speculate about the possibility of amyloid being
the archetypal protein state [166]. But, what is more important, they make it
particularly attractive in material science, where its natural propensity for nanoscale
organisation is used in de novo material design [167]. A well-known example is the
fabrication of conductive nanowires, where the self-assembling peptide fibrils act
as templates for the deposition of metals on the outside of structures to yield
electrically conducting wires [168].

4.5 Mechanism and Kinetics of Protein Aggregation

Given the importance of amyloid in health, disease and biotechnology, solving the
mechanism that leads to amyloid, and more generally, understanding the mecha-
nism(s) of protein aggregation, is of the utmost importance. But the task of dis-
secting the aggregation routes leading to amyloids is proving even more
challenging than solving the folding mechanism leading to a biologically functional
native state. This is so because protein aggregation is remarkably complex for
several reasons:
• conformational states formed along the aggregation pathway are significantly
structural heterogeneous;
• many of them are only transiently populated, which makes structural charac-
terisation a challenge with routinely used biophysics methods;
• their formation depends critically on the environmental conditions (temperature,
pH, salt, other proteins, membranes, ions, etc.);
• due to the multiple length scales and timescales involved in the process, it is
necessary to employ a wide range of techniques that span these wide-ranging
length and timescales.
Despite these difficulties, significant advances have been made to tackle in vitro
[169] and in vivo [170, 171] the complexity of protein aggregation and
amyloidosis.
The need to understand how proteins aggregate into amyloids led to a renewed
interest in the concept of intermediate states, which was somehow set aside in the
1990s when small proteins with two-state kinetics became widely popular models
to study protein folding. Nowadays, it is widely accepted that protein aggregation is
46 Protein Folding: An Introduction

Fig. 22 Complexity of aggregation pathways. The mechanism of protein aggregation is


remarkably complex involving monomeric, oligomeric, protofibrils, fibrils and other
higher-order structures [172]

often triggered by the formation of an aggregation-prone intermediate state


(Fig. 22), which can be considerably nativelike.
The latter is termed early amyloid precursor (EAP) and is prone
to self-association because it exposes hydrophobic patches [173]. Classically, the
EAP is considered to originate from an unfolding transition of the native state
involving the crossing of a major free energy barrier [174]. This view is supported
by the observation that partially folded conformations formed under
non-physiological (or stress) conditions (high temperature, acidic pH, high ionic
concentration, metals, etc.), are more prone to aggregation than its natively folded
precursors, and by the fact that single-point mutations typically lead to a desta-
bilisation of the native state promoting an unfolding transition [175]. Revealing the
identity and generic structural features of this amyloid competent conformation is of
paramount importance as it may be a target for effective therapeutic strategies.
However, there is evidence that the EAP may be a locally unfolded nativelike state
that becomes accessible through thermal fluctuations occurring under physiological
conditions [176]. The existence of such an intermediate species was originally
hypothesised by Thirumalai et al. [173], and its role in the aggregation mechanism
of several model systems has been highlighted [66, 177, 178, 179, 180]. The
recognition that the EAP can also be a nativelike intermediate state, en route to
folding opened new vistas on the relation between the folding and the aggregation
mechanisms, which can actually be viewed as directly competing processes [181].
A mechanistic understanding of protein aggregation implies the identification of
all microscopic steps leading to mature fibrils. Ideally, it should be possible to
determine the size distribution and structures of the oligomeric assemblies,
4 Protein Misfolding: Why Proteins Misbehave? 47

filaments, fibrils and protofibrils that form along the amyloid cascade (Fig. 22).
Moreover, important insights may be gained by defining the rate constants gov-
erning each microscopic step, and determining the manner according to which they
depend on protein sequence and environmental conditions, i.e. the kinetics of
protein aggregation in amyloid formation [182].
A typical aggregation curve (showing the percentage of aggregate vs. time)
exhibits a sigmoidal shape, which is similar to that observed in a two-state folding
transition (Fig. 23a), although not necessarily as steep. Based on the anatomy of
this curve, the mechanism of aggregation is generally divided into three stages: the
lag phase, the growth phase and the final phase (or plateau). Monomers are
dominant during the lag phase, fibrils dominate the final plateau (where the con-
centration of monomers has reached its equilibrium value), while in the growth (or
elongation) phase, their concentrations are similar; the rate formation of amyloid is
largest in the last stage.
The lag phase corresponds to the formation of a critical nucleus, i.e. the smallest
aggregates in the process that are stable enough so that further growth by monomer
addition is faster than dissociation into monomers. In thermodynamic terms, nuclei
are the species of highest free energy along the amyloid pathway (as the TS is the
state of highest free energy in a two-state folding transition) (Fig. 23b). In a system
where aggregation starts from a solution of pure monomeric proteins (or peptides),
the formation of the critical nucleus (which can consist in millions of dimers of the
EAP) is the only molecular event contributing to amyloid formation. However,
primary nucleation typically represents 10−7% of the lag time, which means that
additional fibril-dependent processes (elongation, fragmentation and secondary
nucleation) will also occur within the lag time, and the critical nucleus should
involve the formation of large populations of several species. Indeed, in general,
none of the three phases can be ascribed to a single microscopic process and several
species will co-exist in each phase [182, 183]. The microscopic processes

Fig. 23 Typical amyloid aggregation curve. The three phases of amyloidogenesis (a) and the free
energy of aggregation (projected along a suitable reaction coordinate), showing that the
aggregation kinetics is limited by the process of nucleation, which corresponds to the formation of
a critical number (of the order of millions) of aggregation nuclei
48 Protein Folding: An Introduction

PRIMARY
NUCLEATION
monomer

SECONDARY NUCLEATION

ELONGATION

fibril

oligomer Positive fibril


feedback

oligomer

Fig. 24 Microscopic processes in amyloid aggregation. Scheme of the aggregation process


highlighting the primary nucleation, elongation and secondary nucleation events and the
corresponding microscopic reaction rates—kn, primary nucleation rate; k+, elongation rate; k2,
secondary nucleation rate

driving the overall formation of fibrils (Fig. 24) can be, however, determined using
kinetics in combination with systematic experimental data sets analysed in a global
manner [184].
An outstanding question in amyloid disease, which is directly related to the
mechanism of amyloid formation, concerns the mechanism(s) of cytotoxicity. The
classical amyloid hypothesis states that the toxic species is the amyloid fibril itself
[185]. However, this idea is gradually evolving into the concept that the oligomers
produced along the amyloid cascade are the primary toxic species, while fibrils may
be inert or even protective [186]. This assumption rests on growing evidence that
pre-fibrillar oligomers have the potential to disrupt the permeability of cellular
membranes (through the formation of ion channels, pores or non-selective per-
meation of lipid bilayers) eventually causing cell death [187]. Other known toxicity
processes involve seeding reactions over other amyloidogenic proteins,
metal-mediated toxicity through generation of reactive oxygen species (ROS), and
saturation of the proteostasis network by sequestering of molecular chaperones by
sticky amyloids and its precursors [188] (Fig. 25).
The identification and structural characterisation of the several species produced
along the aggregation pathway leading to amyloid is therefore of utmost impor-
tance. However, difficulty in obtaining highly pure samples of non-fibrillar
aggregates that are sufficiently long-lived for biophysical studies has significantly
hindered progress in the field. It is likely the case that molecular simulations will
play an important role in this new challenge [189].
4 Protein Misfolding: Why Proteins Misbehave? 49

Native Amyloid precursors Amyloid fibril

Chaperone
protein

Membrane non-amyloid
disruption conformer

O2
Sequestering of
ROS
Molecular Chaperones

Amyloid Seeding
Metal mediated
ROS toxicity

Fig. 25 Toxicity of amyloid and its precursors

4.6 Aggregation Propensity

Another important question related to the aggregation mechanism concerns the


identification of which protein sequences will adopt the thermodynamically stable
amyloid state. While in many cases there is an underlying genetic defect that
facilitates or triggers aggregation, the great majority of amyloid diseases are pre-
dominantly sporadic. Moreover, since these pathologies are typically associated
with old age, there is a common view that they occur due to a decreased efficiency
of the cell’s ‘housekeeping’ mechanisms (molecular chaperones and unfolded
protein response), which naturally leads to an increased tendency of proteins to
become misfolded or damaged [190].
However, in the last decade, research in the field of protein folding established
some intrinsic physicochemical sequence properties (e.g. low content in
hydrophobic amino acid residues, high net charge at neutral pH, intrinsically dis-
ordered regions, gatekeeping residues, high content in glutamine/asparagine resi-
dues) as important determinants of protein aggregation [191–195]. In line with
these findings, several algorithms have been developed to predict aggregation
propensity based on general sequence characteristics [196–199]. Based on experi-
ments with homopolymers, which could arrange themselves into amyloid, it has
also been suggested that any protein sequence will form amyloid if the appropriate
conditions (pH, ionic strength, temperature, metals, etc.) are met [200]. Of course,
these conditions can be so extreme that they are no longer relevant from a physi-
ological standpoint. Nevertheless, the very possibility that the most
50 Protein Folding: An Introduction

thermodynamically stable state of any protein is the amyloid state has important
conceptual consequences. In particular, it calls for a re-thinking of the folding space
(which accounts for the native and unfolded states and folding intermediates) to
include amyloid and all relevant structures that pave the way to amyloid formation
[181].

5 Methods for Protein Folding

5.1 Biophysical Spectroscopies

The ability to determine the conformational status of a protein to infer about its
folding state and evaluate fractional unfolding or denaturation is critical in protein
folding studies. To this purpose, recent years have witnessed what has been coined
an expanding arsenal of methodologies applied to protein folding that range from
fluorescence, circular dichroism (CD), nuclear magnetic resonance (NMR) and
Fourier transform infrared spectroscopy (FTIR). These spectroscopic approaches
allow to monitor structural and conformational changes in proteins, from multiple
complementary angles, as a function of time (in folding/unfolding kinetic assays),
denaturant (chemical denaturation or thermal denaturation) or chemical environ-
ment (solution pH, metal ion, ligand/inhibitor). Several excellent reviews and
methods papers covering advanced applications, advantages and limitation of dif-
ferent protein spectroscopies are available, to which the interested reader may refer
to [201–205]. Here we briefly highlight the value in protein folding studies of
protein fluorescence and circular dichroism, two biophysical spectroscopies com-
monly available in biochemistry laboratories.
Protein Fluorescence Aromatic residues, most notably Trp due to its high quan-
tum yield, afford intrinsic fluorescence to proteins which is very useful to evaluate
conformational changes. Trp-fluorescence emission, being extremely sensitive to
the polarity of the side chain environment, is an excellent reporter of the folding
state. Upon excitation at 280 nm, Trp will emit maximally from 345 to 355 nm,
depending on how water exposes the indole side chain is (Fig. 26a). A buried Trp
will emit closer to 345 nm, whereas the emission maximum of a completely
solvent-exposed Trp is red-shifted towards 355 nm. Since Trp residues are fre-
quently buried in the protein interior, the intensity-averaged maximum of the
emission band is thus proportional to the folded state of the protein, from which the
fraction F/U may be determined. For proteins with n Trp, the resulting emission
reflects the environments of the n side chains.
Through fluorescence quenching experiments using iodine or acrylamide, one
can determine the relative fraction of emitting and quencher-accessible Trp side
chains, using the Stern-Volmer equation. A variety of extrinsic fluorophores are
also routinely employed to protein folding and conformation (Fig. 27). Among the
most useful is 8-anilinonaphthalene-1-sulphonic acid (ANS) which emits upon
5 Methods for Protein Folding 51

(a) (b)

80°C
25°C

7M 25°C
GdmCl

80°C

Fig. 26 Spectroscopic analysis of protein folding states. Spectroscopic characterisation of human


frataxin in different conformational states using Trp-fluorescence spectroscopy (left) and far-UV
CD spectroscopy (left). Adapted from [206]

8-Anilinonaphthalene-1-sulfonic acid (ANS) Thioflavin-T (ThT)

SYPRO Orange Nile Red

Fig. 27 Examples of common extrinsic fluorophores used in protein folding and stability analysis

interacting with solvent-exposed hydrophobic patches which are exposed in folding


intermediates such as molten globules or misfolded proteins. Thioflavin-T (ThT) is
another fluorophore which only fluoresces when it intercalates within the cross-b
structure in protein amyloids and thus a very useful tool to monitor amyloid for-
mation [207]. SYPRO Orange and Nile red are also frequently employed in dif-
ferential scanning fluorimetry applications to assay thermal shift and protein
stability [203].
Circular Dichroism Circular dichroism (CD) is the differential absorption of the
left- and right-circularly polarised components of plane-polarised electromagnetic
radiation and is only observed in molecules with chiral chromophores
(light-absorbing groups). This makes the technique particularly sensitive to protein
secondary structure as the most important chromophore in peptides is the amide
52 Protein Folding: An Introduction

bond whose absorption bands in the far-UV (190–250 nm) result from n ! p*
(222 nm) and p ! p* (190 nm) electronic transitions. A different arrangement of
the groups (another conformation) changes the overlap of the molecular orbitals and
their energy levels. What makes CD spectroscopy so sensitive to protein secondary
structure is the fact that the arrangement of the groups in distinct conformations (as
in different types of secondary structure) changes the overlap of the molecular
orbitals and their energy levels and some conformations permit a more constructive
interaction than others (affecting the intensity and allowing a discrimination
between different types of secondary structure). Far-UV CD thus allows not only to
identify and quantitate the relative contributions of different types of secondary
structural elements in a protein but also to discriminate between folded and
unfolded conformers (Fig. 27).
Proteins that are mostly a-helical have intense spectra with a positive band at
190 nm and a significant double well with minima at 208 and 222 nm. Regular
b-sheet proteins with long and aligned strands (referred to as b-I) exhibit less
intense bands and the maximum at 190 nm is slightly red-shifted. Random (r) and
b-II type proteins (which are b-rich proteins that contain only short strands that are
not rigidly aligned) exhibit a similar CD spectrum, with a negative band at 200 nm
and almost no positive bands (Fig. 28).
Albeit with substantially lower intensities, tryptophan (300–280 nm), tyrosine
(290–270 nm) and phenylalanine residues (270–250 nm) as well as disulphide
bonds (250 nm) also exhibit characteristic CD signatures in the near-UV region,
and thus, CD can also be used to monitor the protein tertiary structure. However,
when compared with Trp-fluorescence, near-UV CD is much less sensitive to ter-
tiary interactions. Therefore, while far-UV CD is the main method to monitor
secondary structures, Trp-fluorescence is much more suitable to evaluate tertiary
structure changes.

Fig. 28 Circular dichroism


signatures of secondary
structures in proteins
5 Methods for Protein Folding 53

5.2 Computational Methods

Since the 1990s, the use of Monte Carlo simulations of lattice models has been
helping to establish the fundamental principles driving the remarkable biological
process of protein folding. A simple lattice representation reduces the protein to its
backbone structure: amino acids are represented by beads that occupy the vertices
of a (two- or three-dimensional) regular lattice, and the peptide bond is reduced to
sticks of uniform size (corresponding to the lattice spacing) (Fig. 29a).
Interactions between the amino acids can be modelled by the HP potential [74]
that captures the hydrophobic effect by considering hydrophobic and hydrophilic
amino acids only, by the sequence-based potential, which takes into account the
heterogeneity of interactions resulting from the 20-amino acid alphabet by using the
Miyazawa–Jernigan interaction matrix [208], or by the native-centric (or
structure-based) Go potential, in which the interaction matrix is exclusively dictated
by the native structure of the model protein, i.e. only native interactions contribute
to protein energetics [209]. Lattice models are crude representations of real proteins
that feature the fundamental ingredients of their polymeric nature. They are ade-
quate to explore fundamental aspects of the folding process that do not depend on
specific details of proteins, and computational efficiency allows evaluating folding
thermodynamics and kinetics (including rates) with high accuracy.
To address the folding process of specific proteins, researchers developed
another class of coarse-grained models [210], which use an off-lattice representa-
tion of the protein, which can be either restricted to Ca atoms (Fig. 29b) or be fully
atomistic (Fig. 29c) [211, 212]. The folding space of off-lattice models is often

Fig. 29 Protein models used in simulations of protein folding in order of increasing complexity.
The simple (cubic) lattice model (a) is a generic protein representation displaying the fundamental
features of the protein backbone (chain connectivity, excluded-volume, etc.). Each bead represents
one of the 20 existing amino acids that are connected by sticks representing the peptide bond. The
C-a model (b) is the simplest off-lattice representation. As the lattice model, it is also a
coarse-grained description of the protein that reduces each amino acid to a sphere catered in the
position of each C-a carbon. However, it is a more realistic representation of the protein that not
only considers the polymeric nature of the protein backbone but also features the specific
three-dimensional native structure of the protein. Finally, in the full atomistic off-lattice
representation, (c) all the heavy atoms of the protein are explicitly represented. Figures drawn with
PyMOL (pymol.org)
54 Protein Folding: An Introduction

explored with Monte Carlo (MC) simulation methods or Molecular


Dynamics (MD) schemes (discrete MD, Langevin, etc.). Off-lattice models are
devoid of the severe restrictions imposed by the lattice, a disadvantage impairing
correct capturing of the conformational entropy. In general, these off-lattice rep-
resentations are combined with Go (or Go-like) interaction potentials. However,
other structure-based models, based on more sophisticated intramolecular poten-
tials, have been developed that incorporate important aspects of protein energetics
(e.g. hydrogen bonding [213] and electrostatic interactions [116]), broadening the
spectrum of the questions that can be tackled in the framework of simulations.
A rather interesting study by Holzgräfe and Wallin, combining a Ca representation
with an interaction potential based on a three-letter amino acid alphabet, was
recently developed to study the intriguing phenomena of protein fold switching
[214].
At the top of hierarchical complexity, one finds full atomistic representations
combined with realistic force fields (e.g. AMBER and the GROMOS are popular
choices among researchers), which are explored with classical MD simulations
[215]. Apart from providing realistic energetics, this approach allows one to sim-
ulate folding in explicit water. Its major advantage is the possibility to directly
compare simulation data with data from in vitro experiments [216]. A fundamental
problem of classical MD is the accuracy of the force fields [217].
Another (less important) drawback is the need to consider very small time steps
to integrate the equations of motion. This constraint imposes severe limitations on
the total amount of simulation time and renders their systematic application to
protein folding (and other dynamical processes involving large-scale conforma-
tional changes) non-trivial. For this reason, smart sampling methods [218] and
sophisticated distributed computing schemes [219, 220] have been developed to
conduct classical MD simulations of protein folding, and novel and machine
architectures have been created to execute MD simulations in orders of magnitude
faster than was previously possible [220]. A paradigmatic example of the latter is
the ANTON machine developed by Shaw and co-workers [221, 222].

References

1. Tanford C, Reynolds J (2001) Nature’s robots—a history of proteins. Oxford


2. Anson ML, Mirsky AE (1930) The reversibility of protein coagulation. J Phys Chem
35:185–193
3. Astbury WT, Woods HJ (1930) The X-ray interpretation of the structure and elastic
properties of hair keratin. Nature 126:913
4. Cohen C (1998) Why fibrous proteins are romantic. J Struct Biol 122:3–16
5. Eisenberg D (2003) The discovery of the a-helix and b-sheet, the principal structural
features of proteins. Proc Natl Acad Sci 100:11207–11210
6. Strandberg B (2009) Building the ground for the first two protein structures: myoglobin and
haemoglobin (Chap. 1). J Mol Biol 392:2–10
7. Dill KA, MacCallum JL (2012) The protein-folding problem, 50 years on. Science
338:1042–1046
References 55

8. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN,
Bourne PE (2000) The protein data bank. Nucleic Acids Res 28:235–242
9. Hou J, Sims GE, Zhang C, Kim S-H (2003) A global representation of the protein fold space.
Proc Natl Acad Sci 100:2386–2390
10. Schaeffer RD, Daggett V (2011) Protein folds and protein folding. Protein Eng Des
Sel PEDS 24:11–19
11. Richardson JS (1977) [beta]-Sheet topology and the relatedness of proteins. Nature 268:495–
500
12. Mansfield ML (1994) Are there knots in proteins? Nat Struct Mol Biol 1:213–214
13. Taylor WR (2000) A deeply knotted protein structure and how it might fold. Nature
406:916–919
14. Koniaris K, Muthukumar M (1991) Knottedness in ring polymers. Phys Rev Lett 66:2211–
2214
15. Bölinger D, Sułkowska JI, Hsu H-P, Mirny LA, Kardar M, Onuchic JN, Virnau P (2010) A
Stevedore’s protein knot. PLoS Comput Biol 6:e1000731
16. King NP, Yeates EO, Yeates TO (2007) Identification of rare slipknots in proteins and their
implications for stability and folding. J Mol Biol 373:153–166
17. Jamroz M, Niemyska W, Rawdon EJ, Stasiak A, Millett KC, Sułkowski P, Sulkowska JI
(2015) KnotProt: a database of proteins with knots and slipknots. Nucleic Acids Res 43:
D306–D314
18. Lua RC, Grosberg AY (2006) Statistics of knots, geometry of conformations, and evolution
of proteins. PLoS Comput Biol 2:e45
19. Virnau P, Mirny LA, Kardar M (2006) Intricate knots in proteins: function and evolution.
PLoS Comput Biol 2:e122
20. Sułkowska JI, Rawdon EJ, Millett KC, Onuchic JN, Stasiak A (2012) Conservation of
complex knotting and slipknotting patterns in proteins. Proc Natl Acad Sci 109:E1715–
E1723
21. Soler MA, Nunes A, Faísca PFN (2014) Effects of knot type in the folding of topologically
complex lattice proteins. J Chem Phys 141:025101
22. Nureki O, Shirouzu M, Hashimoto K, Ishitani R, Terada T, Tamakoshi M, Oshima T,
Chijimatsu M, Takio K, Vassylyev DG, Shibata T, Inoue Y, Kuramitsu S, Yokoyama S
(2002) An enzyme with a deep trefoil knot for the active-site architecture. Acta Crystallogr
Sect D 58:1129–1137
23. Jacobs SA, Harp JM, Devarakonda S, Kim Y, Rastinejad F, Khorasanizadeh S (2002) The
active site of the SET domain is constructed on a knot. Nat Struct Mol Biol 9:833–838
24. Sułkowska JI, Sułkowski P, Szymczak P, Cieplak M (2008) Stabilizing effect of knots on
proteins. Proc Natl Acad Sci 105:19714–19719
25. Alam MT, Yamada T, Carlsson U, Ikai A (2002) The importance of being knotted: effects of
the C-terminal knot structure on enzymatic and mechanical properties of bovine carbonic
anhydrase II 1. FEBS Lett 519:35–40
26. Soler MA, Faísca PFN (2013) Effects of knots on protein folding properties. PLoS ONE 8:
e74755
27. Uversky VN (2014) Intrinsically disordered proteins. Springer, New York
28. Theillet FX, Binolfi A, Frembgen-Kesner T, Hingorani K, Sarkar M, Kyne C, Li C,
Crowley PB, Gierasch L, Pielak GJ, Elcock AH, Gershenson A, Selenko P (2014)
Physicochemical properties of cells and their effects on intrinsically disordered proteins
(IDPs). Chem Rev 114:6661–6714
29. Riback JA, Bowman MA, Zmyslowski AM, Knoverek CR, Jumper JM, Hinshaw JR,
Kaye EB, Freed KF, Clark PL, Sosnick TR (2017) Innovative scattering analysis shows that
hydrophobic disordered proteins are expanded in water. Science 358:238–241
30. van der Lee R, Buljan M, Lang B, Weatheritt RJ, Daughdrill GW, Dunker AK, Fuxreiter M,
Gough J, Gsponer J, Jones DT, Kim PM, Kriwacki RW, Oldfield CJ, Pappu RV, Tompa P,
Uversky VN, Wright PE, Babu MM (2014) Classification of intrinsically disordered regions
and proteins. Chem Rev 114:6589–6631
56 Protein Folding: An Introduction

31. Kumar S, Nussinov R (2002) Close-range electrostatic interactions in proteins.


Chembiochem Eur J Chem Biol 3:604–617
32. Kessel A, Ben-Tal N (2011) Introduction to proteins: structure, function, and motion. CRC;
London: Taylor & Francis [distributor], Boca Raton
33. Williamson MP (2012) How proteins work. Garland science, London: Taylor & Francis
[distributor], New York
34. Gomes CM, Wittung-Stafshede P (2011) Protein folding and metal ions: mechanisms,
biology and disease. CRC Press, Boca Raton
35. Gomes CM (2012) Protein misfolding in disease and small molecule therapies. Curr Top
Med Chem 12:2460–2469
36. Leandro P, Gomes CM (2008) Protein misfolding in conformational disorders: rescue of
folding defects and chemical chaperoning. Mini Rev Med Chem 8:901–911
37. Anfinsen CB, Haber E, Sela M, White FH Jr (1961) The kinetics of formation of native
ribonuclease during oxidation of the reduced polypeptide chain. Proc Natl Acad Sci USA
47:1309–1314
38. Anfinsen CB, Sela M, Cooke JP (1962) The reversible reduction of disulphide bonds in
polyalanyl ribonuclease. J Biol Chem 237:1825–1831
39. Sela M, Anfinsen CB (1957) Some spectrophotometric and polarimetric experiments with
ribonuclease. Biochem Biophys Acta 24:229–235
40. Sela M, Anfinsen CB, Harrington WF (1957) The correlation of ribonuclease activity with
specific aspects of tertiary structure. Biochem Biophys Acta 26:502–512
41. Sela M, White FH Jr, Anfinsen CB (1957) Reductive cleavage of disulphide bridges in
ribonuclease. Science 125:691–692
42. Anfinsen CB (1973) Principles that govern the folding of protein chains. Science 181:223–
230
43. Pace CN, Shaw KL (2000) Linear extrapolation method of analyzing solvent denaturation
curves. Proteins Suppl 4:1–7
44. Shaw KL, Scholtz JM, Pace CN, Grimsley GR (2009) Determining the conformational
stability of a protein using urea denaturation curves. Methods Mol Biol 490:41–55
45. Johnson CM (2013) Differential scanning calorimetry as a tool for protein folding and
stability. Arch Biochem Biophys 531:100–109
46. Taverna DM, Goldstein RA (2002) Why are proteins marginally stable? Proteins 46:105–
109
47. Stetter KO (1996) Hyperthermophilic procaryotes. FEMS Microbiol Rev 18:149–158
48. Madigan MT, Orent A (1999) Thermophilic and halophilic extremophiles. Curr Opin
Microbiol 2:265–269
49. Empadinhas N, da Costa MS (2006) Diversity and biosynthesis of compatible solutes in
hyper/thermophiles. Int Microbiol Off J Span Soc Microbiol 9:199–206
50. Mehta R, Singhal P, Singh H, Damle D, Sharma AK (2016) Insight into thermophiles and
their wide-spectrum applications. 3 Biotech 6:81–81
51. Marx V (2016) PCR: the price of infidelity. Nat Meth 13:475–479
52. Bult CJ, White O, Olsen GJ, Zhou L, Fleischmann RD, Sutton GG, Blake JA,
FitzGerald LM, Clayton RA, Gocayne JD, Kerlavage AR, Dougherty BA, Tomb J-F,
Adams MD, Reich CI, Overbeek R, Kirkness EF, Weinstock KG, Merrick JM, Glodek A,
Scott JL, Geoghagen NSM, Weidman JF, Fuhrmann JL, Nguyen D, Utterback TR,
Kelley JM, Peterson JD, Sadow PW, Hanna MC, Cotton MD, Roberts KM, Hurst MA,
Kaine BP, Borodovsky M, Klenk H-P, Fraser CM, Smith HO, Woese CR, Venter JC (1996)
Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii.
Science 273:1058
53. Land M, Hauser L, Jun S-R, Nookaew I, Leuze MR, Ahn T-H, Karpinets T, Lund O,
Kora G, Wassenaar T, Poudel S, Ussery DW (2015) Insights from 20 years of bacterial
genome sequencing. Funct Integr Genomics 15:141–161
54. Radestock S, Gohlke H (2011) Protein rigidity and thermophilic adaptation. Proteins
79:1089–1108
References 57

55. Scandurra R, Consalvi V, Chiaraluce R, Politi L, Engel PC (2000) Protein stability in


extremophilic archaea. Front Biosci J Virtual Libr 5:D787–D795
56. Siddiqui KS (2017) Defying the activity-stability trade-off in enzymes: taking advantage of
entropy to enhance activity and thermostability. Crit Rev Biotechnol 37:309–322
57. Anson ML (1945) Protein denaturation and the properties of protein groups. Adv Protein
Chem 2:361–386
58. Chan HS, Shimizu S, Kaya H (2004) Cooperativity principles in protein folding. Methods
Enzymol 380:350–379
59. Ptitsyn OB (1995) Molten globule and protein folding. Adv Protein Chem 47:83–229
60. Ptitsyn OB (1995) How the molten globule became. Trends Biochem Sci 20:376–379
61. Lumry R, Biltonen R, Brandts JF (1966) Validity of the “two-state” hypothesis for
conformational transitions of proteins. Biopolymers 4:917–944
62. Fersht A (1999) Structure and mechanism in protein science: a guide to enzyme catalysis and
protein folding. Freeman, W. H
63. Chan HS, Zhang Z, Wallin S, Liu Z (2011) Cooperativity, local-nonlocal coupling, and
nonnative interactions: principles of protein folding from coarse-grained models. Annu Rev
Phys Chem 62:301–326
64. Krishna MMG, Englander SW (2005) The N-terminal to C-terminal motif in protein folding
and function. Proc Natl Acad Sci USA 102:1053–1058
65. Krobath H, Rey A, Faisca PFN (2015) How determinant is N-terminal to C-terminal
coupling for protein folding? Phys Chem Chem Phys 17:3512–3524
66. Krobath H, Estácio SG, Faísca PFN, Shakhnovich EI (2012) Identification of a conserved
aggregation-prone intermediate state in the folding pathways of Spc-SH3 amyloidogenic
variants. J Mol Biol 422:705–722
67. Loureiro RJS, Vila-Viçosa D, Machuqueiro M, Shakhnovich EI, Faísca PFN (2017) A tale
of two tails: the importance of unstructured termini in the aggregation pathway of
b2-microglobulin. Proteins Struct Funct Bioinf 85:2045–2057
68. Levinthal C (1968) Are there pathways for protein folding? J Chim Phys 65:44–45
69. Baldwin RL (1975) Intermediates in protein folding reactions and the mechanism of protein
folding. Annu Rev Biochem 44:453–475
70. Wang Z, Mottonen J, Goldsmith EJ (1996) Kinetically controlled folding of the serpin
plasminogen activator inhibitor 1. Biochemistry 35:16443–16448
71. Levinthal C (1969) How to fold graciously. In: Debrunnder JTP, Munck E (eds) Mossbauer
spectroscopy in biological systems: proceedings of a meeting held at Allerton House,
Monticello, Illinois, University of Illinois Press
72. Wetlaufer DB (1973) Nucleation, rapid folding, and globular intrachain regions in proteins.
Proc Natl Acad Sci 70:697–701
73. Karplus M, Weaver DL (1979) Diffusion–collision model for protein folding. Biopolymers
18:1421–1437
74. Dill KA (1985) Theory for the folding and stability of globular proteins. Biochemistry
24:1501–1509
75. Jackson SE (1998) How do small single-domain proteins fold? Fold Des 3:R81–R91
76. Jackson SE, Fersht AR (1991) Folding of chymotrypsin inhibitor 2. 1. Evid Two-state
Transit Biochem 30:10428–10435
77. Tsong TY, Baldwin RL, McPhie P, Elson EL (1972) A sequential model of
nucleation-dependent protein folding: kinetic studies of ribonuclease A. J Mol Biol
63:453–469
78. Abkevich VI, Gutin AM, Shakhnovich EI (1994) Specific nucleus as the transition state for
protein folding: evidence from the lattice model. Biochemistry 33:10026–10036
79. Itzhaki LS, Otzen DE, Fersht AR (1995) The structure of the transition state for folding of
chymotrypsin inhibitor 2 analysed by protein engineering methods: evidence for a
nucleation-condensation mechanism for protein folding. J Mol Biol 254:260–288
80. Weikl TR, Dill KA (2007) Transition-states in protein folding kinetics: the structural
interpretation of u values. J Mol Biol 365:1578–1586
58 Protein Folding: An Introduction

81. Fersht AR (1995) Optimization of rates of protein folding: the nucleation-condensation


mechanism and its implications. Proc Natl Acad Sci 92:10869–10873
82. Faísca PFN (2009) The nucleation mechanism of protein folding: a survey of computer
simulation studies. J Phys Condens Matter 21:373102
83. Bryngelson JD, Wolynes PG (1987) Spin glasses and the statistical mechanics of protein
folding. Proc Natl Acad Sci 84:7524–7528
84. Leopold PE, Montal M, Onuchic JN (1992) Protein folding funnels: a kinetic approach to the
sequence-structure relationship. Proc Natl Acad Sci 89:8721–8725
85. Onuchic JN, Luthey-Schulten Z, Wolynes PG (1997) Theory of protein folding: the energy
landscape perspective. Ann Rev Phys Chem 48:545–600
86. Bryngelson JD, Onuchic JN, Socci ND, Wolynes PG (1995) Funnels, pathways, and the
energy landscape of protein folding: a synthesis. Proteins Struct Funct Bioinf 21:167–195
87. Onuchic JN, Wolynes PG (2004) Theory of protein folding. Curr Opin Struct Biol 14:70–75
88. Dill KA, Bromberg S, Yue K, Chan HS, Ftebig KM, Yee DP, Thomas PD (1995) Principles
of protein folding—a perspective from simple exact models. Protein Sci 4:561–602
89. Dill KA, Chan HS (1997) From Levinthal to pathways to funnels. Nat Struct Mol Biol 4:10–19
90. Chan HS, Dill KA (1998) Protein folding in the landscape perspective: chevron plots and
non-arrhenius kinetics. Proteins Struct Funct Bioinf 30:2–33
91. Dill KA (1999) Polymer principles and protein folding. Protein Sci 8:1166–1180
92. Plaxco KW, Simons KT, Baker D (1998) Contact order, transition state placement and the
refolding rates of single domain proteins (Edited by Wright PE). J Mol Biol 277:985–994
93. Plaxco KW, Simons KT, Ruczinski I, Baker D (2000) Topology, stability, sequence, and
length: defining the determinants of two-state protein folding kinetics. Biochemistry
39:11177–11183
94. Gromiha MM, Selvaraj S (2001) Comparison between long-range interactions and contact
order in determining the folding rate of two-state proteins: application of long-range order to
folding rate prediction (Edited by Wright PE). J Mol Biol 310:27–32
95. Micheletti C (2003) Prediction of folding rates and transition-state placement from
native-state geometry. Proteins Struct Funct Bioinf 51:74–84
96. Chiti F, Taddei N, White PM, Bucciantini M, Magherini F, Stefani M, Dobson CM (1999)
Mutational analysis of acylphosphatase suggests the importance of topology and contact
order in protein folding. Nat Struct Mol Biol 6:1005–1009
97. Riddle DS, Grantcharova VP, Santiago JV, Alm E, Ruczinski I, Baker D (1999) Experiment
and theory highlight role of native state topology in SH3 folding. Nat Struct Mol Biol
6:1016–1024
98. Lindorff-Larsen K, Vendruscolo M, Paci E, Dobson CM (2004) Transition states for protein
folding have native topologies despite high structural variability. Nat Struct Mol Biol
11:443–449
99. Jewett AI, Pande VS, Plaxco KW (2003) Cooperativity, smooth energy landscapes and the
origins of topology-dependent protein folding rates. J Mol Biol 326:247–253
100. Paci E, Lindorff-Larsen K, Dobson CM, Karplus M, Vendruscolo M (2005) Transition state
contact orders correlate with protein folding rates. J Mol Biol 352:495–500
101. Faisca PFN, Ball RC (2002) Topological complexity, contact order, and protein folding
rates. J Chem Phys 117:8587–8591
102. Kaya H, Chan HS (2003) Contact order dependent protein folding rates: kinetic
consequences of a cooperative interplay between favorable nonlocal interactions and local
conformational preferences. Proteins Struct Funct Bioinf 52:524–533
103. Makarov DE, Plaxco KW (2003) The topomer search model: a simple, quantitative theory of
two-state protein folding kinetics. Protein Sci (A Publication of the Protein Society) 12:17–26
104. Faísca PFN, Travasso RDM, Parisi A, Rey A (2012) Why do protein folding rates correlate
with metrics of native topology? PLoS ONE 7:e35599
105. Ivankov DN, Garbuzynskiy SO, Alm E, Plaxco KW, Baker D, Finkelstein AV (2003)
Contact order revisited: influence of protein size on the folding rate. Protein Sci (A
Publication of the Protein Society) 12:2057–2062
References 59

106. Galzitskaya OV, Garbuzynskiy SO, Ivankov DN, Finkelstein AV (2003) Chain length is the
main determinant of the folding rate for proteins with three-state folding kinetics. Proteins
Struct Funct Bioinf 51:162–166
107. Naganathan AN, Muñoz V (2005) Scaling of folding times with protein size. J Am Chem
Soc 127:480–481
108. De Sancho D, Doshi U, Muñoz V (2009) Protein folding rates and stability: how much is
there beyond size? J Am Chem Soc 131:2074–2075
109. Sułkowska Joanna I, Noel Jeffrey K, Ramírez-Sarmiento César A, Rawdon Eric J, Millett
Kenneth C, Onuchic José N (2013) Knotting pathways in proteins. Biochem Soc Trans
41:523–527
110. Faísca PFN (2015) Knotted proteins: a tangled tale of structural biology. Comput Struct
Biotechnol Jurnal 13:459–468
111. Jackson SE, Suma A, Micheletti C (2017) How to fold intricately: using theory and
experiments to unravel the properties of knotted proteins. Curr Opin Struct Biol 42:6–14
112. Mallam AL, Jackson SE (2007) A comparison of the folding of two knotted proteins: YbeA
and YibK. J Mol Biol 366:650–665
113. Wallin S, Zeldovich KB, Shakhnovich EI (2007) The folding mechanics of a knotted
protein. J Mol Biol 368:884–893
114. Škrbić T, Micheletti C, Faccioli P (2012) The role of non-native interactions in the folding of
knotted proteins. PLoS Comput Biol 8:e1002504
115. Soler MA, Faísca PFN (2012) How difficult is it to fold a knotted protein? In silico insights
from surface-tethered folding experiments. PLoS ONE 7:e52343
116. Beccara S, Škrbić T, Covino R, Micheletti C, Faccioli P (2013) Folding pathways of a
knotted protein with a realistic atomistic force field. PLOS Comput Biol 9: e1003002
117. Sułkowska JI, Sułkowski P, Onuchic J (2009) Dodging the crisis of folding proteins with
knots. Proc Natl Acad Sci 106:3119–3124
118. Noel JK, Sułkowska JI, Onuchic JN (2010) Slipknotting upon native-like loop formation in a
trefoil knot protein. Proc Natl Acad Sci 107:15403–15408
119. Noel JK, Onuchic JN, Sulkowska JI (2013) Knotting a protein in explicit solvent. J Phys
Chem Lett 4:3570–3573
120. Lim NCH, Jackson SE (2015) Mechanistic insights into the folding of knotted proteins
in vitro and in vivo. J Mol Biol 427:248–258
121. Mallam AL, Jackson SE (2012) Knot formation in newly translated proteins is spontaneous
and accelerated by chaperonins. Nat Chem Biol 8:147–153
122. Bustamante A, Sotelo-Campos J, Guerra DG, Floor M, Wilson CAM, Bustamante C, Báez
M (2017) The energy cost of polypeptide knot formation and its folding consequences. Nat
Commun 8:1581
123. Soler MA, Rey A, Faisca PFN (2016) Steric confinement and enhanced local flexibility
assist knotting in simple models of protein folding. Phys Chem Chem Phys 18:26391–26403
124. Niewieczerzal S, Sulkowska JI (2017) Knotting and unknotting proteins in the chaperonin
cage: effects of the excluded volume. PLoS ONE 12:e0176744
125. Mirny L, Shakhnovich E (2001) Evolutionary conservation of the folding nucleus (Edited by
Fersht AR). J Mol Biol 308:123–129
126. Sułkowska JI, Noel JK, Onuchic JN (2012) Energy landscape of knotted protein folding.
Proc Natl Acad Sci 109:17783–17788
127. Yu I, Mori T, Ando T, Harada R, Jung J, Sugita Y, Feig M (2016) Biomolecular interactions
modulate macromolecular structure and dynamics in atomistic model of a bacterial
cytoplasm. eLife 5: e19274. https://doi.org/10.7554/eLife.19274
128. Bhushan S, Gartmann M, Halic M, Armache J-P, Jarasch A, Mielke T, Berninghausen O,
Wilson DN, Beckmann R (2010) a-Helical nascent polypeptide chains visualized within
distinct regions of the ribosomal exit tunnel. Nat Struct Mol Biol 17:313
129. Chaney JL, Clark PL (2015) Roles for synonymous codon usage in protein biogenesis.
Annual Rev Biophys 44:143–166
60 Protein Folding: An Introduction

130. Labbadia J, Morimoto RI (2015) The biology of proteostasis in aging and disease. Annu Rev
Biochem 84:435–464
131. Jahn TR, Parker MJ, Homans SW, Radford SE (2006) Amyloid formation under
physiological conditions proceeds via a native-like folding intermediate. Nat Struct Mol
Biol 13:195
132. Hartl FU, Bracher A, Hayer-Hartl M (2011) Molecular chaperones in protein folding and
proteostasis. Nature 475:324
133. Mogk A, Bukau B, Kampinga HH (2018) Cellular handling of protein aggregates by
disaggregation machines. Mol Cell 69:214–226
134. Horowitz S, Koldewey P, Stull F, Bardwell JC (2018) Folding while bound to chaperones.
Curr Opin Struct Biol 48:1–5
135. Hayer-Hartl M, Bracher A, Hartl FU (2016) The GroEL–GroES chaperonin machine: a
nano-cage for protein folding. Trends Biochem Sci 41:62–76
136. Chiti F (2006) Relative importance of hydrophobicity, net charge, and secondary structure
propensities in protein aggregation. In: Uversky VN, Fink AL (eds) Protein misfolding,
aggregation, and conformational diseases: Part A: Protein aggregation and conformational
diseases. Springer, Boston, pp 43–59
137. Ventura S (2005) Sequence determinants of protein aggregation: tools to increase protein
solubility. Microb Cell Fact 4:11
138. Rousseau F, Schymkowitz J, Serrano L (2006) Protein aggregation and amyloidosis:
confusion of the kinds? Curr Opin Struct Biol 16:118–126
139. Gregersen N, Bross P, Vang S, Christensen JH (2006) Protein misfolding and human
disease. Annu Rev Genomics Hum Genet 7:103–124
140. Stoppini M, Bellotti V (2015) Systemic amyloidosis: lessons from b2-microglobulin. J Biol
Chem 290:9951–9958
141. Chiti F, Dobson CM (2006) Protein misfolding, functional amyloid, and human disease.
Annu Rev Biochem 75:333–366
142. Tanskanen M (2013) “Amyloid”—historical aspects. In: Feng D (ed) Amyloidosis. InTech,
Rijeka, pp Ch. 01
143. Ross CA, Poirier MA (2004) Protein aggregation and neurodegenerative disease. Nat Med
10:S10
144. Shewmaker F, McGlinchey RP, Wickner RB (2011) Structural insights into functional and
pathological amyloid. J Biol Chem 286:16533–16540
145. Astbury WT, Dickinson S, Bailey K (1935) The X-ray interpretation of denaturation and the
structure of the seed globulins. Biochem J 29(2351–2360):2351
146. Xiao Y, Ma B (2015) Abeta(1–42) fibril structure illuminates self-recognition and replication
of amyloid in Alzheimer’s disease. Nat Struct Mol Biol 22:499–505
147. Nelson R, Sawaya MR, Balbirnie M, Madsen AØ, Riekel C, Grothe R, Eisenberg D (2005)
Structure of the cross-b spine of amyloid-like fibrils. Nature 435:773–778
148. Sawaya MR, Sambashivan S, Nelson R, Ivanova MI, Sievers SA, Apostol MI,
Thompson MJ, Balbirnie M, Wiltzius JJW, McFarlane HT, Madsen AO, Riekel C,
Eisenberg D (2007) Atomic structures of amyloid cross-[bgr] spines reveal varied steric
zippers. Nature 447:453–457
149. Gazit E (2002) A possible role for p-stacking in the self-assembly of amyloid fibrils.
FASEB J 16:77–83
150. Gremer L, Scholzel D, Schenk C, Reinartz E, Labahn J (2017) Fibril structure of
amyloid-beta (1–42) by cryo-electron microscopy. Science 358:116–119
151. Fitzpatrick AWP, Falcon B, He S, Murzin AG, Murshudov G, Garringer HJ, Crowther RA,
Ghetti B, Goedert M, Scheres SHW (2017) Cryo-EM structures of tau filaments from
Alzheimer’s disease. Nature 547:185–190
152. Li B, Ge P, Murray KA, Sheth P, Zhang M, Nair G, Sawaya MR (2018) Cryo-EM of
full-length alpha-synuclein reveals fibril polymorphs with a common structural kernel. Nat
Commun 9:3609
References 61

153. Iadanza MG, Silvers R (2018) The structure of a beta2-microglobulin fibril suggests a
molecular basis for its amyloid polymorphism. 9:4517
154. Tycko R (2014) Physical and structural basis for polymorphism in amyloid fibrils. Protein
Sci 23:1528–1539
155. Thirumalai D, Reddy G, Straub JE (2012) Role of water in protein aggregation and amyloid
polymorphism. Acc Chem Res 45:83–92
156. Arce FT, Jang H, Ramachandran S, Landon PB, Nussinov R, Lal R (2011) Polymorphism of
amyloid b peptide in different environments: implications for membrane insertion and pore
formation. Soft Matter 7:5267–5273
157. Sarell CJ, Woods LA, Su Y, Debelouchina GT, Ashcroft AE, Griffin RG, Stockley PG,
Radford SE (2013) Expanding the repertoire of amyloid polymorphs by co-polymerization
of related protein precursors. J Biol Chem
158. Pham CLL, Kwan AH, Sunde M (2014) Functional amyloid: widespread in Nature, diverse
in purpose. Essays Biochem 56:207–219
159. Otzen D (2010) Functional amyloid. Prion 4:256–264
160. Fowler DM, Koulov AV, Balch WE, Kelly JW (2007) Functional amyloid—from bacteria to
humans. Trends Biochem Sci 32:217–224
161. Evans ML, Chapman MR (2014) Curli biogenesis: order out of disorder. Biochimica et
Biophysica Acta (BBA)—Mol Cell Res 1843:1551–1558
162. Iconomidou VA, Vriend G, Hamodrakas SJ (2000) Amyloids protect the silkmoth oocyte
and embryo. FEBS Lett 479:141–145
163. Maji SK, Perrin MH, Sawaya MR, Jessberger S, Vadodaria K, Rissman RA, Singru PS,
Nilsson KPR, Simon R, Schubert D, Eisenberg D, Rivier J, Sawchenko P, Vale W, Riek R
(2009) Functional amyloids as natural storage of peptide hormones in pituitary secretory
granules. Science 325:328–332
164. Fowler DM, Koulov AV, Alory-Jost C, Marks MS, Balch WE, Kelly JW (2005) Functional
amyloid formation within mammalian tissue. PLoS Biol 4:e6
165. Smith JF, Knowles TPJ, Dobson CM, MacPhee CE, Welland ME (2006) Characterization of
the nanoscale properties of individual amyloid fibrils. Proc Natl Acad Sci 103:15806–15811
166. Greenwald J, Riek R (2010) Biology of amyloid: structure, function, and regulation.
Structure 18:1244–1260
167. Knowles TPJ, Mezzenga R (2016) Amyloid fibrils as building blocks for natural and
artificial functional materials. Adv Mater 28:6546–6561
168. Scheibel T, Parthasarathy R, Sawicki G, Lin X-M, Jaeger H, Lindquist SL (2003)
Conducting nanowires built by controlled self-assembly of amyloid fibers and selective
metal deposition. Proc Natl Acad Sci 100:4527–4532
169. Nilsson MR (2004) Techniques to study amyloid fibril formation in vitro. Methods 34:151–
160
170. Alberti S, Halfmann R, Lindquist S (2010) Biochemical, cell biological, and genetic assays
to analyze amyloid and prion aggregation in yeast (Chap. 30). In: Methods in enzymology.
Academic Press, pp 709–734
171. Sleutel M, Van den Broeck I, Van Gerven N, Feuillie C, Jonckheere W, Valotteau C,
Dufrene YF, Remaut H (2017) Nucleation and growth of a bacterial functional amyloid at
single-fiber resolution. Nat Chem Biol 13:902–908
172. Giurleo JT, He X, Talaga DS (2008) b-Lactoglobulin assembles into amyloid through
sequential aggregated intermediates. J Mol Biol 381:1332–1348
173. Thirumalai D, Klimov DK, Dima RI (2003) Emerging ideas on the molecular basis of
protein and peptide aggregation. Curr Opin Struct Biol 13:146–159
174. Kelly JW (1998) The alternative conformations of amyloidogenic proteins and their
multi-step assembly pathways. Curr Opin Struct Biol 8:101–106
175. Mahler H-C, Friess W, Grauschopf U, Kiese S (2008) Protein aggregation: pathways,
induction factors and analysis. J Pharm Sci 98:2909–2934
176. Chiti F, Dobson CM (2009) Amyloid formation by globular proteins under native
conditions. Nat Chem Biol 5:15–22
62 Protein Folding: An Introduction

177. Jahn TR, Parker MJ, Homans SW, Radford SE (2006) Amyloid formation under
physiological conditions proceeds via a native-like folding intermediate. Nat Struct Mol
Biol 13:195–201
178. Estácio SG, Krobath H, Vila-Viçosa D, Machuqueiro M, Shakhnovich EI, Faísca PFN
(2014) A simulated intermediate state for folding and aggregation provides insights into
DN6 b2-microglobulin amyloidogenic behavior. PLoS Comput Biol 10:e1003606
179. Neudecker P, Robustelli P, Cavalli A, Walsh P, Lundström P, Zarrine-Afsar A, Sharpe S,
Vendruscolo M, Kay LE (2012) Structure of an intermediate state in protein folding and
aggregation. Science 336:362–366
180. Honda Ryo P, Xu M, Yamaguchi K-I, Roder H, Kuwata K (2015) A native-like intermediate
serves as a branching point between the folding and aggregation pathways of the mouse
prion protein. Structure 23:1735–1742
181. Jahn TR, Radford SE (2008) Folding versus aggregation: polypeptide conformations on
competing pathways. Arch Biochem Biophys 469:100–117
182. Cohen SIA, Vendruscolo M, Dobson CM, Knowles TPJ (2012) From macroscopic
measurements to microscopic mechanisms of protein aggregation. J Mol Biol 421:160–171
183. Buell AK, Dobson CM, Knowles TPJ (2014) The physical chemistry of the amyloid
phenomenon: thermodynamics and kinetics of filamentous protein aggregation. Essays
Biochem 56:11–39
184. Meisl G, Michaels TCT, Linse S, Knowles TPJ (2018) Kinetic analysis of amyloid
formation. Methods Mol Biol 1779:181–196
185. Herrup K (2015) The case for rejecting the amyloid cascade hypothesis. Nat Neurosci
18:794
186. Stefani M (2012) Structural features and cytotoxicity of amyloid oligomers: implications in
Alzheimer’s disease and other diseases with amyloid deposits. Prog Neurobiol 99:226–245
187. Bucciantini M, Rigacci S, Stefani M (2014) Amyloid aggregation: role of biological
membranes and the aggregate-membrane system. J Phys Chem Lett 5:517–527
188. Leal SS, Botelho HM, Gomes CM (2012) Metal ions as modulators of protein conformation
and misfolding in neurodegeneration. Coord Chem Rev 256:2253–2270
189. Morriss-Andrews A, Shea J-E (2015) Computational studies of protein aggregation: methods
and applications. Annu Rev Phys Chem 66:643–666
190. Balchin D, Hayer-Hartl M, Hartl FU (2016) In vivo aspects of protein folding and quality
control. Science 353
191. Michelitsch MD, Weissman JS (2000) A census of glutamine/asparagine-rich regions:
implications for their conserved function and the prediction of novel prions. Proc Natl Acad
Sci 97:11910–11915
192. Bemporad F, Calloni G, Campioni S, Plakoutsi G, Taddei N, Chiti F (2006) Sequence and
structural determinants of amyloid fibril formation. Acc Chem Res 39:620–627
193. De Baets G, Schymkowitz J, Rousseau F (2014) Predicting aggregation-prone sequences in
proteins. Essays Biochem 56:41–52
194. Beerten J, Schymkowitz J, Rousseau F (2012) Aggregation prone regions and gatekeeping
residues in protein sequences. Curr Top Med Chem 12:2470–2478
195. Uversky VN (2010) Targeting intrinsically disordered proteins in neurodegenerative and
protein dysfunction diseases: another illustration of the D(2) concept. Expert Rev Proteomics
7:543–564
196. Fernandez-Escamilla A-M, Rousseau F, Schymkowitz J, Serrano L (2004) Prediction of
sequence-dependent and mutational effects on the aggregation of peptides and proteins. Nat
Biotechnol 22:1302
197. Tartaglia GG, Vendruscolo M (2008) The Zyggregator method for predicting protein
aggregation propensities. Chem Soc Rev 37:1395–1401
198. Conchillo-Solé O, de Groot NS, Avilés FX, Vendrell J, Daura X, Ventura S (2007)
AGGRESCAN: a server for the prediction and evaluation of “hot spots” of aggregation in
polypeptides. BMC Bioinf 8:65
References 63

199. Zambrano R, Jamroz M, Szczasiuk A, Pujols J, Kmiecik S, Ventura S (2015)


AGGRESCAN3D (A3D): server for prediction of aggregation properties of protein
structures. Nucleic Acids Res 43:W306–W313
200. Fändrich M, Dobson CM (2002) The behaviour of polyamino acids reveals an inverse side
chain effect in amyloid structure formation. EMBO J 21:5682–5690
201. Bartlett AI, Radford SE (2009) An expanding arsenal of experimental methods yields an
explosion of insights into protein folding mechanisms. Nat Struct Mol Biol 16:582–588
202. Cristovao JS, Henriques BJ, Gomes CM (2019) Biophysical and spectroscopic methods for
monitoring protein misfolding and amyloid aggregation. Methods Mol Biol 1873:3–18
203. Lucas TG, Gomes CM, Henriques BJ (2019) Thermal shift and stability assays of
disease-related misfolded proteins using differential scanning fluorimetry. Methods Mol Biol
1873:255–264
204. Kelly SM, Jess TJ, Price NC (2005) How to study proteins by circular dichroism. Biochem
Biophys Acta 1751:119–139
205. Barth A (2007) Infrared spectroscopy of proteins. Biochem Biophys Acta 1767:1073–1101
206. Correia AR, Adinolfi S, Pastore A, Gomes CM (2006) Conformational stability of human
frataxin and effect of Friedreich’s ataxia-related mutations on protein folding. Biochem J
398:605–611
207. Gade Malmos K, Blancas-Mejia LM, Weber B, Buchner J, Ramirez-Alvarado M, Naiki H,
Otzen D (2017) ThT 101: a primer on the use of thioflavin T to investigate amyloid
formation. Amyloid 24:1–16
208. Miyazawa S, Jernigan RL (1985) Estimation of effective interresidue contact energies from
protein crystal structures: quasi-chemical approximation. Macromolecules 18:534–552
209. Taketomi H, Ueda Y, Gō N (1975) Studies on protein folding, unfolding and fluctuations by
computer simulation. Int J Pept Protein Res 7:445–459
210. Sebastian Kmiecik, Dominik Gront, Michal Kolinski, Lukasz Wieteska, Aleksandra Elzbieta
Dawid, Andrzej Kolinski, (2016) Coarse-Grained Protein Models and Their Applications.
Chemical Reviews 116(14):7898–7936
211. Tozzini V (2005) Coarse-grained models for proteins. Curr Opin Struct Biol 15:144–150
212. Kmiecik S, Gront D, Kolinski M, Wieteska L, Dawid AE, Kolinski A (2016) Coarse-grained
protein models and their applications. Chem Rev 116:7898–7936
213. Enciso M, Rey A (2010) A refined hydrogen bond potential for flexible protein models.
J Chem Phys 132:235102
214. Holzgräfe C, Wallin S (2014) Smooth functional transition along a mutational pathway with
an abrupt protein fold switch. Biophys J 107:1217–1225
215. Ponder JW, Case DA (2003) Force fields for protein simulations. Adv Protein Chem 66:27–
85
216. Snow CD, Nguyen H, Pande VS, Gruebele M (2002) Absolute comparison of simulated and
experimental protein-folding dynamics. Nature 420:102–106
217. Sandro Bottaro, Kresten Lindorff-Larsen, (2018) Biophysical experiments and biomolecular
simulations: A perfect match?. Science 361(6400):355–360
218. Sugita Y, Okamoto Y (1999) Replica-exchange molecular dynamics method for protein
folding. Chem Phys Lett 314:141–151
219. Bowman GR, Voelz VA, Pande VS (2011) Taming the complexity of protein folding. Curr
Opin Struct Biol 21:4–11
220. Lane TJ, Shukla D, Beauchamp KA, Pande VS (2013) To milliseconds and beyond:
challenges in the simulation of protein folding. Curr Opin Struct Biol 23:58–65
221. Shaw DE, Maragakis P, Lindorff-Larsen K, Piana S, Dror RO, Eastwood MP, Bank JA,
Jumper JM, Salmon JK, Shan Y, Wriggers W (2010) Atomic-level characterization of the
structural dynamics of proteins. Science 330:341–346
222. Dror RO, Young C, Shaw DE (2011) Anton, a special-purpose molecular simulation
machine. In: Padua D (ed) Encyclopedia of parallel computing. Springer, Boston, pp 60–71

You might also like