Professional Documents
Culture Documents
6-Protein Structure Analysis PDF
6-Protein Structure Analysis PDF
AND TERMINOLOGY
Pathway for folding a linear chain of amino acids into a three-three-dimensional
protein structure.
The polypeptide chain is first assembled on the ribosome. The resulting linear chain
forms secondary structures through the formation of hydrogen bonds between
amino acids in the chain. Through further interactions among amino acid side
groups, these secondary structures then fold into a 3-
3-D structure.
Chaperone proteins and membranes may assist with this process. Further,
processing of the protein by cleavage or chemical modification may also occur.
Protein structures include a core region comprising secondary structural elements
packed in close proximity in a hydrophobic environment. Specific interactions
between the amino acid side chains occur within this core structure. At a given
position in a given core, the allowed amino acid substitutions are limited by space
and available contacts with other nearby amino acids.
Outside of the core are loops and structural elements in contact with water and
other molecules. Substitutions in these regions are not as restricted as in the core.
SECONDARY STRUCTURE AND
FOLDING CLASSES
A database search for a new sequence may not provide a hit against a
known protein or the secondary structure of the hit may not be
available.
The secondary structure information may be important for the biologist.
Secondary structure prediction methods rely on observations made from
groups of proteins whose three-
three-dimensional structure has been
experimentally determined.
Some protein sequences have distinct amino acid motifs that always
form a characteristic structure. Prediction of these structures from
sequence is quite achievable using presently available methods.
For most proteins, however, the accuracy of secondary structure
prediction is approximately 70-
70-75%. Methods for matching sequence to
three--dimensional structure have been formulated, but they are not yet
three
very reliable.
However, great forward strides have been made, and there is a very
active community of structural biochemists and bioinformaticians
working on improvements. The need for such an effort is revealed by
the rapid increases in the number of protein sequences and structures.
The structure of two amino acids in a polypeptide chain. The R group is different for
each of the 20 amino acids. Neighboring amino acids are joined by a peptide bond
between the C=O and NH groups. The N-C-C sequence is repeated, forming the
backbone of the three-dimensional structure. The bonds on each side of the C atom
are quite free to rotate, with certain restrictions due to spatial constraints from the R
group and neighboring positions in the chain. The conformation of the protein backbone
is determined by the angles of these bonds, Φ of the bond between the N and C atoms
and Ψ of the bond between the C and C of the C=O group. The angle ω, of the peptide
bond joining the C=O and NH groups is nearly always 180°.
Rotatable bonds in peptide backbone:
Conformational possibilities
Ramchandran Plot: The permitted values of - angles
Gly:1-4
Ala: 2-4
Val,Ile: 4
Others:3-4
The -helix and -
sheets of protein
secondary structure.
The backbone of the
chain is shown in
red, the C
C atoms
and the C=O and NH
groups are shown in
blue, yellow and
green, respectively.
• Corkscrew
• Main chain forms backbone, side
chains project out
• Hydrogen bonds between CO
group at n and NH group at n+4
• Helix-formers: Ala, Glu, Leu, Met
• Helix-breaker: Pro
Beta-strand
• Stabilization
through hydrogen
bonding
• Parallel or
antiparallel
• Variant: beta-turn
Regular Conformations of Polypeptides on
Ramchandran Plot
How the existing knowledge can help in
protein structure prediction?
Protein structure is largely specified by amino acid sequence, but how one set of
interactions of the many possible occurs is not yet fully understood (Branden and
Tooze 1991).
Initial estimates indicated that there are approximately 1,000 protein families
composed of members that share detectable sequence similarity (Dayhoff et al.
1978; Chothia 1992). Thus, new protein sequences are expected to share structural
features with these proteins.
A new structure has often been found to fold into -helical and -sheet structural
elements in the same order and spatial configuration as one or more structures
already in the structural database.
Whether this low number of structure families represents physical restraints in
folding the polypeptide chain into a three-
three-dimensional structure or merely the
selection of certain classes of three-
three-dimensional structure by evolution has yet to be
discovered (Gibrat et al. 1996).
The sequence alignment, motif-
motif-finding, block-
block-finding, and database similarity search
methods may be used to discover these familial relationships. Understanding these
relationships can greatly assist with structural predictions.
Information from amino acid substitutions at a particular sequence position as
obtained from a multiple sequence alignment has been found to increase
significantly the prediction of secondary structures from protein sequences.
Classes of Protein Structure
Class comprises a bundle of a helices connected by loops on the surface
of the proteins.
Class comprises antiparallel sheets, usually two sheets in close contact
forming a sandwich. Alternatively, a sheet can twist into a barrel with the
first and last strands touching.
Class / comprises mainly parallel sheets with intervening helices,
but may also have mixed sheets. In addition to forming a sheet in some
proteins in this class, in others parallel strands may form into a barrel
structure that is surrounded by helices. This class of proteins includes
many metabolic enzymes.
Class + comprises mainly segregated helices and antiparallel
sheets.
Multi--domain (
Multi ( and ) proteins comprise domains representing more
than one of the above four classes.
Membrane and cell-
cell-surface proteins and peptides excluding proteins of the
immune system comprise this class.
Folding Classes
+ /
Human albumin: All (PDB code 1ao6)
Structure of -class proteins.
proteins. (A) Diagram showing -helical
pattern of this class, -helices are red cylinders, and black lines
are loops. (B) Example of the class, hemoglobin, PDB file 3hhb
displayed using Rasmol, using ribbons display and group color.
Engrailed homeodomain All (PDB code: 1enh)
Structure of class proteins. (A) Diagram showing typical
arrangement of the antiparallel strands (blue arrows) joined by
loops (black lines) in sheet. (B) Example of protein in this class, T-
T-
cell receptor CD8, PDB file Icd8, image from
http://expasy.hcuge.ch/pub/Graphics/IMAGES/.
-Amylase inhibitor All (PDB code: 1hoe)
Pillin (+) (PDB code: 2 pil)
Structure of + class proteins. (A) Diagram showing
arrangement of typical motif of antiparallel strands (blue
arrows) in sheet and segregated from helix (red cylinder) and
showing loops (black lines). (B) Example of protein in this class,
G-specific endonuclease complex with deoxy-
deoxy-dinucleotide
inhibitor.
Structure of / class proteins. (A) Diagram showing one
possible configuration of parallel strands (blue arrows) in a
sheet and an intervening a helix (red cylinder), joined by loops
(black lines). (B) Example of protein in this class, tryptophan
synthase subunit.
Structure of membrane proteins: helices are of a particular length
range and have a high content of hydrophobic amino acids traversing a
membrane, features that make this class readily identifiable by scanning a
sequence for these hydrophobic regions. (A) Diagram showing typical
arrangement of membrane-
membrane-traversing, hydrophobic a helices (red). Membrane
bilayer shown as green lines. (B) Example of protein in this class, integral
membrane light-
light-harvesting complex, PDB file lkzu viewed with Rasmol.
Main Web sites for protein structural analysis
Methods for Classification of Protein Structures
SCOP
• Structural Classification Of Proteins
• Augmented manual classification, class, fold, superfamily and
family classification
Murzin et al (1995) J. Mol. Biol. 247 536-540
VAST
• Vector Alignment Search Tool
• Complete PDB and representative structure comparison,
structure alignments, structure superposition tool
Gibrat and Bryant (1996) Curr. Opinion in Struct. Biol. 6377-385
CATH
• Class, Architecture, Topology and Homologous superfamily - a
hierarchical classification of protein domain structures
• Complete PDB, fold classification by domain
Orengo et al (1997) Structure 5(8) 1093-1108
CE
• Combinatorial Extension of the optimal path
• Complete PDB and representative structure comparison,
structure alignments, structure superposition tool
Shindyalov & Bourne (1998) Protein Engineering 11(9) 739-747
FSSP
• Fold classification based on Structure-Structure alignment of
Proteins
• Complete PDB, fold tree, domain dictionary, sequence
neighbors, structure superposition
Holm and Sander (1998) Nucl. Acids Res. 26 316-319
SCOP: Hierarchical Classification of Protein Structures
686 folds
1073 superfamily Constructed Manually by visual
1827 family
inspection, comparison of
structure/sequence, automated tools
5,741 domains
Ricin Chain B
VAST Structure Comparison
Step 2: Optimally align structure element vectors
VAST Structure Comparison
Step 3: Refine residue-by-residue alignment using Monte Carlo
Protein Structure Comparison by Alignment of
Distance Matrices
a
a
b c
a a'
3-D b c
b'
co-ordinates c'
a a'
Distance
Matrix
b
c'
c b'