Download as pdf or txt
Download as pdf or txt
You are on page 1of 35

REVIEW OF PROTEIN STRUCTURE

AND TERMINOLOGY
Pathway for folding a linear chain of amino acids into a three-three-dimensional
protein structure.
 The polypeptide chain is first assembled on the ribosome. The resulting linear chain
forms secondary structures through the formation of hydrogen bonds between
amino acids in the chain. Through further interactions among amino acid side
groups, these secondary structures then fold into a 3-
3-D structure.
 Chaperone proteins and membranes may assist with this process. Further,
processing of the protein by cleavage or chemical modification may also occur.
 Protein structures include a core region comprising secondary structural elements
packed in close proximity in a hydrophobic environment. Specific interactions
between the amino acid side chains occur within this core structure. At a given
position in a given core, the allowed amino acid substitutions are limited by space
and available contacts with other nearby amino acids.
 Outside of the core are loops and structural elements in contact with water and
other molecules. Substitutions in these regions are not as restricted as in the core.
SECONDARY STRUCTURE AND
FOLDING CLASSES
 A database search for a new sequence may not provide a hit against a
known protein or the secondary structure of the hit may not be
available.
 The secondary structure information may be important for the biologist.
 Secondary structure prediction methods rely on observations made from
groups of proteins whose three-
three-dimensional structure has been
experimentally determined.
 Some protein sequences have distinct amino acid motifs that always
form a characteristic structure. Prediction of these structures from
sequence is quite achievable using presently available methods.
 For most proteins, however, the accuracy of secondary structure
prediction is approximately 70-
70-75%. Methods for matching sequence to
three--dimensional structure have been formulated, but they are not yet
three
very reliable.
 However, great forward strides have been made, and there is a very
active community of structural biochemists and bioinformaticians
working on improvements. The need for such an effort is revealed by
the rapid increases in the number of protein sequences and structures.
The structure of two amino acids in a polypeptide chain. The R group is different for
each of the 20 amino acids. Neighboring amino acids are joined by a peptide bond
between the C=O and NH groups. The N-C-C sequence is repeated, forming the
backbone of the three-dimensional structure. The bonds on each side of the C atom
are quite free to rotate, with certain restrictions due to spatial constraints from the R
group and neighboring positions in the chain. The conformation of the protein backbone
is determined by the angles of these bonds, Φ of the bond between the N and C atoms
and Ψ of the bond between the C and C of the C=O group. The angle ω, of the peptide
bond joining the C=O and NH groups is nearly always 180°.
Rotatable bonds in peptide backbone:
Conformational possibilities
Ramchandran Plot: The permitted values of - angles

Gly:1-4
Ala: 2-4
Val,Ile: 4
Others:3-4
The -helix and -
sheets of protein
secondary structure.
The backbone of the
chain is shown in
red, the C
C atoms
and the C=O and NH
groups are shown in
blue, yellow and
green, respectively.

(A) In the -helix, each C=O group at position n is hydrogen-


hydrogen-bonded with the NH group
at position n+4
n+4.. There are 3.6 residues per turn. The helix is usually right-
right-handed,
but short sections of 3-
3-5 amino acids of left-
left-handed helices occur occasionally. The
average Φ and Ψ angles in the right-
right-handed helix are approximately 60°
60° and 40°
40°,
respectively. The R side chains of the amino acids are on the outside of the helix.
(B) The -sheet is made up of -strands that are portions of the protein chain. The
strands may run in the same (parallel) or opposite (antiparallel) chemical directions
(or a mixture of the two), and the pattern of hydrogen bonds is different in each
case and also varies in antiparallel strands.
Arrangement in -helix
Alpha-helix

• Corkscrew
• Main chain forms backbone, side
chains project out
• Hydrogen bonds between CO
group at n and NH group at n+4
• Helix-formers: Ala, Glu, Leu, Met
• Helix-breaker: Pro
Beta-strand

• Extended structure (pleated)


• Peptide bonds point in
opposite directions
• Side chains point in
opposite directions
• No hydrogen bonding within
strand
Beta-sheet

• Stabilization
through hydrogen
bonding
• Parallel or
antiparallel
• Variant: beta-turn
Regular Conformations of Polypeptides on
Ramchandran Plot
How the existing knowledge can help in
protein structure prediction?
 Protein structure is largely specified by amino acid sequence, but how one set of
interactions of the many possible occurs is not yet fully understood (Branden and
Tooze 1991).
 Initial estimates indicated that there are approximately 1,000 protein families
composed of members that share detectable sequence similarity (Dayhoff et al.
1978; Chothia 1992). Thus, new protein sequences are expected to share structural
features with these proteins.
 A new structure has often been found to fold into -helical and -sheet structural
elements in the same order and spatial configuration as one or more structures
already in the structural database.
 Whether this low number of structure families represents physical restraints in
folding the polypeptide chain into a three-
three-dimensional structure or merely the
selection of certain classes of three-
three-dimensional structure by evolution has yet to be
discovered (Gibrat et al. 1996).
 The sequence alignment, motif-
motif-finding, block-
block-finding, and database similarity search
methods may be used to discover these familial relationships. Understanding these
relationships can greatly assist with structural predictions.
 Information from amino acid substitutions at a particular sequence position as
obtained from a multiple sequence alignment has been found to increase
significantly the prediction of secondary structures from protein sequences.
Classes of Protein Structure
 Class  comprises a bundle of a helices connected by loops on the surface
of the proteins.
 Class  comprises antiparallel  sheets, usually two sheets in close contact
forming a sandwich. Alternatively, a sheet can twist into a barrel with the
first and last strands touching.
 Class  /  comprises mainly parallel  sheets with intervening  helices,
but may also have mixed  sheets. In addition to forming a sheet in some
proteins in this class, in others parallel  strands may form into a barrel
structure that is surrounded by  helices. This class of proteins includes
many metabolic enzymes.
 Class  +  comprises mainly segregated  helices and antiparallel 
sheets.
 Multi--domain (
Multi ( and ) proteins comprise domains representing more
than one of the above four classes.
 Membrane and cell-
cell-surface proteins and peptides excluding proteins of the
immune system comprise this class.
Folding Classes

  + /
Human albumin: All  (PDB code 1ao6)
Structure of -class proteins.
proteins. (A) Diagram showing -helical
pattern of this class, -helices are red cylinders, and black lines
are loops. (B) Example of the class, hemoglobin, PDB file 3hhb
displayed using Rasmol, using ribbons display and group color.
Engrailed homeodomain All  (PDB code: 1enh)
Structure of  class proteins. (A) Diagram showing typical
arrangement of the antiparallel  strands (blue arrows) joined by
loops (black lines) in  sheet. (B) Example of protein in this class, T-
T-
cell receptor CD8, PDB file Icd8, image from
http://expasy.hcuge.ch/pub/Graphics/IMAGES/.
-Amylase inhibitor All  (PDB code: 1hoe)
Pillin (+) (PDB code: 2 pil)
Structure of  +  class proteins. (A) Diagram showing
arrangement of typical motif of antiparallel  strands (blue
arrows) in  sheet and segregated from  helix (red cylinder) and
showing loops (black lines). (B) Example of protein in this class,
G-specific endonuclease complex with deoxy-
deoxy-dinucleotide
inhibitor.
Structure of / class proteins. (A) Diagram showing one
possible configuration of parallel  strands (blue arrows) in a 
sheet and an intervening a helix (red cylinder), joined by loops
(black lines). (B) Example of protein in this class, tryptophan
synthase  subunit.
Structure of membrane proteins:  helices are of a particular length
range and have a high content of hydrophobic amino acids traversing a
membrane, features that make this class readily identifiable by scanning a
sequence for these hydrophobic regions. (A) Diagram showing typical
arrangement of membrane-
membrane-traversing, hydrophobic a helices (red). Membrane
bilayer shown as green lines. (B) Example of protein in this class, integral
membrane light-
light-harvesting complex, PDB file lkzu viewed with Rasmol.
Main Web sites for protein structural analysis
Methods for Classification of Protein Structures

SCOP
• Structural Classification Of Proteins
• Augmented manual classification, class, fold, superfamily and
family classification
Murzin et al (1995) J. Mol. Biol. 247 536-540

VAST
• Vector Alignment Search Tool
• Complete PDB and representative structure comparison,
structure alignments, structure superposition tool
Gibrat and Bryant (1996) Curr. Opinion in Struct. Biol. 6377-385
CATH
• Class, Architecture, Topology and Homologous superfamily - a
hierarchical classification of protein domain structures
• Complete PDB, fold classification by domain
Orengo et al (1997) Structure 5(8) 1093-1108

CE
• Combinatorial Extension of the optimal path
• Complete PDB and representative structure comparison,
structure alignments, structure superposition tool
Shindyalov & Bourne (1998) Protein Engineering 11(9) 739-747

FSSP
• Fold classification based on Structure-Structure alignment of
Proteins
• Complete PDB, fold tree, domain dictionary, sequence
neighbors, structure superposition
Holm and Sander (1998) Nucl. Acids Res. 26 316-319
SCOP: Hierarchical Classification of Protein Structures

686 folds
1073 superfamily Constructed Manually by visual
1827 family
inspection, comparison of
structure/sequence, automated tools
5,741 domains

• Basic unit is domain: 686 folds 1-many domains


1 protein=
• Family: Clear evolutionary relationship: High sequence /
1073 superfamily
structure similarity.
• Superfamily: Probable 1827common
family evolutionary origin based
on moderate structure similarity and common functional
features. 5,741 domains
• Fold: Gross structural similarity (could arise from common
origin or physics / chemistry of protein folding).
• Short descriptive
Murzin etnames given
al (1995) toBiol.
J. Mol. fold,247,
superfamily,
536 family;
example: / hydrolase fold.
VAST Structure Comparison (Vector Alignment Search Tool)

Step 1: Construct vectors for secondary structure elements

Ricin Chain B
VAST Structure Comparison
Step 2: Optimally align structure element vectors
VAST Structure Comparison
Step 3: Refine residue-by-residue alignment using Monte Carlo
Protein Structure Comparison by Alignment of
Distance Matrices

Step 1: 3-D to 2-D transformation

a
a
b c

Holm and Sander, J. Mol. Biol. (1993) 233, 123-138


Construction of a distance matrix

x y Z C1 C2 C3

C1 13.220 44.968 51.871 C1 12.93 14.94

C2 0.601 47.326 50.349 C2 3.78

C3 -0.876 46.155 47.073 C3


Algorithm: Equivalent to BLAST

• Size of distance matrix  n2


• Number of comparisons  n2 m2 a from 1

• Highly redundant comparisons


• Compare segments of
hexapeptides
• Adjacent homologous segments
Protein 2
can be merged to obtain say,
helix
• Compare distances between the
equivalent segments.
Comparison of Distance Matrices
Hypothesis: Similar 3-D structure; similar distance matrices
Protein 1 Protein 2

a a'
3-D b c
b'
co-ordinates c'

a a'
Distance
Matrix
b
c'
c b'

You might also like