Protein Structure Modeling

Protein structure modeling
Andrs Fiser
Department of Biochemistry and Seaver Foundation Center for Bioinformatics Albert Einstein College of Medicine New York, USA
Why is it useful to know the structure of a protein not only its sequence?
The 3D structure is more informative than sequence because patterns in space are frequently more recognizable than patterns in sequence
Evolution tends to conserve function and function depends more directly on structu than on sequence, structure is more conserved in evolution than sequence.
Why Protein Structure Prediction?
Y 2005 Sequences Structures 2,300,000 29,000
We know the experimental 3D structure for ~1% of the protein sequences
Principles of Protein Structure

GFCHIKAYTRLIMVG
Desulfovibrio vulgaris
Anacystis nidulans
folding Ab initio prediction
evolution Fold Recognition Comparative Modeling
Condrus crispus
Anabaena 7120
Protein structure modeling
Ab initio prediction
Comparative Modeling
Applicable to those sequences only that share recognizable similarity to a template structure Fairly accurate ( <3 Ang RMSD), typically comparable to a low resolution X-ray experiment. Not limited by size
Applicable to any sequence
Not very accurate (>4 Ang RMSD),
Attempted for proteins of <100 residues
Accuracy and applicability are limited by our understanding of the protein olding problem
Accuracy and applicability are rather limited by the number of known folds
What makes comparative modeling possible
A small difference in the sequence makes a small ifference in the structure
II Protein structures are clustered into fold families
Structural Genomics
Characterize most protein sequences (red) based on related nown structures (green). The number of families is much smaller than the number of proteins
Structural Genomics
Definition: The aim of structural genomics is to put every protein sequence within modeling distance of a known protein structure.
Size of the problem: There are a few thousand domain fold families. There are ~20,000 sequence families (30% sequence id).
Solution: Determine protein structures for as many different families as possible. Model the rest of the family members using comparative modeling
Comparative Protein Structure Modeling

Ca RMSD (% EQV) 2 (50) 1 (80) 0 (100) Anacystis nidulans
Flavodoxin family
Anabaena 7120
COMPARATIVE MODELING
KIGIFFSTSTGNTTEVA
Condrus crispus
Desulfovibrio vulgaris
Clostridium mp. 20 50 100
Steps in Comparative Protein Structure Modelin

START
TARGET
ASILPKRLFGNCEQTSDEGLK IERTPLVPHISAQNVCLKIDD VPERLIPERASFQWMNDK
TEMPLATE
Template Search Target Template Alignment Model Building Model Evaluation
ASILPKRLFGNCEQTSDEGLKIERTPLVPHISAQNVCLKIDDVPERLIPE MSVIPKRLYGNCEQTSEEAIRIEDSPIV---TADLVCLKIDEIPERLVGE
No
OK?
Yes
END
Steps in Comparative Protein Structure Modeling

START Template Search Target Template Alignment Model Building Model Evaluation No
Pattern recognition, heuristic searches (e.g. BLAST, FastA) Profile and iterative alignment methods (e.g. HMMs, PSI-BLAST) Structure based threading (e.g. THREADER, FUGUE, 3DPSSM)
OK? Yes END

START
Template Search
Target Template Alignment
Dynamic Programming, Pairwise Alignmen Multiple Alignments, Profiles, HMMs Structure based approaches (Threading)
Model Building Model Evaluation
No
OK?
Yes
END

START
Template Search
Rigid Body Assembly (COMPOSER) Segment Matching (SEGMOD, 3DPSSM) Satisfaction of Spatial Restraints (MODELLE Integrated (NEST)
loop modeling, side chain modeling
No
OK?
Yes
END
Steps in Comparative Protein Structure Modelin

START
Template Search
Stereochemistry (PROCHECK, WHATCHEC Environment (Profiles3D, Verify3d) Statistical potentials based methods (PROSA
No
Is the model reliable? A model is reliable when it is based on a correct template and on an approximately correct alignment.
OK?
Yes
END
Typical Errors in Comparative Models

Incorrect template Misalignment
MODEL X-RAY TEMPLATE
Region without a template
Distortion in correctly aligned regions
Side chain packing
omparing accuracies of experimental and theoretical approach
Some Models Can Be Surprisingly Accurate (in Some Regions)
24% sequence identity

YJL001W
25% sequence identity

YGL203C
1rypH
1ac5
His 488 Ser 176 Asp 383
odeling structural consequences of a point mutation (Ser-Pro) in Zebrafish forkhead transcription factor Foxi1
re-modelled wild type segments(6 and 7aa) and NMR: modelled mutated segments with each other (6 and 7aa): wild type and mutated segments (6 and 7 aa):
RMSD 1.78 and 1.82 1.19 3.65 and 3.75
Altered subunit communication in subfamilies of dUTPases

Predicting features that are not present in the template
H. sapiens
ophila m.
1. Active form usually is a trimer,
each active site is formed by all three monomers.

2. Comparison of models and X-ray
oli
Eq. inf. virus
structures reveals two subclasses of dUTPases with different type of subunit interfaces.
3. Altered character of subunit
interfaces correlates with the suggested different functional mechanism: polar/charged surface is better adjusted for allosterism.
Convergent evolution of Trichomonas vaginalis lactate dehydrogenase from malate dehydrogenase.

Designing new enzyme specificity with the aid of comparative models
1.
Sequences are identif from the Trichomonas genome project
2.
Mutations were designed using the constructed 3D mode to switch specificity.
Core histones of the amitochondriate protist, Giardia lamblia
Confirming fold by energy evaluation of comparative models
G.Lamblia H2A H2B H3 H4 X-ray
1aoiC/G -4.74 -1.15 -1.35 -2.29
1aoiD/H -3.42 -4.34 -0.61 -2.82
1aoiA/E -0.64 -0.41 -2.38 -0.26
1aoiB/F -2.77 -1.70 -0.41 -4.79
-5.41/-5.09 -3.98/-4.05 -2.74/-2.39 -5.23/-4.42

Protein Structure Modeling

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Protein Structure Modeling

Uploaded by

Copyright:

Available Formats

Protein structure modeling

Why Protein Structure Prediction?

Y 2005 Sequences Structures 2,300,000 29,000

We know the experimental 3D structure for ~1% of the protein sequences

Principles of Protein Structure

folding Ab initio prediction

evolution Fold Recognition Comparative Modeling

Protein structure modeling

Applicable to any sequence

Not very accurate (>4 Ang RMSD),

Attempted for proteins of <100 residues

What makes comparative modeling possible

A small difference in the sequence makes a small ifference in the structure

II Protein structures are clustered into fold families

Comparative Protein Structure Modeling

Clostridium mp. 20 50 100

Steps in Comparative Protein Structure Modelin

Template Search Target Template Alignment Model Building Model Evaluation

Steps in Comparative Protein Structure Modeling

OK? Yes END

Steps in Comparative Protein Structure Modeling

Model Building Model Evaluation

Steps in Comparative Protein Structure Modeling

Model Building Model Evaluation

loop modeling, side chain modeling

Steps in Comparative Protein Structure Modelin

Model Building Model Evaluation

Typical Errors in Comparative Models

MODEL X-RAY TEMPLATE

Region without a template

Distortion in correctly aligned regions

Side chain packing

omparing accuracies of experimental and theoretical approach

Some Models Can Be Surprisingly Accurate (in Some Regions)

24% sequence identity

25% sequence identity

His 488 Ser 176 Asp 383

RMSD 1.78 and 1.82 1.19 3.65 and 3.75

Altered subunit communication in subfamilies of dUTPases

1. Active form usually is a trimer,

each active site is formed by all three monomers.

Eq. inf. virus

Convergent evolution of Trichomonas vaginalis lactate dehydrogenase from malate dehydrogenase.

Sequences are identif from the Trichomonas genome project

Mutations were designed using the constructed 3D mode to switch specificity.

Core histones of the amitochondriate protist, Giardia lamblia

Confirming fold by energy evaluation of comparative models

G.Lamblia H2A H2B H3 H4 X-ray

1aoiC/G -4.74 -1.15 -1.35 -2.29

1aoiD/H -3.42 -4.34 -0.61 -2.82

1aoiA/E -0.64 -0.41 -2.38 -0.26

1aoiB/F -2.77 -1.70 -0.41 -4.79

-5.41/-5.09 -3.98/-4.05 -2.74/-2.39 -5.23/-4.42

You might also like