Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Protein structure modeling

Andrs Fiser
Department of Biochemistry and Seaver Foundation Center for Bioinformatics Albert Einstein College of Medicine New York, USA

Why is it useful to know the structure of a protein not only its sequence?
The 3D structure is more informative than sequence because patterns in space are frequently more recognizable than patterns in sequence

Evolution tends to conserve function and function depends more directly on structu than on sequence, structure is more conserved in evolution than sequence.

Why Protein Structure Prediction?

Y 2005 Sequences Structures 2,300,000 29,000

We know the experimental 3D structure for ~1% of the protein sequences

Principles of Protein Structure


GFCHIKAYTRLIMVG
Desulfovibrio vulgaris

Anacystis nidulans

folding Ab initio prediction

evolution Fold Recognition Comparative Modeling

Condrus crispus

Anabaena 7120

Protein structure modeling

Ab initio prediction

Comparative Modeling
Applicable to those sequences only that share recognizable similarity to a template structure Fairly accurate ( <3 Ang RMSD), typically comparable to a low resolution X-ray experiment. Not limited by size

Applicable to any sequence

Not very accurate (>4 Ang RMSD),

Attempted for proteins of <100 residues

Accuracy and applicability are limited by our understanding of the protein olding problem

Accuracy and applicability are rather limited by the number of known folds

What makes comparative modeling possible

A small difference in the sequence makes a small ifference in the structure

II Protein structures are clustered into fold families

Structural Genomics

Characterize most protein sequences (red) based on related nown structures (green). The number of families is much smaller than the number of proteins

Structural Genomics

Definition: The aim of structural genomics is to put every protein sequence within modeling distance of a known protein structure.

Size of the problem: There are a few thousand domain fold families. There are ~20,000 sequence families (30% sequence id).

Solution: Determine protein structures for as many different families as possible. Model the rest of the family members using comparative modeling

Comparative Protein Structure Modeling


Ca RMSD (% EQV) 2 (50) 1 (80) 0 (100) Anacystis nidulans

Flavodoxin family

Anabaena 7120

COMPARATIVE MODELING
KIGIFFSTSTGNTTEVA

Condrus crispus

Desulfovibrio vulgaris

Clostridium mp. 20 50 100

Steps in Comparative Protein Structure Modelin


START

TARGET
ASILPKRLFGNCEQTSDEGLK IERTPLVPHISAQNVCLKIDD VPERLIPERASFQWMNDK

TEMPLATE

Template Search Target Template Alignment Model Building Model Evaluation

ASILPKRLFGNCEQTSDEGLKIERTPLVPHISAQNVCLKIDDVPERLIPE MSVIPKRLYGNCEQTSEEAIRIEDSPIV---TADLVCLKIDEIPERLVGE

No

OK?

Yes
END

Steps in Comparative Protein Structure Modeling


START Template Search Target Template Alignment Model Building Model Evaluation No

Pattern recognition, heuristic searches (e.g. BLAST, FastA) Profile and iterative alignment methods (e.g. HMMs, PSI-BLAST) Structure based threading (e.g. THREADER, FUGUE, 3DPSSM)

OK? Yes END

Steps in Comparative Protein Structure Modeling


START

Template Search
Target Template Alignment

Dynamic Programming, Pairwise Alignmen Multiple Alignments, Profiles, HMMs Structure based approaches (Threading)

Model Building Model Evaluation

No

OK?

Yes
END

Steps in Comparative Protein Structure Modeling


START

Template Search
Target Template Alignment

Rigid Body Assembly (COMPOSER) Segment Matching (SEGMOD, 3DPSSM) Satisfaction of Spatial Restraints (MODELLE Integrated (NEST)

Model Building Model Evaluation

loop modeling, side chain modeling

No

OK?

Yes
END

Steps in Comparative Protein Structure Modelin


START

Template Search
Target Template Alignment

Stereochemistry (PROCHECK, WHATCHEC Environment (Profiles3D, Verify3d) Statistical potentials based methods (PROSA

Model Building Model Evaluation

No

Is the model reliable? A model is reliable when it is based on a correct template and on an approximately correct alignment.

OK?

Yes
END

Typical Errors in Comparative Models


Incorrect template Misalignment

MODEL X-RAY TEMPLATE

Region without a template

Distortion in correctly aligned regions

Side chain packing

omparing accuracies of experimental and theoretical approach

Some Models Can Be Surprisingly Accurate (in Some Regions)

24% sequence identity


YJL001W

25% sequence identity


YGL203C

1rypH

1ac5

His 488 Ser 176 Asp 383

odeling structural consequences of a point mutation (Ser-Pro) in Zebrafish forkhead transcription factor Foxi1

re-modelled wild type segments(6 and 7aa) and NMR: modelled mutated segments with each other (6 and 7aa): wild type and mutated segments (6 and 7 aa):

RMSD 1.78 and 1.82 1.19 3.65 and 3.75

Altered subunit communication in subfamilies of dUTPases


Predicting features that are not present in the template
H. sapiens

ophila m.

1. Active form usually is a trimer,

each active site is formed by all three monomers.


2. Comparison of models and X-ray

oli

Eq. inf. virus

structures reveals two subclasses of dUTPases with different type of subunit interfaces.
3. Altered character of subunit

interfaces correlates with the suggested different functional mechanism: polar/charged surface is better adjusted for allosterism.

Convergent evolution of Trichomonas vaginalis lactate dehydrogenase from malate dehydrogenase.


Designing new enzyme specificity with the aid of comparative models

1.

Sequences are identif from the Trichomonas genome project

2.

Mutations were designed using the constructed 3D mode to switch specificity.

Core histones of the amitochondriate protist, Giardia lamblia

Confirming fold by energy evaluation of comparative models

G.Lamblia H2A H2B H3 H4 X-ray

1aoiC/G -4.74 -1.15 -1.35 -2.29

1aoiD/H -3.42 -4.34 -0.61 -2.82

1aoiA/E -0.64 -0.41 -2.38 -0.26

1aoiB/F -2.77 -1.70 -0.41 -4.79

-5.41/-5.09 -3.98/-4.05 -2.74/-2.39 -5.23/-4.42

You might also like