Professional Documents
Culture Documents
Computational Methods For Protein Structure Prediction
Computational Methods For Protein Structure Prediction
Computational Methods For Protein Structure Prediction
p Methods for
Protein Structure Prediction
Ying
g Xu
2010/1/19 1
Outline
i t d ti tto protein
introduction t i structures
t t
protein threading
2010/1/19 2
Protein
Sequence, Structure and Function
Protein
>1MBN:_
_ MYOGLOBIN (154 AA) sequence
MVLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHL
KTEAEMKASEDLKKAGVTVLTALGAILKKKGHHEAELKPLAQSHATKHKI
PIKYLEFISEAIIHVLHSRHPGNFGADAQGAMNKALELFRKDIAAKYKEL
GYQG
Protein
structure
Protein
function
Oxygen storage
2010/1/19 3
Protein Structure
protein sequence folds into a “unique”
unique shape (“structure”)
( structure ) that
minimizes its free potential energy
2010/1/19 4
Protein Structures
Primary sequence
MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE
Secondary structure
-helix
anti-parallel
-sheet
parallel
ll l
2010/1/19 5
Protein Structures
Tertiary structure
Quaternary structure
2010/1/19 6
Protein Structures
Backbone versus all
all--atom structures
2010/1/19 8
Protein Structure Prediction
Problem: Given the amino acid sequence of a protein,
computationally predict its 3-dimensional shape?
MVLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHL
KTEAEMKASEDLKKAGVTVLTALGAILKKKGHHEAELKPLAQSHATKHKI
PIKYLEFISEAIIHVLHSRHPGNFGADAQGAMNKALELFRKDIAAKYKEL
GYQG
? ……..
2010/1/19 9
Secondary Structure Prediction
Rough categories: H, E, C
2010/1/19 10
Secondary Structure Prediction
CEEEEE
EEEEECCC
CCEEEEE
EEEEECCC
CCCHHHHHH
HHHHHHCCCCCC
2010/1/19 11
Secondary Structure Prediction
Secondary structure propensities:
propensities:
Calculate the propensity for a given amino acid to adopt a certain ss-
ss-type
2010/1/19 12
Secondary Structure Prediction
2010/1/19 13
Secondary Structure Prediction
P di t h
Predict helix
li ((and
d predict
di t strand)
t d)
1. Scan each window of 6 residues; if score > 4 predict helix ( if score > 3
predict strand)
Resolving Conflicts
For overlapping regions, decide according to propensity parameters
Key ideas
using sequence profiles, generated by psi-
psi-blast, rather than
individual sequence for secondary structure
combiningg multiple
p p predictors using
g a neural network
accuracy reaches ~76%
2010/1/19 15
Secondary Structure Prediction
Using
U i llarger training
i i sets, the
h prediction
di i accuracy can reach
h
~80%; So how far can we further push this
Non-
Non-locality. Secondary structure is influenced by long-
long-range
interactions
Some
S segments
t can have
h multiple
lti l structure
t t types
t
(chameleon sequences)
sequences)
homology modeling
similar sequence similar structures
practically very useful
useful, need homologues
protein threading
many proteins share the same structural fold
a folding problem becomes a fold recognition problem
2010/1/19 17
Ab initio Structure Prediction
A energy ffunction
An ti tto d
describe
ib the
th protein
t i
o bond energy
o bond angle energy
o dihedral angel energy
o van der Waals energy
o electrostatic energy
2010/1/19 18
Homology Modeling
Observation: proteins with similar sequences tend to fold into
similar structures.
Programs:
Modeller http://salilab.org/modeller/
Swiss-Model http://swissmodel expasy org//SWISS MODEL html
http://swissmodel.expasy.org//SWISS-MODEL.html
2010/1/19 19
Protein Threading
Basic premise
2010/1/19 20
Protein Threading
The goal:
goal: find the “correct” sequence-
sequence-structure alignment
between a target sequence and its native-
native-like fold in PDB
MTYKLILN …. NGVDGEWTYTE
2010/1/19 21
Protein Threading – four basic components
Structure database
Energy function
2010/1/19 22
Protein Threading – structure
t t database
d t b
2010/1/19 23
Protein Threading – structure
t t database
d t b
FSSP (http://www.bioinfo.biocenter.helsinki.fi:8080/dali/index.html)
( p )
(Families of Structurally Similar Proteins)
SCOP (http://scop.mrc-lmb.cam.ac.uk/scop/)
PDB-Select (http://www.sander.embl-heidelberg.de/pdbsel/)
Pisces (http://www.fccc.edu/research/labs/dunbrack/pisces/)
2010/1/19 24
Protein Threading – energy function
f ti
MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE
how preferable
f to put
two particular residues
nearby: E_p how well a residue fits
a structural
t t l
environment: E_s
alignment gap
penalty: E_g
total energy: E
E_p
p + E_s
E s+E
E_g
g
where
2010/1/19 26
Protein Threading – energy function
f ti
singleton
i l t energy tterm
2010/1/19 27
Protein Threading – energy function
f ti
2010/1/19 28
Protein Threading – energy function
f ti
2010/1/19 29
Protein Threading – energy function
f ti
ALA -140
ARG 268 -18
ASN 105 -85 -435
ASP 217 -616 -417 17
CYS 330 67 106 278 -1923
GLN 27 -60 -200 67 191 -115
GLU 122 -564 -136 140 122 10 68
GLY 11 -80 -103 -267 88 -72 -31 -288
HIS 58 -263 61 -454 190 272 -368 74 -448
ILE -114 110 351 318 154 243 294 179 294 -326
LEU -182 263 358 370 238 25 255 237 200 -160 -278
LYS 123 310 -201 -564 246 -184 -667 95 54 194 178 122
MET -74 304 314 211 50 32 141 13 -7 -12 -106 301 -494
PHE -65 62 201 284 34 72 235 114 158 -96 -195 -17 -272 -206
PRO 174 -33 -212 -28 105 -81 -102 -73 -65 369 218 -46 35 -21 -210
SER 169 -80 -223 -299 7 -163 -212 -186 -133 206 272 -58 193 114 -162 -177
THR 58 60 -231 -203 372 -151 -211 -73 -239 109 225 -16 158 283 -98 -215 -210
TRP 51 -150 -18 104 52 -12 157 -69 -212 -18 81 29 -5 31 -432 129 95 -20
TYR 53 -132 53 268 62 -90 269 58 34 -163 -93 -312 -173 -5 -81 104 163 -95 -6
VAL -105 171 298 431 196 180 235 202 204 -232 -218 269 -50 -42 46 267 73 101 107 -324
ALA ARG ASN ASP CYS GLN GLU GLY HIS ILE LEU LYS MET PHE PRO SER THR TRP TYR VAL
2010/1/19 30
Protein Threading – energy function
f ti
FDSK---THRGHR FDSK-T--HRGHR
:.: :: ::: :.: : : :::
FESYWTCTH GHR
FESYWTCTH-GHR FESYWTCTH GHR
FESYWTCTH-GHR
2010/1/19 31
Threading Parameter Optimization
How to determine the weight of different energy term?
Etotal = sEsingleton + pEpairwise + gEgap
2010/1/19 32
Protein Threading -- algorithm
Dynamic programming
Heuristic algorithms for pair-
pair-wise interactions
Frozen approximation algorithm (A. Godzik et al.)
al.)
Double dynamic programming (D. Jones et al.)
al.)
Monte carlo sampling
p g ((S.H. Bryant
y al.))
et al.)
2010/1/19 33
Fold Recognition
MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE
S
Score = -1500
1500 S
Score = -720
720 S
Score = -1120
1120 S
Score = -900
900
2010/1/19 34
Fold Recognition
Query sequence: AAAA
B
Better template?
l ?
2010/1/19 35
Fold Recognition
2010/1/19 36
Fold Recognition
Threading 100
100,000
000
sequences against a
template structure provides
th b
the baseline
li iinformation
f ti
about the background
scores of the template
Byy locating
g where the
threading score with a
particular query sequence,
one can decide how
significant the score, and
hence the threading result,
is!
Not significant significant
2010/1/19 37
Fold Recognition
score - average
Z-score =
standard deviation
--randomly shuffle the query sequence and calculate the alignment score
2010/1/19 38
State of the Art
2010/1/19 39