Download as pdf or txt
Download as pdf or txt
You are on page 1of 46

Structure Prediction-Comparative Model

Manu Madhavan

Lecture 14

Manu Madhavan ISC 211 Lecture 14 1 / 29


Outline

Homology Modeling
Refer: chapter 7 of Krane & Raymer [Kra02]

Manu Madhavan ISC 211 Lecture 14 2 / 29


Issues in existing models

None of the algorithms discussed so far is accurate in predicting exact


structures (with high accuracy) for long protein sequences
Applications such as drug discovery, protein ligand identification, etc
high accurate structure details are required
Use Homology: When the tertiary structure of one or more proteins
similar in primary structure to a target protein is known, the target
protein can be modeled using comparative modeling

Manu Madhavan ISC 211 Lecture 14 3 / 29


Issues in existing models

None of the algorithms discussed so far is accurate in predicting exact


structures (with high accuracy) for long protein sequences
Applications such as drug discovery, protein ligand identification, etc
high accurate structure details are required
Use Homology: When the tertiary structure of one or more proteins
similar in primary structure to a target protein is known, the target
protein can be modeled using comparative modeling

Manu Madhavan ISC 211 Lecture 14 4 / 29


Comparative Model

aka Homology modeling


Predict the structure of the target protein via comparison with the
structures of related proteins.
Compare the known structure of homologous sequences

Manu Madhavan ISC 211 Lecture 14 5 / 29


Comparative Model-steps

Step-1: Identify a set of protein structures related to the target


protein
BLAST and FASTA are used to identify related structures based on
sequence similarity
these structures serve as template for structure modeling
Step-2: Align the sequence of the target with the sequences of the
template proteins
Multiple sequence alignment- conserved regions of similarity can be
identified
CLUSTALW
Manual adjustment may be involved (Since MSA tools may generate
small percentage of error)

Manu Madhavan ISC 211 Lecture 14 6 / 29


Comparative Model-steps

Step-1: Identify a set of protein structures related to the target


protein
BLAST and FASTA are used to identify related structures based on
sequence similarity
these structures serve as template for structure modeling
Step-2: Align the sequence of the target with the sequences of the
template proteins
Multiple sequence alignment- conserved regions of similarity can be
identified
CLUSTALW
Manual adjustment may be involved (Since MSA tools may generate
small percentage of error)

Manu Madhavan ISC 211 Lecture 14 6 / 29


Comparative Model-steps

Step-1: Identify a set of protein structures related to the target


protein
BLAST and FASTA are used to identify related structures based on
sequence similarity
these structures serve as template for structure modeling
Step-2: Align the sequence of the target with the sequences of the
template proteins
Multiple sequence alignment- conserved regions of similarity can be
identified
CLUSTALW
Manual adjustment may be involved (Since MSA tools may generate
small percentage of error)

Manu Madhavan ISC 211 Lecture 14 6 / 29


Comparative Model-steps

Step-1: Identify a set of protein structures related to the target


protein
BLAST and FASTA are used to identify related structures based on
sequence similarity
these structures serve as template for structure modeling
Step-2: Align the sequence of the target with the sequences of the
template proteins
Multiple sequence alignment- conserved regions of similarity can be
identified
CLUSTALW
Manual adjustment may be involved (Since MSA tools may generate
small percentage of error)

Manu Madhavan ISC 211 Lecture 14 6 / 29


Comparative Model-steps

Step-1: Identify a set of protein structures related to the target


protein
BLAST and FASTA are used to identify related structures based on
sequence similarity
these structures serve as template for structure modeling
Step-2: Align the sequence of the target with the sequences of the
template proteins
Multiple sequence alignment- conserved regions of similarity can be
identified
CLUSTALW
Manual adjustment may be involved (Since MSA tools may generate
small percentage of error)

Manu Madhavan ISC 211 Lecture 14 6 / 29


Comparative Model-steps

Step-1: Identify a set of protein structures related to the target


protein
BLAST and FASTA are used to identify related structures based on
sequence similarity
these structures serve as template for structure modeling
Step-2: Align the sequence of the target with the sequences of the
template proteins
Multiple sequence alignment- conserved regions of similarity can be
identified
CLUSTALW
Manual adjustment may be involved (Since MSA tools may generate
small percentage of error)

Manu Madhavan ISC 211 Lecture 14 6 / 29


Comparative Model-steps

Step-1: Identify a set of protein structures related to the target


protein
BLAST and FASTA are used to identify related structures based on
sequence similarity
these structures serve as template for structure modeling
Step-2: Align the sequence of the target with the sequences of the
template proteins
Multiple sequence alignment- conserved regions of similarity can be
identified
CLUSTALW
Manual adjustment may be involved (Since MSA tools may generate
small percentage of error)

Manu Madhavan ISC 211 Lecture 14 6 / 29


Comparative Model-steps

Step-3: Construct the model


Superimpose the template structures and find the structurally conserve
regions
The backbone of the template structure is then aligned to these
conserved structures, forming a core of the model
Step-4: Model the loops
Select the best loop from a database of known loop conformation

Manu Madhavan ISC 211 Lecture 14 7 / 29


Comparative Model-steps

Step-3: Construct the model


Superimpose the template structures and find the structurally conserve
regions
The backbone of the template structure is then aligned to these
conserved structures, forming a core of the model
Step-4: Model the loops
Select the best loop from a database of known loop conformation

Manu Madhavan ISC 211 Lecture 14 7 / 29


Comparative Model-steps

Step-3: Construct the model


Superimpose the template structures and find the structurally conserve
regions
The backbone of the template structure is then aligned to these
conserved structures, forming a core of the model
Step-4: Model the loops
Select the best loop from a database of known loop conformation

Manu Madhavan ISC 211 Lecture 14 7 / 29


Comparative Model-steps

Step-3: Construct the model


Superimpose the template structures and find the structurally conserve
regions
The backbone of the template structure is then aligned to these
conserved structures, forming a core of the model
Step-4: Model the loops
Select the best loop from a database of known loop conformation

Manu Madhavan ISC 211 Lecture 14 7 / 29


Comparative Model-steps

Step-3: Construct the model


Superimpose the template structures and find the structurally conserve
regions
The backbone of the template structure is then aligned to these
conserved structures, forming a core of the model
Step-4: Model the loops
Select the best loop from a database of known loop conformation

Manu Madhavan ISC 211 Lecture 14 7 / 29


Comparative Model-steps

Step-5: Model the side chains


Identifying the positions of side chain atoms
Methods involves library search and computation based on molecular
dynamics
Step-6: Evaluate the model
PROCHECK, WHATCHECK, Verify-3D,....
Check for validation anomalies (not-allowed ϕ, ψ) angles
Usually correct the problems identified by hand (if possible)
Research in automating many of the steps is trending

Manu Madhavan ISC 211 Lecture 14 8 / 29


Comparative Model-steps

Step-5: Model the side chains


Identifying the positions of side chain atoms
Methods involves library search and computation based on molecular
dynamics
Step-6: Evaluate the model
PROCHECK, WHATCHECK, Verify-3D,....
Check for validation anomalies (not-allowed ϕ, ψ) angles
Usually correct the problems identified by hand (if possible)
Research in automating many of the steps is trending

Manu Madhavan ISC 211 Lecture 14 8 / 29


Comparative Model-steps

Step-5: Model the side chains


Identifying the positions of side chain atoms
Methods involves library search and computation based on molecular
dynamics
Step-6: Evaluate the model
PROCHECK, WHATCHECK, Verify-3D,....
Check for validation anomalies (not-allowed ϕ, ψ) angles
Usually correct the problems identified by hand (if possible)
Research in automating many of the steps is trending

Manu Madhavan ISC 211 Lecture 14 8 / 29


Comparative Model-steps

Step-5: Model the side chains


Identifying the positions of side chain atoms
Methods involves library search and computation based on molecular
dynamics
Step-6: Evaluate the model
PROCHECK, WHATCHECK, Verify-3D,....
Check for validation anomalies (not-allowed ϕ, ψ) angles
Usually correct the problems identified by hand (if possible)
Research in automating many of the steps is trending

Manu Madhavan ISC 211 Lecture 14 8 / 29


Comparative Model-steps

Step-5: Model the side chains


Identifying the positions of side chain atoms
Methods involves library search and computation based on molecular
dynamics
Step-6: Evaluate the model
PROCHECK, WHATCHECK, Verify-3D,....
Check for validation anomalies (not-allowed ϕ, ψ) angles
Usually correct the problems identified by hand (if possible)
Research in automating many of the steps is trending

Manu Madhavan ISC 211 Lecture 14 8 / 29


Comparative Model-steps

Step-5: Model the side chains


Identifying the positions of side chain atoms
Methods involves library search and computation based on molecular
dynamics
Step-6: Evaluate the model
PROCHECK, WHATCHECK, Verify-3D,....
Check for validation anomalies (not-allowed ϕ, ψ) angles
Usually correct the problems identified by hand (if possible)
Research in automating many of the steps is trending

Manu Madhavan ISC 211 Lecture 14 8 / 29


Comparative Model-steps

Step-5: Model the side chains


Identifying the positions of side chain atoms
Methods involves library search and computation based on molecular
dynamics
Step-6: Evaluate the model
PROCHECK, WHATCHECK, Verify-3D,....
Check for validation anomalies (not-allowed ϕ, ψ) angles
Usually correct the problems identified by hand (if possible)
Research in automating many of the steps is trending

Manu Madhavan ISC 211 Lecture 14 8 / 29


Comparative Model-steps

Step-5: Model the side chains


Identifying the positions of side chain atoms
Methods involves library search and computation based on molecular
dynamics
Step-6: Evaluate the model
PROCHECK, WHATCHECK, Verify-3D,....
Check for validation anomalies (not-allowed ϕ, ψ) angles
Usually correct the problems identified by hand (if possible)
Research in automating many of the steps is trending

Manu Madhavan ISC 211 Lecture 14 8 / 29


RNA Secondary Structure Prediction

Manu Madhavan

Lecture 14b

Manu Madhavan ISC 211 Lecture 14b 9 / 29


Various types RNA

messenger RNA (mRNA)


transfer RNA (tRNA)
Ribosomal RNA (rRNA)
small interfering RNA (siRNA)
micro RNA (miRNA)
small nuclear RNA (snRNA)
small nucleolar RNA (snoRNA)
guide RNA (gRNA)
efference RNA(eRNA)

Manu Madhavan ISC 211 Lecture 14b 10 / 29


Non-coding RNA

RNA that isn’t translated into protein


Includes: tRNA, rRNA, snRNA, snoRNA, miRNA, gRNA, eRNA,
pRNA, tmRNA
mRNA contains untranslated regions (5’UTR, 3’UTR), but UTRs are
not considered ncRNA

Manu Madhavan ISC 211 Lecture 14b 11 / 29


RNA Basics

RNA bases: A, C, G, U
Watson-Crick Pair: Two hydrogen bonds
A-U ( 2 kcal/mol)
G-C ( 3 kcal/mol)
Wobble pair
G-U ( 1 kcal/mol)
Non-Canonical pairs (modified suitably)
Bases can only pair with one other base

Manu Madhavan ISC 211 Lecture 14b 12 / 29


RNA Structure

Manu Madhavan ISC 211 Lecture 14b 13 / 29


RNA Secondary Structure

Manu Madhavan ISC 211 Lecture 14b 14 / 29


RNA Motifs

Manu Madhavan ISC 211 Lecture 14b 15 / 29


RNA Motifs

Manu Madhavan ISC 211 Lecture 14b 16 / 29


Why Predict secondary structure

Knowing the shape of a biomolecule is invaluable in drug design and


understanding disease mechanisms
Current physical methods (X-Ray, NMR) are too expensive and
time-consuming
Predict shape from sequence of bases
Four basic structures: helices, loops, bulges and junctions

Manu Madhavan ISC 211 Lecture 14b 17 / 29


RNA Motifs

Manu Madhavan ISC 211 Lecture 14b 18 / 29


RNA Secondary structure representation

Manu Madhavan ISC 211 Lecture 14b 19 / 29


Definition

Manu Madhavan ISC 211 Lecture 14b 20 / 29


Base-pair maximization

Manu Madhavan ISC 211 Lecture 14b 21 / 29


DP- Table

Manu Madhavan ISC 211 Lecture 14b 22 / 29


DP- Table

Manu Madhavan ISC 211 Lecture 14b 23 / 29


DP- Table

Manu Madhavan ISC 211 Lecture 14b 24 / 29


DP- Table

Manu Madhavan ISC 211 Lecture 14b 25 / 29


DP- Table

Manu Madhavan ISC 211 Lecture 14b 26 / 29


Traceback

To determine the structure of the folded RNA by traceback, we first


create an empty list of pairs P. We initialize with i = 0, j = n Then,
we follow one of three scenarios.
1 If j ≤ i the procedure stops.
2 If M(i, j) = M(i, j − 1) then set i = i, j = j − 1 and continue.
3 Otherwise, ∀k : i ≤ k < j if Sk and Sj are complementary and
M(i, j) = M(i, k − 1) + M(k + 1, j − 1) + 1 append (k, j) P, then
traceback both with i = i, j = k − 1 and i = k + 1, j = j − 1
When the traceback finishes, P P contains all of the paired bases.

Manu Madhavan ISC 211 Lecture 14b 27 / 29


Limitation

Base pair maximization will not necessarily lead to the most stable
structure.
It may create structure with many interior loops or hairpins which are
energetically unfavorable.
Results comparable to aligning sequences with scattered matches—
not biologically reasonable.

Manu Madhavan ISC 211 Lecture 14b 28 / 29


References I

Dan E Krane, Fundamental concepts of bioinformatics, Pearson Education India,


2002.

Manu Madhavan ISC 211 Lecture 14b 29 / 29

You might also like