Lecture 14b

Structure Prediction-Comparative Model
Manu Madhavan
Lecture 14
Manu Madhavan ISC 211 Lecture 14 1 / 29

Outline
Homology Modeling
Refer: chapter 7 of Krane & Raymer [Kra02]

Issues in existing models
None of the algorithms discussed so far is accurate in predicting exact

structures (with high accuracy) for long protein sequences
Applications such as drug discovery, protein ligand identification, etc
high accurate structure details are required
Use Homology: When the tertiary structure of one or more proteins
similar in primary structure to a target protein is known, the target
protein can be modeled using comparative modeling

Issues in existing models
None of the algorithms discussed so far is accurate in predicting exact

structures (with high accuracy) for long protein sequences
Applications such as drug discovery, protein ligand identification, etc
high accurate structure details are required
Use Homology: When the tertiary structure of one or more proteins
similar in primary structure to a target protein is known, the target
protein can be modeled using comparative modeling

Comparative Model
aka Homology modeling

Predict the structure of the target protein via comparison with the
structures of related proteins.
Compare the known structure of homologous sequences

Comparative Model-steps
Step-1: Identify a set of protein structures related to the target

protein
BLAST and FASTA are used to identify related structures based on
sequence similarity
these structures serve as template for structure modeling
Step-2: Align the sequence of the target with the sequences of the
template proteins
Multiple sequence alignment- conserved regions of similarity can be
identified
CLUSTALW
Manual adjustment may be involved (Since MSA tools may generate
small percentage of error)


protein
sequence similarity
template proteins
identified
CLUSTALW


protein
sequence similarity
template proteins
identified
CLUSTALW


protein
sequence similarity
template proteins
identified
CLUSTALW


protein
sequence similarity
template proteins
identified
CLUSTALW


protein
sequence similarity
template proteins
identified
CLUSTALW


protein
sequence similarity
template proteins
identified
CLUSTALW

Step-3: Construct the model

Superimpose the template structures and find the structurally conserve
regions
The backbone of the template structure is then aligned to these
conserved structures, forming a core of the model
Step-4: Model the loops
Select the best loop from a database of known loop conformation


regions


regions


regions


regions

Step-5: Model the side chains

Identifying the positions of side chain atoms
Methods involves library search and computation based on molecular
dynamics
Step-6: Evaluate the model
PROCHECK, WHATCHECK, Verify-3D,....
Check for validation anomalies (not-allowed ϕ, ψ) angles
Usually correct the problems identified by hand (if possible)
Research in automating many of the steps is trending


dynamics


dynamics


dynamics


dynamics


dynamics


dynamics


dynamics

RNA Secondary Structure Prediction
Manu Madhavan
Lecture 14b
Manu Madhavan ISC 211 Lecture 14b 9 / 29

Various types RNA
messenger RNA (mRNA)

transfer RNA (tRNA)
Ribosomal RNA (rRNA)
small interfering RNA (siRNA)
micro RNA (miRNA)
small nuclear RNA (snRNA)
small nucleolar RNA (snoRNA)
guide RNA (gRNA)
efference RNA(eRNA)

Non-coding RNA
RNA that isn’t translated into protein

Includes: tRNA, rRNA, snRNA, snoRNA, miRNA, gRNA, eRNA,
pRNA, tmRNA
mRNA contains untranslated regions (5’UTR, 3’UTR), but UTRs are
not considered ncRNA

RNA Basics
RNA bases: A, C, G, U
Watson-Crick Pair: Two hydrogen bonds
A-U ( 2 kcal/mol)
G-C ( 3 kcal/mol)
Wobble pair
G-U ( 1 kcal/mol)
Non-Canonical pairs (modified suitably)
Bases can only pair with one other base

RNA Structure

RNA Secondary Structure

RNA Motifs

RNA Motifs

Why Predict secondary structure
Knowing the shape of a biomolecule is invaluable in drug design and

understanding disease mechanisms
Current physical methods (X-Ray, NMR) are too expensive and
time-consuming
Predict shape from sequence of bases
Four basic structures: helices, loops, bulges and junctions

RNA Motifs

RNA Secondary structure representation

Definition

Base-pair maximization

DP- Table

DP- Table

DP- Table

DP- Table

DP- Table

Traceback
To determine the structure of the folded RNA by traceback, we first

create an empty list of pairs P. We initialize with i = 0, j = n Then,
we follow one of three scenarios.
1 If j ≤ i the procedure stops.
2 If M(i, j) = M(i, j − 1) then set i = i, j = j − 1 and continue.
3 Otherwise, ∀k : i ≤ k < j if Sk and Sj are complementary and
M(i, j) = M(i, k − 1) + M(k + 1, j − 1) + 1 append (k, j) P, then
traceback both with i = i, j = k − 1 and i = k + 1, j = j − 1
When the traceback finishes, P P contains all of the paired bases.

Limitation
Base pair maximization will not necessarily lead to the most stable
structure.
It may create structure with many interior loops or hairpins which are
energetically unfavorable.
Results comparable to aligning sequences with scattered matches—
not biologically reasonable.

References I
Dan E Krane, Fundamental concepts of bioinformatics, Pearson Education India,

2002.

Lecture 14b

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 14b

Uploaded by

Copyright:

Available Formats

Structure Prediction-Comparative Model

Manu Madhavan ISC 211 Lecture 14 1 / 29

Manu Madhavan ISC 211 Lecture 14 2 / 29

None of the algorithms discussed so far is accurate in predicting exact

Manu Madhavan ISC 211 Lecture 14 3 / 29

None of the algorithms discussed so far is accurate in predicting exact

Manu Madhavan ISC 211 Lecture 14 4 / 29

aka Homology modeling

Manu Madhavan ISC 211 Lecture 14 5 / 29

Step-1: Identify a set of protein structures related to the target

Manu Madhavan ISC 211 Lecture 14 6 / 29

Step-1: Identify a set of protein structures related to the target

Manu Madhavan ISC 211 Lecture 14 6 / 29

Step-1: Identify a set of protein structures related to the target

Manu Madhavan ISC 211 Lecture 14 6 / 29

Step-1: Identify a set of protein structures related to the target

Manu Madhavan ISC 211 Lecture 14 6 / 29

Step-1: Identify a set of protein structures related to the target

Manu Madhavan ISC 211 Lecture 14 6 / 29

Step-1: Identify a set of protein structures related to the target

Manu Madhavan ISC 211 Lecture 14 6 / 29

Step-1: Identify a set of protein structures related to the target

Manu Madhavan ISC 211 Lecture 14 6 / 29

Step-3: Construct the model

Manu Madhavan ISC 211 Lecture 14 7 / 29

Step-3: Construct the model

Manu Madhavan ISC 211 Lecture 14 7 / 29

Step-3: Construct the model

Manu Madhavan ISC 211 Lecture 14 7 / 29

Step-3: Construct the model

Manu Madhavan ISC 211 Lecture 14 7 / 29

Step-3: Construct the model

Manu Madhavan ISC 211 Lecture 14 7 / 29

Step-5: Model the side chains

Manu Madhavan ISC 211 Lecture 14 8 / 29

Step-5: Model the side chains

Manu Madhavan ISC 211 Lecture 14 8 / 29

Step-5: Model the side chains

Manu Madhavan ISC 211 Lecture 14 8 / 29

Step-5: Model the side chains

Manu Madhavan ISC 211 Lecture 14 8 / 29

Step-5: Model the side chains

Manu Madhavan ISC 211 Lecture 14 8 / 29

Step-5: Model the side chains

Manu Madhavan ISC 211 Lecture 14 8 / 29

Step-5: Model the side chains

Manu Madhavan ISC 211 Lecture 14 8 / 29

Step-5: Model the side chains

Manu Madhavan ISC 211 Lecture 14 8 / 29

Manu Madhavan ISC 211 Lecture 14b 9 / 29

messenger RNA (mRNA)

Manu Madhavan ISC 211 Lecture 14b 10 / 29

RNA that isn’t translated into protein

Manu Madhavan ISC 211 Lecture 14b 11 / 29

Manu Madhavan ISC 211 Lecture 14b 12 / 29

Manu Madhavan ISC 211 Lecture 14b 13 / 29

Manu Madhavan ISC 211 Lecture 14b 14 / 29

Manu Madhavan ISC 211 Lecture 14b 15 / 29

Manu Madhavan ISC 211 Lecture 14b 16 / 29

Knowing the shape of a biomolecule is invaluable in drug design and

Manu Madhavan ISC 211 Lecture 14b 17 / 29

Manu Madhavan ISC 211 Lecture 14b 18 / 29