Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Algorithms in

Bioinformatics

Lecture 9
Advanced Topics

Outline

‹ RNA and Protein Secondary Structure

‹ Bio-networks

‹ Networks Analyses and Modeling


RNA and Protein Secondary Structure

‹ RNA Secondary Structure Modeling


z Context-Free Grammar (CFG)

‹ RNA Secondary Structure Prediction


z Nussinov Algorithm

Protein Structure

4
RNA and Protein Structure
‹ Primary Structure:
Sequence
‹ Secondary Structure:
Pairing
‹ Tertiary Structure: 3D
Shape
‹ Quaternary Structure Tertiary
(Protein) Structure Secondary Structure

RNA Secondary Structure


‹ RNA is typically single stranded

‹ Folding, in large part is determined by base-pairing


A -- U and C -- G are the canonical base pairs

‹ The base-paired structure is referred to as the


secondary structure of RNA

‹ Related RNAs often have homologous secondary


structure without significant sequence similarity
6
Four Types of Protein Structure

3 forms of Protein
2ndary structure:
α helix
β sheet
Turn, coil or
loop

RNA sequence modeled by grammar


‹ For instance, the sequence
ACGGAGUGCCCGU
can be modeled by the following grammar
‹ S → a W1 u
SÆaW1uÆacW2guÆacgW3cguÆacg
‹ W1 → c W2 g
gLccguÆacggagugccgu
‹ W2 → g W3 c
‹ W3 → g L c
‹ L → agugc

Actually, this is exactly the context-free grammar

8
A Context Free Grammar
S → AB Nonterminals: S, A, B
A → aAc | a Terminals: a, b, c, d
B → bBd | b

Derivation:

S → AB → aAcB → … → aaaacccB → aaaacccbBd → … →


aaaacccbbbbbbddd

Produces all strings ai+1cibj+1dj, for i, j ≥ 0

Context-free grammars for RNA


‹ Generally,
there are four types of context-free
grammars used for RNA modeling
z SÆaSa

+ base pairings in RNA


z SÆaS, SÆSa
+ unpaired bases
z SÆSS
+ branched secondary structures
z SÆS
+ used in the context of multiple alignments

10
Example: modeling a stem loop
‹ S → a W1 u
‹ W1 → c W2 g AG
ACGG
‹ W2 → g W3 c U
UGCC
‹ W3 → g L c CG
S
‹ L → agucg
W1

W2

W3

A C G G A G U G C C C G U

11

Example: modeling a stem loop


S → a W1 u | g W1 u
W1 → c W2 g
W2 → g W3 c | g W3 u
W3 → g L c | a L u
L → agucg | agccg | cugugc

AG AG CUG
ACGG GCGA GCGA
U C U
UGCC UGCU UGUU
CG CG CG

12
Context-free grammars for RNA

Similar to HMM, as we consider dependence structure for the RNA


sequence modeling, we use the stochastic context-free grammars, as
referred to appendix 13

Stochastic Context Free Grammars


In an analogy to HMMs, we can assign probabilities to
transitions:

Given grammar

X1 → s11 | … | sin

Xm → sm1 | … | smn

Can assign probability to each rule, s.t.

P(Xi → si1) + … + P(Xi → sin) = 1

14
Example
S→aSb:½
a:¼
b:¼

Probability distribution over all strings x:

x = anbn+1,
then P(x) = 2-n × ¼ = 2-(n+2)

x = an+1bn,
same

Otherwise: P(x) = 0
15

RNA Secondary Structure Prediction: The


Nussinov Algorithm
A
C
Problem: C
A
G C
Find the RNA structure C G
G C
with the maximum A U
A U
U A
(weighted) number of A AG U
A C A CC A
AG A U
G
nested pairings A
G
C U C G
G A G C
C
C
U G U G
U
UUC
G A G G
G C G A
C G G
A U
G C
A U
C A
U A U
G A

ACCACGCUUAAGACACCUAGCUUGUGUCCUGGAGGUCUAUAAGUCAGACCGCGAGAGGGAAGACUCGUAUAAGCG

16
Base-Pair Maximization
‹ Find structure with the most base pairs
‹ Efficient dynamic programming approach to this problem
introduced by Nussinov (1970s).
‹ Four ways to get the best structure between position i and j
from the best structures of the smaller subsequences
z Add i,j pair onto best structure found for subsequence

i+1, j-1
z add unpaired position i onto best structure for

subsequence i+1, j
z add unpaired position j onto best structure for

subsequence i, j-1
z combine two optimal structures i,k and k+1, j

17

Alignment scores for parses!


We can define each rule X → s, where s is a string,
to have a score.

Example:

W → a W’ u: 1
W → g W’ c: 1
W → g W’ u: 1
W → x W’ z 0, when (x, z) is not an a/u, g/c, g/u pair

18
Bio-networks

z Interaction network: Protein-protein interaction


z Regulatory network: transcription factor/microRNA
z Signal transduction network: pathway
z Metabolic network
z Others: Genetic interactions

19

Interaction networks
in molecular biology

‹ Protein-protein
interactions
‹ Protein-DNA
interactions
‹ Protein-RNA
interactions

20
蛋白质相互作用网络

21

Transcriptional Regulatory network


转录调控网络
‹ Protein-DNA interaction
‹ High-throughput

z Chip-Chip

22
23

Mapping transcription factor binding sites

Harbison C., Gordon B., et al. Nature 2004


24
ChIP-chip measurement of protein-DNA
interactions

Simon et al., Cell 2001


25

Signal transduction network


‹ B Cell
‹ Receptor

‹ Pathway

26
Metabolic network 代谢网络

27

Other bio-networks
•Genetic interaction

‹ 药物作用网络:drugÆgene

‹ 疾病网络:症状Æ疾病;表型Æ疾病

28
拓扑特征之统计分析

.
.

.
Spanning tree

若干种方法
.

cuting

若干种方法

.
介节点分析

Hub节点分析

概率网络建立

.
.

.
n3

0.9
n2
0.2
0.6 0.3

n5
1 0.6 0 -0.4 0.1
n1 0.1 0.8
0.6 1 0.9 0 0.3
-0.4 0 0.9 1 0.8 0.2
-0.4 0 0.8 1 -0.4
0.1 0.3 0.2 -0.4 1
-0.4
n4

对应一个五变量的联合高斯分布

.
.

.
.

You might also like