Professional Documents
Culture Documents
CSCI1950ZSpring09 Lecture9
CSCI1950ZSpring09 Lecture9
CSCI1950‐Z
Computa3onal Methods for Biology
Lecture 9
Ben Raphael
February 23, 2009
hHp://cs.brown.edu/courses/csci1950‐z/
Outline
Searching Through trees
1. Branch‐swapping: NNI, SPR, TBR.
2. MCMC
Consensus Trees and Supertrees
1
2/25/09
Heuris3c Search
1. Start with an arbitrary tree T.
2. Check “neighbors” of T.
3. Move to a neighbor if it provides the best
improvement in parsimony/likelihood score.
Caveats:
Could be stuck in local
op3mum, and not
achieve global
op3mum
Trees and Splits
Given a set X, a split is a par33on of X into two non‐
empty subsets A and B such that X = A | B.
For a phylogene3c tree T with leaves L, each edge e
defines a split Le = A | B, where A and B are the
leaves in the subtrees obtained by removing e.
e
A B
2
2/25/09
Compu3ng the Splits Metric
A phylogene3c tree T defines a collec3on of
splits Σ(T) = { Le | e is edge in T}.
Theorem:
ρ(T1, T2) = | Σ(T1) \ Σ(T2) | + |Σ(T2) \ Σ(T1) |
= |Σ(T1)| + |Σ(T2)| ‐ 2 |Σ(T1)∩Σ(T2)|
Proof: (whiteboard)
Nota3on: A \ B = {x: x ∈ A, x ∉ B}
Nearest Neighbor Interchange
Rearrange four subtrees
defined by one
internal edge
Claim: The number of NNI neighbors of a binary tree is 2(n‐3)
Proof: (whiteboard)
3
2/25/09
Subtree Pruning and Regrafing
(SPR)
1. Remove a branch.
2. Reconnect incident vertex by
subdividing a branch
Subtree Pruning and Regrafing
(SPR)
1. Remove a branch.
2. Reconnect incident vertex by
subdividing a branch
Claim: The number of SPR neighbors of a binary tree is
2(n‐3) (2n – 7)
Proof: (whiteboard)
4
2/25/09
Tree Bisec3on and Reconnec3on
(TBR)
1. Remove a branch.
2. Reconnect subtrees by adding
new branch that subdivides
branches in both.
Rela3onship between Opera3ons
• Every NNI is an SPR and every SPR is a TBR.
• Every TBR is a single SPR or a composi3on of
two SPR.
• All three types of opera3ons are inver3ble:
If T T’, then T’ T.
α α‐1
Theorem: For all T and T’ in B(n), there is a sequence
of NNI (or SPR or TBR) opera3ons that transform T
into T’.
5
2/25/09
Rela3onship between Opera3ons
NNI SPR TBR
• Every NNI is an SPR and every SPR is a TBR.
• Every TBR is a single SPR or a composi3on of two SPR.
• All three types of opera3ons are inver3ble:
If T T’, then T’ T.
Heuris3c Search
1. Start with an arbitrary tree T.
2. Check “neighbors” of T.
3. Move to a neighbor if it provides the best
improvement in parsimony/likelihood score.
PAUP* (widely used
phylogene3c package)
includes command:
hsearch nreps=num
swap=type
Where type = NNI, SPR, TBR
6
2/25/09
From Likelihood to Bayesian
Given data X = (x1, …, xn), we found the tree T and
branch lengths t* that maximized likelihood
Pr[X | T, t*].
What about other trees?
Could we compute Pr[T, t* | X]?
Back to Coin Flipping
Flip coin with
p = Pr[heads] unknown.
Earlier we computed max.
likelihood es3mate of p.
L(p) = Pr[ D | p].
Pr[p | D] = Pr[ p, D]/Pr[D]
= Pr[D|p]Pr[p] / Pr[D]
11 tosses 44 tosses
Posterior 5 heads 20 heads
Prior
7
2/25/09
Bayesian Methods
Pr[T, t* | X] = Pr[X, T, t*] / Pr[X]
= Pr[X | T, t*] Pr[T, t*] / Pr[X]
= Pr[X | T, t*] Pr[T, t*] / (ΣT’, t’Pr[X | T’, t’] Pr[T’, t’]
Posterior Bayes Theorem
Prior
Problem: Cannot compute denominator.
Bayesian Methods
Pr[T, t* | X] = Pr[X, T, t*] / Pr[X]
= Pr[X | T, t*] Pr[T, t*] / Pr[X]
= Pr[X | T, t*] Pr[T, t*] / (ΣT’, t’Pr[X | T’, t’] Pr[T’, t’]
Posterior Bayes Theorem
Prior
Problem: Cannot compute denominator.
Solu2on: Use power of Markov Chains to
draw trees (“sample”) according to
distribu3on Pr[T, t* | X]
8
2/25/09
Markov Chain Monte Carlo
To sample from a distribu3on
Define a Markov chain with equilibrium distribu3on π.
Simulate chain through many transi3ons.
Afer many transi3ons (e.g. ~10000), will be at equilibrium
π. (“Burn‐in”)
Output every n‐th state. (n ~ 50).
Jukes‐Cantor model of DNA
A C
Equilibrium distribu3on:
qA = qC = qG = qT = 1/4
T G
MCMC on Trees
1. Define a Markov chain:
• States are trees T.
• Equilibrium distribu3on is posterior Pr[T,
t* | X].
2. Simulate Markov chain for many steps (burn‐
in).
3. Output T from every n‐th (e.g. n = 50) step.
NNI neighborhood for trees with
5 leaves
9
2/25/09
MCMC on Trees
1. Define a Markov chain:
• States are trees T.
• Equilibrium distribu3on is posterior Pr[T,
t* | X].
2. Simulate Markov chain for many steps (burn‐
in).
3. Output T from every n‐th (e.g. n = 50) step.
For transi3ons, can use NNI, SPR, TBR, or other
opera3ons.
Can define* the transi3on probabili3es of this
Markov chain without compu3ng Z = (ΣT’, t’Pr[X |
T’, t’] Pr[T’, t’] (Metropolis algorithm). NNI neighborhood for trees with
5 leaves
*“involves burning of incense, cas3ng of chicken bones, use of magical incanta3ons, and invoking the
opinions of more pres3gious colleagues.” ‐‐Felsenstein
How Many Times Did Wings Evolve?
• Previous studies had shown loss of wings:
winged wingless transi3ons
• Gain of wings (Wingless winged transi3on)
appears to be much more complicated
10
2/25/09
Phylogeny of Insects
(Nature 2003)
Build phylogeny of winged and
wingless s3ck insects
Used data from:
18S ribosomal DNA (~1,900 base
pairs (bp))
28S rDNA (2,250 bp)
Por3on of histone 3 (H3, 372 bp)
Used mul3ple tree reconstruc3on
techniques
Most Parsimonious Evolu3onary Tree of
Winged and Wingless Insects
• All most parsimonious
reconstruc3on gave a
wingless ancestor
• All required mul3ple
winged wingless
transi3ons.
11
2/25/09
Most Parsimonious Evolu3onary Tree of
Winged and Wingless Insects
Will Wingless Insects Fly Again?
• All most parsimonious reconstruc3ons all
required the re‐inven3on of wings.
• It is likely that wing developmental pathways
are conserved in wingless s3ck insects
12
2/25/09
Next Ques3ons
• How to combine/merge trees?
• How to determine “confidence” in a par3cular
tree/branch?
Mul3ple Trees?
13
2/25/09
Consensus Trees
Strict Consensus Tree
14
2/25/09
Strict Consensus
No non‐trivial splits in common!
Strict consensus tree is unresolved.
Splits Equivalence Theorem
A phylogene3c tree T defines a collec3on of splits Σ(T) =
{ Le | e is edge in T}.
Splits A1 | B1 and A2 | B2 are pairwise compa.ble if at
least one of A1∩A2 , A1∩B2 , B1∩A2, and B1∩B2 is the
empty set.
Splits Equivalence Theorem: Let Σ be a collec3on of
splits. There is a phylogene3c tree such that Σ(T) = Σ if
and only if the splits in Σ are pairwise compa3ble.
The Pairwise Compa3bility Theorem (for binary
characters) follows from this theorem.
15
2/25/09
Majority Consensus Tree
Majority Consensus Tree
16