Professional Documents
Culture Documents
Biolegical Sequence Analysis
Biolegical Sequence Analysis
Biolegical Sequence Analysis
Catherine Matias
I
I
I
I
Part
Part
Part
Part
Part I
Introduction to sequence analysis
Biological sequences
What kind of sequences?
I
Examples of repositories
I
Recap on probability
P(A B)
P(B)
Books references
R. Durbin, S. Eddy, A. Krogh, and G. Mitchison.
Biological sequence analysis: probabilistic models of proteins
and nucleic acids.
Cambridge University Press, Cambridge, UK, 1998.
Z. Yang.
Computational Molecular Evolution.
Oxford Series in Ecology and Evolution. Oxford University
Press, 2006.
J. Felsenstein.
Inferring phylogenies.
Sinauer Associates, 2004.
G. Deleage and M. Gouy
Bioinformatique - Cours et cas pratique
Dunod, 2013.
Part II
Motifs detection
Motifs detection
...
Outline Part 2
Markov chains (order 1)
Higher order Markov chains
Motifs detection with Markov chains
Hidden Markov models (HMMs)
Parameter estimation in HMM
Sequence segmentation with HMM
Motifs detection with HMMs
Modeling a sequence
A biological sequence may be viewed as a sequence of random
variables X1 , . . . , Xn (also denoted X1:n ) with values in a finite
alphabet A.
I
I
N (ab)
N (a) N (b)
6= fa fb =
n1
n
n
Mathematical formulation
Let {Xn }n1 be a sequence of random variables with values in
finite or countable space A, s.t. i 1, x1:i+1 Ai+1 ,
P(Xi+1 = xi+1 |X1:i = x1:i ) = P(Xi+1 = xi+1 |Xi = xi ) := p(xi , xi+1 )
p is the transition of the chain. When A is finite, this is a
stochastic matrix: it hasPnon-negative entries and its rows sum
to one p(x, x0 ) 0 and x0 A p(x, x0 ) = 1 for all x A.
Example I
Example of a transition matrix on state space A = {A, C, G, T }.
p=
0.25 0.25 0.25 0.25 .
0.05 0.25 0.4 0.3
In particular,
I
When X
k = A then
A
with prob. 0.7
Xk+1 =
.
C, G or T with prob. 0.1
When Xk = G, then Xk+1 is drawn uniformly on A.
Example II
Automaton description
G
0.1
0.4
0.25
0.25
0.05
0.25
0.1
0.2
0.25
0.1
0.1
C
0.3
Example III
Remarks
I
n
Y
p(xi1 , xi ).
(1)
i=2
Proof.
P(X1:n = x1:n )
= P(Xn = xn |X1:n1 = x1:n1 )P(X1:n1 = x1:n1 )
= P(Xn = xn |Xn1 = xn1 )P(X1:n1 = x1:n1 )
= p(xn1 , xn )P(X1:n1 = x1:n1 )
= ...
(induction)
(Markov property)
x,yA
(2)
where N (xy) is the number of occurrences of dinucleotide xy in
the sequence.
Proof.
According to (1)
log P(X1:n = x1:n ) = log (x1 ) +
n
X
log p(xi1 , xi )
i=2
X
xA
X
x,yA
Probability of state Xn
Let A = {1, . . . , Q}, = ((1), . . . , (Q)) viewed as row vector
and p = (p(i, j))1i,jQ the transition matrix. Then
P(Xn = x) = (pn )(x),
x A,
Proof.
By induction, let n be the row vector containing the probabilities
P(Xn = x). Then
X
n (x) = P(Xn = x) =
P(Xn1 = y, Xn = x)
yA
yA
X
yA
Theorem
For finite state space A, whenever there exist some m 1 such
that x, y A, pm (x, y) > 0, then a stationary distr. ? exists
and is unique. Moreover, we have the convergence,
x, y A,
pn (x, y) ? (y).
n+
Parameter estimation I
Consider a sequence of observations X1:n following a Markov
chain. We want to fit a transition matrix on this sequence.
(x) = N (x)/n.
Consequence: the dinucleotides counts in the observed sequence
give estimators for the transition probabilities.
Parameter estimation II
Proof.
P
According to (2), we want to maximise x,yA N (xy) log p(x, y)
with
P respect to {p(x, y), x, y A} under the constraint
y) = 1. Introducing Lagrange multipliers x for each
yA p(x, P
constraint yA p(x, y) 1 = 0, we want
sup
{x ,p(x,y)}x,yA x,yA
X
xA
X
yA
p(x, y) 1 .
Example
X1:20 = CCCACGACGTATATTTCGAC
0 3/5 0 2/5
1/6 2/6 3/6 0
.
p =
2/3 0
0 1/3
2/5 1/5 0 2/5
p(A, C) =
N (AC)
N (A)
3
5
Outline Part 2
Markov chains (order 1)
Higher order Markov chains
Motifs detection with Markov chains
Hidden Markov models (HMMs)
Parameter estimation in HMM
Sequence segmentation with HMM
Motifs detection with HMMs
Distribution
I
0.7
0.2
0.25
0.05
0.75
0.4
0.2
0.05
p=
0.7
0.2
0.25
0.05
0.9
0.2
0.15
0.1
0.4
0.25
0.25
0.05
0.1
0.1
0
0.1
0.4
0.25
0.25
0.01
0.65
0.2
0.25
0.1
0.3
0.25
0.4
0.1
0.4
0.6
0
0.1
0.3
0.25
0.4
0.01
0.3
0.55
0.45
0.1
0.1
0.25
0.3
0.1
0.1
0.1
0.95
0.1
0.1
0.25
0.3
0.08
0.05
0.05
In particular,
I
p(7, 3) = P(Xk+1 =
G|Xk1 = C, Xk = G).
0.15
Remarks
I
N (x1:r y)
,
N (x1:r )
r1
Nr
log n,
2
X
x1:r Ar ,yA
N (x1:r y) log
N (x1:r y)
N (x1:r )
Theorem ([CS00])
Let X1:n be a sequence following a r? -order Markov chain,
where r? is (minimal and) unknown. Then,
r (X1:n ) Nr log n,
rn = sup BIC(r) = sup log P
2
r1
r1
is a consistent estimate of r, namely limn+ rn = r? almost
surely.
Markov smoothing I
Zero counts
I
Markov smoothing II
Markov smoothing
Different strategies have been developed
I
p(x1:r , y) = P
VLMC principle
I
FIG. 2. Triplet tree representation of the estimated minimal state space for exon sequence. The
triplets are denoted in reverse order, for example, the terminal node with concatenation ggt.gtt.
describes the context x 0 ! g, x"1 ! g, x"2 ! t, x"3 ! g, x"4 ! t, x"5 ! t for the variable x 1.
Outline Part 2
Markov chains (order 1)
Higher order Markov chains
Motifs detection with Markov chains
Hidden Markov models (HMMs)
Parameter estimation in HMM
Sequence segmentation with HMM
Motifs detection with HMMs
I
I
Context
I
As a consequence,
I
Outline Part 2
Markov chains (order 1)
Higher order Markov chains
Motifs detection with Markov chains
Hidden Markov models (HMMs)
Parameter estimation in HMM
Sequence segmentation with HMM
Motifs detection with HMMs
Heterogeneity in sequences
I
Advantages
I
0.15
0.00
0.05
0.10
Density
0.20
0.25
n
Y
k=1
PQ
q=1 q fq .
Graphical representation
Sk1
Xk1
Sk+1
Sk
fSk
Xk+1
Xk
Sk1
Sk
Sk+1
Xk+1
fSk
Xk1
Xk
n
Y
k=1
Mixtures vs HMMs
Similarities/Differences
I
Methods
I
Outline Part 2
Markov chains (order 1)
Higher order Markov chains
Motifs detection with Markov chains
Hidden Markov models (HMMs)
Parameter estimation in HMM
Sequence segmentation with HMM
Motifs detection with HMMs
HMM likelihood
Likelihood of the observations
Model parameter = (, p, {fq }1qQ ).
P
`n () := log P (X1:n ) = log
P
(X
,
S
=
s
)
1:n
1:n
1:n .
s1 ,...,sn
I
Principle
I
I
Expectation-step compute
Q(, k ) := Ek (log P (S1:n , X1:n )|X1:n ).
Maximization-step compute k+1 := Argmax Q(, k ).
Heuristics
I
EM algo. in practice
In practice
I
Q
X
1S1 =q log q
q=1
n
X
Q
n X
X
i=1 q=1
i=2 1q,lQ
Q
X
q=1
n
X
i=2 1q,lQ
Q
n X
X
i=1 q=1
Algorithm
I
Initialisation n () := 1,
E-step quantities
P(Sk = q|X1:n ) k (q)k (q)
and P(Sk1 = q, Sk = l|X1:n ) k1 (q)ql fl (Xk )k (l).
Factorized distributions
Let V = {Vi }1iN be a set of random variables and G = (V, E)
a DAG. Distribution
Q P on V factorizes according to G if
P(V) = P(V1:N ) = N
i=1 P(Vi |pa(Vi , G)), where pa(Vi , G) is the
set of parents of Vi in G.
ex: HMM
S1
Sk1
Sk
Sk+1
X1
Xk1
Xk
Xk+1
Qn
Qn
Sk1
Sk
Sk+1
X1
Xk1
Xk
Xk+1
Independence properties
Let I, J, K subsets of {1, . . . N },
I
Forward equations
k (l)
P (Sk = l, X1:k ) =
Q
X
P (Sk1 = q, Sk = l, X1:k )
q=1
Q
X
q=1
Q
X
q=1
Forward equations
k (l)
P (Sk = l, X1:k ) =
Q
X
P (Sk1 = q, Sk = l, X1:k )
q=1
Q
X
q=1
Q
X
q=1
DAG
Sk1
Sk
Sk+1
Xk1
Xk
Xk+1
Forward equations
k (l)
P (Sk = l, X1:k ) =
Q
X
P (Sk1 = q, Sk = l, X1:k )
q=1
Q
X
q=1
Q
X
q=1
DAG
Sk1
Sk
Sk+1
Xk1
Xk
Xk+1
n
X
i=2
n
X
Pk (Si1 = q, Si = l|X1:n )
Pk (Si = q|X1:n )1Xi =x
i=1
k+1
1X
Pk (Si = q|X1:n ).
(q) =
n
i=1
Q
X
l=1
I
Outline Part 2
Markov chains (order 1)
Higher order Markov chains
Motifs detection with Markov chains
Hidden Markov models (HMMs)
Parameter estimation in HMM
Sequence segmentation with HMM
Motifs detection with HMMs
Sequence segmentation I
We now want to reconstruct the sequence of regimes {Sk }.
Viterbi algorithm
I
(3)
Sequence segmentation II
Alternative solution
At the end of EM algorithm, one has access to
k = q|X1:n )
P(S
k (q)k (q). Thus, one may consider
k = q|X1:n )
Sk = Argmax1qQ P(S
Consequences
At the end of algo, one recovers an estimate of Pk (Si = |X1:n ):
either consider MAP (maximum a posteriori), or simulate var.
under this distribution.
HMM
People regularly use Markov chains with Markov regimes (and
call them HMM). Namely, conditional on {Si }i1 , the sequence
of observations {Xi }i1 is an order-k Markov chain, and the
distribution of each Xi depends on Si and Xik:i1 .
Ex : k = 1
Sk1
Sk
Sk+1
Xk1
Xk
Xk+1
Outline Part 2
Markov chains (order 1)
Higher order Markov chains
Motifs detection with Markov chains
Hidden Markov models (HMMs)
Parameter estimation in HMM
Sequence segmentation with HMM
Motifs detection with HMMs
Underlying idea
Coding sequences follow a letter distribution that should be
different than in non coding sequences: thus, running a HMM
with two states (coding/non coding) should enable to detect
genes on a sequence.
3 425 000
yvrD
yvrA
yvrB yvrC
yvrK
yvrE
yvrG
yvrH yvrI
fhuD
yvrM
yvrO yvrP
fhuC fhuG
yvsG
fhuB
yvgL
yvgJ
yvsH
yvgK
3 425 001
yvgM
3 450 000
yvgO
yvgN
yvaA
yvgP
yvgQ
yvgR
yvgS
yvgT
yvgV
yvgW
yvgX
yvaB
yvaC
yvaF yvaG
3 450 001
3 475 000
yvaM
yvaI
yvaJ
yvaP
yvaQ
yvaK
yvaV
opuBD
yvaX yvaY
opuBB
yvbF
yvaZ opuCD
opuCB
yvbH yvbI
yvbG
yvbJ
3 475 001
3 500 000
yvbK
araR
eno
pgm
tpi
pgk
gap
yvbQ
araE
yvbV
yvbT
yvbU
yvfQ yvfP
yvbW
yvbX yvbY
yvfW
yvfS yvfR
dbut de
transcription
3
bote 35
bote 10
RBS
squence codante
Ideas
I Constrain your HMM so that it detects structures,
I Use hidden semi-Markov models (HSMM) that generalize
HMM to case where homogeneous parts do not have
geometric length (implied in HMM case).
bote 10
spacer
bote 35
Etat
Dbut
absorbant
(15:16)
Part II - References I
[BW99] P. B
uhlmann and A.J. Wyner.
Variable length Markov chains.
The Annals of Statistics, 27(2):480513, 1999.
[CG98] S.F. Chen and J. Goodman.
An empirical study of smoothing techniques for language
modeling.
Technical Report TR-10-98, Center for Research in
Computing Technology (Harvard University), 1998.
[CS00] I. Csiszar and P. C. Shields.
The consistency of the BIC Markov order estimator.
Ann. Statist., 28(6):16011619, 2000.
Part II - References II
[DEKM98] R. Durbin, S. Eddy, A. Krogh, and
G. Mitchison.
Biological sequence analysis: probabilistic models of proteins
and nucleic acids.
Cambridge University Press, Cambridge, UK, 1998.
[DLR77] A. P. Dempster, N. M. Laird, and D. B. Rubin.
Maximum likelihood from incomplete data via the EM
algorithm.
J. Roy. Statist. Soc. Ser. B, 39(1):138, 1977.
[KN95] R. Kneser and H. Ney.
Improved backing-off for m-gram language modeling.
In Proceedings of the IEEE International Conference on
Acoustics, Speech, and Signal Processing, volume 1, pages
181184, 1995.
Part II - References IV
[Nue11] G. Nuel.
Bioinformatics - Trends and Methodologies, chapter
Significance Score of Motifs in Biological Sequences.
InTech., 2011.
Mahmood A. Mahdavi (ed.).
[RS07] E. Roquain and S. Schbath.
Improved compound Poisson approximation for the number
of occurrences of any rare word family in a stationary
Markov chain.
Adv. in Appl. Probab., 39(1):128140, 2007.
Part III
Sequence evolution and alignment
Outline Part 3
Comparative genomics
Definition and procedures
I
I
I
I
Usages
I
functional prediction,
phylogenetic reconstruction, . . .
What is an alignment? I
I
Example
A = {T, C, A, G}, X1:9 = GAAT CT GAC, Y1:6 = CACGT A,
and a (global) alignment of these two sequences
G A A T C T
C A C G T
G A C
A
What is an alignment? II
Vocabulary
I
First remarks
I
Outline Part 3
Outline Part 3
Some textbooks
Principles
I
Definition
A process X = {X(t)}t0 is a continuous time (homogeneous)
Markov process if for any t1 < t2 < . . . < tk < tk+1 and any
i1 , . . . ik , ik+1 Ak+1 we have
P(X(tk+1 ) = ik+1 |X(t1 ) = i1 , . . . , X(tk ) = ik )
= P(X(tk+1 ) = ik+1 |X(tk ) = ik ).
Future state only depends on the past through the present.
Also note that Pij (t) = (eQt )ij is not equal to eQij t .
`n (Q, t) =
n X
X
i=1 a,bA
a,bA
1 1 1 1
3
, , ,
and Q =
=
3
4 4 4 4
Transition probabilities
It can be shown that
Qt
1
4
1
4
41 e4t for i 6= j,
+ 43 e4t for i = j.
Consequence
I
1
Let x be the number of mismatchs in the alignment of S1:n
2 . The frequency x/n estimates
and S1:n
P(X(t) 6= X(0)|X(0)).
and thus
1
4x
b
t = log 1
4
3n
b = d = 3 log 1 4x .
3t
4
3n
Variance
The rate
matrix is given by (remember order T, C, A, G)
( + 2)
( + 2)
Q=
( + 2)
( + 2)
=
1
log(1 2V )
JC and K80 have symmetrical rates qij = qji and thus stat.
dist. is uniform. This is unrealistic.
?
aC bA cG
aT
?
dA eG
Q=
bT dC
?
f G
cT eC f A
?
where ? are terms such that rows sum to 0.
Outline Part 3
r 1
r
,
() e
r > 0;
GTR + Gamma +I
I
Outline Part 3
I
I
G
G
T
C
A A T G
Figure : Graphical representation of an alignment between
X = AAT G and Y = CT GG. This alignment corresponds to
A A T G
TG G
Correspondence
I
r(n, m)
Ym
Y1
X1
Xn
Outline Part 3
Principle
I
Remarks
Exact algorithms I
I
Principle
At each step in the alignment, 3 possibilities arise. Next step
can either be
I
Exact algorithms II
Dynamic programing - global alignement - linear penalty
Let F (i, j), the optimal (global) alignment score between X1:i
and Y1:j . Construct matrix F recursively:
I
F (i 1, j 1)
F (i, j 1)
I
&
F (i 1, j)
F (i, j)
F (i 1, j 1) + s(Xi , Yj )
F (i 1, j)
F (i, j) = max
F (i, j 1)
F (i 1, j 1)
F (i, j 1)
I
F (i 1, j)
&
F (i, j)
F (i 1, j 1) + s(Xi , Yj )
F (i, j) = max
F (i 1, j)
F (i, j 1)
Exact algorithms IV
(Source Durbin et al. [DEKM98])
Approximate algorithms
Substitution matrices I
I
I
I
I
Substitution matrices II
Log-odds ratios
I
Alternative
An alternative to the choice of the scoring function is given by
statistical alignment, that corresponds to select scoring
functions from data through maximum likelihood.
Hypothesis testing
I
Developing alternatives
I
is highly desirable.
Outline Part 3
Context
Framework
I
TKF model I
Evolutionary model
I
Consequences (1/2)
I
TKF model II
Consequences (2/2)
I
G
G
T
C
A A T G
Figure : Graphical representation of an alignment between 2
sequences X = AAT G and Y = CT GG. The alignment corresponds
A A T G
to C
TG G
At time t, conditional on
{s , s t} draw independently
We have
and
P(1:t ) = 1
t1
Y
s=1
(s , s+1 ).
1
1 2
M
h(a,b)
1
IY
g(b)
0 1
= 0 1
1 2
Likelihood
Observe two sequences X1:n and Y1:m .
I
Outline Part 3
Profile HMMs I
References [Edd98, KBM94]
Principle
I
Profile HMMs II
References [Edd98, KBM94]
Additional remarks
I
Part IV
Introduction to phylogeny
Outline Part 4
Trees
Genes phylogenies
Introduction to genes phylogenies
Model based phylogenies
Extensions
Species phylogenies
For many reasons, gene trees and species trees may not be
identical: gene duplications, losses, lateral gene transfers,
lineage sorting and estimation errors . . .
Outline Part 4
Trees
Genes phylogenies
Introduction to genes phylogenies
Model based phylogenies
Extensions
Species phylogenies
Outline Part 4
Trees
Genes phylogenies
Introduction to genes phylogenies
Model based phylogenies
Extensions
Species phylogenies
Parsimony methods
Principle
I
Find the tree that explains the sequences with the most
parsimonious number of mutations;
Advantages/Inconvenients
I
c
a
c
d
(b) Wrong tree T2
. 3.21 Long-branch attraction. When the correct tree (T1 ) has two long branches se
xyxy hastends
higher
probability
than
xxyy.
ed by a (Source
short [Yan06]).
internal Pattern:
branch, parsimony
to recover
a wrong
tree
(T2 ) with the
long branches grouped in one clade.
und that Pr(xyxy) > Pr(xxyy) when the two long branches are much longer than
ee short branches. Thus with more and more sites in the sequence, it will be m
d more certain that more sites will have pattern xyxy than pattern xxyy, in wh
Distance methods I
Principle
I
Distances
I
Distance methods II
Least-squares
I
A criterion used for tree comparison, especially in distance methods, is the amount of
evolution measuredIII
by the sum of branch lengths in the tree (Kidd and SgaramellaDistance methods
Zonta 1971; Rzhetsky and Nei 1993). The tree with the smallest sum of branch
is known as the minimum evolution tree; see Desper and Gascuel (2005) for
Neighborlengths
joining
an excellent review of this class of methods.
Neighbour joining is a cluster algorithm based on the minimum-evolution crite-
Method
thatbyminimizes
an Because
evolution
criterion:
rion proposed
Saitou and Nei (1987).
it is computationally
fast and alsothe sum
produces reasonable trees, it is widely used. It is a divisive cluster algorithm (i.e. a
of branch
lengths;
star-decomposition algorithm), with the tree length (the sum of branch lengths along
the tree)cluster
used as the criterion
for tree selection
at each step. with
It starts with
a starstar
tree andtree and
Divisive
algorithm:
starting
the
then joins two nodes, choosing the pair to achieve the greatest reduction in tree length.
join two
nodes,
choosing
that(Fig.achieves
A new node
is then created
to replace the
the twopair
nodes joined
3.17), reducinggreatest
the
dimension of the distance matrix by one. The procedure is repeated until the tree is
reduction
in
tree
length;
Iterate
the
procedure.
fully resolved. The branch lengths on the tree as well as the tree length are updated
3
3
5
Y
6
X
1
7
6
8
Fig. 3.17 The neighbour-joining method of tree reconstruction is a divisive cluster algorithm,
dividing taxa successively into finer groups.
(Source [Yan06]).
Distance methods IV
Fast to compute;
Outline Part 4
Trees
Genes phylogenies
Introduction to genes phylogenies
Model based phylogenies
Extensions
Species phylogenies
Maximum likelihood
Principle: 2 main steps
I
t8
7 t7
t1
1: T
t2
t3
t4
2: C 3: A 4: C
t5
5: C
Fig. 4.1 A tree of five species used to demonstrate calculation of the liklihood function. The
nucleotides observed at the tips at a site are shown. Branch lengths t1 t8 are measured by the
expected number of nucleotide substitutions per site.
(Source [Yan06]).parameters in the model include the branch lengths and the transition/transversion
rate ratio
collectively
denoted = {t1L
, t2 ,it(|T
, t8 , }.a site i requires
Computation
of ,the
likelihood
3 , t4 , t5 , t6), t7at
Because of the assumption of independent evolution among sites, the probability
summing over
all
possible
values
at
the
(here 4) internal nodes,
of the whole data set is the product of the probabilities of data at individual sites.
which is prohibitive.
Equivalently the log likelihood is a sum over sites in the sequence
Li (|T ) =
XXXX
x0 x6 x7 x8
# = log(L) =
h=1
(4.1)
The ML method estimates by maximizing the log likelihood #, often using numerical
Finally, the
Plikelihood for a site is obtained as
L(|T ) = xA x Lroot (x), where is stationary
distribution of the Markov evolutionary process (estimated
from extant sequences).
Advantages
I
The quantities Pxy (tj ) are the same for all sites (only
depend on branch length) and computed once for all sites;
Equation (4.6) also highlights the fact that the model is over-parametrized in
4.1, with one branch
many. The likelihood
is the same for
any
ReversibleFig.models
andlength
thetoo pulley
principle
[Fel81]
combinations of t6 and t8 as long as t6 + t8 is the same. The data do not contain
information to estimate t6 and t8 separately and only their sum is estimable. Thus
consequence
of time reversibility
is thatdirection
only unrooted trees
be identified
I For another
reversible
models,
neither
ofcan
time
nor
if the molecular clock (rate constancy over time) is not assumed and every branch is
positioning
are
relevant;
allowed to have its own rate.
I This induces
Under the clock,
however, the simplifications
root of the tree can indeedinbe likelihood
identified. With a
potential
single
rate
throughout
the
tree,
every
tip
is
equidistant
from
the
root,
and the natural
calculations;
t 6+
7
t1
1: T
t8
6
t7
t2
8
t3
t4
2: C 3: A 4: C
t4
5: C
Fig. 4.3 The ensuing unrooted tree when the root is moved from node 0 to node 6 in the tree
of Fig. 4.1.
(Source [Yan06]).
root
Bayesian methods
Bayesian paradigm
I
Bayesian phylogenetics
I
Outline Part 4
Trees
Genes phylogenies
Introduction to genes phylogenies
Model based phylogenies
Extensions
Species phylogenies
Setup
I
I
I
Outline Part 4
Trees
Genes phylogenies
Introduction to genes phylogenies
Model based phylogenies
Extensions
Species phylogenies
(b)
Speciation
Speciation
1a 2a 3a 4a 1b 2b 3b 4b
References I
[FC96] J. Felsenstein and G. A. Churchill.
A hidden Markov model approach to variation among sites
in rate of evolution.
Mol. Biol. Evol., 13:93104, 1996.
[Fel78] J. Felsenstein.
Cases in which parsimony and compatibility methods will
be positively misleading.
Syst. Zool, 27:401410, 1978.
[Fel81] J. Felsenstein.
Evolutionary trees from DNA sequences: A maximum
likelihood approach.
J. Mol. Evol., 17:368376, 1981.
References II
[Fit71] W. M. Fitch.
Toward defining the course of evolution: minimum change
for a specific tree topology.
Syst. Zool, 20:406416, 1971.
[Fleissner et al.05] R. Fleissner, D. Metzler and A. Von
Haeseler.
Simultaneous Statistical Multiple Alignment and Phylogeny
Reconstruction.
Syst. Biol. 54(4): 548561, 2005.
[Gal01] N. Galtier.
Maximum-likelihood phylogenetic analysis under a
covarion-like model.
Mol. Biol. Evol., 18:866873, 2001.
References III
[GDL et al.10] S. Guindon, J. Dufayard, V. Lefort,
M. Anisimova, W. Hordijk, and G. O.
New algorithms and methods to estimate
maximum-likelihood phylogenies: Assessing the
performance of phyml 3.0.
Systematic Biology, 59(3):30721, 2010.
[GG03] S. Guindon and O. Gascuel.
A simple, fast, and accurate algorithm to estimate large
phylogenies by maximum likelihood.
Systematic Biology, 52(5):696704, 2003.
[Har73] J. A. Hartigan.
Minimum evolution fits to a given tree.
Biometrics, 29:5365, 1973.
References IV
[Hue02] J. P. Huelsenbeck.
Testing a covariotide model of DNA substitution.
Mol. Biol. Evol., 19:698707, 2002.
[Liuet al.09] K. Liu, S. Raghavan, S. Nelesen, C. Randal
Linder, T. Warnow.
Rapid and Accurate Large-Scale Coestimation of Sequence
Alignments and Phylogenetic Trees.
Science 324: 1561, 2009.
[Liuet al.12] K. Liu, T.J. Warnow, M.T. Holder,
S.M. Nelesen, J. Yu, A.P. Stamatakis and C. Randal
Linder.
SATe-II: Very Fast and Accurate Simultaneous Estimation
of Multiple Sequence Alignments and Phylogenetic Trees.
Syst Biol, 61(1): 90-106, 2012.
References V
[SN87] N. Saitou and M. Nei.
The neighbor-joining method: a new method for
reconstructing phylogenetic trees.
Mol. Biol. Evol, 4:406425, 1987.
[SS63] R. R. Sokal and P. H. A. Sneath.
Numerical Taxonomy.
W.H. Freeman and Co., San Francisco, CA., 1963.
[Yan95] Z. Yang.
A space-time process model for the evolution of DNA
sequences.
Genetics, 139:9931005, 1995.
[Yan06] Z. Yang.
Computational Molecular Evolution.
Oxford Series in Ecology and Evolution. Oxford University
Press, 2006.