Professional Documents
Culture Documents
CG6 ML
CG6 ML
with applications to
inferring phylogenetic trees
Head Tail
LD ( ) (1 ) (1 )
0 0.2 0.4
0.6 0.8 1
8
Sufficient Statistics
To compute the likelihood in the thumbtack
example we only require NH and NT
(the number of heads and the number of tails)
LD ( ) NH
(1 ) NT
9
Sufficient Statistics
A sufficient statistic is a function of the data that
summarizes the relevant information for the
likelihood
Formally, s(D) is a sufficient statistics if for
any two datasets D and D’
s(D) = s(D’ ) LD() = LD’ ()
Datasets
Statistics
10
Maximum Likelihood Estimation
MLE Principle:
Choose parameters that maximize the
likelihood function
11
Example: MLE in Binomial Data
lD N H log NT log 1
Taking derivative and equating it to 0,
we get
NH NT ˆ NH
1 N H NT
(which coincides with what one would expect)
Example: L()
(NH,NT ) = (3,2)
Sufficient statistics:
N1, N2, …, NK - the number of times each outcome
is observed K
Likelihood function: LD ( )
k 1
k
Nk
ˆ Nk
k
N
MLE: (proof @ assignment 3)
13
Example: Multinomial
Let x1 x2 ....xn be a protein sequence
We want to learn the parameters q1, q2,…,q20
corresponding to the frequencies of the 20 amino
acids
N1, N2, …, N20 - the number of times each amino
acid is observed in the sequence
20
Likelihood function: LD (q ) qk Nk
k 1
Nk
MLE: qk
n
14
Inferring Phylogenetic Trees
Let S1 , S2 ,.... , Sn be n sequence (DNA or AA).
Assume for simplicity they are all same length, l.
We want to learn the parameters of a
phylogenetic tree that maximizes the likelihood.
15
A Probabilistic Model
Our models will consist of a “regular” tree, where
in addition, edges are assigned substituion
probabilities.
For simplicity, assume our “DNA” has only two
16
A Probabilistic Model (2)
Our models will consist of a “regular” tree, where
in addition, edges are assigned substituion
probabilities.
For simplicity, assume our “DNA” has only two
17
A Probabilistic Model (3)
If edge e is assigned probability pe , this means
that the probability of more involved patterns of
substitution across e (e.g. XXYXY YXYXX)
is determined, and easily computed: pe2 (1- pe)3
for this pattern.
Q.: What if pattern on both sides is known, but pe
is
not known?
A.: Makes sense to seek pe that maximizes
probability of observation.
So far, this is identical to coin toss example.
18
A Probabilistic Model (4)
But a single edge is a fairly boring tree…
XXYXY YXYXX
pe2
pe1
pe3
????? YYYYX
19
Two Ways to Go
XXYXY YXYXX
pe2
pe1
pe3
????? YYYYX
XXYXY YXYXX
pe2
pe1
pe3
????? YYYYX
In the first version (average, or sum over states of internal
nodes) we are looking for the “most likely” setting of tree edges.
This is called maximum likelihood (ML) inference of
phylogenetic trees.
XXYXY YXYXX
pe2
pe1
pe3
????? YYYYX
In the second version (maximize over states of internal nodes)
we are looking for the “most likely” ancestral states. This is
called ancestral maximum likelihood (AML).
or a break
.