Professional Documents
Culture Documents
Markov Models
Markov Models
• Markov Models
• Hidden Markov Models
• The Three Fundamental Questions for HMMs
• Finding the probability of an observation
• Finding the best state sequence and parameter estimation
Problem Statement
✔ Often, we want to consider a sequence (perhaps through time) of random variables that
aren’t independent, but rather the value of each variable depends on previous elements in
the sequence.
✔ For many such systems, it seems reasonable to assume that all we need to predict the
future random variables is the value of the present random variable, and we don’t need to
know the values of all the past random variables in the sequence.
Markov Model Definition
Markov chain is a mathematical model that is utilized to simulate random processes
occurring over a duration of time. It consists of a set of states and the transitions between
them. These transitions are probabilistic, which implies that the possibility of moving
from one state to another solely depends on the current state and not on any past events.
• If the random variables measure the number of books in the University library, then,
knowing how many books were in the library today might be an adequate predictor of
how many books there will be tomorrow, and we don’t really need to additionally
know how many books the library had last week, let alone last year.
• That is, future elements of the sequence are conditionally independent of past
elements, given the present element.
Markov Assumption
• Suppose X = (X1,...,XT ) is a sequence of random variables taking values in
some finite set S = {s1,...,sN}, the state space. Then the Markov
Assumptions are:
• Limited Horizon:
• Time invariant (stationary process):
States: Observations:
•Sunny (S) •Carrying an umbrella (U)
•Rainy (R) •Not carrying an umbrella (N)
Transition
Probabilities: Emission Probabilities:
•P(Sunny to Sunny) = 0.7 •P(Carrying an umbrella in Sunny) = 0.2
•P(Sunny to Rainy) = 0.3 •P(Not carrying an umbrella in Sunny) = 0.8
•P(Rainy to Sunny) = 0.4 •P(Carrying an umbrella in Rainy) = 0.6
•P(Rainy to Rainy) = 0.6 •P(Not carrying an umbrella in Rainy) = 0.4
Stochastic Model
• It is a discrete-time process indexed at time 1,2,3,…that take
values called states which are observed.
• For an example if the states (S) ={hot , cold }
• State series over time => z∈ S_T
• Weather for 4 days can be a sequence => {z1=hot, z2 =cold,
z3 =cold, z4 =hot}
• Suppose you have a crazy soft drink machine: it can be in two states,
cola preferring (CP) and iced tea preferring (IP), but it switches
between them randomly after each purchase.
• Now, if, when you put in your coin, the machine always put out a
cola if it was in the cola preferring state and an iced tea when it was
in the iced tea preferring state, then we would have a visible Markov
model
Hidden Markov Models
• We need symbol emission probabilities for the observations:
• For this machine, the output is actually independent of Sj, and so can
be described by the following probability matrix:
0.7 x 0.3 x 0.7 x 0.1 + 0.7 x 0.3 x 0.3 x 0.1 + 0.3 x 0.3 x 0.5 x 0.7 + 0.3 x 0.3 x 0.5 x 0.7 = 0.084
Hidden Markov Models
For the general case (where one can start in any state, and move to any
other at each step), the calculation requires (2T + 1) · NT+1 multiplications.
The Three Fundamental Questions for HMMs
Finding the probability of an observation
The forward variable αi(t) is stored at (si,t) in the trellis and expresses the total probability of
ending up in state si at time t (given that the observations o1 ··· ot-1 were seen).
The Three Fundamental Questions for HMMs
Finding the probability of an observation
Trellis - The forward procedure:
It is calculated by summing probabilities for all incoming arcs at a
Trellis node.
We calculate the forward variables in the trellis left to right using
the following procedure:
CP CP CP CP
IP IP IP IP
Forward Procedure
Forward Procedure Output Summary
The Three Fundamental Questions for HMMs
Finding the probability of an observation
Trellis - The backward procedure:
It should be clear that we do not need to cache results working
forward through time like this, but rather that we could also work
backward.
The backward procedure computes backward variables which are
the total backward procedure probability of seeing the rest of the
observation sequence given that we were in state si at time t.
Then we can calculate backward variables working from
right to left through the trellis as follows:
CP CP CP CP
IP IP IP IP
Backward Procedure
Forward & Backward Procedure
Output Summary
The Three Fundamental Questions for HMMs
Finding the best state sequence
The second problem is “finding the state sequence that best explains the
observations.”
One way to proceed would be to choose the states individually.
That is, for each t, 1 ≤ t ≤ T + 1, we would find Xt that maximizes P (Xt|O,µ).
This quantity maximizes the expected number of states that will be guessed correctly.
However, it may yield a quite unlikely state sequence.
Therefore, this is not the method that is normally used, but rather the Viterbi algorithm,
which efficiently computes the most likely state sequence.
What is the probability of seeing the output sequence {lem, ice-t, cola} if the
machine always starts off in the cola preferring state?
Lem Ice-t Cola
CP CP CP CP
IP IP IP IP
Best Sequence
Best State Sequence Calculation
CP-IP-CP-CP
Forward/Backward Procedure & Best
State Sequence Output Summary
Finding the probability of an observation: Example
What is the probability of seeing the output sequence {lem, ice-t, cola} if
the machine always starts off in the cola preferring state?
Homework!
AAB?
The Three Fundamental Questions for HMMs
Viterbi Algorithm
• For any model, such as an HMM, that
contains hidden variables, the task of • We might propose to find the best
determining which sequence of sequence as follows:
variables is the underlying source of • For each possible hidden state
some sequence of observations is called sequence (HHH, HHC, HCH, etc.), we
the decoding task. could run the forward algorithm and
• In the ice-cream domain, given a compute the likelihood of the
sequence decoding of ice-cream observation sequence given that
observations 3 1 3 and an HMM, the hidden state sequence.
task of the decoder is to find the best • Then we could choose the hidden state
hidden weather sequence (H H H). sequence with the maximum
• More formally, Decoding: Given as observation likelihood.
input an HMM μ = (A,B) and a • It should be clear from the previous
sequence of observations O = o1,o2,...,oT section that we cannot do this because
, find the most probable sequence of there are an exponentially large
states Q = q1q2q3 ...qT . number of state sequences.
The Three Fundamental Questions for HMMs
Viterbi Algorithm
Using dynamic programming, we calculate the most probable path through the whole
trellis as follows
Finding the best state sequence: Viterbi
Lem Ice-t Cola
CP CP CP CP
IP IP IP IP
Output Summary
The Three Fundamental Questions for HMMs
The third problem: Parameter estimation
• Given a certain observation sequence, • There is no known analytic method to
we want to find the values of the model choose µ to maximize P (O|µ).
parameters µ = (A, B, π) which best • But we can locally maximize it by an
explain what we observed. iterative hill-climbing algorithm.
• Using Maximum Likelihood • This algorithm is the Baum-Welch or
Estimation, that means we want to find Forward-Backward algorithm, which is a
the values that maximize P (O|µ): Forward-Backward algorithm special case
of the Expectation Maximization method.
• We don’t know what the model is, but we can work out the probability of the observation sequence
using some (perhaps randomly chosen) model.
• Looking at that calculation, we can see which state transitions and symbol emissions were probably
used the most.
• By increasing the probability of those, we can choose a revised model which gives a higher probability
to the observation sequence.
• This maximization process is often referred to as training the model and is performed on training data.
What is the probability of seeing the output sequence {lem, ice-t, cola} if the
machine always starts off in the cola preferring state?
Lem Ice-t Cola
CP CP CP CP
IP IP IP IP
The Three Fundamental Questions for HMMs
The third problem: Parameter estimation
Define pt(i, j), 1 ≤ t ≤ T, 1 ≤ i, j ≤ N
as shown below. This is the
probability of traversing a certain
arc at time t given observation
sequence O
The Three Fundamental Questions for HMMs
The third problem: Parameter estimation
What is the probability of seeing the output sequence {lem, ice-t, cola} if the
machine always starts off in the cola preferring state?
Lem Ice-t Cola
CP CP CP CP
IP IP IP IP
What is the probability of seeing the output sequence {lem, ice-t, cola} if the
machine always starts off in the cola preferring state?
Lem Ice-t Cola
CP CP CP CP
IP IP IP IP
The Three Fundamental Questions for HMMs
• The Three Fundamental Questions for HMMs
• Parameter Re-estimation
The Three Fundamental Questions for HMMs
• The third problem: Parameter estimation
The Three Fundamental Questions for HMMs
The third problem: Parameter estimation
Now, if we sum over the time index, this gives We then repeat this process, hoping to
us expectations (counts): converge on optimal values for the model
parameters µ.
The reestimation formulas are as follows: