Professional Documents
Culture Documents
CS 4705 Hidden Markov Models: Slides Adapted From Dan Jurafsky, and James Martin
CS 4705 Hidden Markov Models: Slides Adapted From Dan Jurafsky, and James Martin
CS 4705 Hidden Markov Models: Slides Adapted From Dan Jurafsky, and James Martin
1 02/28/22
Hidden Markov Models
2 02/28/22
Definitions
A weighted finite-state automaton adds probabilities
to the arcs
– The sum of the probabilities leaving any arc must sum to
one
A Markov chain is a special case of a WFST
– the input sequence uniquely determines which states the
automaton will go through
Markov chains can’t represent inherently ambiguous
problems
– Assigns probabilities to unambiguous sequences
3 02/28/22
Markov chain for weather
4 02/28/22
Markov chain for words
5 02/28/22
Markov chain = “First-order
observable Markov Model”
a set of states
– Q = q1, q2…qN; the state at time t is qt
Transition probabilities:
– a set of probabilities A = a01a02…an1…ann.
– Each aij represents the probability of transitioning from state i to state j
– The set of these is the transition probability matrix A
a ij 1; 1 i N
6 02/28/22 j1
Markov chain = “First-order
observable Markov Model”
7 02/28/22
Another representation for start
state
Instead of start state
i P(q1 i) 1 i N
– An initial distribution over probability of start states
Constraints:
N
j 1
j1
8 02/28/22
The weather figure using pi
9 02/28/22
The weather figure: specific
example
10 02/28/22
Markov chain for weather
11 02/28/22
How about?
12 02/28/22
Hidden Markov Models
Observed events
Hidden events
13 02/28/22
HMM for Ice Cream
15 02/28/22
Hidden Markov Models
States Q = q1, q2…qN;
Observations O= o1, o2…oN;
– Each observation is a symbol from a vocabulary V =
{v1,v2,…vV}
Transition probabilities
– Transition probability matrix A = {a }
ij
aij P(qt j | qt1 i) 1 i, j N
Observation likelihoods
– Output probability matrix B={b (k)}
i
bi (k) P(X t ok | qt i)
Special initial probability vector
i P(q1 i) 1 i N
Hidden Markov Models
Some constraints
N
a ij 1; 1 i N
j1
i P(q1 i) 1 i N
M
b (k) 1
N
k1
i j 1
j1
17 02/28/22
Assumptions
Markov assumption:
P(qi | q1...qi1) P(qi | qi1)
Output-independence assumption
18 02/28/22
Eisner task
Given
– Ice Cream Observation Sequence: 1,2,3,2,2,2,3…
Produce:
– Weather Sequence: H,C,H,H,H,C…
19 02/28/22
HMM for ice cream
20 02/28/22
Different types of HMM structure
Ergodic =
Bakis = left-to-right
fully-connected
Transitions between the hidden states of
HMM, showing A probs
22 02/28/22
B observation likelihoods for POS HMM
Three fundamental Problems for HMMs
25 02/28/22
Viterbi intuition: we are looking for
the best ‘path’
S1 S2 S3 S4 S5
RB
NN
VBN
JJ DT VB
TO
VBD
VB NNP NN
27 02/28/22
The Viterbi Algorithm
28 02/28/22
The A matrix for the POS HMM
29 02/28/22
The B matrix for the POS HMM
30 02/28/22
Viterbi example
31 02/28/22
Computing the Likelihood of an
observation
Forward algorithm
32 02/28/22
Error Analysis: ESSENTIAL!!!
Look at a confusion matrix
33 02/28/22
Learning HMMs
Learn the parameters of an HMM
– A and B matrices
Input
– An unlabeled sequence of observations (e.g., words)
– A vocabulary of potential hidden states (e.g., POS tags)
Training algorithm
– Forward-backward (Baum-Welch) algorithm
– A special case of the Expectation-Maximization (EM) algorithm
– Intuitions
Iteratively estimate the counts
Estimated probabilities derived by computing the forward probability
for an observation and divide that probability mass among all different
contributing paths
34 02/28/22
Other Classification Methods
35 02/28/22