4-Hidden Markov Models

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 113

1 , 2006.2.

1~3, KAIST

A Tutorial on Hidden Markov Models

2006 2 2 jyha@kangwon.ac.kr

Contents
Introduction Markov Model Hidden Markov model (HMM) Three algorithms of HMM
Model evaluation Most probable path decoding Model training

Pattern classification using HMMs HMM Applications and Software Summary References

A Tutorial on HMMs

Sequential Data
Examples

Speech data ( )

Handwriting data
A Tutorial on HMMs 3

Characteristics of such data


Data are sequentially generated according to time or index Spatial information along time or index Often highly variable, but has an embedded structure Information is contained in the structure

A Tutorial on HMMs

Advantage of HMM on Sequential Data


Natural model structure: doubly stochastic process
transition parameters model temporal variability output distribution model spatial variability

Efficient and good modeling tool for


sequences with temporal constraints spatial variability along the sequence real world complex processes

Efficient evaluation, decoding and training algorithms


Mathematically strong Computationally efficient

Proven technology!
Successful stories in many applications

Tools already exist


HTK (Hidden Markov Model Toolkit) HMM toolbox for Matlab

A Tutorial on HMMs

Successful Application Areas of HMM


On-line handwriting recognition Speech recognition and segmentation Gesture recognition Language modeling Motion video analysis and tracking Protein sequence/gene sequence alignment Stock price prediction

A Tutorial on HMMs

Whats HMM?

Hidden Markov Model

Hidden

Markov Model

What is hidden?

What is Markov model?

A Tutorial on HMMs

Markov Model
Scenario Graphical representation Definition Sequence probability State probability
8

A Tutorial on HMMs

Markov Model: Scenario


Classify a weather into three states
State 1: rain or snow State 2: cloudy State 3: sunny

By carefully examining the weather of some city for a long time, we found following weather change pattern
Tomorrow Rain/snow Today Rain/Snow Cloudy Sunny 0.4 0.2 0.1 Cloudy 0.3 0.6 0.1 Sunny 0.3 0.2 0.8

Assumption: tomorrow weather depends only on todays weather!


A Tutorial on HMMs 9

Markov Model: Graphical Representation


Visual illustration with diagram
0.4 0.3 0.2 0.3 0.1 3: sunny 0.8 0.2 0.1

1: rain

2: cloudy

0.6

- Each state corresponds to one observation - Sum of outgoing edge weights is one
A Tutorial on HMMs 10

Markov Model: Definition


0.4

Observable states

1: rain 0.3 0.1

0.3 0.2 0.2

2: cloudy

0.6

{1, 2 , L , N }
Observed sequence

0.1 3: sunny 0.8

q1 , q2 , L , qT
1st order Markov assumption

P ( qt = j | qt 1 = i, qt 2 = k , L) = P ( qt = j | qt 1 = i )
q1 q2

qt 1

qt

qt 1

qt

Stationary
A Tutorial on HMMs

Bayesian network representation

P ( qt = j | qt 1 = i ) = P (qt +l = j | qt +l 1 = i )
11

Markov Model: Definition (Cont.)


0.4

State transition matrix

1: rain 0.3 0.1

0.3 0.2 0.2

2: cloudy

0.6

a11 a A = 21 M a N1
Where

a12 a 22 M a NN

L L M L

a1 N a2 N M a NN

0.1 3: sunny 0.8

aij = P (qt = j | qt 1 = i ),

1 i, j N
ij

With constraints

a ij 0,
Initial state probability

a
j =1

=1

= P ( q 1 = i ),

1 i N
12

A Tutorial on HMMs

Markov Model: Sequence Prob.


Conditional probability

P ( A, B ) = P ( A | B ) P ( B )
Sequence probability of Markov model

P (q1 , q2 , L , qT )

Chain rule

= P (q1 ) P (q2 | q1 ) L P (qT 1 | q1 , L , qT 2 ) P (qT | q1 , L , qT 1 ) = P (q1 ) P (q2 | q1 ) L P (qT 1 | qT 2 ) P (qT | qT 1 )


1st order Markov assumption

A Tutorial on HMMs

13

Markov Model: Sequence Prob. (Cont.)


Question: What is the probability that the weather for the next 7 days will be sun-sun-rain-rain-sun-cloudy-sun when today is sunny?

S1 : rain, S 2 : cloudy , S 3 : sunny


P (O | model) = P ( S 3 , S 3 , S 3 , S1 , S1 , S 3 , S 2 , S 3 | model) = P ( S 3 ) P ( S 3 | S 3 ) P ( S 3 | S 3 ) P ( S1 | S 3 ) P ( S1 | S1 ) P ( S 3 | S1 ) P ( S 2 | S 3 ) P ( S 3 | S 2 ) = 3 a33 a33 a31 a11 a13 a32 a23 = 1 (0.8)(0.8)(0.1)(0.4)(0.3)(0.1)(0.2) = 1.536 10 4
0.4 1: rain 0.3 0.1 3: sunny A Tutorial on HMMs 0.8 14 0.3 0.2 0.2 0.1 2: cloudy 0.6

Markov Model: State Probability


State probability at time t :

P ( qt = i )

Simple but slow algorithm:


Probability of a path that ends to state i at time t:

Qt (i ) = (q1 , q2 , L , qt = i ) P (Qt (i )) = q1 P (qk | qk 1 )


k =2 t

Exponential time complexity:

O( N t )

Summation of probabilities of all the paths that ends to i at t

P ( qt = i ) =
A Tutorial on HMMs

all Qt ( i )' s

P(Q (i))
t

15

Markov Model: State Prob. (Cont.) State probability at time t : P ( qt = i )


P ( q t 1 = 1)

a 1i
Each node stores the sum of probabilities of partial paths

Efficient algorithm (Lattice algorithm)


Recursive path probability calculation

P (qt = i) = = =

j =1

P ( q t 1 = j , q t = i )

j =1

P ( q t 1 = j ) P ( q t = i | q t 1 = j ) P ( q t 1 = j ) a

A Tutorial on HMMs

j =1

ji

Time complexity: O ( N 2t ) 16

Whats HMM?

Hidden Markov Model

Hidden

Markov Model

What is hidden?

What is Markov model?

A Tutorial on HMMs

17

Hidden Markov Model


Example Generation process Definition Model evaluation algorithm Path decoding algorithm Training algorithm
18

A Tutorial on HMMs

Time Series Example Representation


X = x1 x2 x3 x4 x5 xT-1 xT = s p iy iy iy ch ch ch ch

A Tutorial on HMMs

19

Analysis Methods Probability-based analysis?

P (s p iy iy iy ch ch ch ch ) = ? Method I
P(s) P( ) P(p) P(iy) P(ch )
3 3 4

Observations are independent; no time/order A poor model for temporal structure


Model size = |V| = N

A Tutorial on HMMs

20

Analysis methods Method II


P(s) P(s | s) P( | s) P(p | ) P(iy | p) P(iy | iy) 2 P( | iy) P( | ) P(ch | ) P(ch | ch ) 2`
A simple model of ordered sequence
A symbol is dependent only on the immediately preceding:

P( xt | x1 x2 x3 L xt 1 ) = P ( xt | xt 1 )
|V||V| matrix model 5050 not very bad 105105 doubly outrageous!!

A Tutorial on HMMs

21

The problem What you see is the truth


Not quite a valid assumption There are often errors or noise
Noisy sound, sloppy handwriting, ungrammatical sentences

There may be some truth process


Underlying hidden sequence Obscured by the incomplete observation

A Tutorial on HMMs

22

Another analysis method Method III


What you see is a clue to what lies behind and is not known a priori
The source that generated the observation The source evolves and generates characteristic observation sequences

q 0 q1 q 2 L qT

P(s, q ) P(s, q
1 Q

P(s, q1 ) P (s, q2 | q1 ) P ( , q3 | q2 ) L P (ch, qT | qT 1 ) = P( xt , qt | qt 1 )


t

| q1 ) P ( , q3 | q2 ) L P (ch, qT | qT 1 ) = P ( xt , qt | qt 1 )
Q t

A Tutorial on HMMs

23

The Auxiliary Variable

qt S = {1, K, N }
N is also conjectured {qt:t0} is conjectured, not visible is Q = q1 q 2 L qT
is Markovian

P(q1q2 L qT ) = P(q1 ) P(q2 | q1 ) L P(qT | qT 1 )


Markov chain

A Tutorial on HMMs

24

Summary of the Concept


P( X ) = P( X , Q)
Q

= P (Q) P ( X | Q)
= P ( q1 q 2 L qT ) P ( x1 x 2 L xT | q1 q 2 L qT )
Q

= P(qt | qt 1 ) p( xt | qt )
Q t =1 t =1

Markov chain process

Output process

A Tutorial on HMMs

25

Hidden Markov Model is a doubly stochastic process


stochastic chain process : { q(t) } output process : { f(x|q) }

is also called as
Hidden Markov chain Probabilistic function of Markov chain

A Tutorial on HMMs

26

HMM Characterization = (A, B, )


A : state transition probability
{ aij | aij = p(qt+1=j|qt=i) }

B : symbol output/observation probability : initial state distribution probability


{ i | i = p(q1=i) } { bj(v) | bj(v) = p(x=v|qt=j) }

P(Q | )P(X | Q, )
Q

= q1 aq1q 2 aq2q3 ... aqT 1qT bq1 ( x1 )bq2 ( x2 ) ... bqT ( xT )


Q
A Tutorial on HMMs

27

Graphical Example
= [ 1.0 0 0 0 ] 0.6 1 0.4 0.5 2 p 0.5 0.7 3 0.3 1.0 4 ch 1 2 A= 3 4 1 2 B= 3 4
A Tutorial on HMMs

iypsch

iy

1 0.6 0.0 0.0 0.0 0.2 0.0 0.0 0.6

2 0.4 0.5 0.0 0.0 0.2 0.2 0.8 0.0

3 0.0 0.5 0.7 0.0 0.0 0.5 0.1 0.2

4 0.0 0.0 0.3 1.0 s 0.6 0.3 0.1 0.2


28

ch iy p

Data interpretation
P(s s p p iy iy iy ch ch ch|) = Q P(ssppiyiyiychchch,Q|) = Q P(Q|) p(ssppiyiyiychchch|Q,) Let Q = 1 1 2 2 3 3 3 4 4 4
0.6 0.0 0.0 0.0 0.2 0.0 0.0 0.6 0.4 0.5 0.0 0.0 0.2 0.2 0.8 0.0 0.0 0.5 0.7 0.0 0.0 0.0 0.3 1.0

P(Q|) p(ssppiyiyiychchch|Q, ) = P(1122333444|) p(ssppiyiyiychchch|1122333444, ) 2 = (1.6)(.6.6)(.4.5)(.5.5)(.5.8)(.7.8) (.3.6)(1..6)2 0.0000878 #multiplications ~ 2TNT


A Tutorial on HMMs 29

0.0 0.5 0.1 0.2

0.6 0.3 0.1 0.2

Issues in HMM Intuitive decisions


1. number of states (N) 2. topology (state inter-connection) 3. number of observation symbols (V)

Difficult problems
4. efficient computation methods 5. probability parameters ()

A Tutorial on HMMs

30

The Number of States How many states?


Model size Model topology/structure

Factors
Pattern complexity/length and variability The number of samples

Ex:
rrgbbgbbbr

A Tutorial on HMMs

31

(1) The simplest model Model I


N=1 a11=1.0 B = [1/3, 1/6, 1/2] 1.0

1 1 1 1 1 1 P(r r g b b g b b b r | 1 ) = 1 1 1 1 1 1 3 3 6 2 2 6 1 1 1 1 1 1 1 1 2 2 3 2

0.0000322 (< 0.0000338)

A Tutorial on HMMs

32

(2) Two state model Model II:


N=2 A= B= 0.6 0.4 0.6 0.4 1/2 1/3 1/6 1/6 1/6 2/3 0.6 1 0.4 0.6 2 0.4

1 1 1 1 2 2 P(r r g b b g b b b r | 1 ) = .5 .6 .6 .4 .4 .6 2 2 3 3 3 3 2 2 2 1 .4 .4 .4 .6 + LL 3 3 3 2 =?
A Tutorial on HMMs 33

(3) Three state models


N=3: 0.2 0.6 1 0.2 0.3 2 0.2 0.5 1 0.2 0.6 0.3 0.1
A Tutorial on HMMs 34

0.3 0.7 3 0.7

0.6 0.2 3 0.3

The Criterion is Obtaining the best model() that maximizes

P( X | )
The best topology comes from insight and experience
the # classes/symbols/samples

A Tutorial on HMMs

35

A trained HMM

.5
1

.1 .2 .6 .5 2 .3 .4 .0 3 .3 .7

.4

R .6 G .2 B .2

=
1 2 3 1 2 3

1. 0. 0.
1 2 3

A=

.5 .4 .1 .0 .6 .4 .0 .0 .0
R G B

B=

.6 .2 .2 .2 .5 .3 .0 .3 .7

A Tutorial on HMMs

36

Hidden Markov Model: Example


2
0.2 0.6 0.3 0.6 0.1 0.2 0.3 0.1

0.6

N pots containing color balls M distinct colors Each pot contains different number of color balls

A Tutorial on HMMs

37

HMM: Generation Process


Sequence generating algorithm
Step 1: Pick initial pot according to some random process Step 2: Randomly pick a ball from the pot and then replace it Step 3: Select another pot according to a random selection process 0.3 2 Step 4: Repeat steps 2 and 3
0.2 0.6 1 0.1 0.1 0.2 0.3 0.6 3 0.6

Markov process: {q(t)} Output process: {f(x|q)}


A Tutorial on HMMs 38

HMM: Hidden Information Now, what is hidden?

We can just see the chosen balls We cant see which pot is selected at a time So, pot selection (state transition) information is hidden
A Tutorial on HMMs 39

HMM: Formal Definition Notation: = ( , , )


(1) N : Number of states (2) M : Number of symbols observable in states

V = { v1 , L , v M }
(3) A : State transition probability distribution

A = { a ij },

1 i, j N

(4) B : Observation symbol probability distribution (5) : Initial state distribution

B = { b i ( v k )},

1 i N ,1 j M
1 i N

= P ( q 1 = i ),

A Tutorial on HMMs

40

Three Problems 1. Model evaluation problem


What is the probability of the observation? Forward algorithm

2. Path decoding problem


What is the best state sequence for the observation? Viterbi algorithm

3. Model training problem


How to estimate the model parameters? Baum-Welch reestimation algorithm
41

A Tutorial on HMMs

Solution to Model Evaluation Problem


Forward algorithm Backward algorithm

Definition
Given a model Observation sequence: X = x 1 , x 2 , L , x T P(X| ) = ? P(X | ) =

P ( X , Q | ) = P ( X | Q , ) P (Q | )
Q

(A path or state sequence: Q = q 1 , L , q T


x1 x2 x3 x4 x5 x6 x7 x8

x1 x2 x3 x4 x5 x6 x7 x8

A Tutorial on HMMs

43

Solution
Easy but slow solution: exhaustive enumeration
P(X | ) = =

P ( X , Q | ) = P ( X | Q , ) P (Q | )
Q

b
Q

q1

( x 1 ) b q 2 ( x 2 ) L b q T ( x T ) q 1 a q 1 q 2 a q 2 q 3 L a q T 1 q T

Exhaustive enumeration = combinational explosion!

O(N T )

Smart solution exists?


Yes! Dynamic Programming technique Lattice structure based computation Highly efficient -- linear in frame length

A Tutorial on HMMs

44

Forward Algorithm
Key idea
Span a lattice of N states and T times Keep the sum of probabilities of all the paths coming to each state i at time t

Forward probability t ( j ) = P( x1 x2 ...xt , qt = S j | )


= P( x1 x2 ...xt , Qt = q1...qt | )
Qt N

= t 1 (i )aij b j ( xt )
A Tutorial on HMMs
i =1

45

Forward Algorithm
Initialization 1 (i ) = i bi (x1 ) Induction
1 i N

t ( j ) = t 1 (i )aij b j (x t )
i =1

1 j N , t = 2, 3, L , T

Termination
P ( X | ) = T (i )
i =1 N

A Tutorial on HMMs

46

Numerical Example: P(RRGB|)


=[1 0 0]T .5 .6 .4
R .6 G .2 B .2 1.6

R .5.6

R .5.2

G .5.2

.6

.18

.018

.0018

.4.2 0.2 .0 .6.2

.4.5 .6.5

.4.3 .6.3

.4

.1 .2 .5 .3 .0 .3 .7

.048 .0504 .01123 .1.0 .1.3 .1.7 .4.0 .4.3 .4.7 .0 .01116 .01537

0.0

.0

A Tutorial on HMMs

47

Backward Algorithm (1)


Key Idea
Span a lattice of N states and T times Keep the sum of probabilities of all the outgoing paths at each state i at time t

Backward probability t (i ) = P( xt +1 xt + 2 ...xT | qt = Si , )


Qt +1 N

= P( xt +1 xt + 2 ...xT , Qt +1 = qt +1...qT | qt = Si , )

j =1 A Tutorial on HMMs

= aij b j ( xt +1 ) t +1 ( j )
48

Backward Algorithm (2)


Initialization

T (i ) = 1
Induction

1 i N

t (i ) = aij b j (x t +1 ) t +1 ( j )
j =1

1 i N , t = T 1, T 2, L, 1

A Tutorial on HMMs

49

Solution to Path Decoding Problem


State sequence Optimal path Viterbi algorithm Sequence segmentation

The Most Probable Path


Given a model Observation sequence: X = x 1 , x 2 , L , x T P(X ,Q | ) = ? Q * = arg max Q P ( X , Q | ) = arg max Q P ( X | Q , ) P ( Q | )
(A path or state sequence: Q = q 1 , L , q T )

x1 x2 x3 x4 x5 x6 x7 x8

x1 x2 x3 x4 x5 x6 x7 x8

A Tutorial on HMMs

51

Viterbi Algorithm
Purpose
An analysis for internal processing result The best, the most likely state sequence Internal segmentation

Viterbi Algorithm
Alignment of observation and state transition Dynamic programming technique

A Tutorial on HMMs

52

Viterbi Path Idea


Key idea
Span a lattice of N states and T times Keep the probability and the previous node of the most probable path coming to each state i at time t

Recursive path selection Path probability: t +1 ( j ) = max t (i ) aij b j ( x t +1 )


Path node:

t +1 ( j ) = arg max t (i )aij


1i N

1i N

t (1) a 1 j b j ( x t + 1 ) t ( i ) a ij b j ( x t + 1 )

A Tutorial on HMMs

53

Viterbi Algorithm
Introduction: Recursion:

1 (i ) = i bi (x1 ), 1 i N 1 (i ) = 0 t +1 ( j ) = max t (i )aij b j (x t +1 ), 1 t T 1


1i N

t +1 ( j ) = arg max t (i )aij


1i N

1 j N

Termination:

P = max T (i ) 1i N qT = arg max T (i )


1i N

Path backtracking:
1

qt = t +1 (qt+1 ), t = T 1, K ,1

states

2 3

A Tutorial on HMMs

54

Numerical Example: P(RRGB,Q*|)


=[1 0 0]T .5 .6 .4
R .6 G .2 B .2 1.6

R .5.6

R .5.2

G .5.2

.6

.18

.018

.0018

.4.2 0.2 .0 .6.2

.4.5 .6.5

.4.3 .6.3

.4

.1 .2 .5 .3 .0 .3 .7

.048 .036 .00648 .1.0 .1.3 .1.7 .4.7 .4.0 .4.3 .0 .00576 .01008

0.0

.0

A Tutorial on HMMs

55

Solution to Model training Problem


HMM training algorithm Maximum likelihood estimation Baum-Welch reestimation

HMM Training Algorithm


Given an observation sequence X = x 1 , x 2 , L , x T Find the model parameter * = ( A , B , ) s.t. P ( X | * ) P ( X | ) for
Adapt HMM parameters maximally to training samples Likelihood of a sample

P(X | ) =

P ( X | Q , ) P ( Q | )

State transition is hidden!

NO analytical solution Baum-Welch reestimation (EM)


iterative procedures that locally maximizes P(X|) convergence proven MLE statistic estimation
A Tutorial on HMMs

P(X|)

:HMM parameters
57

Maximum Likelihood Estimation


MLE selects those parameters that maximizes the probability function of the observed sample. [Definition] Maximum Likelihood Estimate
: a set of distribution parameters Given X, * is maximum likelihood estimate of if f(X|*) = max f(X|)

A Tutorial on HMMs

58

MLE Example
Scenario
Known: 3 balls inside pot (some red; some white) Unknown: R = # red balls Observation: (two reds)

Two models
P ( P (

2 1 |R=2) = 2 0

3 1 = 2 3

|R=3) =

3 2

3 =1 2

Which model?

L ( R = 3 ) > L ( R = 2 ) Model(R=3) is our choice


59

A Tutorial on HMMs

MLE Example (Cont.)


Model(R=3) is a more likely strategy, unless we have a priori knowledge of the system. However, without an observation of two red balls
No reason to prefer P(R=3) to P(R=2)

ML method chooses the set of parameters that maximizes the likelihood of the given observation. It makes parameters maximally adapted to training data.

A Tutorial on HMMs

60

EM Algorithm for Training With ( t ) =< { a ij }, { b ik }, i > , estimate EXPECTATION of following quantities:
Expected number of state i visiting Expected number of transitions from i to j

With following quantities:


Expected number of state i visiting Expected number of transitions from i to j

Obtain the MAXIMUM LIKELIHOOD of


( t + 1 ) =< { a ij }, { b ik }, i >
A Tutorial on HMMs 61

Expected Number of Si Visiting


t (i ) = P ( q t = S i | X , ) P (qt = S i , X | ) = P(X | ) t (i ) t (i ) = t ( j) t ( j)
j

A Tutorial on HMMs

62

Expected Number of Transition


t (i )aij b j ( xt +1 ) t +1 ( j ) t (i, j ) = P(qt = Si , qt +1 = S j | X , ) = i (i)aij b j ( xt +1 )t +1 ( j )
i j

A Tutorial on HMMs

63

Parameter Reestimation MLE parameter estimation


a ij =

T 1

t =1 T 1

t (i, j ) t (i)
t =1 s .t . x t = v k T

P(X|)

t =1

t ( j)

:HMM parameters

b j (vk ) =

= 1 (i)

t =1

t ( j)

Iterative: P ( X | convergence proven: arriving local optima


A Tutorial on HMMs

( t +1)

) P( X | (t ) )

64

Other issues
Other method of training
MAP (Maximum A Posteriori) estimation for adaptation MMI (Maximum Mutual Information) estimation MDI (Minimum Discrimination Information) estimation Viterbi training Discriminant/reinforcement training

A Tutorial on HMMs

65

Other types of parametric structure


Continuous density HMM (CHMM)
More accurate, but much more parameters to train

Semi-continuous HMM
Mix of CHMM and DHMM, using parameter sharing

State-duration HMM
More accurate temporal behavior

Other extensions
HMM+NN, Autoregressive HMM 2D models: MRF, Hidden Mesh model, pseudo-2D HMM

A Tutorial on HMMs

66

Graphical DHMM and CHMM


Models for 5 and 2

A Tutorial on HMMs

67

Pattern Classification using HMMs


Pattern classification Extension of HMM structure Extension of HMM training method Practical issues of HMM HMM history

A Tutorial on HMMs

68

Pattern Classification
Construct one HMM per each class k

Train each HMM k with samples Dk


Baum-Welch reestimation algorithm

1 , L , N

Calculate model likelihood of 1 , L , N with observation X Forward algorithm: P ( X | k ) Find the model with maximum a posteriori probability

* = argmax P(k | X ) P(k ) P( X | k )


k

= argmax k

P( X ) = argmax k P(k ) P( X | k )
A Tutorial on HMMs 69

Extension of HMM Structure


Extension of state transition parameters
Duration modeling HMM
More accurate temporal behavior

Transition-output HMM
HMM output functions are attached to transitions rather than states

Extension of observation parameter


Segmental HMM
More accurate modeling of trajectories at each state, but more computational cost

Continuous density HMM (CHMM)


Output distribution is modeled with mixture of Gaussian

Semi-continuous HMM (Tied mixture HMM)


Mix of continuous HMM and discrete HMM by sharing Gaussian components

A Tutorial on HMMs

70

Extension of HMM Training Method


Maximum Likelihood Estimation (MLE)*
maximize the probability of the observed samples

Maximum Mutual Information (MMI) Method


information-theoretic measure maximize average mutual information:

maximize discrimination power by training models together

V I = max log P ( X =1
*

| v ) log

P(X
w =1

| w )

Minimum Discrimination Information (MDI) Method


minimize the DI or the cross entropy between pd(signal) and pd(HMM)s use generalized Baum algorithm
A Tutorial on HMMs 71

Practical Issues of HMM


Architectural and behavioral choices
the unit of modeling -- design choice type of models: ergodic, left-right, parallel path. number of states observation symbols;discrete, continuous; mixture number

Initial estimates

A, : adequate with random or uniform initial values B : good initial estimates are essential for CHMM

ergodic
A Tutorial on HMMs

left-right

parallel path
72

Practical Issues of HMM (Cont.)


Scaling
t (i ) =

a b (x )
s , s +1 s s s =1 s =1

t 1

heads exponentially to zero:

scaling (or using log likelihood)

Multiple observation sequences


accumulate the expected freq. with weight P(X(k)|l)

Insufficient training data


deleted interpolation with desired model & small model output prob. smoothing (by local perturbation of symbols) output probability tying between different states

A Tutorial on HMMs

73

Practical Issues of HMM (Cont.)


HMM topology optimization
What to optimize
# of states # of Gaussian mixtures per state Transitions

Methods
Heuristic methods
# of states from average (or mod) length of input frames

Split / merge
# of states from iterative split / merge

Model selection criteria


# of states and mixtures at the same time ML (maximum likelihood) BIC (Bayesian information criteria) HBIC (HMM-oriented BIC) DIC (Discriminative information criteria) ..

A Tutorial on HMMs

74

HMM applications and Software


On-line handwriting recognition Speech applications HMM toolbox for Matlab HTK (hidden Markov model Toolkit)

A Tutorial on HMMs

75

HMM Applications
On-line handwriting recognition
BongNet: HMM network-based handwriting recognition system

Speech applications
CMU Sphinx : Speech recognition toolkit Dr.Speaking : English pronunciation correction system

A Tutorial on HMMs

76

BongNet
Consortium of CAIR(Center for Artificial Intelligence Research) at KAIST
The name BongNet from its major inventor, BongKee Shin

Prominent performance for unconstrained on-line Hangul recognition Modeling of Hangul handwriting
considers ligature between letters as well as consonants and vowels
(initial consonant)+(ligature)+(vowel) (initial consonant)+(ligature)+(vowel)+(ligature)+(final consonant)

connects letter models and ligature models using Hangul composition principle further improvements
BongNet+ : incorporating structural information explicitly Circular BongNet : successive character recognition Unified BongNet : Hangul and alphanumeric recognition dictionary look-up

A Tutorial on HMMs

77

Network structure

A Tutorial on HMMs

78

A Modification to BongNet
16-dir Chaincode Structure Code Generation

Structure code sequence


carries structural information
not easily acquired using chain code sequence including length, direction, and vending

Distance Rotation 18.213 45.934 41.238 45.796 18.299 16.531 45.957 52.815 26.917 53.588 56.840

Straightness 96.828 87.675 99.997 97.941 98.820 88.824 100.000 99.999 99.961 99.881 80.187

Direction 46.813 146.230 0.301 138.221 8.777 298.276 293.199 95.421 356.488 156.188 17.449 1 1 1 1 1 1 0 1 1 1 1

Real 1 1 0 1 0 -1 0 0 0 0 -1

3 37 0 37 0 31 54 11 28 15 5

A Tutorial on HMMs

79

Dr. Speaking

1 2 , ,

A Tutorial on HMMs

80


Feature Extraction & Acoustic Analysis Score Estimation

speech

Decoder

Acoustic Score

Evaluation Score

Target Speech DB spoken by Native Speaker Target Speech DB spoken by Non-native Speaker (mis-pronunciation)

Acoustic Model (Phoneme Unit of Native) Language Model (Phoneme Unit)

Acoustic Model (Phoneme Unit of non native)

Target Pronunciation Dictionary

Target mis-Pronunciation Dictionary (Analysis Non native speech pattern)

A Tutorial on HMMs

81

Acoustic modeling
Native HMM Non-Native HMM

B C
standard

Language modeling
standard

deletion error modeling

error

error

replacement error modeling


A Tutorial on HMMs

insertion error modeling


82


: , , , , ,

A Tutorial on HMMs

83

1 - 2 - 3 - , ,
A Tutorial on HMMs 84

Software Tools for HMM


HMM toolbox for Matlab
Developed by Kevin Murphy Freely downloadable SW written in Matlab (Hmm Matlab is not free!) Easy-to-use: flexible data structure and fast prototyping by Matlab Somewhat slow performance due to Matlab Download: http://www.cs.ubc.ca/~murphyk/Software/HMM/hmm.html

HTK (Hidden Markov toolkit)


Developed by Speech Vision and Robotics Group of Cambridge University Freely downloadable SW written in C Useful for speech recognition research: comprehensive set of programs for training, recognizing and analyzing speech signals Powerful and comprehensive, but somewhat complicate Download: http://htk.eng.cam.ac.uk/

A Tutorial on HMMs

85

What is HTK ?
Hidden Markov Model Toolkit Set of tools for training and evaluation HMMs Primarily used in automatic speech recognition and economic modeling Modular implementation, (relatively) easy to extend

A Tutorial on HMMs

86

HTK Software Architecture


HShell : User input/output & interaction with the OS HLabel : Label files HLM : Language model HNet : Network and lattices HDic : Dictionaries HVQ : VQ codebooks HModel : HMM definitions HMem : Memory management HGraf : Graphics HAdapt : Adaptation HRec : main recognition processing functions
A Tutorial on HMMs 87

Generic Properties of a HTK Tool

Designed to run with a traditional command-line style interface Each tool has a number of required argument plus optional arguments
HFoo -T 1 -f 34.3 -a -s myfile file1 file2

This tool has two main arguments called file1 and file2 plus four optional arguments -f : real number, -T : integer, -s : string, -a : no following value
HFoo -C config -f 34.3 -a -s myfile file1 file2

HFoo will load the parameters stored in the configuration file config during its initialization procedures Configuration parameters can sometimes by used as an alternative to using command line arguments
A Tutorial on HMMs 88

The Toolkit
There are 4 main phases
data preparation, training, testing and analysis

The Toolkit
Data Preparation Tools Training Tools Recognition Tools Analysis Tools
< HTK Processing Stages >
A Tutorial on HMMs 89

Data Preparation Tools


A set of speech data file and their associated transcriptions are required It must by converted into the appropriate parametric form HSlab : Used both to record the speech and to manually annotate it with and required transcriptions HCopy : simply copying each file performs the required encoding HList : used to check the contents of any speech file HLed : output file to a single Master Label file MLF which is usually more convenient for subsequent processing HLstats : gather and display statistics on label files and where required HQuant : used to build a VQ codebook in preparation for building discrete probability HMM system
90

A Tutorial on HMMs

Training Tools
If there is some speech data available for which the location of the sub-word boundaries have been marked, this can be used as bootstrap data HInit and HRest provide isolated word style training using the fully labeled bootstrap data Each of the required HMMs is generated individually
A Tutorial on HMMs 91

Training Tools (contd)


HInit : iteratively compute an initial set of parameter values using a segmental k-means procedure HRest : process fully labeled bootstrap data using a Baum-Welch re-estimation procedure HCompV : all of the phone models are initialized to by identical and have state means and variances equal to the global speech mean and variance HERest : perform a single Baum-Welch re-estimation of the whole set of HMM phone models simultaneously HHed : apply a variety of parameter tying and increment the number of mixture components in specified distributions HEadapt : adapt HMMs to better model the characteristics of particular speakers using a small amount of training or adaptation data

A Tutorial on HMMs

92

Recognition Tools
HVite : use the token passing algorithm to perform Viterbi-based speech recognition HBuild : allow sub-networks to be created and used within higher level networks HParse : convert EBNF into the equivalent word network HSgen : compute the empirical perplexity of the task HDman : dictionary management tool

A Tutorial on HMMs

93

Analysis Tools
HResults
Use dynamic programming to align the two transcriptions and count substitution, deletion and insertion errors Provide speaker-by-speaker breakdowns, confusion matrices and time aligned transcriptions Compute Figure of Merit scores and Receiver Operation Curve information

A Tutorial on HMMs

94

HTK Example
Isolated word recognition

A Tutorial on HMMs

95

Isolated word recognition (contd)

A Tutorial on HMMs

96

Speech Recognition Example using HTK


Recognizer for voice dialing application
Goal of our system
Provide a voice-operated interface for phone dialing

Recognizer
digit strings, limited set of names sub-word based

A Tutorial on HMMs

97

1> gram .
- gram grammar . ------------------------ gram -------------------------$digit = | | | | |..... | | ; $name = | | ..... | ; ( SENT-START ( <$digit> | $name) SENT-ENT ) -------------------------------------------------------$ . < > | or() . SENT-START SENT-END . 2> HParse gram wdnet . - HParse.exe gram wdnet .
A Tutorial on HMMs 98

3> dict
- . -------------------- dict -------------------------SENT-END[] sil SENT-START[] sil kc oxc ngc sp kc uxc sp .... jeoc ngc hc euic sp .... phc axc lc sp hc oxc chc uxc lc sp ---------------------------------------------------A Tutorial on HMMs 99

4> HSGen -l -n 200 wdnet dict


- wdnet dict HSGen.exe 200 .

5> HSGen .
- HSLab

A Tutorial on HMMs

100

6> words.mlf .
- words.mlf . --------------------- words.mlf ---------------------#!MLF!# "*/s0001.lab" . #*/s0002.lab" . ..... ------------------------------------------------------

A Tutorial on HMMs

101

7> mkphones0.led
-mkphones0.led words.mlf . ------------- mkphones0.led ---------------EX IS sil sil DE sp ------------------------------------------- sil sp .

A Tutorial on HMMs

102

8>HLEd -d dict -i phones0.mlf mkphones0.led words.mlf


- HLEd.exe mkphones0.led words.mlf phones0.mlf . ------------------- phones0.mlf -------------------------#!MLF!# EX "*/s0001.lab" sil IS sil sil nc DE sp uxc rc ... oxc ngc kc . "*/s0002.lab" ..... ------------------------------------------------------------

A Tutorial on HMMs

103

9> config
-config mfc . -------------- config --------------------TARGETKIND = MFCC_0 TARGETRATE = 100000.0 SOURCEFORMAT = NOHEAD SOURCERATE = 1250 WINDOWSIZE = 250000.0 ...... -------------------------------------------

A Tutorial on HMMs

104

10> codetr.scp
- *.mfc ------------- codetr.scp ----------------DB\s0001.wav DB\s0001.mfc DB\s0002.wav DB\s0002.mfc ... DB\s0010.wav DB\s0010.mfc ...... ------------------------------------------

11> HCopy -T 1 -C config -S codetr.scp


- HCopy.exe config codetr.scp mfc . mfc config .

A Tutorial on HMMs

105

12> proto train.scp


- proto HMM . 3 left-right ------------------------------ proto --------------------------~o <vecsize> 39 <MFCC_0_D_A> ~h "proto" <BeginHMM> <NumStates> 5 <State>2 <Mean> 39 0.0 0.0 .... <Variance> 39 ... ... <TransP> 5 .... <EndHMM> ----------------------------------------------------------------train.scp: mfc

A Tutorial on HMMs

106

13> config1
- HMM config MFCC_0 MFCC_0_D_A config1 .

14> HCompV -C config1 -f 0.01 -m -S train.scp -m hmm0 proto


- HCompV.exe hmm0 proto vFloors . macros hmmdefs . proto hmmdefs . -------------------- hmmdefs ------------------~h "axc" <BeginHMM> ... <EndHMM> ~h "chc" <BeginHMM> ... <EndHMM> ... ... -------------------------------------------------

A Tutorial on HMMs

107

vFloors ~o macros .
----------------- macros ----------------------~o <VecSize> 39 <MFCC_0_D_A> ~v "varFoorl" <Variance> 39 ... ----------------------------------------------Proto

Hmm0/vFloors

A Tutorial on HMMs

108

15> HERest -C config1 -I phones0.mlf -t 250.0 150.0 1000.0 -S train.scp


- H hmm0\macros -H hmm0\hmmdefs -M hmm1 monophones0 - HERest.exe hmm1 macros hmmdefs . - HERest.exe 2 hmm2 macros hmmdefs . - hmm3, hmm4,

16> HVite -H hmm7/macros H hmm7/hmmdefs -S test.scp -l * -i recout.mlf -w wdnet -p 0.0 -s 5.0 dict monophones

A Tutorial on HMMs

109

A Tutorial on HMMs

110

Summary
Markov model
1-st order Markov assumption on state transition Visible: observation sequence determines state transition seq.

Hidden Markov model


1-st order Markov assumption on state transition Hidden: observation sequence may result from many possible state transition sequences Fit very well to the modeling of spatial-temporally variable signal Three algorithms: model evaluation, the most probable path decoding, model training

HMM applications and Software


Handwriting and speech applications HMM tool box for Matlab HTK

Acknowledgement
HMM , .

A Tutorial on HMMs

111

References
Hidden Markov Model
L.R. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, IEEE Proc. pp. 267-295, 1989. L.R. Bahl et. al, A Maximum Likelihood Approach to Continuous Speech Recognition, IEEE PAMI, pp. 179-190, May. 1983. M. Ostendorf, From HMMs to Segment Models: a Unified View of Stochastic Modeling for Speech Recognition, IEEE SPA, pp 360-378, Sep., 1996.

HMM Tutorials
, HMM Theory and Applications, 2003
.

, ILVB Tutorial, 2005.04.16, . Sam Roweis, Hidden Markov Models (SCIA Tutorial 2003), http://www.cs.toronto.edu/~roweis/notes/scia03h.pdf Andrew Moore, Hidden Markov Models, http://www-2.cs.cmu.edu/~awm/tutorials/hmm.html
A Tutorial on HMMs 112

References (Cont.)
HMM Applications
B.-K. Sin, J.-Y. Ha, S.-C. Oh, Jin H. Kim, Network-Based Approach to Online Cursive Script Recognition, IEEE Transactions on Systems Man and Cybernetics Part B-Cybernetics, Vol. 29, No. 2, pp.321-328, 1999. J.-Y. Ha, "Structure Code for HMM Network-Based Hangul Recognition", 18th International Conference on Computer Processing of Oriental Languages, pp.165-170, 1999. , , , , , , , , 46, pp.87-102, 2003.

HMM Topology optimization


H. Singer, and M. Ostendorf, Maximum likelihood successive state splitting, ICASSP , 1996, pp. 601-604. A. Stolcke, and S. Omohundro, Hidden Markov model induction by Bayesian model merging, Advances in NIPS. 1993, pp. 11-18. San Mateo, CA: Morgan Kaufmann. 0 A. Biem, J.-Y. Ha, J. Subrahmonia, "A Bayesian Model Selection Criterion for HMM Topology Optimization", International Conference on Acoustics Speech and Signal Processing, pp.I989~I992, IEEE Signal Processing Society, 2002. A. Biem, A Model Selection Criterion for Classification: Application to HMM Topology Optimization, ICDAR 2003, pp. 204-210, 2003.

HMM Software
Kevin Murphy, HMM toolbox for Matlab, freely downloadable SW written in Matlab, http://www.cs.ubc.ca/~murphyk/Software/HMM/hmm.html Speech Vision and Robotics Group of Cambridge University, HTK (Hidden Markov toolkit), freely downloadable SW written in C, http://htk.eng.cam.ac.uk/ Sphinx at CMU http://cmusphinx.sourceforge.net/html/cmusphinx.php 113

A Tutorial on HMMs

You might also like