Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Diane Hu Notes on HTMM April 12, 2010

1 Notation

Let there be D documents in the corpus. In standard HMM notation, for each document d:

z(d) = (z1 , . . . , zT ), where zt ∈ {1, . . . , K} (1)


(d) 1×V
x = (x1 , . . . , xT ), where xt ∈ {0, 1} (2)

where z is sequence of hidden states and x is a sequence of observations. Notation is as follows:

Var Code Name Dim Description


D docs .size() 1×1 Number of documents in corpus
Td docs [d].size() 1×1 Number of sentences in document d
Lt sen.size() 1×1 Number of words in sentence t
K topics 1×1 Number of topics
V words 1×1 Number of words in vocabulary
α alpha T × 2K αi (t) = p(x1 , . . . , xt , zt = i | b, a, π)
β beta T × 2K βi (t) = p(xt+1 , . . . , xT |zn = i; b, a, π)
γ p dwzpsi D × T × 2K γi (t)(d) = p(zt = i|w; b, a, π)
b local T ×K bi (t) = p(xt |zt = i)
π init probs 1 × 2K πi = p(z1 = i)
θ theta D×K θdz = p(topic z | document d)
φ phi K ×V φzw = p(word w | topic z)
 epsilon 1×1 Binomial paramater, prior over ψ
λ alpha 1×1 Dirichlet parameter, prior over θ
η beta 1×1 Dirichlet parameter, prior over φ
(d) (d)
E[Czw ] = D
PT
= i, xt = w|x(d) )
P
E[Czw ] Czw K ×V d=1 t=1 p(zt
PT (d) (d)
E[Cdz ] Cdz D×K E[Cdz ] = t=1 p(zt = i, ψt = 1|x(d) )

2 Model

The generative process is as follows:

1. For each topic z ∈ {1, . . . , K}: draw φz ∼ Dirichlet(η)

2. For each document d ∈ {1, . . . , D},

(a) Draw θ ∼ Dirichlet(λ)


(b) For each of sentence xt in d:
(
ψt = 1, if t = 1
(i)
ψt ∼ Binomial(), if t > 1
(c) For each of sentence xt in d:
(
zt = zt−1 , if ψt = 0
(i)
zt ∼ Multinomial(θ), if ψt = 1

(ii) For each word w` in sentence xt : draw w` ∼ Multinomial(φzt )

1
Diane Hu Notes on HTMM April 12, 2010

3 Inference

3.1 Initialization

HTMM parameters , θ, and φ are all initialized randomly.

3.2 E-step

Update HMM-related parameters (M-step within HMM):

V
Y (j)
bi (t) = φij xt (3)
j=1
(
θdi , if 1 ≤ i ≤ K
πi = (4)
0, if i > K

Run HTMM version of the Forward-Backward algorithm (E-step within HMM):


(
bi (1)πi , if 1 ≤ i ≤ K
αi (1) = (5)
bi (1)πi+K , if K + 1 ≤ i ≤ 2K
(
θdi bi (t), if 1 ≤ i ≤ K
αi (t) = (6)
(1 − ) [αi (t − 1) + αi+K (t − 1)] bi (t), if K + 1 ≤ i ≤ 2K
K
X
Normalize all αi (t) by αj (t) + αj+K (t). (7)
j=1
(8)
βi (T ) = 1 (9)
( PK
(1 − )bi (t + 1)βi (t + 1) + j=1 θdj bj (t + 1)βj (t + 1), if 1 ≤ i ≤ K
βi (t) = (10)
βi−K (t), if K + 1 ≤ i ≤ 2K
K
X
Normalize all βi (t) by αj (t) + αj+K (t) (11)
j=1
(12)
αi (t)βi (t)
γi (t) = PK (13)
j=1 αj (t)βj (t)

To compute the complete-data log-likelihood:


T
X K
X
log p(x, z) = log αi (t) + αi+K (t) (14)
t=1 i=1

2
Diane Hu Notes on HTMM April 12, 2010

3.3 M-step

Compute MAP estimates for HTMM parameters , φ, and θ:


PD PT PK
d=1 t=2 i=1 γi (t)(d)
 = PD (15)
d=1 Td − 1

We note that
K
(d)
X
γi (t)(d) = p(ψt = 1|x(d) ) (16)
i=1
2K
(d)
X
γi (t)(d) = p(ψt = 0|x(d) ) (17)
i=K+1

Let E[Cij ] denote the expected number of times word j was drawn from topic i, according to φij :

X Td h
D X ixt(j)
(d) (d)
E[Cij ] = γi (t) + γi+K (t) (18)
d=1 t=1

Then,

φij = E[Cij ] + η − 1 (19)


V
X
Normalize each φij by φim (20)
m=1

Let E[Cdi ] denote the expected number of times topic i was drawn according to θd in document d:

T X
X K
E[Cdi ] = γi (t)(d) (21)
t=1 i=1

Then,

θdi = E[Cdi ] + λ − 1 (22)


X
Normalize each θdi by θdj (23)
j=1

References

[1] Gruber, A., Rozen-Zvi, M., Weiss, Y. “Hidden Topic Markov Models,” Artificial Intelligence and
Statistics (AISTATS), 2007.

You might also like