Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

• Suppose that Zj ⇠ Gamma(a j , b), j 2 [k ]. Then Âkj=1 Zj ⇠ Gamma(Âdj=1 a j , b).

Since there is a bijection


between (Z, S) and ( Z1 , · · · , Zk ), we have

p(z, s) = p(z1 s, · · · zk s)
k
1 ai 1 b zi s
’ zi i
a
µ s e
i =1

k
k
1 1
= s  i =1 a i ’ zi i
bs a
e
i =1

k 1
Since p(s) µ sÂi=1 ai 1 e and p(z) µ ’ik=1 zi i
a
bs which is the kernel of the Dirichlet distribution with
parameters a1 , · · · , ak .

• This is a direct consequence of the previous result: p(z, s) µ p(z) p(s).


✓ ◆
d Z1 Zk
• ( p1 , · · · , p k ) = k , · · · , k for independent Zi ⇠ Gamma(ai , b). Therefore,
Âi=1 Zi Âi=1 Zi

! ! ✓ ◆
d Âi2 A1 Zi Âi2 Am Zi d Y1 Ym
( P( A1 ), · · · , P( Am )) = Â pi , · · · , Â pi = ,··· , =
Âim=1 Yi
,··· ,
Âim=1 Yi
i 2 A1 i 2 Am Âik=1 Zi Âik=1 Zi
0 0 0
for Yj := Âi2 A j Zi . Since Yj ⇠ Gamma(Âi2 A j ai , b), ( P( A1 ), · · · , P( Am )) ⇠ Dir (a1 , · · · , am ) where ai =
 j2 Ai a j for i = 1, · · · , k.
• Define zi as an vector of independent Gamma random variables with parameters ai for i = 1, 2. By the
result in part 1),
d Z1 d Z2
P1 = , P2 =
k Z1 k1 k Z2 k1
By the result in part 2), P1 ? kZ1 k1 ? P2 ? kZ2 k1 .
One can represent Y as the proportion of Gamma(a1 (X )) over Gamma(a1 (X ) + a2 (X )). We have
kzi k1 ⇠ Gamma(ai (X )) for i = 1, 2 and they are independent, so

d k Z1 k1 k Z1 k1
Y= =
k Z1 k1 + k Z2 k1 k Z1 + Z2 k1

Therefore,
d 1 1 Z1 + Z2
YP1 + (1 Y ) P2 = Z + Z =
k Z1 + Z2 k1 1 k Z1 + Z2 k1 2 k Z1 + Z2 k1
which means YP1 + (1 Y ) P2 ⇠ Dir (a1 + a2 ).
✓ ◆
d Z1 Zk
• Again, ( p1 , · · · , pk ) = k ,··· , k for independent Zi ⇠ Gamma(ai , b). This implies that
Âi=1 Zi Âi=1 Zi
Âik=11 Zi
q := , and
Âik=1 Zi
!
d Z1 Zk 1
( p1 /q, · · · , pk 1 /q ) = ,··· , ⇠ Dir (a1 , · · · ak 1)
Âik=11 Zi Âik=11 Zi
Problem 4
• When we do M-step in the EM algorithm procedure, we need to maximize the expected
complete log likelihood given observed data, but not directly maximize the complete
log likelihood. So the optimal solutions for the parameters are based on expectations
of the complete data, but not the complete data themselves. The calculation of those
expectations are done in E-step.

• E-step is to calculate the conditional expectations which will be used in the M-step. For
this particular problem, direct calculation (summing over all configurations of latent
variables) is computationally infeasible. Thus we need to run Gibbs sampler for many
iterations, and take the Monte Carlo estimates (average of generated samples) after
burn-in time to approximate those expectations.

Attached is the example solution of Problem 4 from Roger Fan.

3
Problem 4

Consider the model


IT 7riZi(l -7!"i)1-z;
K

p ( Zl,···,ZL I 7r) _
-
i=l
(5)

a) The posterior of zi I z_i, y, 0 is a Bernoulli distribution with probabilities

OX\+
p(zi = x I z_i, y, 0) ex: p(z-i, y I Zi = x, 0)p(zi = x, 0)
= p(y I Li, Zi = x, 0)p(z-i I Zi = x, 0)p(zi = x I 0)

L ZR,µ£+ xµi, 0"2I) 7rf (1 - 7ri) -x IT 7r?(l - 7!"£) -ze


£Ii
l

£Ii
1

Therefore, we get that

b) Given data points y1, ... , Y n, where Yi E lllM, Zi E lll K and letting µ E lllKxM our expected
log-likelihood is

1
logp(y, z ) = - 2 'I)Yi -µ 'zi)' (Yi -µ'zi) - n� log 0"
2
2

(J" i=l

+LL
n K
(zik log7rk + (1 - Zik) log(l -7rk))
i=l k=l
n
1"'/1 1 "'/ /
n n
1"'1
= -- 2 2 L.t
YiYi + 2 L.t Y i µ Zi - -
2 2 L.t i
z µµ Zi +···

t t t
(J" i=l (J" i=l (J" i =l
1
E logp(y, z) = - 2 2 Y:Yi +� y:µ' Ezi - :2 E (z:µµ'zi) +···
(J" i=l (J" i=l 2 i= l

You might also like