E-Step

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

Jim Inoue

TA: Leilani Battle, section AB

(1)

E-Step

In the E-step we need to find E[Zij] for j = 1,2,3.

Let A be the event that xi is drawn from the distribution f1


Let B be the event that xi is drawn from the distribution f2
Let C be the event that xi is drawn from the distribution f3
Let D be the event that xi is observed.

We want to calculate P(A|D),P(B|D), and P(C|D) as these are the expected values of Zi1,Zi2, and Zi3
respectively. Using Bayes’ Theorem:

P ( D|X )∗P ( X )
P ( X|D )= , where X= A , B ,C∧P ( A ) , P ( B ) , P ( C )=T =1/3
P(D)

P ( D )=P ( D|A )∗P ( A ) + P ( D|B )∗P ( B )+ P ( D|C )∗P ( C )

¿ T 1∗P ( D| A ) +T 2∗P ( D|B ) +T 3∗P ( D|C )

Calculate P(D|X) by:


2
− ( X i −μ j )
1 2∗σ
2
e , where σ=1
√2 π σ 2
Plugging in the corresponding values into the above equation will result in the expected values of Zij. We
will then need the expected values for the M-step

M-Step

1
We then try to maximize the expected log likelihood of the data assuming that τ 1 =τ 2=τ 3= and σ 2=1
3
n −1
τ∗1 2 σ 2
¿¿
L ( x1 … x n|θ , τ ¿=∏ e ¿
i=1 √2 π σ 2
1 1 1
ln ⁡( L ( x 1 … x n|θ , τ ) ¿=ln − ln ⁡(2 π σ 2 )− 2 ¿
3 2 2σ
Then apply the expected value to the log likelihood. As a result of linearity of expectation the following
occurs:

1 1 1
E ¿ ¿=ln − ln ⁡(2 π σ 2)− 2 ¿
3 2 2σ
2
Set μ j=μ1=θ 1∧σ =θ 2

δ δ
E ¿ ¿= ¿
δ θ1 δ θ1
n
E[Z i 1 ]( x i−θ1 )
¿∑
i=1 θ2

Set to 0 to find the max


n

∑ E [Z ij]( xi −θ1)=0 , θ2=1


i=1

n n

∑ E [ Z i1 ] x i−∑ E [ Z i 1 ] θ 1=0
i=1 i=1

n n

∑ E [ Z i1 ] x i=∑ E [ Z i 1 ] θ 1
i=1 i=1

∑ E [ Zi 1 ] xi
i=1
n
=θ1
∑ E [ Z i1 ]
i=1

For the general case of j:


n

∑ E [ Z ij ] x i
μ j= i=1n
∑ E [ Z ij]
i=1

We must then calculate μ j for distribution (1,2,3). Once we have calculated the new μ j , we then can
repeat the e-step with these new mu values. This will be repeated several times until we hit a desired
threshold.

Calculating Log Likelihood:


Z1 j= 1 if x 1 is drawn f j ¿ 0 otherwise ¿
{ ¿

The probability of an individual x i:

P( D)=¿

P ( D )=τ 1 f 1 ( x 1|θ¿ +τ 2 f 2 ( x 1|θ ¿+ τ 2 f 2 ( x1|θ ¿

The total likelihood is the product of the P(D)


n
L ( xi … x n|θ ¿=∏ P( Di )
i=1

NOTE: For the purpose of computation in the actual code, this will be calculating by summing the logs of
the P(D) terms.

ln ⁡( L ( x i … x n|θ)¿=ln ⁡¿

ln ⁡( L ( x i … x n|θ)¿=ln ⁡¿

Pseudocode:

Put the data in an array list and choose three numbers from the dataset to be mu1, mu2, mu3
respectively.

Do the following:

1. Approximate the three E [ Z ij ] with the normal distribution using μ=μ 1 , μ2 , μ3 and σ =1
2. Recalculate new μ j by:

∑ E [ Z ij ] x i
μ j= i=1n
∑ E [ Z ij]
i=1

Repeat step s 1 and 2 until reach threshold.

Print results.

//Threshold will be the change in log likelihood.

(2)

Initialization:
I saved all of the data in an ArrayList and calculated the max. I then took the max, 1/3, and 2/3 of the
max as mu1, mu2, and mu3. This ensures that the three values will be within the range of the data
points and that the mus will be spread out. From experimentation, having three values spread out as the
mus on average gave a better estimation.

Termination:

To terminate, I found the change in the log likelihoods of each iteration and stopped when the change
was no more than ε = .001 between iterations.

(3) Arbitrarily setting the variances to 1 will not give you the most accurate representation of the data.
You cannot necessarily assume that the variance is 1 for each distribution. However, it is easier to
compute the mus when we can assume that variance 1 as we don’t have to calculate the unknown
variance. Specifically looking at the data, there is no reason to think that the third distribution has a
variance of 1. It is easy to tell from looking at the distribution that the variance for the third distribution
is not accurately depicted by 1 (the curve doesn’t even cover all of the points).

(4)

Initialization: Randomly choosing three indexes from the Arraylist. This did not work too well. A lot of
times I would get a mu of NaN or a likelihood of –Infinity. The issue with using a random approach is that
you don’t capture the underlying total distribution. You could potentially pick 3 random numbers that
belong to the same distribution.

Termination: I initially used the change in Mus as the threshold. While this theoretically works fine, I
noticed that checking the three Mus was computationally (writing the code) a little messier. It was easier
to just find the change in the likelihood because I just had to compare 1 number.

You might also like