Professional Documents
Culture Documents
(Slides) The em Algorithm
(Slides) The em Algorithm
✬ ✩
THE EM ALGORITHM
Michael Turmon
29 July 1992
References:
A. Dempster, N. Laird, and D. Rubin. “Maximum likelihood from
incomplete data via the EM algorithm.” Jour. Royal Stat. Soc.
Ser. B, pp. 1–39, 1977.
J. A. Fessler and A. O. Hero, “Space-alternating generalized expectation-
maximization algorithm,” IEEE Trans. Sig. Proc., pp. 2664–
2677.
I. Meilijson, “A fast improvement to the EM algorithm on its own
terms,” Jour. Royal. Stat. Soc. Ser. B, pp. 127–138, 1989.
I. Csiszár and G. Tusnády. “Information geometry and alternating
minimization procedures.” Statistics and Decisions, supplement
issue, pp. 205–237, 1984.
✫ ✪
A-2 (2) December 7, 1999
✬ ✩
PROBLEM STATEMENT
✫ ✪
B-1 (3) December 7, 1999
✬ ✩
INCOMPLETE-DATA MODEL
X h Y
Incomplete
Complete Data
Data
y
X (y)
{g(y|θ)}θ∈Ω
{f (x|θ)}θ∈Ω
✫ ✪
B-2 (4) December 7, 1999
✬ ✩
THE EM ALGORITHM
What about X ?
X is chosen to simplify the E and M steps. Often the incomplete-
data y is augmented with data that makes estimating θ easy.
For example, if g(y|θ) is a family of mixtures of M densities, choos-
ing X = Y × {1 . . . M } allows each observation have a mark indi-
cating the density it came from.
✫ ✪
B-3 (5) December 7, 1999
✬ ✩
NON-DECREASING LIKELIHOOD
Complete X Y Incomplete
Data y Data
X (y)
g(y|θ)= X (y) f (x|θ) dx
f (x|θ)
✫ ✪
B-4 (6) December 7, 1999
✬ ✩
✫ ✪
C-1 (7) December 7, 1999
✬ ✩
A SIMPLE EXAMPLE
where of course
g(y|θ) = N (0, σ + θ)
= (2π(σ + θ))−1/2 exp −y 2 /2(σ + θ)
Find maximizing θ:
∂ 1 1 1 y2
log g(y|θ) = − + =0
∂θ 2 σ + θ 2 (σ + θ)2
θmax = y 2 − σ
If y 2 − σ ≥ 0, set θ∗ = θmax .
Otherwise, note g(y|θ) is decreasing on [y 2 − σ, ∞), so put θ∗ as
close to θmax as possible:
θ∗ = max(0, y 2 − σ)
✫ ✪
C-2 (8) December 7, 1999
✬ ✩
(1) E-step:
E[log f (X|θ) | y, θ(p) ] = E[log fS (S|θ) | y, θ(p) ]
+ E[log fN (N ) | y, θ(p) ]
1 S2
= E[− log θ − | y, θ(p) ]
2 2θ
1 N2
+ E[− log σ − | y, θ(p) ] + o. t.
2 2σ
1 1
= − log θ − E[S 2 | y, θ(p) ] + o. t.
2 2θ
(2) M-step:
θ(p+1) = E[S 2 | y, θ(p) ] (as expected)
✫ ✪
C-3 (9) December 7, 1999
✬ ✩
THE EM ITERATION
3
θ(p+1)
2
θ(p)
1 2 3 4 5
θ(0) θ(1) θ(2) θ =y 2 −σ
∗
✫ ✪
D-1 (10) December 7, 1999
✬ ✩
✫ ✪
D-2 (11) December 7, 1999
✬ ✩
Complete X Y Incomplete
Data Data
y
X (y)
g(y|θ)= X (y) f (x|θ) dx
f (x|θ)
✫ ✪
D-3 (12) December 7, 1999
✬ ✩
ALTERNATING MINIMIZATION
A a(0) B
b(0)
a(1) b(1)
a(2) b(2)
✫ ✪
D-4 (13) December 7, 1999
✬ ✩
EM AS ALTERNATING MINIMIZATION
For the EM algorithm, the convex sets are Ω and D(X (y)), and
the distance measure is relative entropy.
The minimization starts with θ(0) and then
k (0) = arg min D(p(x)f (x|θ(0)))
p∈D(X (y))
✫ ✪
E-1 (14) December 7, 1999
✬ ✩
SUMMARY
✫ ✪