Professional Documents
Culture Documents
GMM Said Crv10 Tutorial
GMM Said Crv10 Tutorial
Ashorttutorialon
GaussianMixtureModels
By:MohandSadAllili
UniversitduQubecenOutaouais
Plan
-Introduction
-What is a Gaussian mixture model?
-The Expectation-Maximization algorithm
-Some issues
-Applications of GMM in computer vision
Introduction
A few basic equalities that are often used:
Conditional probabilities:
p( A B) = p( A | B) p( B)
Bayes rule:
p( A | B) p( B)
p( B | A) =
P( A)
K
UB
if
= andi j : Bi B j = ,then :
i =1
p( A) = p( A Bi )
i =1
Introduction
What is a Gaussian?
For d dimensions, the Gaussian distribution of a vector x =(x1, x2, ,xd)T
is defined by:
( x | , ) =
exp ( x )T 1 ( x )
2
(2 ) d / 2 | |
1
= (0,0)T
0.25 0.30
=
0.30 1.00
p( x ) = w j ( x | j , j )
j =1
w
Examples:
d=1:
j =1
=1
and
0 wj 1
d=2:
Solution:
p( xi | )
i =1
- At each iteration :
E-Step: Estimate the distribution of the hidden variable given
the data and the current value of the parameters.
M-Step: Maximize the joint distribution of the data and the
hidden variable.
10
11
Q ( , t ) = EZ log( p( X , Z | t ))
ItcanbeenshownthatmaximizingQ ( , t ) :
t +1 = arg max[Q ( , t )]
12
Proof of EM convergence
Q ( , t ) = EZ log( p ( X , Z | t ))
= p( z | X , t ) log( p ( X , Z | ))
z
= p( z | X , t ) log[ p( Z | X , ) p ( X | )]
z
= p ( z | X , t ) log[ p( Z | X , )] + log[ p( X | )]
z
(1)
13
Proof of EM convergence
If = t ,wehave :
Q ( t , t ) = p( z | X , t ) log[ p( Z | X , t )] + log[ p( X | t )]
z
( 2)
From(1)and(2) , we have :
log[ p( X | )] log[ p( X | t )] =
p( Z | X , )
t
Q ( , ) Q ( , ) + p ( z | X , ) log
t
p
(
Z
|
X
,
)
44444
1z44444424
3
t
Conclusion:
p( xi ) = w j ( xi | j , j )
j =1
1ifGaussian jemitted xi .
zij =
0otherwise.
15
L( X , Z , ) = p ( X , Z | )
N
i =1
j =1
i =1
j =1
= [ p( xi , j | )] ij
z
= [ p( xi | j, )] ij [ p( j | )] ij
z
Using"log"function gives :
N
16
Q ( , t ) = EZ log( p( X , Z )) | t
N K
t
= EZ zij log[ p( xi | j, )] + zij log[ p( j | )] |
i =1 j =1
(Posterior distribution)
17
Q ( , t )
=0
j =
p( j | x , )x
t
p( j | x , )
t
i =1
p( j | x , )[( x )( x ) ]
N
j =
i =1
N
i =1
p( j | x , )
i =1
18
Incorporatingtheconstraint w j = 1usingLagrangemultipliergives :
j =1
J ( , ) = Q ( , ) w j 1
j =1
Then :
J ( , t ) Q ( , t )
=
w j
w j
N
=
i =1
whichgives :
p( j | xi , t )
= 0
wj
wj =
t
p
(
j
|
x
,
)
i
i =1
(3)
19
j =1
(4)
t
p
(
j
|
x
,
) =1
i
j =1 i =1
Itfollowsthat : = N
andfinally :
1
wj =
N
t
p
(
j
|
x
,
i
i =1
20
Some issues
1- Initialization:
- EM is an iterative algorithm which is very sensitive
to initial conditions:
2- Number of Gaussians:
- Use information-theoretic criteria to obtain the optima K.
1
2
Some issues
3- Simplification of the covariance matrices:
Case 1: Spherical covariance matrix
j = diag ( 2j , 2j ,..., 2j ) = 2j I
-Less precise.
-Very efficient to compute.
2
2
2
Case 2: Diagonal covariance matrix j = diag ( j1 , j 2 ,..., jd )
-More precise.
-Efficient to compute.
22
Some issues
3- Simplification of the covariance matrices:
Case 1: Full covariance matrix
2j1
cov j ( x 1 , x 2 ) L cov j ( x 1 , x d )
2
cov j ( x 2 , x 1 )
j2
L cov j ( x 2 , x d )
j =
M
M
M
M
cov j ( x d , x 1 ) cov j ( x d , x 2 ) L
jd
-Very precise.
-Less efficient to compute.
23
24
25
Some references
1. Christopher M. Bishop. Pattern Recognition and Machine Learning,
Springer, 2006.
2. Geoffrey McLachlan and David Peel. Finite Mixture Models. Wiley
Series in Probability and Statistics, 2000.
3. Samy Bengio. Statistical Machine Leaning from Data: Gaussian Mixture
Models. http://bengio.abracadoudou.com/lectures/gmm.pdf
4. T. Hastie et al. The Elements of of Statistical Learning. Springer, 2009.
26
Questions?
27