GMM Said Crv10 Tutorial

CRV 2010
Ashorttutorialon
GaussianMixtureModels
By:MohandSadAllili
UniversitduQubecenOutaouais
Plan
-Introduction
-What is a Gaussian mixture model?
-The Expectation-Maximization algorithm
-Some issues
-Applications of GMM in computer vision
Introduction
A few basic equalities that are often used:
Conditional probabilities:
p( A B) = p( A | B) p( B)
Bayes rule:
p( A | B) p( B)
p( B | A) =
P( A)
K
UB
if
= andi j : Bi B j = ,then :
i =1
p( A) = p( A Bi )
i =1
Introduction
Carl Friedrich Gauss invented the normal distribution in 1809 as

a way to rationalize the method of least squares.
What is a Gaussian?
For d dimensions, the Gaussian distribution of a vector x =(x1, x2, ,xd)T
is defined by:
( x | , ) =
exp ( x )T 1 ( x )
2
(2 ) d / 2 | |
1
whereisthemean and is the covariance matrix of the Gaussian.

Example:
= (0,0)T
0.25 0.30
=
0.30 1.00
What is a Gaussian mixture model?

The probability given in a mixture of K Gaussians is:
K
p( x ) = w j ( x | j , j )
j =1
wherew j isthepriorprobability(weight)ofthe jthGaussian.

K
w
Examples:
d=1:
j =1
=1
and
0 wj 1
d=2:
What is a Gaussian mixture model?

Problem:
Given a set of data X = {x1, x2 ,..., x N } drawn from an

unknown distribution (probably a GMM), estimate the
parameters of the GMM model that fits the data.
Solution:
Maximize the likelihood p( X | ) of the data with regard

to the model parameters ?
N
* = arg max p( X | ) = arg max
p( xi | )
i =1
The Expectation-Maximization algorithm

One of the most popular approaches to maximize the likelihood
is to use the Expectation - Maximization (EM) algorithm.
Basic ideas of the EM algorithm:

- Introduce a hidden variable such that its knowledge would simplify
the maximization of the likelihood.
- At each iteration :
E-Step: Estimate the distribution of the hidden variable given
the data and the current value of the parameters.
M-Step: Maximize the joint distribution of the data and the
hidden variable.
The EM for the GGM (graphical view 1)

Hidden variable: for each point, which Gaussian generated it?

E-Step: for each point, estimate the probability that each Gaussian
generated it.
10

M-Step: modify the parameters according to the hidden variable to
maximize the likelihood of the data (and the hidden variable).
11
General formulation of the EM

(Inwhatfollows the hidden variable nameis Z )
Considerthefollowing " auxillary" function :
Q ( , t ) = EZ log( p( X , Z | t ))
ItcanbeenshownthatmaximizingQ ( , t ) :
t +1 = arg max[Q ( , t )]
alwaysmaximizes the liekelihood of thedata.
12
Proof of EM convergence
Q ( , t ) = EZ log( p ( X , Z | t ))
= p( z | X , t ) log( p ( X , Z | ))
z
= p( z | X , t ) log[ p( Z | X , ) p ( X | )]
z
= p ( z | X , t ) log[ p( Z | X , )] + log[ p( X | )]
z
(1)
13
Proof of EM convergence
If = t ,wehave :
Q ( t , t ) = p( z | X , t ) log[ p( Z | X , t )] + log[ p( X | t )]
z
( 2)
From(1)and(2) , we have :
log[ p( X | )] log[ p( X | t )] =
p( Z | X , )
t
Q ( , ) Q ( , ) + p ( z | X , ) log
t
p
(
Z
|
X
,
)
44444
1z44444424
3
t
Conclusion:
IfQ ( , t )increasesthenthelikelihood p( X | )increases.

ThemaximumofQ ( , t )isthesameasthemaximumlikelihood.
14
EM for the GMM

NotethatifZ is observed, then the parameters can be estimated
GaussianbyGaussian.Unfortunately,Z isnotalwaysknown.
LetuswritethemixtureofGaussiansforadatapointxi , i = 1,..., N :
K
p( xi ) = w j ( xi | j , j )
j =1
Weintroducethe indicator variable :
1ifGaussian jemitted xi .
zij =
0otherwise.
15
EM for the GMM

Wecannowwrite the joint likelihood ofall X and Z as follows :
L( X , Z , ) = p ( X , Z | )
N
i =1
j =1
i =1
j =1
= [ p( xi , j | )] ij
z
= [ p( xi | j, )] ij [ p( j | )] ij
z
Using"log"function gives :
N
log[ p( X , Z )]= zij log[ p( xi | j, )] + zij log[ p( j | )]

i =1 j =1
16
EM for the GMM

Theauxillaryfunction is given by :
Q ( , t ) = EZ log( p( X , Z )) | t
N K
t
= EZ zij log[ p( xi | j, )] + zij log[ p( j | )] |
i =1 j =1
= EZ ( zij | t ) log[ p( xi | j, )] + EZ ( zij | t ) log[ p( j | )]

i =1 j =1
Wehavethenthe E Step given by :
EZ ( zij | t ) = 1 p( zij = 1 | t ) + 0 p( zij = 0 | t )

= p( j | xi , t )
p( j, t ) p( xi | j, t )
=
p( xi , t )
(Posterior distribution)
17
EM for the GMM

AndtheM Step given by :
Q ( , t )
=0
where = ( w j , j , j , j = 1,..., K )and w j = 1.

j =1
Afterstraightforward manipulations, weobtain :

N
j =
p( j | x , )x
t
p( j | x , )
t
i =1
p( j | x , )[( x )( x ) ]
N
j =
i =1
N
i =1
p( j | x , )
i =1
18
EM for the GMM

K
Incorporatingtheconstraint w j = 1usingLagrangemultipliergives :
j =1
J ( , ) = Q ( , ) w j 1
j =1
Then :
J ( , t ) Q ( , t )
=
w j
w j
N
=
i =1
whichgives :
p( j | xi , t )
= 0
wj
wj =
t
p
(
j
|
x
,
)
i
i =1
(3)
19
EM for the GMM

Also,wehave :
J ( , t ) K
= wj 1 = 0
j =1
(4)
From (3) and (4), we obtain :
t
p
(
j
|
x
,
) =1
i
j =1 i =1
Itfollowsthat : = N
andfinally :
1
wj =
N
t
p
(
j
|
x
,
i
i =1
20
Some issues
1- Initialization:
- EM is an iterative algorithm which is very sensitive
to initial conditions:
Start from trash end upwith trash

- Usually, we use the K-Means to get a good initialization.
2- Number of Gaussians:
- Use information-theoretic criteria to obtain the optima K.
Ex: MDL = log( L( X , Z , )) + ( K 1) + K D + D( D + 1) log( N )
1
2
Othercriteria : AIC , BIC , MML, etc.

21
Some issues
3- Simplification of the covariance matrices:
Case 1: Spherical covariance matrix
j = diag ( 2j , 2j ,..., 2j ) = 2j I
-Less precise.
-Very efficient to compute.
2
2
2
Case 2: Diagonal covariance matrix j = diag ( j1 , j 2 ,..., jd )
-More precise.
-Efficient to compute.
22
Some issues
3- Simplification of the covariance matrices:
Case 1: Full covariance matrix
2j1
cov j ( x 1 , x 2 ) L cov j ( x 1 , x d )
2
cov j ( x 2 , x 1 )
j2
L cov j ( x 2 , x d )
j =
M
M
M
M
cov j ( x d , x 1 ) cov j ( x d , x 2 ) L
jd
-Very precise.
-Less efficient to compute.
23
Applications of GMM in computer vision

1- Image segmentation:
X = (R, G, B )T
24
Applications of GMM in computer vision

2- Object tracking:
Knowing the moving object distribution in the first frame, we can localize
the objet in the next frames by tracking its distribution.
25
Some references
1. Christopher M. Bishop. Pattern Recognition and Machine Learning,
Springer, 2006.
2. Geoffrey McLachlan and David Peel. Finite Mixture Models. Wiley
Series in Probability and Statistics, 2000.
3. Samy Bengio. Statistical Machine Leaning from Data: Gaussian Mixture
Models. http://bengio.abracadoudou.com/lectures/gmm.pdf
4. T. Hastie et al. The Elements of of Statistical Learning. Springer, 2009.
26
Questions?
27

GMM Said Crv10 Tutorial

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

GMM Said Crv10 Tutorial

Uploaded by

Copyright:

Available Formats

CRV 2010

Carl Friedrich Gauss invented the normal distribution in 1809 as

whereisthemean and is the covariance matrix of the Gaussian.

What is a Gaussian mixture model?

wherew j isthepriorprobability(weight)ofthe jthGaussian.

What is a Gaussian mixture model?

Given a set of data X = {x1, x2 ,..., x N } drawn from an

Maximize the likelihood p( X | ) of the data with regard

* = arg max p( X | ) = arg max

The Expectation-Maximization algorithm

Basic ideas of the EM algorithm:

The EM for the GGM (graphical view 1)

The EM for the GGM (graphical view 2)

The EM for the GGM (graphical view 3)

General formulation of the EM

alwaysmaximizes the liekelihood of thedata.

IfQ ( , t )increasesthenthelikelihood p( X | )increases.

EM for the GMM

Weintroducethe indicator variable :

EM for the GMM

log[ p( X , Z )]= zij log[ p( xi | j, )] + zij log[ p( j | )]

EM for the GMM

= EZ ( zij | t ) log[ p( xi | j, )] + EZ ( zij | t ) log[ p( j | )]

Wehavethenthe E Step given by :

EZ ( zij | t ) = 1 p( zij = 1 | t ) + 0 p( zij = 0 | t )

EM for the GMM

where = ( w j , j , j , j = 1,..., K )and w j = 1.

Afterstraightforward manipulations, weobtain :

EM for the GMM

EM for the GMM

From (3) and (4), we obtain :

Start from trash end upwith trash

Ex: MDL = log( L( X , Z , )) + ( K 1) + K D + D( D + 1) log( N )

Othercriteria : AIC , BIC , MML, etc.

Applications of GMM in computer vision

Applications of GMM in computer vision

You might also like