Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

CRV 2010

Ashorttutorialon
GaussianMixtureModels
By:MohandSadAllili
UniversitduQubecenOutaouais

Plan
-Introduction
-What is a Gaussian mixture model?
-The Expectation-Maximization algorithm
-Some issues
-Applications of GMM in computer vision

Introduction
A few basic equalities that are often used:
Conditional probabilities:

p( A B) = p( A | B) p( B)
Bayes rule:

p( A | B) p( B)
p( B | A) =
P( A)
K

UB

if

= andi j : Bi B j = ,then :

i =1

p( A) = p( A Bi )
i =1

Introduction

Carl Friedrich Gauss invented the normal distribution in 1809 as


a way to rationalize the method of least squares.

What is a Gaussian?
For d dimensions, the Gaussian distribution of a vector x =(x1, x2, ,xd)T
is defined by:

( x | , ) =

exp ( x )T 1 ( x )
2

(2 ) d / 2 | |
1

whereisthemean and is the covariance matrix of the Gaussian.


Example:

= (0,0)T

0.25 0.30

=
0.30 1.00

What is a Gaussian mixture model?


The probability given in a mixture of K Gaussians is:
K

p( x ) = w j ( x | j , j )
j =1

wherew j isthepriorprobability(weight)ofthe jthGaussian.


K

w
Examples:
d=1:

j =1

=1

and

0 wj 1
d=2:

What is a Gaussian mixture model?


Problem:

Given a set of data X = {x1, x2 ,..., x N } drawn from an


unknown distribution (probably a GMM), estimate the
parameters of the GMM model that fits the data.

Solution:

Maximize the likelihood p( X | ) of the data with regard


to the model parameters ?
N

* = arg max p( X | ) = arg max

p( xi | )
i =1

The Expectation-Maximization algorithm


One of the most popular approaches to maximize the likelihood
is to use the Expectation - Maximization (EM) algorithm.

Basic ideas of the EM algorithm:


- Introduce a hidden variable such that its knowledge would simplify
the maximization of the likelihood.

- At each iteration :
E-Step: Estimate the distribution of the hidden variable given
the data and the current value of the parameters.
M-Step: Maximize the joint distribution of the data and the
hidden variable.

The EM for the GGM (graphical view 1)


Hidden variable: for each point, which Gaussian generated it?

The EM for the GGM (graphical view 2)


E-Step: for each point, estimate the probability that each Gaussian
generated it.

10

The EM for the GGM (graphical view 3)


M-Step: modify the parameters according to the hidden variable to
maximize the likelihood of the data (and the hidden variable).

11

General formulation of the EM


(Inwhatfollows the hidden variable nameis Z )
Considerthefollowing " auxillary" function :

Q ( , t ) = EZ log( p( X , Z | t ))

ItcanbeenshownthatmaximizingQ ( , t ) :

t +1 = arg max[Q ( , t )]

alwaysmaximizes the liekelihood of thedata.

12

Proof of EM convergence

Q ( , t ) = EZ log( p ( X , Z | t ))

= p( z | X , t ) log( p ( X , Z | ))
z

= p( z | X , t ) log[ p( Z | X , ) p ( X | )]
z

= p ( z | X , t ) log[ p( Z | X , )] + log[ p( X | )]
z

(1)

13

Proof of EM convergence
If = t ,wehave :

Q ( t , t ) = p( z | X , t ) log[ p( Z | X , t )] + log[ p( X | t )]
z

( 2)

From(1)and(2) , we have :
log[ p( X | )] log[ p( X | t )] =

p( Z | X , )
t
Q ( , ) Q ( , ) + p ( z | X , ) log
t
p
(
Z
|
X
,

)
44444
1z44444424
3
t

Conclusion:

IfQ ( , t )increasesthenthelikelihood p( X | )increases.


ThemaximumofQ ( , t )isthesameasthemaximumlikelihood.
14

EM for the GMM


NotethatifZ is observed, then the parameters can be estimated
GaussianbyGaussian.Unfortunately,Z isnotalwaysknown.
LetuswritethemixtureofGaussiansforadatapointxi , i = 1,..., N :
K

p( xi ) = w j ( xi | j , j )
j =1

Weintroducethe indicator variable :

1ifGaussian jemitted xi .

zij =
0otherwise.

15

EM for the GMM


Wecannowwrite the joint likelihood ofall X and Z as follows :

L( X , Z , ) = p ( X , Z | )
N

i =1

j =1

i =1

j =1

= [ p( xi , j | )] ij
z

= [ p( xi | j, )] ij [ p( j | )] ij
z

Using"log"function gives :
N

log[ p( X , Z )]= zij log[ p( xi | j, )] + zij log[ p( j | )]


i =1 j =1

16

EM for the GMM


Theauxillaryfunction is given by :

Q ( , t ) = EZ log( p( X , Z )) | t
N K
t
= EZ zij log[ p( xi | j, )] + zij log[ p( j | )] |
i =1 j =1

= EZ ( zij | t ) log[ p( xi | j, )] + EZ ( zij | t ) log[ p( j | )]


i =1 j =1

Wehavethenthe E Step given by :

EZ ( zij | t ) = 1 p( zij = 1 | t ) + 0 p( zij = 0 | t )


= p( j | xi , t )
p( j, t ) p( xi | j, t )
=
p( xi , t )

(Posterior distribution)
17

EM for the GMM


AndtheM Step given by :

Q ( , t )
=0

where = ( w j , j , j , j = 1,..., K )and w j = 1.


j =1

Afterstraightforward manipulations, weobtain :


N

j =

p( j | x , )x
t

p( j | x , )
t

i =1

p( j | x , )[( x )( x ) ]
N

j =

i =1
N

i =1

p( j | x , )
i =1

18

EM for the GMM


K

Incorporatingtheconstraint w j = 1usingLagrangemultipliergives :
j =1

J ( , ) = Q ( , ) w j 1
j =1

Then :

J ( , t ) Q ( , t )
=

w j
w j
N

=
i =1

whichgives :

p( j | xi , t )
= 0
wj

wj =

t
p
(
j
|
x
,
)

i
i =1

(3)

19

EM for the GMM


Also,wehave :
J ( , t ) K
= wj 1 = 0

j =1

(4)

From (3) and (4), we obtain :

t
p
(
j
|
x
,

) =1

i
j =1 i =1

Itfollowsthat : = N

andfinally :

1
wj =
N

t
p
(
j
|
x
,

i
i =1

20

Some issues
1- Initialization:
- EM is an iterative algorithm which is very sensitive
to initial conditions:

Start from trash end upwith trash


- Usually, we use the K-Means to get a good initialization.

2- Number of Gaussians:
- Use information-theoretic criteria to obtain the optima K.

Ex: MDL = log( L( X , Z , )) + ( K 1) + K D + D( D + 1) log( N )

1
2

Othercriteria : AIC , BIC , MML, etc.


21

Some issues
3- Simplification of the covariance matrices:
Case 1: Spherical covariance matrix

j = diag ( 2j , 2j ,..., 2j ) = 2j I
-Less precise.
-Very efficient to compute.

2
2
2
Case 2: Diagonal covariance matrix j = diag ( j1 , j 2 ,..., jd )

-More precise.
-Efficient to compute.
22

Some issues
3- Simplification of the covariance matrices:
Case 1: Full covariance matrix

2j1
cov j ( x 1 , x 2 ) L cov j ( x 1 , x d )

2
cov j ( x 2 , x 1 )
j2
L cov j ( x 2 , x d )
j =

M
M
M
M

cov j ( x d , x 1 ) cov j ( x d , x 2 ) L

jd

-Very precise.
-Less efficient to compute.

23

Applications of GMM in computer vision


1- Image segmentation:
X = (R, G, B )T

24

Applications of GMM in computer vision


2- Object tracking:
Knowing the moving object distribution in the first frame, we can localize
the objet in the next frames by tracking its distribution.

25

Some references
1. Christopher M. Bishop. Pattern Recognition and Machine Learning,
Springer, 2006.
2. Geoffrey McLachlan and David Peel. Finite Mixture Models. Wiley
Series in Probability and Statistics, 2000.
3. Samy Bengio. Statistical Machine Leaning from Data: Gaussian Mixture
Models. http://bengio.abracadoudou.com/lectures/gmm.pdf
4. T. Hastie et al. The Elements of of Statistical Learning. Springer, 2009.

26

Questions?

27

You might also like