Dsci303-19 GM - em

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 81

Gaussian mixture models

&
Expectation-Maximization

1
Clustering Methods
• Clustering
• Hard clustering: clusters do not overlap
• Assign each example to one cluster (e.g. K-means)
• Soft clustering: clusters do overlap
• Assign data to cluster with some probability (EM algorithm)
• Strength of association between clusters and instances

• Mixture Models
• Probability based Soft Clustering
• Each cluster -> generative models (Gaussian or multinomial)
• Estimate parameters (mean, covariance are unknown)
Gaussian Mixture Model
• Gaussian Mixture Models (GMMs) assume that there
are a certain number of Gaussian distributions, and
each of these distributions represent a cluster.

• A Gaussian Mixture Model tends to group the data


points belonging to a single distribution together
Introduction to GMM

• Gaussian • Mixture Model


“Gaussian is a “mixture model is a
characteristic symmetric probabilistic model which
"bell curve" shape that assumes the underlying
quickly falls off towards 0 data to belong to a
(practically)” mixture distribution”

2
Normal (Gaussian)
Distribution

• µ is the mean
• σ2 is the variance
Mixture Models
• Formally a Mixture Model is the weighted sum
of a number of pdfs where the weights are
determined by a distribution,
Gaussian Mixture Models
• GMM: the weighted sum of a number of
Gaussians where the weights are determined
by a distribution,
0.5

0.4 Component 1 Component 2


0.3
p(x)

0.2

0.1

0
-5 0 5 10

0.5

0.4
Mixture Model
0.3
p(x)

0.2

0.1

0
-5 0 5 10
x
0.5

0.4 Component 1 Component 2


0.3
p(x)

0.2

0.1

0
-5 0 5 10

0.5

0.4
Mixture Model
0.3
p(x)

0.2

0.1

0
-5 0 5 10
x
2

1.5 Component Models


p(x)

0.5

0
-5 0 5 10

0.5

0.4
Mixture Model
0.3
p(x)

0.2

0.1

0
-5 0 5 10
x
GMM example
Quiz:
The formula for the Gaussian density is:

Which of the following is the formula for the density of this figure?

Respond at www.pollev.com/akanesano925
The formula for the Gaussian density is:

Which of the following is the formula for the density of this figure?
Gaussian Mixture Models
• Rather than identifying clusters by “nearest”
centroids
• Fit a Set of k Gaussians to the data
• Maximum Likelihood over a mixture model
1-D Data:
come from 2 gaussian models
Which sources does each data come from?

X1 X2 X3 X10
EM algorithm
Find the best fit models to maximize the likelihood of the data

Gaussian a? Gaussian b?

X1 X2 X3 X10
EM algorithm
Find the best fit models to maximize the likelihood of the data
1) Start with two random gaussians: N(mean(a), sigma(a)), N(mean(b), sigma(b))

Gaussian a? Gaussian b?

X1 X2 X3 X10
EM algorithm
Find the best fit models to maximize the likelihood of the data
1) Start with two random gaussians: N(mean(a), sigma(a)), N(mean(b), sigma(b))

2) Compute P(b | xi) : does this look like it comes from b? (E-step)
1 (𝑥𝑥𝑥𝑥 − 𝜇𝜇𝑏𝑏)2
P (xi | b) = exp(− )
2 2𝜎𝜎𝑏𝑏2
2𝜋𝜋𝜎𝜎𝑏𝑏

Gaussian a? Gaussian b?

X1 X2 X3 X10
EM algorithm
Find the best fit models to maximize the likelihood of the data
1) Start with two random gaussians: N(mean(a), sigma(a)), N(mean(b), sigma(b))

2) Compute P(b | xi) : does this look like it comes from b? (E-step)
1 (𝑥𝑥𝑥𝑥 − 𝜇𝜇𝑏𝑏)2
P (xi | b) = exp(− )
2 2𝜎𝜎𝑏𝑏2
2𝜋𝜋𝜎𝜎𝑏𝑏

3) Adjust N(mean(a), sigma(a)), N(mean(b), sigma(b)) to fit points


(M-step) (find mean, sigma and pi parameters to maximize the likelihood=>
derivative of the likelihood = 0 in respect to mean, sigma, pi)

Gaussian a? Gaussian b?

X1 X2 X3 X10
EM algorithm
Find the best fit models to maximize the likelihood of the data
1) Start with two random gaussians: N(mean(a), sigma(a)), N(mean(b), sigma(b))

2) Compute P(b | xi) : does this look like it comes from b? (E-step)
1 (𝑥𝑥𝑥𝑥 − 𝜇𝜇𝑏𝑏)2
P (xi | b) = exp(− )
2 2𝜎𝜎𝑏𝑏2
2𝜋𝜋𝜎𝜎𝑏𝑏

3) Adjust N(mean(a), sigma(a)), N(mean(b), sigma(b)) to fit points


(M-step) (find mean, sigma and pi parameters to maximize the likelihood=>
derivative of the likelihood = 0 in respect to mean, sigma, pi)
4) iterate until it converges (the likelihood change is small)
Gaussian a? Gaussian b?

X1 X2 X3 X10
EM: 1-d example
want to discover two gaussian
𝑎𝑎 𝑏𝑏
EM: 1-d example
Step1 : Define random gaussian
1 (𝑥𝑥𝑖𝑖 − 𝜇𝜇𝑏𝑏 )2
𝑝𝑝 𝑥𝑥𝑖𝑖 𝑏𝑏) = exp −
2 2𝜎𝜎𝑏𝑏2
2𝜋𝜋𝜎𝜎𝑏𝑏
EM: 1-d example
Step1 : Define random gaussian
1 (𝑥𝑥𝑖𝑖 − 𝜇𝜇𝑏𝑏 )2
𝑝𝑝 𝑥𝑥𝑖𝑖 𝑏𝑏) = exp −
2 2𝜎𝜎𝑏𝑏2
2𝜋𝜋𝜎𝜎𝑏𝑏
𝑃𝑃 𝑥𝑥𝑖𝑖 | 𝑏𝑏 𝑃𝑃(𝑏𝑏)
𝑏𝑏𝑖𝑖 = P(𝑏𝑏 | 𝑥𝑥𝑖𝑖 ) =
𝑃𝑃 𝑥𝑥𝑖𝑖 | 𝑏𝑏 𝑃𝑃 𝑏𝑏 +𝑃𝑃 𝑥𝑥𝑖𝑖 | 𝑎𝑎 𝑃𝑃(𝑎𝑎

𝑎𝑎𝑖𝑖 = P 𝑎𝑎 | 𝑥𝑥𝑖𝑖 = 1 − 𝑏𝑏𝑖𝑖


EM: 1-d example
Step1 : Define random gaussian
1 (𝑥𝑥𝑖𝑖 − 𝜇𝜇𝑏𝑏 )2
𝑝𝑝 𝑥𝑥𝑖𝑖 𝑏𝑏) = exp −
2 2𝜎𝜎𝑏𝑏2
2𝜋𝜋𝜎𝜎𝑏𝑏
𝑃𝑃 𝑥𝑥𝑖𝑖 | 𝑏𝑏 𝑃𝑃(𝑏𝑏)
𝑏𝑏𝑖𝑖 = P(𝑏𝑏 | 𝑥𝑥𝑖𝑖 ) =
𝑃𝑃 𝑥𝑥𝑖𝑖 | 𝑏𝑏 𝑃𝑃 𝑏𝑏 +𝑃𝑃 𝑥𝑥𝑖𝑖 | 𝑎𝑎 𝑃𝑃(𝑎𝑎

𝑎𝑎𝑖𝑖 = P 𝑎𝑎 | 𝑥𝑥𝑖𝑖 = 1 − 𝑏𝑏𝑖𝑖

𝑏𝑏1 𝑥𝑥1 + 𝑏𝑏2 𝑥𝑥2 + … + 𝑏𝑏𝑛𝑛 𝑥𝑥𝑛𝑛


𝜇𝜇𝑏𝑏 =
𝑏𝑏1 + 𝑏𝑏2 + ⋯ + 𝑏𝑏𝑛𝑛
EM: 1-d example
Step1 : Define random gaussian
1 (𝑥𝑥𝑖𝑖 − 𝜇𝜇𝑏𝑏 )2
𝑝𝑝 𝑥𝑥𝑖𝑖 𝑏𝑏) = exp −
2 2𝜎𝜎𝑏𝑏2
2𝜋𝜋𝜎𝜎𝑏𝑏
𝑃𝑃 𝑥𝑥𝑖𝑖 | 𝑏𝑏 𝑃𝑃(𝑏𝑏)
𝑏𝑏𝑖𝑖 = P(𝑏𝑏 | 𝑥𝑥𝑖𝑖 ) =
𝑃𝑃 𝑥𝑥𝑖𝑖 | 𝑏𝑏 𝑃𝑃 𝑏𝑏 +𝑃𝑃 𝑥𝑥𝑖𝑖 | 𝑎𝑎 𝑃𝑃(𝑎𝑎

𝑎𝑎𝑖𝑖 = P 𝑎𝑎 | 𝑥𝑥𝑖𝑖 = 1 − 𝑏𝑏𝑖𝑖

𝑏𝑏1 𝑥𝑥1 + 𝑏𝑏2 𝑥𝑥2 + … + 𝑏𝑏𝑛𝑛 𝑥𝑥𝑛𝑛


𝜇𝜇𝑏𝑏 =
𝑏𝑏1 + 𝑏𝑏2 + ⋯ + 𝑏𝑏𝑛𝑛

𝑎𝑎1 𝑥𝑥1 + 𝑎𝑎2 𝑥𝑥2 + … + 𝑎𝑎𝑛𝑛 𝑥𝑥𝑛𝑛


𝜇𝜇𝑎𝑎 =
𝑎𝑎1 + 𝑎𝑎2 + ⋯ + 𝑎𝑎𝑛𝑛
EM: 1-d example
Step1 : Define random gaussian
1 (𝑥𝑥𝑖𝑖 − 𝜇𝜇𝑏𝑏 )2
𝑝𝑝 𝑥𝑥𝑖𝑖 𝑏𝑏) = exp −
2 2𝜎𝜎𝑏𝑏2
2𝜋𝜋𝜎𝜎𝑏𝑏
𝑃𝑃 𝑥𝑥𝑖𝑖 | 𝑏𝑏 𝑃𝑃(𝑏𝑏)
𝑏𝑏𝑖𝑖 = P(𝑏𝑏 | 𝑥𝑥𝑖𝑖 ) =
𝑃𝑃 𝑥𝑥𝑖𝑖 | 𝑏𝑏 𝑃𝑃 𝑏𝑏 +𝑃𝑃 𝑥𝑥𝑖𝑖 | 𝑎𝑎 𝑃𝑃(𝑎𝑎

𝑎𝑎𝑖𝑖 = P 𝑎𝑎 | 𝑥𝑥𝑖𝑖 = 1 − 𝑏𝑏𝑖𝑖


Step2 : Update parameters
𝑏𝑏1 𝑥𝑥1 + 𝑏𝑏2 𝑥𝑥2 + … + 𝑏𝑏𝑛𝑛 𝑥𝑥𝑛𝑛
𝜇𝜇𝑏𝑏 =
𝑏𝑏1 + 𝑏𝑏2 + ⋯ + 𝑏𝑏𝑛𝑛

𝑎𝑎1 𝑥𝑥1 + 𝑎𝑎2 𝑥𝑥2 + … + 𝑎𝑎𝑛𝑛 𝑥𝑥𝑛𝑛


𝜇𝜇𝑎𝑎 =
𝑎𝑎1 + 𝑎𝑎2 + ⋯ + 𝑎𝑎𝑛𝑛
EM: 1-d example
Step1 : Initialize gaussian
1 (𝑥𝑥𝑖𝑖 − 𝜇𝜇𝑏𝑏 )2
𝑝𝑝 𝑥𝑥𝑖𝑖 𝑏𝑏) = exp −
2 2𝜎𝜎𝑏𝑏2
2𝜋𝜋𝜎𝜎𝑏𝑏
𝑃𝑃 𝑥𝑥𝑖𝑖 | 𝑏𝑏 𝑃𝑃(𝑏𝑏)
𝑏𝑏𝑖𝑖 = P(𝑏𝑏 | 𝑥𝑥𝑖𝑖 ) =
𝑃𝑃 𝑥𝑥𝑖𝑖 | 𝑏𝑏 𝑃𝑃 𝑏𝑏 +𝑃𝑃 𝑥𝑥𝑖𝑖 | 𝑎𝑎 𝑃𝑃(𝑎𝑎

𝑎𝑎𝑖𝑖 = P 𝑎𝑎 | 𝑥𝑥𝑖𝑖 = 1 − 𝑏𝑏𝑖𝑖


𝑏𝑏1 𝑥𝑥1 + 𝑏𝑏2 𝑥𝑥2 + … + 𝑏𝑏𝑛𝑛 𝑥𝑥𝑛𝑛
𝜇𝜇𝑏𝑏 =
𝑏𝑏1 + 𝑏𝑏2 + ⋯ + 𝑏𝑏𝑛𝑛

𝑎𝑎1 𝑥𝑥1 + 𝑎𝑎2 𝑥𝑥2 + … + 𝑎𝑎𝑛𝑛 𝑥𝑥𝑛𝑛


𝜇𝜇𝑎𝑎 =
𝑎𝑎1 + 𝑎𝑎2 + ⋯ + 𝑎𝑎𝑛𝑛

Step2 : Update parameters


Step3 : Iterate the process
Normal (Gaussian)
Distribution
• Multivariate Gaussian
Normal (Gaussian)
Distribution (d-dim)
• Multivariate Gaussian

• x is now a vector
• µ is the mean vector
• Σ is the covariance matrix (d x d)
Quiz: Consider the following multivariate
Gaussian:
Quiz: Consider the following multivariate
Gaussian.

Respond at www.pollev.com/akanesano925
Gaussian mixture models: d>1
• Data with d attributes from k sources
• Each source c is gaussian
• Iteratively estimate parameters to maximize likelihood
• Prior: what % of instances come from source c?

• Mean: expected value of attribute j from c


• Covariance: how correlated are attributes
j and k in source c?
Expectation Maximization
• The training of GMMs can be accomplished
using Expectation Maximization
– Step 1: Expectation (E-step)
• Evaluate the “responsibilities” of each cluster with the
current parameters => compute likelihood
– Step 2: Maximization (M-step)
• Re-estimate parameters using the existing
“responsibilities” => maximize likelihood
by optimizing mean, variance and weights
• Iterate step 1 & 2 until likelihood converges
Quiz: Choose reasonable criterion for stopping EM
algorithms

A) Parameters stabilized
B) Log-likelihood reached the predefined constant value
C) The prior probability weights in GMM should be non negative and sum up to one
D) Log-likelihood stabilized

Respond at www.pollev.com/akanesano925
Quiz: Choose reasonable criterion for stopping EM
algorithms

A) Parameters stabilized
B) Log-likelihood reached the predefined constant value
C) The prior probability weights in GMM should be non negative and sum up to one
D) Log-likelihood stabilized
Quiz: EM algorithm
Which of the following is(are) TRUE for EM algorithm?

1) EM is prune to converging to a local optimum so running EM algorithm multiple


times with different random initialization is good

2) In contract to k-means clustering, EM algorithm always converges to a global


optimum so we don’t need to run the algorithm multiple times

3) Log-likelihood is monotonically increasing with # of iterations

4) Log-likelihood is monotonically decreasing with # of iterations

Respond at www.pollev.com/akanesano925
Quiz: EM algorithm
Which of the following is(are) TRUE for EM algorithm?

1) EM is prune to converging to a local optimum so running EM algorithm multiple


times with different random initialization is good

2) In contract to k-means clustering, EM algorithm always converges to a global


optimum so we don’t need to run the algorithm multiple times

3) Log-likelihood is monotonically increasing with # of iterations

4) Log-likelihood is monotonically decreasing with # of iterations


How to pick K?
• Probabilistic Model
• Try to fit the model (maximize the likelihood)

• Select best K that maximize likelihood?


• when K=n (each point as own model), L is max)

• Split the data into training and validation sets ( T and V)


• For each K, fit parameters of T and measure likelihood of V

• Pick simplest models that fit “Occam’s razor”


• BIC (Bayesian Information Criteria)
• AIC (Akaike Information Criteria)
• L: Likelihood (how well our model fits the data)
• p: Number of parameters (how simple the model is)
Quiz: What happens if we use too
few k or too many k?

A) Too small k: underfiting, too large k: overfitting

B) Too small k: overfitting, too large k: underfiting

C) K does not affects underfiting/overfitting but it does affect computational cost

Respond at www.pollev.com/akanesano925
Quiz: What happens if we use too
few k or too many k?

A) Too small k: underfiting, too large k: overfitting

B) Too small k: overfitting, too large k: underfiting

C) K does not affects underfiting/overfitting but it does affect computational cost


Maximum Likelihood over a GMM
• As usual: Identify a likelihood function

• And set partial derivative to zero


to optimize three parameters
MLE of a GMM
How can we compute these?
(optional)
To find optimal µ,
Preparation
To find optimal µ,
Preparation
•To get optimal sigma,
Preparation
To optimize π,
We have constraint about

Therefore using (*4)


To optimize π,
We have constraint about

Therefore using (*4)


MLE of a GMM
Visual example of EM
Potential Problems
• Incorrect number of Mixture Components

• Singularities
Incorrect Number of Gaussians
Incorrect Number of Gaussians
Singularities
• A minority of the data can have a
disproportionate effect on the model
likelihood.
• For example…
GMM example
Singularities
• When a mixture component collapses on a
given point, the mean becomes the point, and
the variance goes to zero.
• Consider the likelihood function as the
covariance goes to zero.

• The likelihood approaches infinity.


Relationship to K-means
• K-means makes hard decisions.
– Each data point gets assigned to a single cluster.
• GMM/EM makes soft decisions.
– Each data point can yield a posterior p(z|x)
• Soft K-means is a special case of EM.
ANEMIA PATIENTS AND CONTROLS
4.4

4.3
Red Blood Cell Hemoglobin Concentration

4.2

4.1

3.9

3.8

3.7
3.3 3.4 3.5 3.6 3.7 3.8 3.9 4
Red Blood Cell Volume
EM ITERATION 1
4.4
Red Blood Cell Hemoglobin Concentration

4.3

4.2

4.1

3.9

3.8

3.7
3.3 3.4 3.5 3.6 3.7 3.8 3.9 4
Red Blood Cell Volume
EM ITERATION 3
4.4
Red Blood Cell Hemoglobin Concentration

4.3

4.2

4.1

3.9

3.8

3.7
3.3 3.4 3.5 3.6 3.7 3.8 3.9 4
Red Blood Cell Volume
EM ITERATION 5
4.4
Red Blood Cell Hemoglobin Concentration

4.3

4.2

4.1

3.9

3.8

3.7
3.3 3.4 3.5 3.6 3.7 3.8 3.9 4
Red Blood Cell Volume
EM ITERATION 10
4.4
Red Blood Cell Hemoglobin Concentration

4.3

4.2

4.1

3.9

3.8

3.7
3.3 3.4 3.5 3.6 3.7 3.8 3.9 4
Red Blood Cell Volume
EM ITERATION 15
4.4
Red Blood Cell Hemoglobin Concentration

4.3

4.2

4.1

3.9

3.8

3.7
3.3 3.4 3.5 3.6 3.7 3.8 3.9 4
Red Blood Cell Volume
EM ITERATION 25
4.4
Red Blood Cell Hemoglobin Concentration

4.3

4.2

4.1

3.9

3.8

3.7
3.3 3.4 3.5 3.6 3.7 3.8 3.9 4
Red Blood Cell Volume
LOG-LIKELIHOOD AS A FUNCTION OF EM ITERATIONS
490

480

470

460
Log-Likelihood

450

440

430

420

410

400
0 5 10 15 20 25
EM Iteration
ANEMIA DATA WITH LABELS
4.4
Red Blood Cell Hemoglobin Concentration

4.3

4.2

Control Group
4.1

3.9 Anemia Group

3.8

3.7
3.3 3.4 3.5 3.6 3.7 3.8 3.9 4
Red Blood Cell Volume
Real-world stress, mood and health prediction

Tomorrow

11%

92%

27%

History
Wellbeing/psychiatric condition prediction using multimodal data

Location
Physiology

Behavioral surveys Smartphone Logs

Weather
SNAPSHOT:
~200 people, 30 days each
Mobility

• Phone also logs GPS coordinates


• Downsample to 1 location per 5 minutes
• Interpolate up to 3 missing segments

• Features:
• Distance traveled
• Radius of minimum circle enclosing all locations in a day [3]
• Time spent indoors vs. outdoors (based on Wifi or Cellular)
• Time spent on campus (using coordinates of campus)
• Regularity Index
Mobility

• Used Gaussian Mixture Models (GMMs) to learn a probability distribution


over each participant’s typical locations, using up to K Gaussians

• Used Bayesian Information Criterion (BIC) to select the best model

The GMM for one participant


o Points are locations
o Contours are probability
Mobility

• GMMs were used to compute


o Log likelihood of each day - how typical is this day?
o Akaike Information Criterion (AIC) of a day
Quiz: Of the clustering algorithms covered in class, Gaussian
Mixture Models used for clustering always outperforms k-
means clustering

A. True
B. False

Respond at www.pollev.com/akanesano925
Quiz: Of the clustering algorithms covered in class, Gaussian
Mixture Models used for clustering always outperforms k-
means clustering

A. True
B. False
GM/EM: Summary
• Maximize likelihood of the data using EM algorithm

• Similar to k means
• Sensitive to starting points, converges to local minimum
• Converge: when change in P(x1, x2, …) is sufficiently small
• Cannot discover K (likelihood increases as K increases)

• Different from k means


• Soft-clustering: instance can come from multiple clusters
Announcements
1) Final Project Progress meeting

• Sign up on https://dsci303-finalproject-update.youcanbook.me/
• Sign up will be closed on Nov 5 this Thursday at noon.
• All members in the team must attend the meeting.

2) Homework 4

• individual assignment
• not allowed to work with anyone else
• Allowed to ask questions on piazza

• HW4_5: oral problem


• Nov 20, Dec 3 or 4
• ~ 5 mins
• Sign up at https://akanesano.youcanbook.me/
• You see Nov 13 but this is not the option (have some issues with
youcanbook me)
• Sign up will be closed on Nov 13.

You might also like