Professional Documents
Culture Documents
Dsci303-19 GM - em
Dsci303-19 GM - em
Dsci303-19 GM - em
&
Expectation-Maximization
1
Clustering Methods
• Clustering
• Hard clustering: clusters do not overlap
• Assign each example to one cluster (e.g. K-means)
• Soft clustering: clusters do overlap
• Assign data to cluster with some probability (EM algorithm)
• Strength of association between clusters and instances
• Mixture Models
• Probability based Soft Clustering
• Each cluster -> generative models (Gaussian or multinomial)
• Estimate parameters (mean, covariance are unknown)
Gaussian Mixture Model
• Gaussian Mixture Models (GMMs) assume that there
are a certain number of Gaussian distributions, and
each of these distributions represent a cluster.
2
Normal (Gaussian)
Distribution
• µ is the mean
• σ2 is the variance
Mixture Models
• Formally a Mixture Model is the weighted sum
of a number of pdfs where the weights are
determined by a distribution,
Gaussian Mixture Models
• GMM: the weighted sum of a number of
Gaussians where the weights are determined
by a distribution,
0.5
0.2
0.1
0
-5 0 5 10
0.5
0.4
Mixture Model
0.3
p(x)
0.2
0.1
0
-5 0 5 10
x
0.5
0.2
0.1
0
-5 0 5 10
0.5
0.4
Mixture Model
0.3
p(x)
0.2
0.1
0
-5 0 5 10
x
2
0.5
0
-5 0 5 10
0.5
0.4
Mixture Model
0.3
p(x)
0.2
0.1
0
-5 0 5 10
x
GMM example
Quiz:
The formula for the Gaussian density is:
Which of the following is the formula for the density of this figure?
Respond at www.pollev.com/akanesano925
The formula for the Gaussian density is:
Which of the following is the formula for the density of this figure?
Gaussian Mixture Models
• Rather than identifying clusters by “nearest”
centroids
• Fit a Set of k Gaussians to the data
• Maximum Likelihood over a mixture model
1-D Data:
come from 2 gaussian models
Which sources does each data come from?
X1 X2 X3 X10
EM algorithm
Find the best fit models to maximize the likelihood of the data
Gaussian a? Gaussian b?
X1 X2 X3 X10
EM algorithm
Find the best fit models to maximize the likelihood of the data
1) Start with two random gaussians: N(mean(a), sigma(a)), N(mean(b), sigma(b))
Gaussian a? Gaussian b?
X1 X2 X3 X10
EM algorithm
Find the best fit models to maximize the likelihood of the data
1) Start with two random gaussians: N(mean(a), sigma(a)), N(mean(b), sigma(b))
2) Compute P(b | xi) : does this look like it comes from b? (E-step)
1 (𝑥𝑥𝑥𝑥 − 𝜇𝜇𝑏𝑏)2
P (xi | b) = exp(− )
2 2𝜎𝜎𝑏𝑏2
2𝜋𝜋𝜎𝜎𝑏𝑏
Gaussian a? Gaussian b?
X1 X2 X3 X10
EM algorithm
Find the best fit models to maximize the likelihood of the data
1) Start with two random gaussians: N(mean(a), sigma(a)), N(mean(b), sigma(b))
2) Compute P(b | xi) : does this look like it comes from b? (E-step)
1 (𝑥𝑥𝑥𝑥 − 𝜇𝜇𝑏𝑏)2
P (xi | b) = exp(− )
2 2𝜎𝜎𝑏𝑏2
2𝜋𝜋𝜎𝜎𝑏𝑏
Gaussian a? Gaussian b?
X1 X2 X3 X10
EM algorithm
Find the best fit models to maximize the likelihood of the data
1) Start with two random gaussians: N(mean(a), sigma(a)), N(mean(b), sigma(b))
2) Compute P(b | xi) : does this look like it comes from b? (E-step)
1 (𝑥𝑥𝑥𝑥 − 𝜇𝜇𝑏𝑏)2
P (xi | b) = exp(− )
2 2𝜎𝜎𝑏𝑏2
2𝜋𝜋𝜎𝜎𝑏𝑏
X1 X2 X3 X10
EM: 1-d example
want to discover two gaussian
𝑎𝑎 𝑏𝑏
EM: 1-d example
Step1 : Define random gaussian
1 (𝑥𝑥𝑖𝑖 − 𝜇𝜇𝑏𝑏 )2
𝑝𝑝 𝑥𝑥𝑖𝑖 𝑏𝑏) = exp −
2 2𝜎𝜎𝑏𝑏2
2𝜋𝜋𝜎𝜎𝑏𝑏
EM: 1-d example
Step1 : Define random gaussian
1 (𝑥𝑥𝑖𝑖 − 𝜇𝜇𝑏𝑏 )2
𝑝𝑝 𝑥𝑥𝑖𝑖 𝑏𝑏) = exp −
2 2𝜎𝜎𝑏𝑏2
2𝜋𝜋𝜎𝜎𝑏𝑏
𝑃𝑃 𝑥𝑥𝑖𝑖 | 𝑏𝑏 𝑃𝑃(𝑏𝑏)
𝑏𝑏𝑖𝑖 = P(𝑏𝑏 | 𝑥𝑥𝑖𝑖 ) =
𝑃𝑃 𝑥𝑥𝑖𝑖 | 𝑏𝑏 𝑃𝑃 𝑏𝑏 +𝑃𝑃 𝑥𝑥𝑖𝑖 | 𝑎𝑎 𝑃𝑃(𝑎𝑎
• x is now a vector
• µ is the mean vector
• Σ is the covariance matrix (d x d)
Quiz: Consider the following multivariate
Gaussian:
Quiz: Consider the following multivariate
Gaussian.
Respond at www.pollev.com/akanesano925
Gaussian mixture models: d>1
• Data with d attributes from k sources
• Each source c is gaussian
• Iteratively estimate parameters to maximize likelihood
• Prior: what % of instances come from source c?
A) Parameters stabilized
B) Log-likelihood reached the predefined constant value
C) The prior probability weights in GMM should be non negative and sum up to one
D) Log-likelihood stabilized
Respond at www.pollev.com/akanesano925
Quiz: Choose reasonable criterion for stopping EM
algorithms
A) Parameters stabilized
B) Log-likelihood reached the predefined constant value
C) The prior probability weights in GMM should be non negative and sum up to one
D) Log-likelihood stabilized
Quiz: EM algorithm
Which of the following is(are) TRUE for EM algorithm?
Respond at www.pollev.com/akanesano925
Quiz: EM algorithm
Which of the following is(are) TRUE for EM algorithm?
Respond at www.pollev.com/akanesano925
Quiz: What happens if we use too
few k or too many k?
• Singularities
Incorrect Number of Gaussians
Incorrect Number of Gaussians
Singularities
• A minority of the data can have a
disproportionate effect on the model
likelihood.
• For example…
GMM example
Singularities
• When a mixture component collapses on a
given point, the mean becomes the point, and
the variance goes to zero.
• Consider the likelihood function as the
covariance goes to zero.
4.3
Red Blood Cell Hemoglobin Concentration
4.2
4.1
3.9
3.8
3.7
3.3 3.4 3.5 3.6 3.7 3.8 3.9 4
Red Blood Cell Volume
EM ITERATION 1
4.4
Red Blood Cell Hemoglobin Concentration
4.3
4.2
4.1
3.9
3.8
3.7
3.3 3.4 3.5 3.6 3.7 3.8 3.9 4
Red Blood Cell Volume
EM ITERATION 3
4.4
Red Blood Cell Hemoglobin Concentration
4.3
4.2
4.1
3.9
3.8
3.7
3.3 3.4 3.5 3.6 3.7 3.8 3.9 4
Red Blood Cell Volume
EM ITERATION 5
4.4
Red Blood Cell Hemoglobin Concentration
4.3
4.2
4.1
3.9
3.8
3.7
3.3 3.4 3.5 3.6 3.7 3.8 3.9 4
Red Blood Cell Volume
EM ITERATION 10
4.4
Red Blood Cell Hemoglobin Concentration
4.3
4.2
4.1
3.9
3.8
3.7
3.3 3.4 3.5 3.6 3.7 3.8 3.9 4
Red Blood Cell Volume
EM ITERATION 15
4.4
Red Blood Cell Hemoglobin Concentration
4.3
4.2
4.1
3.9
3.8
3.7
3.3 3.4 3.5 3.6 3.7 3.8 3.9 4
Red Blood Cell Volume
EM ITERATION 25
4.4
Red Blood Cell Hemoglobin Concentration
4.3
4.2
4.1
3.9
3.8
3.7
3.3 3.4 3.5 3.6 3.7 3.8 3.9 4
Red Blood Cell Volume
LOG-LIKELIHOOD AS A FUNCTION OF EM ITERATIONS
490
480
470
460
Log-Likelihood
450
440
430
420
410
400
0 5 10 15 20 25
EM Iteration
ANEMIA DATA WITH LABELS
4.4
Red Blood Cell Hemoglobin Concentration
4.3
4.2
Control Group
4.1
3.8
3.7
3.3 3.4 3.5 3.6 3.7 3.8 3.9 4
Red Blood Cell Volume
Real-world stress, mood and health prediction
Tomorrow
11%
92%
27%
History
Wellbeing/psychiatric condition prediction using multimodal data
Location
Physiology
Weather
SNAPSHOT:
~200 people, 30 days each
Mobility
• Features:
• Distance traveled
• Radius of minimum circle enclosing all locations in a day [3]
• Time spent indoors vs. outdoors (based on Wifi or Cellular)
• Time spent on campus (using coordinates of campus)
• Regularity Index
Mobility
A. True
B. False
Respond at www.pollev.com/akanesano925
Quiz: Of the clustering algorithms covered in class, Gaussian
Mixture Models used for clustering always outperforms k-
means clustering
A. True
B. False
GM/EM: Summary
• Maximize likelihood of the data using EM algorithm
• Similar to k means
• Sensitive to starting points, converges to local minimum
• Converge: when change in P(x1, x2, …) is sufficiently small
• Cannot discover K (likelihood increases as K increases)
• Sign up on https://dsci303-finalproject-update.youcanbook.me/
• Sign up will be closed on Nov 5 this Thursday at noon.
• All members in the team must attend the meeting.
2) Homework 4
• individual assignment
• not allowed to work with anyone else
• Allowed to ask questions on piazza