Introduction To Mixture Models

Prerequisites
Overview
Example 1
Example 2
Definition
Introduction to Mixture Models

Matt Bonakdarpour
2016-01-22
 work!owr 
Prerequisites
This document assumes basic familiarity with probability theory.
Overview
We often make simplifying modeling assumptions when analyzing a data set such as assuming each observation
comes from one specific distribution (say, a Gaussian distribution). Then we proceed to estimate parameters of
this distribution, like the mean and variance, using maximum likelihood estimation.
However, in many cases, assuming each sample comes from the same unimodal distribution is too restrictive and
may not make intuitive sense. Often the data we are trying to model are more complex. For example, they might
be multimodal – containing multiple regions with high probability mass. In this note, we describe mixture
models which provide a principled approach to modeling such complex data.
Example 1
Suppose we are interested in simulating the price of a randomly chosen book. Since paperback books are typically
cheaper than hardbacks, it might make sense to model the price of paperback books separately from hardback
books. In this example, we will model the price of a book as a mixture model. We will have two mixture
components in our model – one for paperback books, and one for hardbacks.
Let’s say that if we choose a book at random, there is a 50% chance of choosing a paperback and 50% of choosing
hardback. These proportions are called mixture proportions. Assume the price of a paperback book is
normally distributed with mean $9 and standard deviation $1 and the price of a hardback is normally distributed
with a mean $20 and a standard deviation of $2. We could simulate book prices Pi in the following way:
1. Sample Zi ∼ Bernoulli(0.5)
2. If Zi = 0 draw Pi from the paperback distribution N(9, 1). If Zi = 1, draw Pi from the hardback
distribution N(20, 2).
We implement this simulation in the code below:

NUM.SAMPLES <- 5000
prices <- numeric(NUM.SAMPLES)
for(i in seq_len(NUM.SAMPLES)) {
z.i <- rbinom(1,1,0.5)
if(z.i == 0) prices[i] <- rnorm(1, mean = 9, sd = 1)
else prices[i] <- rnorm(1, mean = 20, sd = 1)
}
hist(prices)
Past versions of unnamed-chunk-1-1.png
We see that our histogram is bimodal. Indeed, even though the mixture components are each normal
distributions, the distribution of a randomly chosen book is not. We illustrate the true densities below:
We see that the resulting probability density for all books is bimodal, and is therefore not normally distributed.
In this example, we modeled the price of a book as a mixture of two components where each component was
modeled as a Gaussian distribution. This is called a Gaussian mixture model (GMM).
Example 2
Now assume our data are the heights of students at the University of Chicago. Assume the height of a randomly
chosen male is normally distributed with a mean equal to 5′ 9 and a standard deviation of 2.5 inches and the
height of a randomly chosen female is N(5′ 4, 2.5). However, instead of 50/50 mixture proportions, assume that
75% of the population is female, and 25% is male. We simulate heights in a similar fashion as above, with the
corresponding changes to the parameters:
NUM.SAMPLES <- 5000

heights <- numeric(NUM.SAMPLES)
for(i in seq_len(NUM.SAMPLES)) {
z.i <- rbinom(1,1,0.75)
if(z.i == 0) heights[i] <- rnorm(1, mean = 69, sd = 2.5)
else heights[i] <- rnorm(1, mean = 64, sd = 2.5)
}
hist(heights)
Now we see that histogram is unimodal. Are heights normally distributed under this model? We plot the
corresponding densities below:
Here we see that the Gaussian mixture model is unimodal because there is so much overlap between the two
densities. In this example, you can see that the population density is not symmetric, and therefore not normally
distributed.
These two illustrative examples above give rise to the general notion of a mixture model which assumes each
observation is generated from one of K mixture components. We formalize this notion in the next section.
Before moving on, we make one small pedagogical note that sometimes confuses students new to mixture
models. You might recall that if X and Y are independent normal random variables, then Z = X + Y is also a
normally distributed random variable. From this, you might wonder why the mixture models above aren’t
normal. The reason is that X + Y is not a bivariate mixture of normals. It is a linear combination of normals. A
random variable sampled from a simple Gaussian mixture model can be thought of as a two stage process. First,
we randomly sample a component (e.g. male or female), then we sample our observation from the normal
distribution corresponding to that component. This is clearly different than sampling X and Y from different
normal distributions, then adding them together.
De!nition
Assume we observe X1 , … , Xn and that each Xi is sampled from one of K mixture components. In the
second example above, the mixture components were {male,female}. Associated with each random variable Xi
is a label Zi ∈ {1, … , K} which indicates which component Xi came from. In our height example, Zi would be
either 1 or 2 depending on whether Xi was a male or female height. Often times we don’t observe Zi (e.g. we
might just obtain a list of heights with no gender information), so the Zi ’s are sometimes called latent
variables.
From the law of total probability, we know that the marginal probability of Xi is:
K K
P (Xi = x) = ∑ P (Xi = x|Zi = k)P (Zi = k) = ∑ P (Xi = x|Zi = k)πk
k=1  k=1
πk
Here, the πk are called mixture proportions or mixture weights and they represent the probability that Xi
belongs to the k-th mixture component. The mixture proportions are nonnegative and they sum to one,
∑K k=1 πk = 1. We call P (Xi |Zi = k) the mixture component, and it represents the distribution of Xi
assuming it came from component k. The mixture components in our examples above were normal distributions.
For discrete random variables these mixture components can be any probability mass function p(. ∣ Zk ) and for
continuous random variables they can be any probability density function f(. ∣ Zk ). The corresponding pmf and
pdf for the mixture model is therefore:
K
p(x) = ∑ πk p(x ∣ Zk )
k=1
K
fx (x) = ∑ πk fx∣Zk (x ∣ Zk )
k=1
If we observe independent samples X1 , … , Xn from this mixture, with mixture proportion vector
π = (π1 , π2 , … , πK ), then the likelihood function is:
n n K
L(π) = ∏ P (Xi |π) = ∏ ∑ P (Xi |Zi = k)πk
i=1 i=1 k=1
Now assume we are in the Gaussian mixture model setting where the k-th component is N(µk , σk ) and the
mixture proportions are πk . A natural next question to ask is how to estimate the parameters {µk , σk , πk } from
our observations X1 , … , Xn . We illustrate one approach in the introduction to EM (intro_to_em.html)
vignette.
Acknowledgement: The “Examples” section above was taken from lecture notes written by Ramesh Sridharan.
 Session information
This site was created with R Markdown (http://rmarkdown.rstudio.com)

Introduction To Mixture Models

Uploaded by

Copyright:

Available Formats

You might also like

Introduction To Mixture Models

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Mixture Models

Uploaded by

Copyright:

Available Formats

Prerequisites

Introduction to Mixture Models

We implement this simulation in the code below:

Past versions of unnamed-chunk-1-1.png

NUM.SAMPLES <- 5000

Past versions of unnamed-chunk-3-1.png

This site was created with R Markdown (http://rmarkdown.rstudio.com)

You might also like