Professional Documents
Culture Documents
Ds 9
Ds 9
Since we do not know the values of latent variables, force them into
the expression using the law of total probability
We did a similar thing (introduce models) in predictive posterior
calculations
Step 2 becomes Indeed! Notice that even here, instead of choosing just
one value of the latent variables at each time step, we
can instead use a distribution over their support
Step 2 (M Step) Maximize the new obj. fn. to get new models
Repeat!
Derivation
Jensen’s inequality tells of the
us that E
for any Step
convex function. We used the fact that is
a concave function and so the inequality reverses since every concave function is
8
the negative
Let denote the models of a convex
to avoid function
clutter. Also let denote our current
estimate of the model
Just need to see derivation for a single point, sayLaw theof-th
total probability
point
Simply multiply and
divide by the same term
Jensen’s inequality
Just renaming
M-step maximizes
Mixed Regression 12
An example of latent variables
Sure, we could try clustering this data first and then apply regression models
in a supervised learning task
separately on both clusters. However, using latent variables may be
Webeneficial
have regression train e.g.
since 1) clustering datak-means may not necessarily work well
since the points here are really not close to two centroids (instead, they lie
Example:
close to twodenotes age
lines which and is really not meant to handle) and 2) using
k-means
denoteslatenttime spent
variables, weon
canwebsite
elegantly cluster and learn regression models
There are two subpopulations jointly!!
in data (gender) which behave
differently even if age is same
?
An indication that our features
may be incomplete/latent
We could have had separate and for the two components as well
Latent Variables to the Rescue
which we could also learn. However, this would make things more
tedious so for now, let us assume and also that
13
As before, if we believe that our data is best explained using two
linear regression models instead of one, we should work with a
mixed model (aka mixture of experts)
Will fit two regression models to the data and use a latent variable
to keep track of which data point belongs to which model
Let us use Gaussian likelihoods since we are comfortable with it
i.e. perform least squares on the data points assigned to each component
May incorporate a prior as well to add a regularizer (ridge regression)
Repeat!
EM for Mixed Regression 16
Original Prob: EM for MR
1.Step
Initialize
1 (E Step)models (for components)
Consists of two sub-steps
2. Step
For 1.1
, update
Assumeusing
our current model estimates are
1.UseLet
the current models to ascertain how likely are different values of for the -th data
point i.e. compute for both
2. Let (normalize)
Step 1.2 Use weights to set up a new objective function
3. Update
As before, assume constant for sake of simplicity
where
Step(apply
2 (Mfirst order
Step) optimality)
Maximize the new obj. fn. to get new models
4. Repeat until convergence
Repeat!
Quiz 2
Oct 19 (Wednesday), 6PM, L18,L19
Only for registered students (no auditors)
Open notes (handwritten only)
No mobile phones, tablets etc
Bring your institute ID card
Syllabus:
All content uploaded to YouTube/GitHub by
16th Oct
DS content (slides, code) for DS
1,2,3,4,5,6,7,8,9
See previous year’s GitHub for practice