CM Latent - Models 2022

Latent Variable Models
M2 IASD - Y. Chevaleyre
Outline
1. Introduction
2. Gaussian mixtures and the EM algorithm
3. Variational Analysis of the EM algorithm
4. Variational Auto-Encoders
5. Exercise: probabilistic PCA

Outline
1. Introduction

Introduction: what are latent variable models ?
• A latent variable zi is a missing information about an example xi
• example:
• In the clustering setting
• in Dimensionality reduction setting input bai É
• in the case of missing data
TÉ
• Usually, best learning algorithm is ExpectationMaximisation (EM)
Eh p
tl
ÉÏEÏT
themodel
and the coordinates
low dim are missing
Probabilistic Machine Learning
• The most appropriate framework for handling latent variables is Probabilistic ML.
• The Probabilistic Machine Learning is when you assume that your data (xi or yi or
both) was generated according to some generative model with some unknown
parameters
• Learning a model is done by MLE or MAP

• e.g: Linear Discriminant Analysis, Linear Regression, …
ÔTÉ ni ER They au nom random 1
where Ein No 1
Ï
III
Yi Yi ni E
Ô
Ï
P y Y v10
ILE find angngex log
É at bgl fi Ota
aggex É
by Poy lol engine
Probabilistic Machine Learning
• This generative model may include unobserved variables. We call these Latent Variables.
Latent Diﬀusion Processes (e.g. DALLE-2)
Topic Modeling (LDA)
Gaussian Mixture Models (GMM)

Variational AutoEncoders (VAE)
Outline
1. Introduction

Reminder: The multivariate Normal Distribution
• Gaussienne univariée
mi
Nftp.ry fyexpftk 2
• Gaussienne multivariée: est logo_Ça la m
logetCaprio
NkIms ap
f Ecarts n
n
• Notation:
• 𝒩 (μ, σ 2) and 𝒩 (μ, Σ) represent the normal distribution.
• We write X ∼ 𝒩 (μ, σ 2) or X ∼ 𝒩 (μ, Σ)
• 𝒩 (x ∣ μ, Σ) represents the p.d.f. of 𝒩 (μ, Σ) at point x
Reminder: The multivariate Normal Distribution
• Gaussienne multivariée:
Gaussian Mixture Models
• One of the simplest latent variable model is Gaussian Mixture Models (GMM)
• same as LDA without class information. Recall LDA with 2 classes first:
supervisedlearning
setting
unsupervised
Lana
Et and N'MF2
NIM distributions
ÏÏÏ
distributions
NIM.tl Ncaa Et
distributions
and Bernouilli Berlin ÎÉÉÜ Üeach i s N
Be ITI with Tr A Ta
Zinnultica a
I then air ctfu E I
such that for each i 1 N if zie
if ziz The airtime
you Be ten xirtlua.CI
if Yi 1 then air Nlm Et zi k
ai tenais z is not seen by the leanne
É
K 2 des
80 tre 20
Tpe
un 1
Mi O
1 82 1
On
1 Draw informally
p xp z 11 p XIZI on a
graph
21 Doran PSX Z Il pix 2 21 on anothergraph
Dron PCE I IX on graph
a
3
4 Built points drown from the GMM process
p X Zek p XIZ klplz kl PCXIZ kl.IT
IIIÈ Total law ofpub

PELÉ
3 Zal A a
p platEPCAIB pibil
p Xx X Z 1
plz p PCX.ae Za2 p z
p a
n
zPpjfIIIjI
plzsIX I ak
Gaussian Mixture Models: The log-likelihood
Cet Or ta k and O O Ok
Mk
en
p X milk p X ni
l Z 1 Ts p X ki IZ
2 Tat plxeailz kl.TK
the low oftotal proba
by
Ë plx xilz kd.tk
SÉ ni lmes Oa Tes
_sen lol
theleg likelihood LL 6 bg PlaPfx_ni
Éloy lol
Max likelihood principle

Enleg É N ni loal.tk
Ô agngx Luol fork t
that K I T is know and quel to 1 But had
exercise
psy
assume
find Us by max likelihood for K 1
LUM E Étai M est non conned
can't solve
â Lam IE maïétky
GMM with K=1 case is easy
GMM with K>1. The Complete Log Likelihood
• For K > 1, maximising the likelihood is hard
• Idea: optimize a surrogate instead: the Complete Log Likelihood (CLL)
• The CLL assumes we know z1…zN so we still can’t compute it either
PC ai an Zi Zu lo
CLL G z zu leg
Plait oal.tk
És É zigleg
Nemat
Naze _ÉÉgleexaphas
Nan
Izi 3
parasite
f
GMM with K>1. The Expected Complete Log Likelihood
• Idea: optimize a surrogate instead: the Complete Log Likelihood (CLL)
• The CLL assumes we know z1…zN so we still can’t compute it either
• But if for each i we can have an estimate the distribution p(zi = k ∣ xi; θ)̂
then we can compute the Expected CLL. Let qi(k) be this estimation.
kle p Zi k xii
q en
L'air
EargiÉeï É fille log Plui Eeklo
È q kllegraitos Tes
L 9,0 É 219,0 i
Ezra _zurquCCLLIO Zi Zn
The EM Algorithm
• EM Algorithm:
1.Initialize θ ̂ arbitrarily qaxplzi kkiid
2.E-Step: for each i ∈ {1…N}, k ∈ {1…K}
fzipjh.IT te
compute qi(k) = p(zi = k ∣ xi; θ)̂
3.M-Step: compute θ ̂ ← arg max ℒ(q, θ)

θ
is
fÉïïïÈ
a
sigmoidfor Kr
• Details:
augmax219,0 agnaxÊEzing
t
ziesleg Plait e tk
É ÉÉ aile at ent é
MetzÉ
for fixed O I get
ki 9 1k
The EM Algorithm
Outline
1. Introduction

Variational Analysis of EM: The ELBO (1)
• We really want to optimise the likelihood p(x1…xN ∣ θ), but instead we optimise the a
surrogate: the expected CLL ℒ(q, θ). What is the link ?
N
ℒ(q, θ, i) where ℒ(q, θ, i) = 𝔼zi∼qi [p(X = xi, zi = k ∣ θ)]
∑
Recall that ℒ(q, θ) =
• i=1
• Let us pick a single point x and compute its likelihood

ln P x 10 ln
É Pla z ki o
en
L
E qu'ki
PITIÉ e g PIÉ
JerzjIaa ans s'EN
en philo Ez que PIII
amuseteeïïe
Variational Analysis of EM: The ELBO (2)
• We really want to optimise the likelihood p(x1…xN ∣ θ), but instead we optimise the a
surrogate: the expected CLL ℒ(q, θ). What is the link ?
N
ℒ(q, θ, i) where ℒ(q, θ, i) = 𝔼zi∼qi [p(X = xi, zi = k ∣ θ)]
∑
Recall that ℒ(q, θ) =
• i=1
close the ELBo to the loglikelihood
How is
la Plata ELBO n o
g ln Pluto Eau
la
PITIÉ
Ez gu ln Pluto en PIÈ
Evan hfEI.nl
KL 9 Z Il plz I n o
for any two discretedistributions pland p over 41 K4

recall that KL pip Épizlleg
PÊ
Two properties kelp p a
KLSpip o
iff p p
then Kel l D and ln plaide tro
if gu plein d
o
propertyoftheELBI
lo
f9 1 P P
Bolson paire
Property y a c méat
la Pluto 2 qu o al Entropy qd t KL qu Il p1211,01
p Zix of them le Plato L Qx o a Entropyqul
If qu
and la Pluto agnex 269nA sel
aggax
angmaxELBOGoal
Variational Analysis of EM: reinterpreting the algorithm
• EM Algorithm:
1. Initialize θ ̂ arbitrarily
2.E-Step: for each i ∈ {1…N}, k ∈ {1…K}
compute qi(k) = p(zi = k ∣ xi; θ)̂
ELBO(xi, θ,̂ qi)
q1…qN ∑
q1…qN = arg max
i
3. M-Step: compute θ ̂ ← arg max ℒ(q, θ)

θ
θ ̂ = arg max
∑
ELBO(xi, θ, qi)
θ
i
Ïï
agg map
ELBO
rirait
agny la parano
Outline
1. Introduction

A word about auto-encoders
• Standard auto-encoders learn to reduce the dimensionality of examples in a
dataset
ni sé
ET
agüÉllai
where
still
ài
Deepti
A word about variational auto-encoders
• Variational auto-encoders (VAE) learn to reduce the dimensionality of examples in
a dataset and produces for each example xi a distribution of latent vectors zi such
that overall, latent vectors z1…zN are normally distributed.
p uto p alz plaide

Outline
1. Introduction

Probabilistic PCA

CM Latent - Models 2022

Uploaded by

Copyright:

Available Formats

You might also like

CM Latent - Models 2022

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CM Latent - Models 2022

Uploaded by

Copyright:

Available Formats

Latent Variable Models

2. Gaussian mixtures and the EM algorithm

3. Variational Analysis of the EM algorithm

5. Exercise: probabilistic PCA

2. Gaussian mixtures and the EM algorithm

3. Variational Analysis of the EM algorithm

5. Exercise: probabilistic PCA

• Learning a model is done by MLE or MAP

Latent Diﬀusion Processes (e.g. DALLE-2)

Topic Modeling (LDA)

Gaussian Mixture Models (GMM)

2. Gaussian mixtures and the EM algorithm

3. Variational Analysis of the EM algorithm

5. Exercise: probabilistic PCA

p X Zek p XIZ klplz kl PCXIZ kl.IT

IIIÈ Total law ofpub

Max likelihood principle

3.M-Step: compute θ ̂ ← arg max ℒ(q, θ)

2. Gaussian mixtures and the EM algorithm

3. Variational Analysis of the EM algorithm

5. Exercise: probabilistic PCA

• Let us pick a single point x and compute its likelihood

for any two discretedistributions pland p over 41 K4

3. M-Step: compute θ ̂ ← arg max ℒ(q, θ)

2. Gaussian mixtures and the EM algorithm

3. Variational Analysis of the EM algorithm

5. Exercise: probabilistic PCA

p uto p alz plaide

2. Gaussian mixtures and the EM algorithm

3. Variational Analysis of the EM algorithm

5. Exercise: probabilistic PCA

You might also like