Professional Documents
Culture Documents
CM Latent - Models 2022
CM Latent - Models 2022
CM Latent - Models 2022
M2 IASD - Y. Chevaleyre
Outline
1. Introduction
4. Variational Auto-Encoders
1. Introduction
4. Variational Auto-Encoders
tl
ÉÏEÏT
themodel
and the coordinates
low dim are missing
Probabilistic Machine Learning
• The most appropriate framework for handling latent variables is Probabilistic ML.
• The Probabilistic Machine Learning is when you assume that your data (xi or yi or
both) was generated according to some generative model with some unknown
parameters
where Ein No 1
Ï
III
Yi Yi ni E
Ô
Ï
P y Y v10
ILE find angngex log
É at bgl fi Ota
aggex É
by Poy lol engine
Probabilistic Machine Learning
• This generative model may include unobserved variables. We call these Latent Variables.
1. Introduction
4. Variational Auto-Encoders
NkIms ap
f Ecarts n
n
• Notation:
• 𝒩 (μ, σ 2) and 𝒩 (μ, Σ) represent the normal distribution.
• We write X ∼ 𝒩 (μ, σ 2) or X ∼ 𝒩 (μ, Σ)
• 𝒩 (x ∣ μ, Σ) represents the p.d.f. of 𝒩 (μ, Σ) at point x
Reminder: The multivariate Normal Distribution
• Gaussienne multivariée:
Gaussian Mixture Models
• One of the simplest latent variable model is Gaussian Mixture Models (GMM)
• same as LDA without class information. Recall LDA with 2 classes first:
supervisedlearning
setting
unsupervised
Lana
Et and N'MF2
NIM distributions
ÏÏÏ
distributions
NIM.tl Ncaa Et
distributions
and Bernouilli Berlin ÎÉÉÜ Üeach i s N
Be ITI with Tr A Ta
Zinnultica a
I then air ctfu E I
such that for each i 1 N if zie
if ziz The airtime
you Be ten xirtlua.CI
if Yi 1 then air Nlm Et zi k
ai tenais z is not seen by the leanne
É
K 2 des
80 tre 20
Tpe
un 1
Mi O
1 82 1
On
1 Draw informally
p xp z 11 p XIZI on a
graph
21 Doran PSX Z Il pix 2 21 on anothergraph
Dron PCE I IX on graph
a
3
4 Built points drown from the GMM process
n
zPpjfIIIjI
plzsIX I ak
Gaussian Mixture Models: The log-likelihood
Cet Or ta k and O O Ok
Mk
en
p X milk p X ni
l Z 1 Ts p X ki IZ
2 Tat plxeailz kl.TK
the low oftotal proba
by
Ë plx xilz kd.tk
SÉ ni lmes Oa Tes
_sen lol
theleg likelihood LL 6 bg PlaPfx_ni
Éloy lol
Nemat
Naze _ÉÉgleexaphas
Nan
Izi 3
parasite
f
GMM with K>1. The Expected Complete Log Likelihood
• Idea: optimize a surrogate instead: the Complete Log Likelihood (CLL)
• The CLL assumes we know z1…zN so we still can’t compute it either
• But if for each i we can have an estimate the distribution p(zi = k ∣ xi; θ)̂
then we can compute the Expected CLL. Let qi(k) be this estimation.
kle p Zi k xii
q en
L'air
EargiÉeï É fille log Plui Eeklo
È q kllegraitos Tes
L 9,0 É 219,0 i
Ezra _zurquCCLLIO Zi Zn
The EM Algorithm
• EM Algorithm:
1.Initialize θ ̂ arbitrarily qaxplzi kkiid
2.E-Step: for each i ∈ {1…N}, k ∈ {1…K}
fzipjh.IT te
compute qi(k) = p(zi = k ∣ xi; θ)̂
É ÉÉ aile at ent é
MetzÉ
for fixed O I get
ki 9 1k
The EM Algorithm
Outline
1. Introduction
4. Variational Auto-Encoders
amuseteeïïe
Variational Analysis of EM: The ELBO (2)
• We really want to optimise the likelihood p(x1…xN ∣ θ), but instead we optimise the a
surrogate: the expected CLL ℒ(q, θ). What is the link ?
N
ℒ(q, θ, i) where ℒ(q, θ, i) = 𝔼zi∼qi [p(X = xi, zi = k ∣ θ)]
∑
Recall that ℒ(q, θ) =
• i=1
close the ELBo to the loglikelihood
How is
la Plata ELBO n o
g ln Pluto Eau
la
PITIÉ
Ez gu ln Pluto en PIÈ
Evan hfEI.nl
KL 9 Z Il plz I n o
• EM Algorithm:
1. Initialize θ ̂ arbitrarily
2.E-Step: for each i ∈ {1…N}, k ∈ {1…K}
compute qi(k) = p(zi = k ∣ xi; θ)̂
ELBO(xi, θ,̂ qi)
q1…qN ∑
q1…qN = arg max
i
Ïï
agg map
ELBO
rirait
agny la parano
Outline
1. Introduction
4. Variational Auto-Encoders
ni sé
ET
agüÉllai
where
still
ài
Deepti
A word about variational auto-encoders
• Variational auto-encoders (VAE) learn to reduce the dimensionality of examples in
a dataset and produces for each example xi a distribution of latent vectors zi such
that overall, latent vectors z1…zN are normally distributed.
1. Introduction
4. Variational Auto-Encoders