CM Latent - Models 2022

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

Latent Variable Models

M2 IASD - Y. Chevaleyre
Outline

1. Introduction

2. Gaussian mixtures and the EM algorithm

3. Variational Analysis of the EM algorithm

4. Variational Auto-Encoders

5. Exercise: probabilistic PCA


Outline

1. Introduction

2. Gaussian mixtures and the EM algorithm

3. Variational Analysis of the EM algorithm

4. Variational Auto-Encoders

5. Exercise: probabilistic PCA


Introduction: what are latent variable models ?
• A latent variable zi is a missing information about an example xi
• example:
• In the clustering setting
• in Dimensionality reduction setting input bai É
• in the case of missing data

• Usually, best learning algorithm is ExpectationMaximisation (EM)
Eh p

tl
ÉÏEÏT

themodel
and the coordinates
low dim are missing
Probabilistic Machine Learning
• The most appropriate framework for handling latent variables is Probabilistic ML.
• The Probabilistic Machine Learning is when you assume that your data (xi or yi or
both) was generated according to some generative model with some unknown
parameters

• Learning a model is done by MLE or MAP


• e.g: Linear Discriminant Analysis, Linear Regression, …
ÔTÉ ni ER They au nom random 1

where Ein No 1
Ï
III
Yi Yi ni E

Ô
Ï
P y Y v10
ILE find angngex log
É at bgl fi Ota
aggex É
by Poy lol engine
Probabilistic Machine Learning
• This generative model may include unobserved variables. We call these Latent Variables.

Latent Diffusion Processes (e.g. DALLE-2)

Topic Modeling (LDA)

Gaussian Mixture Models (GMM)


Variational AutoEncoders (VAE)
Outline

1. Introduction

2. Gaussian mixtures and the EM algorithm

3. Variational Analysis of the EM algorithm

4. Variational Auto-Encoders

5. Exercise: probabilistic PCA


Reminder: The multivariate Normal Distribution
• Gaussienne univariée
mi
Nftp.ry fyexpftk 2
• Gaussienne multivariée: est logo_Ça la m
logetCaprio

NkIms ap
f Ecarts n
n

• Notation:
• 𝒩 (μ, σ 2) and 𝒩 (μ, Σ) represent the normal distribution.
• We write X ∼ 𝒩 (μ, σ 2) or X ∼ 𝒩 (μ, Σ)
• 𝒩 (x ∣ μ, Σ) represents the p.d.f. of 𝒩 (μ, Σ) at point x
Reminder: The multivariate Normal Distribution

• Gaussienne multivariée:
Gaussian Mixture Models
• One of the simplest latent variable model is Gaussian Mixture Models (GMM)
• same as LDA without class information. Recall LDA with 2 classes first:

supervisedlearning
setting
unsupervised

Lana
Et and N'MF2
NIM distributions
ÏÏÏ
distributions
NIM.tl Ncaa Et
distributions
and Bernouilli Berlin ÎÉÉÜ Üeach i s N

Be ITI with Tr A Ta
Zinnultica a
I then air ctfu E I
such that for each i 1 N if zie
if ziz The airtime
you Be ten xirtlua.CI
if Yi 1 then air Nlm Et zi k
ai tenais z is not seen by the leanne
É
K 2 des
80 tre 20
Tpe
un 1
Mi O
1 82 1
On
1 Draw informally
p xp z 11 p XIZI on a
graph
21 Doran PSX Z Il pix 2 21 on anothergraph
Dron PCE I IX on graph
a
3
4 Built points drown from the GMM process

p X Zek p XIZ klplz kl PCXIZ kl.IT

IIIÈ Total law ofpub


PELÉ
3 Zal A a
p platEPCAIB pibil
p Xx X Z 1
plz p PCX.ae Za2 p z
p a

n
zPpjfIIIjI
plzsIX I ak
Gaussian Mixture Models: The log-likelihood

Cet Or ta k and O O Ok
Mk
en
p X milk p X ni
l Z 1 Ts p X ki IZ
2 Tat plxeailz kl.TK
the low oftotal proba
by
Ë plx xilz kd.tk

SÉ ni lmes Oa Tes

_sen lol
theleg likelihood LL 6 bg PlaPfx_ni
Éloy lol

Max likelihood principle


Enleg É N ni loal.tk
Ô agngx Luol fork t
that K I T is know and quel to 1 But had
exercise
psy
assume
find Us by max likelihood for K 1
LUM E Étai M est non conned
can't solve
â Lam IE maïétky
GMM with K=1 case is easy
GMM with K>1. The Complete Log Likelihood
• For K > 1, maximising the likelihood is hard
• Idea: optimize a surrogate instead: the Complete Log Likelihood (CLL)
• The CLL assumes we know z1…zN so we still can’t compute it either
PC ai an Zi Zu lo
CLL G z zu leg
Plait oal.tk
És É zigleg

Nemat
Naze _ÉÉgleexaphas
Nan
Izi 3
parasite
f
GMM with K>1. The Expected Complete Log Likelihood
• Idea: optimize a surrogate instead: the Complete Log Likelihood (CLL)
• The CLL assumes we know z1…zN so we still can’t compute it either
• But if for each i we can have an estimate the distribution p(zi = k ∣ xi; θ)̂
then we can compute the Expected CLL. Let qi(k) be this estimation.

kle p Zi k xii
q en
L'air
EargiÉeï É fille log Plui Eeklo
È q kllegraitos Tes

L 9,0 É 219,0 i
Ezra _zurquCCLLIO Zi Zn
The EM Algorithm
• EM Algorithm:
1.Initialize θ ̂ arbitrarily qaxplzi kkiid
2.E-Step: for each i ∈ {1…N}, k ∈ {1…K}
fzipjh.IT te
compute qi(k) = p(zi = k ∣ xi; θ)̂

3.M-Step: compute θ ̂ ← arg max ℒ(q, θ)


θ
is
fÉïïïÈ
a
sigmoidfor Kr
• Details:
augmax219,0 agnaxÊEzing
t
ziesleg Plait e tk

É ÉÉ aile at ent é

MetzÉ
for fixed O I get
ki 9 1k
The EM Algorithm
Outline

1. Introduction

2. Gaussian mixtures and the EM algorithm

3. Variational Analysis of the EM algorithm

4. Variational Auto-Encoders

5. Exercise: probabilistic PCA


Variational Analysis of EM: The ELBO (1)
• We really want to optimise the likelihood p(x1…xN ∣ θ), but instead we optimise the a
surrogate: the expected CLL ℒ(q, θ). What is the link ?
N
ℒ(q, θ, i) where ℒ(q, θ, i) = 𝔼zi∼qi [p(X = xi, zi = k ∣ θ)]

Recall that ℒ(q, θ) =
• i=1

• Let us pick a single point x and compute its likelihood


ln P x 10 ln
É Pla z ki o
en
L
E qu'ki
PITIÉ e g PIÉ
JerzjIaa ans s'EN
en philo Ez que PIII

amuseteeïïe
Variational Analysis of EM: The ELBO (2)
• We really want to optimise the likelihood p(x1…xN ∣ θ), but instead we optimise the a
surrogate: the expected CLL ℒ(q, θ). What is the link ?
N
ℒ(q, θ, i) where ℒ(q, θ, i) = 𝔼zi∼qi [p(X = xi, zi = k ∣ θ)]

Recall that ℒ(q, θ) =
• i=1
close the ELBo to the loglikelihood
How is

la Plata ELBO n o
g ln Pluto Eau
la
PITIÉ
Ez gu ln Pluto en PIÈ
Evan hfEI.nl
KL 9 Z Il plz I n o

for any two discretedistributions pland p over 41 K4


recall that KL pip Épizlleg

Two properties kelp p a
KLSpip o
iff p p
then Kel l D and ln plaide tro
if gu plein d
o
propertyoftheELBI
lo
f9 1 P P
Bolson paire
Property y a c méat
la Pluto 2 qu o al Entropy qd t KL qu Il p1211,01
p Zix of them le Plato L Qx o a Entropyqul
If qu
and la Pluto agnex 269nA sel
aggax
angmaxELBOGoal
Variational Analysis of EM: reinterpreting the algorithm

• EM Algorithm:
1. Initialize θ ̂ arbitrarily
2.E-Step: for each i ∈ {1…N}, k ∈ {1…K}
compute qi(k) = p(zi = k ∣ xi; θ)̂
ELBO(xi, θ,̂ qi)
q1…qN ∑
q1…qN = arg max
i

3. M-Step: compute θ ̂ ← arg max ℒ(q, θ)


θ
θ ̂ = arg max

ELBO(xi, θ, qi)
θ
i

Ïï

agg map
ELBO
rirait
agny la parano
Outline

1. Introduction

2. Gaussian mixtures and the EM algorithm

3. Variational Analysis of the EM algorithm

4. Variational Auto-Encoders

5. Exercise: probabilistic PCA


A word about auto-encoders
• Standard auto-encoders learn to reduce the dimensionality of examples in a
dataset

ni sé

ET
agüÉllai
where
still
ài
Deepti
A word about variational auto-encoders
• Variational auto-encoders (VAE) learn to reduce the dimensionality of examples in
a dataset and produces for each example xi a distribution of latent vectors zi such
that overall, latent vectors z1…zN are normally distributed.

p uto p alz plaide


Outline

1. Introduction

2. Gaussian mixtures and the EM algorithm

3. Variational Analysis of the EM algorithm

4. Variational Auto-Encoders

5. Exercise: probabilistic PCA


Probabilistic PCA

You might also like