Professional Documents
Culture Documents
Dlvu Lecture07
Dlvu Lecture07
Implicit models
Jakub M. Tomczak
Deep Learning
Generative
model
Autoregressive models
Stable Exact Slow No
(e.g., PixelCNN)
Flow-based models
Stable Exact Fast/Slow No
(e.g., RealNVP)
Implicit models
Unstable No Fast No
(e.g., GANs)
Prescribed models
Stable Approximate Fast Yes
(e.g., VAEs)
3
GENERATIVE MODELS
Autoregressive models
Stable Exact Slow No
(e.g., PixelCNN)
Flow-based models
Stable Exact Fast/Slow No
(e.g., RealNVP)
Implicit models
Unstable No Fast No
(e.g., GANs)
Prescribed models
Stable Approximate Fast Yes
(e.g., VAEs)
4
DENSITY NETWORKS
5
GENERATIVE MODELING WITH LATENT VARIABLES
Generative process:
1. z ∼ pλ(z)
2. x ∼ pθ(x ∣ z)
The log-likelihood function:
∫
log p(x) = log pθ(x ∣ z)pλ(z)dz
Generative process:
1. z ∼ pλ(z)
2. x ∼ pθ(x ∣ z)
The log-likelihood function:
∫
log p(x) = log pθ(x ∣ z)pλ(z)dz
1 S
exp (log pθ (x ∣ zs))
∑
≈ log
S s=1
It could be estimated
by MC samples.
7
Generative process:
1. z ∼ pλ(z)
2. x ∼ pθ(x ∣ z)
The log-likelihood function:
∫
log p(x) = log pθ(x ∣ z)pλ(z)dz
1 S
exp (log pθ (x ∣ zs))
∑
≈ log
S s=1
It could be estimated
by MC samples. If we take standard Gaussian prior
8 we need to model p(x|z) only!
DENSITY NETWORK
Let’s consider a
function f that
transforms z to x.
9
MacKay, D. J., & Gibbs, M. N. (1999). Density networks. Statistics and neural networks: advances at the interface. Oxford
DENSITY NETWORK
Let’s consider a
function f that
transforms z to x.
10
MacKay, D. J., & Gibbs, M. N. (1999). Density networks. Statistics and neural networks: advances at the interface. Oxford
DENSITY NETWORK
Let’s consider a
function f that
transforms z to x.
DENSITY NETWORK
Let’s consider a
function f that
transforms z to x.
∫
log p(x) = log pθ(x ∣ z)pλ(z)dz
1 S
exp (log pθ (x ∣ zs))
∑
≈ log
S s=1
Training procedure:
1. Sample multiple z’s from the prior (e.g., standard Gaussian).
2. Apply log-sum-exp-trick, and apply backpropagation.
13
DENSITY NETWORKS
∫
log p(x) = log pθ(x ∣ z)pλ(z)dz
1 S
exp (log pθ (x ∣ zs))
∑
≈ log
S s=1
Training procedure:
1. Sample multiple z’s from the prior (e.g., standard Gaussian).
2. Apply log-sum-exp-trick, and apply backpropagation.
Drawback: It scales badly in high-dimensional cases…
14
DENSITY NETWORKS
Advantages Disadvantages
✓ Non-linear transformations. - No analytical solutions.
✓ Allows to generate. - No exact likelihood.
- It requires a lot of samples from
the prior.
- Fails in high-dim.
- It requires an explicit distribution
(e.g., Gaussian).
15
DENSITY NETWORKS
Advantages Disadvantages
✓ Non-linear transformations. - No analytical solutions.
✓ Allows to generate. - No exact likelihood.
- It requires a lot of samples from
the prior.
- Fails in high-dim.
- It requires an explicit distribution
(e.g., Gaussian).
16 Can we do better?
IMPLICIT DISTRIBUTIONS
17
THE IDEA BEHIND IMPLICIT DISTRIBUTIONS
p(z)
z
0
18
p(z)
z
0
But now, we don’t specify the distribution (e.g., MoG), but use a flexible
19
transformation directly to return an image. This is now implicit.
20
p(x | z) = δ (x − f(z))
21
p(x | z) = δ (x − f(z))
However, now we cannot use the likelihood-based approach, because
ln δ (x − f(z)) is ill-defined.
22
p(x | z) = δ (x − f(z))
However, now we cannot use the likelihood-based approach, because
ln δ (x − f(z)) is ill-defined.
We need to use a different approach.
23
24
BEFORE WE GO INTO MATH…
25
BEFORE WE GO INTO MATH…
A fraud
26
BEFORE WE GO INTO MATH…
A fraud
An art expert
27
BEFORE WE GO INTO MATH…
A fraud
An art expert
28
BEFORE WE GO INTO MATH…
A fraud
An art expert
29
A fraud
An art expert
The expert assesses
a painting and
gives her opinion.
30
An art expert
The expert assesses
a painting and
gives her opinion.
31
Hmmm
A FAKE!
A fraud
An art expert
32
…
Hmmm
PABLO!
A fraud
An art expert
33
…
Hmmm
PABLO!
A fraud
An art expert
34
…
35
HOW WE CAN FORMULATE IT?
Generator /fraud/
36
HOW WE CAN FORMULATE IT?
Discriminator /expert/
37
HOW WE CAN FORMULATE IT?
1. Sample z.
2. Generate G(z).
3. Discriminate
whether given
image is real or
fake.
38
1. Sample z.
2. Generate G(z).
3. Discriminate
whether given
image is real or
fake.
39
ADVERSARIAL LOSS
ADVERSARIAL LOSS
ADVERSARIAL LOSS
ADVERSARIAL LOSS
ADVERSARIAL LOSS
ADVERSARIAL LOSS
46
47
𝔼
𝔼
Learning:
1. Generate fake images, and minimize wrt. G.
2. Take real & fake images, and maximize wrt. D.
48
𝔼
𝔼
GANS
import torch.nn as nn
class GAN(nn.Module):
def __init__(self, D, M):
super(GAN, self).__init__()
self.D = D
self.M = M
49
GANS
50
GANS
51
GANS
We can use two optimizers, one for d_real & d_gen, and one for d_gen.
52
GENERATIONS
53
Salimans, T., et al. (2016). Improved techniques for training GANs. NeurIPS
GANS
Advantages Disadvantages
✓ Non-linear transformations. - No exact likelihood.
✓ Allows to generate. - Unstable training.
✓ Learnable loss. - Missing mode problem (i.e., it
doesn’t cover the whole space).
✓ Allows implicit models.
- No clear way for quantitative
✓ Works in high-dim.
assessment.
54
WASSERSTEIN GAN
Before, the motivation for the adversarial loss was the Bernoulli
distribution. But we can use other ideas.
55
WASSERSTEIN GAN
Before, the motivation for the adversarial loss was the Bernoulli
distribution. But we can use other ideas.
For instance, we can use the earth-mover distance:
WASSERSTEIN GAN
Before, the motivation for the adversarial loss was the Bernoulli
distribution. But we can use other ideas.
For instance, we can use the earth-mover distance:
WASSERSTEIN GAN
Before, the motivation for the adversarial loss was the Bernoulli
distribution. But we can use other ideas.
For instance, we can use the earth-mover distance:
Thank you!