Download as pdf or txt
Download as pdf or txt
You are on page 1of 115

Tutorial on Deep Generative

Models
ADITYA GROVER AND STEFANO ERMON
STANFORD UNIVERSITY
IJCAI-ECAI 2018
URL: goo.gl/H1prjP
Introduction

Computational Speech
Computer Vision

Natural Language Processing Computer Vision / robotics

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 2


Introduction

Richard Feynman: "What I cannot create, I do not understand“

Assumption for today: "What I understand, I can create“

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 3


Generative Modeling: Computer Graphics
Cube (color=blue, position=, size, …)
Cylinder (color=red, position=, size,..)
Computer Graphics Computer Vision

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 4


Statistical Generative Models

+
Data Prior Knowledge
(e.g., images of bedrooms) Learning (e.g., physics, materials, ..)

Generative
Data model Prior Knowledge

This tutorial Graphics

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 5


Statistical Generative Models
Model family, loss function,
+ optimization algorithm, etc.
Prior Knowledge
Data
Learning

Image x A probability probability p(x)


distribution
p(x)
Sampling from p(x) generates new images:

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 6


Discriminative vs. generative
ary
nd
ou
n b Y=B , Y=D ,
isio
c
De …

Y=B , Y=D ,

P(Y = Bedroom | X= ) = 0.01 P(Y = Bedroom, X= )


Example: logistic regression, convolutional net, etc. Model of the joint
IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 7
Discriminative vs. generative
Joint and conditional are related by Bayes Rule:
P(Y = Bedroom, X= )
P(Y = Bedroom | X= ) =
P( X= )

Discriminative: X is always given as input, doesn’t need to model P( X= )

Cannot handle missing data P(Y = Bedroom | X= )


IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 8
Conditional Generative Model
Class conditional models:

P(X= | Y = Bedroom) P(X= | Y = Dining Room)

P(X= | Caption = “A black table with 6 chairs”)

A discriminative model is a very simple conditional generative model of Y:

P(Y = Bedroom | X= )
IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 9
Progressive Growing of GANs

Karras et al., 2018

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 10


WaveNet
Generative model of speech signals Text to Speech

Parametric

Concatenative

WaveNet

Unconditional

Music

van den Oord et al, 2016c

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 11


Image Super Resolution
Conditional generative model P( high res image | low res image)

Ledig et al., 2017

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 12


Audio Super Resolution
Conditional generative model P(high-res signal | low-res audio signal)

Low res signal

High res audio signal

Kuleshov et al., 2017

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 13


Machine Translation
Conditional generative model P( English text| Chinese text)

Figure from Google AI research blog.

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 14


Image Translation
Conditional generative model P( zebra images| horse images)

Zhu et al., 2017

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 15


Imitation Learning

Li et al., 2017

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 16


Roadmap
Introduction
Background and Taxonomy

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 17


Statistical Generative Models
Model family, loss Generative
function, optimization model
p(x)
algorithm, etc.
Desirable properties:
Learn
1. Sampling
2. Evaluating p(x)
Data 3. Extracting features
as efficiently as possible
IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 18
Representation - compactness
Suppose x1,x2,x3 are binary variables. P(x1,x2,x3) can be specified with (23-1) =7
parameters
◦ P(0,0,0), P(0,0,1), …, P(1,1,0), P(1,1,1) = 1 – (P(0,0,0) + .. P(1,1,0))

Main challenge: distributions over high dimensional objects

P( X= ) … P( X= ) P( X= )

Too many possibilities!


P( X= ) P( X= ) N black/white pixels à
IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS
2N parameters
19
Representation - factorization
Suppose x1,x2,x3 are binary variables. P(x1,x2,x3) can be specified with (23-1) =7
parameters
◦ P(0,0,0), P(0,0,1), …, P(1,1,0), P(1,1,1)

Definition of conditional probability: P(x1,x2) = P(x1) P(x2|x1)


Chain rule: P(x1,x2,x3) = P(x1) P(x2|x1) P(x3|x1,x2)
Chain rule: P(x1,x2,x3,x4) = P(x1) P(x2|x1) P(x3|x1,x2) P(x4|x1,x2,x3)

Main idea: write as a product of simpler terms

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 20


Representation - factorization
Suppose x1,x2,x3 are binary variables. P(x1,x2,x3) can be specified with (23-1) =7
parameters
◦ P(0,0,0), P(0,0,1), …, P(1,1,0), P(1,1,1)

Definition of conditional probability: P(x1,x2) = P(x1) P(x2|x1)


Chain rule: P(x1,x2,x3) = P(x1) P(x2|x1) P(x3|x1,x2)
Chain rule: P(x1,x2,x3,x4) = P(x1) P(x2|x1) P(x3|x1,x2) P(x4|x1,x2,x3)

Still complex! L
Simple! J Needs a distribution for all
possible values of x1,x2,x3
IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 21
Representation – Bayes Nets
Chain rule: P(x1,x2,x3,x4) = P(x1) P(x2|x1) P(x3|x1,x2) P(x4|x1,x2,x3)
Solution #1 : simply the conditionals (Bayes Nets)
E.g., P(x1,x2,x3,x4) ≈ P(x1) P(x2|x1) P(x3|x1,x2) P(x4|x1,x2,x3) = P(x1) P(x2|x1) P(x3|x2) P(x4|x3)

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 22


Representation – Deep Generative
Models
Chain rule: P(x1,x2,x3,x4) = P(x1) P(x2|x1) P(x3|x1,x2) P(x4|x1,x2,x3)
Solution #2a: use simple functional form for the conditionals
◦ P(x 4|x 1,x 2,x 3) ≈ sigmoid ( a*x 1 + b*x 2 + c*x 3)
◦ Only requires storing 3 parameters
◦ Relationship between x 4 and (x 1,x 2,x 3) could be too simple

Solution #2a: more complex functional form


◦ More flexible
◦ More parameters
x1

x2 x4

x3 Neural network

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 23


Properties of factorized models
Chain rule: P(x1,x2,x3,x4) = P(x1) P(x2|x1) P(x3|x1,x2) P(x4|x1,x2,x3)
Easy to sample fromJ :
◦ Sample x 1~ P(x 1)
◦ Sample x 2~ P(x 2|x 1)
◦ Sample x 3~ P(x 3|x 1,x 2)
◦ …

Easy to evaluate P(x1,x2,x3,x4)J


◦ Just multiply together the terms

Not obvious how to “learn features”/representationsL

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 24


Latent variable models
Latent factors of variation Ethnicity
z z1 zk
Gender Eye color Hair color Pose
! " #, % = ! % ! " (#|%)

Image x
Hou el al., 2016

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 25


Deep Latent Variable Models
z ! & Simple prior (Gaussian)

!" ($|&) Parameterized as a neural net

x
! " $, & = ! & ! " ($|&) Image x log !" $ = log ∫ !" $, & ./
1) Easy to sample from J
2) P(x) intractable L
3) Enables feature learning J
IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 26
Learning in Generative Models
Given: Samples from a data distribution and model family
Goal: Approximate a data distribution as closely as possible

((!"#$# , !% )
!"#$# !%

+, ~!"#$# %∈'
, = /, 0, … , 2 Model family
IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 27
Learning in Generative Models
Given: Samples from a data distribution
Goal: Approximate a data distribution as closely as possible

min '()*+,+ , )$ )
$∈&

Challenge: How should we evaluate and optimize closeness


between the data distribution and the model distribution?

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 28


Maximum likelihood estimation
Given: Samples from a data distribution
Goal: Approximate a data distribution as closely as possible
Solution 1: ! = KL divergence

min ()~+,-.- [−log 4% (x)]


%∈'

• Statistically efficient
• Requires the ability to tractably evaluate or optimize likelihoods

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 29


Maximum likelihood estimation
Tractable likelihoods: Directed models such as autoregressive models
Bayes Net: !" (x1 ,x2 ,x3,x4 ) = !" (x1 ) !" (x2 |x1 ) !" (x3 |x1 ,x2 ) !" (x4 |x3 )

min 45~!&'(' [−log ;" (x)]


"∈3

!&'(' (x1 )!&'(' (x2 |x1 )!&'(' (x3 |x1 ,x2 )!&'(' (x4 |x3 )
#$ ~!&'(' Analytic solution: pick " so that conditional probabilities
$ = *, ,, … , .
match the empirical ones, e.g., !" (x1 ) =!&'(' (x1 )
IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 30
Maximum likelihood estimation
Tractable likelihoods: Directed models such as autoregressive models
Neural Net param.: !" (x1 ,x2 ,x3,x4 ) = !" (x1 ) !" (x2 |x1 ) !" (x3 |x1 ,x2 ) !" (x4 |x3 )
where !" (x3 |x1 ,x2 ) ≈ Neural-Net(x1 , x2; ")
min 45~!&'(' [−log ;" (x)]
"∈3

Approximately solve for " using gradient descent:


• Gradients computed by backprograpagation
#$ ~!&'('
$ = *, ,, … , .
• Stochastic approximations (minibatch)
• Non convex, but works well in practice
IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 31
Maximum likelihood estimation
Tractable likelihoods: Directed models such as autoregressive models
Intractable likelihoods: Undirected models such as restricted Boltzmann
machines (RBM), directed models such as variational autoencoders (VAE)

Alternatives for intractable likelihoods:


- Approximate inference using MCMC or variational inference
- Likelihood-free inference using adversarial training

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 32


Adversarial Training
Given: Samples from a data distribution
Goal: Approximate a data distribution as closely as possible
Solution 2: Two sample test
< = integral probability metric or f-div lower bound
For e.g., if the f-div = Jenson-Shannon:

min max +,~.$ log (3 − 5) x ) − +,~.7898 log[5) x ]


$∈& )∈*

Likelihood-free! Requires samples from .$


IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 33
Generative Adversarial Networks

Figure from
Goodfellow et al., 2014

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 34


Roadmap
Introduction
Background and Taxonomy
Likelihood-based Generative Models
◦ Autoregressive Models

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 35


Likelihood-based Generative Models
• Provide an analytical expression for the log-likelihood, i.e.,
log $% &
• Learning involves (approximate) evaluation of the gradients of the
model log-likelihood with respect to the parameters '
• Key design choices
§ Directed vs. undirected
§ Fully observed vs. latent variable

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 36


Autoregressive Generative Models
• Directed, fully-observed graphical models

• Key idea: Decompose the joint distribution as a product of


tractable conditionals

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 37


Learning and Inference
• Learning maximizes the model log-likelihood over the dataset !

• Tractable conditionals allow for exact likelihood evaluation


§ Conditional evaluated in parallel during training
• Directed model permits ancestral sampling, one variable at a time

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 38


NN-based parameterizations
• Fully-visible sigmoid belief networks (FVSBN)
• Neural autoregressive density estimator (NADE): 1 hidden layer NN

!" #$ #%$ = '() #%$ , +(#%$ ))


Neal, 1992, Larochelle & Murray, 2011

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 39


MLP-based parameterizations
• Masked autoencoder distribution estimation (MADE): multilayer NN

Germain et al., 2015

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 40


RNN-based parameterizations
Effective architecture for learning long-range sequential dependencies

• Hidden layer ℎ" # is a summary of the inputs seen till # − 1


• Output layer &" # specifies parameters for conditionals '" () (*) )
Sutskevar et al., 2011

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 41


RNN-based parameterizations

Pixel-RNN
Char-RNN
Sutskevar et al., 2011, Karpathy, 2015, Theis & Bethge, 2015, van den Oord, 2016a

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 42


CNN-based parameterizations
Convolutions are amenable for parallelization on modern hardware

Lecunn et al., 1998

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 43


CNN-based parameterizations
Masked convolutions preserve Dilated convolutions increase
raster scan order the receptive field

PixelCNN WaveNet
van den Oord et al., 2016a, 2016b, 2016c

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 44


Roadmap
Introduction
Background and Taxonomy
Likelihood-based Generative Models
◦ Autoregressive Models
◦ Variational Autoencoders

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 45


Variational Autoencoders
Directed, latent-variable models, with an inference network

Goal: Maximize the marginal log-likelihood of the dataset


max Σ&∈( log ,$ (.)
$

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 46


Variational Autoencoders
Challenge: Marginal log-likelihood of any data point is intractable

Key ideas
§Approximate the posterior !" ($|&) with a simpler, tractable
distribution () ($|&)
§Cast inference as optimization over the parameters of the model
and the approximate posterior

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 47


Inference as Optimization
Goal: Maximize the marginal log-likelihood of the data is intractable!

./ &, $
log - ./ &, $ = log - 23 ($|&)
23 ($|&)
$ $
z

!" ($|&) () (&|$)

x
IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 48
Inference as Optimization
New Goal: Maximize lower bound to the marginal log-likelihood of
the data is tractable!
./ &, $
log - ./ &, $ = log - 23 ($|&)
23 ($|&)
$ $
67 &,$ z
≥ ∫$ 23 $ & log 8 ($|&)
9

Tightness Condition !" ($|&) () (&|$)

23 $ & = ./ $|&
x
IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 49
Variational Bayes
• Evidence lower bound (ELBO) for the marginal log-likelihood of !

• KL gap depends on the quality of the variational approximation

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 50


The Autoencoder Perspective

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 51


Learning
• Learning maximizes the ELBO for the observed data !
• Gradient estimates with respect to:
1. model parameters ": easy, use direct Monte Carlo
2. variational parameters #: score function estimates

Glynn, 1990, Williams, 1992, Fu, 2006

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 52


Inference
• ELBO and annealed importance sampling give lower bounds on
true, intractable log-likelihood
• Directed model permits ancestral sampling:
! ∼ # ! , % ∼ #& (% |!)
• Latent representations for any % can be inferred via *+ ! %)

Neal, 1998

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 53


Research avenues in a nutshell

%(' ( ) , + , ))
$ ! #
#∗ #.
!∗

!.
Improving variational learning via:
1. Better optimization techniques
2. More expressive approximating families
3. Alternate loss functions
Figure inspired from David Blei’s keynote at AISTATS 2018

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 54


Optimizing variational objectives
• Stochastic Variational Inference (Hoffman et al., 2013)
§ subsample data, fit variational parameters, repeat

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 55


Optimizing variational objectives
• Gradient estimates with respect to variational parameters !
1. Score function with control variates (Wingate & Weber, 2013, Ranganath et al.,
2014, Minh & Gregor, 2014, Gu et al., 2016, Mnih & Rezende, 2016)
"#$ [&' log +' (-)/ - ] = "#$ [&' log +' (-)(/ - − 2)]
2. Reparameterization (Kingma & Welling, 2014, Rezende et al., 2014, Titsias & Lazaro-
Gredilla, 2014)
- ∼ 4 5, 7 8 ⇔ : ∼ 4 0, 1 , - = 5 + :7
• Score function estimators can be applied to most continuous and discrete
distributions
• Reparameterized estimators are applicable to certain continuous distributions
§ Continuous Relaxations to discrete distributions using Gumbel perturbations (Jang et
al., 2017, Maddison et al., 2017)

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 56


Model families - Encoder
•Amortization (Gershman & Goodman, 2015, Shu et al., 2018)
§ Generalization: Variational posterior shares statistical strength across !
§ Scalability: Efficient test-time inference on massive datasets
•Augmenting variational posteriors
• Monte Carlo methods: Importance Sampling (Burda et al., 2015), MCMC (Salimans
et al., 2015, Hoffman, 2017, Levy et al., 2018), Sequential Monte Carlo (Maddison
et al., 2017, Le et al., 2018, Naesseth et al., 2018), Rejection Sampling (Grover et
al., 2018)
• Normalizing flows (Rezende & Mohammed, 2015, Kingma et al., 2016)

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 57


Model families - Decoder
•Powerful decoders such as DRAW (Gregor et al., 2015), PixelCNN
(Gulrajani et al., 2016)
•Parameterized, learned priors (Nalusnick et al., 2016, Tomczak &
Welling, 2018, Graves et al., 2018)

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 58


Rethinking variational objectives
Tighter ELBO does not imply:
• Better samples: Samples and likelihoods are uncorrelated (Theis et al., 2016)
• Informative latent codes: Powerful decoders can ignore latent codes due to
tradeoff in minimizing reconstruction error vs. KL prior penalty (Bowman et
al., 2015, Chen et al., 2016, Zhao et al., 2017, Alemi et al., 2018)
• Higher signal-to-noise ratio in gradient estimates (Rainforth et al., 2018)

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 59


Rethinking variational objectives
Alternatives to the reverse-KL variational objective:
• Renyi’s alpha-divergences (Li & Turner, 2016)
• f-divergences (Makhzani et al., 2015; Mescheder et al., 2017)
• integral probability metrics (Dziugaite et al., 2015; Tolstikhin et al., 2018)
• objectives derived from information theoretic principles (Alemi et al.,
2018, Zhao et al., 2018)

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 60


Roadmap
Introduction
Background and Taxonomy
Likelihood-based Generative Models
◦ Autoregressive Models
◦ Variational Autoencoders
◦ Normalizing Flow Models

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 61


Normalizing flow models
Directed, latent-variable invertible models

Key idea: The mapping between ! and ", given by #$ : ℝ' → ℝ' , is
deterministic and invertible such that " = #$ ! and ! = #$*+ " .

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 62


Normalizing flow models
• Use change-of-variables to relate densities in the spaces defined by
! and ":

• Transformations shape densities via volume expansions and


contractions
• Determinant quantifies the per-unit change in volume

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 63


A flow of transformations
Invertible transformations can be composed with each other

,=0 ,=1 ,=2 , = 10

Planar Flows
! = # + %ℎ(() # + *)

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 64


Rezende & Mohamed, 2016
Learning and inference
• Learning maximizes the model likelihood over the dataset !

• Exact likelihood evaluation via inverse transformations


• Ancestral sampling via forward transformations

• Latent representations inferred via inverse transformations

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 65


Desiderata for flow models
• Simple prior that allows for sampling and tractable likelihood
evaluation E.g., isotropic Gaussian
• Invertible transformations with tractable evaluation:
§ Likelihood evaluation requires efficient evaluation of inverse
§ Sampling requires efficient evaluation of inverse
• Tractable evaluation of determinants of Jacobian for large !
§ Computing determinants for an ! × ! matrix is #(!% ): prohibitive!
§ Key idea: Determinant of triangular matrices is the product of the diagonal
entries, i.e., an #(!) operation

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 66


Triangular Jacobians

' = )(+)

Lower triangular Upper triangular


"# depends on $%# "# depends on $&#

Triangular Jacobian → Autoregressive structure


IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 67
Flow transformations in practice
• Nonlinear Independent Components Estimation (NICE)
Additive coupling at intermediate layers and scaling at final layer

• Real-valued Non-Volume Preserving (Real NVP)


Location-scale coupling at all layers

• NICE and Real-NVP have same complexity for evaluating


forward and inverse transformations
Dinh et al., 2015, 2017

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 68


Flow transformations in practice
• Masked and inverse autoregressive flows (MAF, IAF) trade-off
computational complexity of forward and inverse transformations
§ MAF → inverse J → likelihood-based learning e.g., VAE decoder
§ IAF → forward J → likelihood-free learning e.g., VAE encoder
• Parallel Wavenet
§ Teacher: MAF-like model learned using maximum likelihood estimation
§ Student: a distilled IAF model trained by minimizing KL divergence with teacher
model à three orders of magnitude faster generation than teacher!

Papamakarios et al., 2017, Kingma et al., 2016, van den Oord et al., 2018

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 69


Glow

Kingma and Dhariwal, 2018

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 70


Roadmap
Introduction
Background and Taxonomy
Likelihood-based Generative Models
◦ Autoregressive Models
◦ Variational Autoencoders
◦ Normalizing Flow Models
Likelihood-free Generative Models
◦ Generative Adversarial Networks

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 71


Likelihood-free generative models
• Optimal generative model: best samples and highest log-likelihoods
• For imperfect models, log-likelihoods and samples are uncorrelated
• Great held-out log-likelihoods, poor samples. E.g., For a noise mixture model
!" # = 0.01 !()*) # + 0.99!-./01 # , log !" # ≥ log !()*) # − log 100
• Great samples, poor held-out log-likelihoods. E.g., Memorizing training set
• Likelihood-free learning considers objectives that do not depend
directly on the likelihood function E.g., 2-sample test objectives

Theis et al., 2016, Mohamed & Lakshminarayanan, 2016

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 72


Generative Adversarial Networks
• Directed, latent variable generative models with a discriminator

Generative Model
• The mapping between ! and ", given by #$ , is deterministic

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 73


Generative Adversarial Networks
Key idea: A two player generator-discriminator game
• Discriminator distinguishes between real dataset samples and fake
samples from the generator
• Generator generates samples that can fool the discriminator

Training procedure
IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 74
Distributional perspective - Discriminator

• For a fixed generator, discriminator is maximizing negative cross


entropy
• Optimal discriminator is given by:

Goodfellow et al., 2014

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 75


Distributional perspective - Generator

• For the optimal discriminator, generator is minimizing (shifted and


scaled) Jenson-Shannon divergence (JSD) between !" and !#$%$

Jenson-Shannon Divergence
• Optimal generator is given by !" = !#$%$
Goodfellow et al., 2014

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 76


A minimax learning objective
During learning, generator and discriminator are updated alternatively

Data distribution Discriminator Generator

Current state Update discriminator Update generator Convergence


Goodfellow et al., 2014
IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 77
Evaluation
• Likelihoods may not be defined or tractable

• Directed model permits ancestral sampling


• For labelled datasets, metrics such as inception scores and Frechet inception
distance quantify sample diversity and quality using pretrained classifiers

Wu et al., 2017, Grover et al., 2018, Salimans et al., 2016, Heusel et al., 2018

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 78


Training dynamics
If the generator updates are made in function space and discriminator is optimal
at every step, then the generator is guaranteed to converge to the data
distribution ← unrealistic assumptions!
In practice, GANs suffer from mode collapse

Arjovsky et al., 2017

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 79


Stabilizing training
• Theory of GAN convergence: Arjovsky & Bottou, 2017, Arora et al., 2017,
Mescheder et al., 2017, Roth et al., 2017, Nagarajan & Kolter, 2018, Li et al., 2018
• JSD can be replaced f-divergences and integral probability metrics (Nowozoin
et al., 2016, Li et al., 2017, Arjovsky et al., 2017)
• Practical architectures, regularizers, optimization algorithms: Radford et al.,
2015, Poole et al., 2016, Salimans et al., 2016, Gulrajani et al., 2017, Metz et al.,
2017, Miyato et al., 2018, Karras et al., 2018 and many more!
https://github.com/soumith/ganhacks
◦ How to Train a GAN? Tips and tricks to make GANs work by Soumith Chintala

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 80


Inferring latent representations
Prefinal layer of discriminator (Salimans et al., 2016)
Controlled generation: InfoGAN (Chen et al., 2016)

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 81


Inferring latent representations
Learn explicit encoders along with generators: Adversarially Learned
Inference(Dumoulin et al., 2017), BiGAN (Donahue et al., 2017)
Conditional GANs for image translation: Pix2Pix (Isola et al., 2017),
CycleGAN (Zhu et al., 2017)

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 82


The GAN Zoo
https://github.com/hindupuravinash/the-gan-zoo
◦ A list of all named GANs by Avinash Hindupur

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 83


Roadmap
Introduction
Background and Taxonomy
Likelihood-based Generative Models
◦ Autoregressive Models
◦ Variational Autoencoders
◦ Normalizing Flow Models
Likelihood-free Generative Models
◦ Generative Adversarial Networks
Applications: Semi Supervised learning, imitation learning, adversarial examples,
compressed sensing
IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 84
Semi Supervised Learning
( ,5 ) = (x , y ) 1 1


Labeled data
( , 0 ) = (xN , yN ) How can we leverage
unlabeled data?
( , ? ) = (xN+1 , ?)
Unlabeled

data
( , ? ) = (xN+M , ?)

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 85


Latent-feature discriminative model (M1)
5 0
Step 2: train classifier (e.g., SVM),
TSVM
using z as features, only using the
labeled part
z1 z2 zK

!" ($|&)
Step 1: learn a latent var. generative
model on labeled and unlabeled data
Kingma et al, 2014

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 86


Generative model (M2)
( ,5 ) = (x , y ) 1 1 z y ~ [0,..9]
… !" ($|&, ()

( , 0 ) = (xN , yN )
!" ((|&)
Inference net is a classifier!
( , ? ) = (xN+1 , ?) x
… Class-conditional VAE: a fully
probabilistic model where the label (y)
( , ? ) = (xN+M , ?) is another latent variable
Kingma et al, 2014

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 87


Generative model (M2)
( ,5 ) = (x , y ) 1 1


Labeled data
( , 0 ) = (xN , yN ) log $% &' , )' = log + $% &' , )' , -
,
≥ ELBO
( , ? ) = (xN+1 , ?)
… Unlabeled data
log $% &. = log + $% &., )
( , ? ) = (xN+M , ?) /

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 88


Experimental Results on MNIST

Semi-supervised Learning with Deep Generative Models, 2014

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 89


( , 5 ) = (x , y)

Generative vs. discriminative


Favorite disc. model Favorite gen. model

Generative approach. Maximize log $% &, (

Discriminative approach. Maximize log $% (|&

Multi-conditional (McCallum). Maximize a log $% (|& + + log $% &


◦ a=1 , b=0 à Discriminative
◦ a=1=b à Generative
They need to share parameters: from same model!

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 90


Attempt 1: Discriminative + Generative
z1 z2 zK

Generative model

Issue: The two components


are independent of each
other!
Discriminative model Can be optimized separately

5 0
IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 91
Semi Supervised Learning
Idea 1: couple any discriminative
model and generative one
through a shared latent space

Idea 2: train using a generalized


multi-conditional objective for
divergences (adversarial)
Kuleshov et al, 2017

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 92


Results on Semi Supervised Learning

MNIST (100 labels) SVHN (1000 labels)

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 93


Imitation Learning
Input: demonstrations provided by an expert
i i i i n
{(s0 , a0 , s1 , a1 , . . . )}i=1 ⇠ ⇡E

Goal: find policy that generates a similar behavior


IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 94
Imitation Learning
Several existing approaches:
Behavioral cloning Latent var. z

Inverse reinforcement learning Policy p

Apprenticeship learning Sim ulator


(Environm ent)

Our approach: a generative latent


variable model States, actions

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 95


Generative Adversarial Imitation Learning
D tries to D tries to
output 0 output 1

Differentiab Differentiab
le function le function
D D

Sample Sample
from expert from model

Black box Simulator


simulator (Environment)
Generator
G Differentiable
Ho and Ermon, 2016 function P

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 96


How to optimize the objective
Previous Apprenticeship learning work:
◦ Full dynamics model
◦ Small environment
◦ Repeated RL
We propose: gradient descent over policy parameters (and
discriminator)
Trust region policy optimization
Ho et al., 2016.

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 97


Results
Input: driving demonstrations (Torcs)

Output policy:
From raw visual inputs

Li et al. 2017

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 98


Results

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 99


Experimental results

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 100


Latent structure in demonstrations
Human Latent variables Environment
Observed
Policy
model Z Behavior

Semantically meaningful latent structure?

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 101


InfoGAIL Example
Pass left (z=0)? Observed
Environment
Pass right (z=1)? Behavior

Pass left (z=0) Pass right (z=1)

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 102


InfoGAIL Example
Turn inside (z=0)? Observed
Environment
Or outside (z=1)? Behavior

Turn inside (z=0) Turn outside (z=1)

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 103


Anomaly detection - Adversarial
examples
• State-of-the-art classifiers can be fooled by adding imperceptible
noise

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 104


Detecting adversarial examples

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 105


Detecting adversarial examples

Figure: (Left) Likelihoods of different adversarial examples. (Right) ROC curves for detecting various
attacks.
Song et al., 2018

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 106


PixelDefend
• Intuition: The harm of adversarial examples might be reduced if
they can be modified to have higher likelihood.

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 107


PixelDefend Experiments

Figure: Adversarial images (left) and purified images after PixelDefend (right).

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 108


Compressed Sensing
Goal: Efficient acquisition and recovery of high dimensional signals

High dimensional Low dimensional High dimensional


(unobserved) (observed) (recovered)
Acquisition: Low dimensional random projections
Recovery: Inverse optimization with structural assumptions
IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 109
Setup
Solve for an ! dimensional signal " using # linear measurements $.
$ = A" + (
Compressed sensing: # << !

Underdetermined system of equations have infinite solutions


… unless we impose structure on "

Candes & Tao, 2005, Donoho, 2006, Candes et al., 2006

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 110


Structural Assumptions

Bora et al., 2017, Dhar et al., 2018

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 111


Results
0.35

0.30 0.25

ℓ2 error (per pixel)


ℓ2 error (per pixel)

0.25 0.20
0.20
0.15
0.15
0.10
0.10

0.05 0.05

0.00 0.00

50

100

200

300

400
500

750
50

100

200

300

400
500

1umber of measurements 750 1umber of measurements

MNIST Omniglot
IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 112
Transfer Compressive Sensing
Transferring from source, data-rich to target, data-hungry domains

Grover & Ermon, 2018

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 113


Beyond images, audio, text
Videos: Extremely high-dimensional spatiotemporal reasoning
(Ranzato et al., 2014, Finn et al., 2018)
Graphs: Scalable, permutation-invariant architectures (Kipf et al.,
2016, Hu et al., 2017, Bojchevski et al., 2018, Grover et al., 2018, Li
et al., 2018, Wang et al., 2018, You et al., 2018)
Reinforcement learning: Online learning with respect to drifting
distributions (Pfau & Vinayls, 2018, Kansky et al., 2017, Ostrovski et
al., 2017)

IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 114


Outlook
1. What is the killer application of generative models?
• Model based-RL?
2. What is the right evaluation metric?
• Fundamentally, it is unsupervised learning. Ill-defined.
3. Are there fundamental tradeoffs in inference ?
• Sampling
• Evaluation
• Latent features
4. Provable guarantees?
IJCAI-ECAI 2018 TUTORIAL: DEEP GENERATIVE MODELS 115

You might also like