3 GANs

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 50

Generative Modeling

Generative adversarial networks

Denis Derkach, Artem Ryzhikov, Sergei Popov

Laboratory for methods of big data analysis

Spring 2023
In this Lecture

▶ Generative Adversarial Networks


– algorithm statement;
– ideal case;
– shortcomings of vanilla algorithm;
– proposed shortcuts.

D.. Derkach Generative Modeling Spring 2023 2


Idea


Reminder: 𝑓-divergence

▶ For convex 𝑓(. ), 𝑃 and 𝑄 some distributions, we define 𝑓 -divergence:


"#
𝐷! (𝑃| 𝑄 = ∫ 𝑞 𝑥 𝑓 $ #
𝑑𝑥.

D.. Derkach Generative Modeling Spring 2023 4


Reminder: 𝑓-divergence

▶ For convex 𝑓(. ), 𝑃 and 𝑄 some distributions, we define 𝑓 -divergence:


"#
𝐷! (𝑃| 𝑄 = ∫ 𝑞 𝑥 𝑓 $ #
𝑑𝑥.

D.. Derkach Generative Modeling Spring 2023 5


Reminder: 𝑓-divergence Convergence

▶ For convex 𝑓(. ), 𝑃 and 𝑄 some distributions, we define 𝑓 -divergence:


"#
𝐷! (𝑃| 𝑄 = ∫ 𝑞 𝑥 𝑓 $ #
𝑑𝑥.

D.. Derkach Generative Modeling To optimize 𝑟𝐾𝐿 properly we need access to true PDF, p(x).
Spring 2023 6
Reminder: Variational Lower Bound

▶ For convex 𝑓(. ), 𝑃 and 𝑄 some distributions, we define 𝑓 -divergence:


"#
𝐷! (𝑃| 𝑄 = ∫ 𝑞 𝑥 𝑓 $ #
𝑑𝑥.

▶ Lower bound can be written:


𝐷! (𝑃| 𝑄 ≥ max 𝔼#~' 𝑇 𝑥 − 𝔼#~( 𝑓 ∗ 𝑇 𝑥 ,
% #

𝑇 𝑥 is some random function.


▶ The tight boundary can be estimated for each 𝑓 -divergence (𝑇 ∗ (𝑥)).
*" #
▶ For JS-divergence 𝑇∗ 𝑥 = log " # +$ # , 𝑓 ∗ 𝑥 = − log 2 − exp 𝑡 .

D.. Derkach Generative Modeling Spring 2023 7


Lower Bound for JS

P(x) Q(x)

x x

2𝑝 𝑥 2𝑞 𝑥
JS(𝑃| 𝑄 ≥ 𝔼#~' log + 𝔼#~( log
𝑝 𝑥 +𝑞 𝑥 𝑝 𝑥 +𝑞 𝑥

It would be interesting to construct something close.


D. Derkach Spring 2023 8
Adversarial optimization


Rationale
▶ Need to optimize the model 𝑞, without the direct access to the 𝑝 𝑥 .

▶ Instead of minimizing over some analytically defined divergence with


parameter 𝜙, one could minimize over ”learned divergence”.

D.. Derkach Generative Modeling Spring 2023 10


Generator
▶ 𝐺, is a generator. It should sample from a random noise:
𝑧- ∼ 𝑁 0; 1 ;
𝑥- = G, 𝑧- .
▶ Our aim is 𝐺, as a neural network.
▶ We thus have a sample:
𝑥- ~𝑞, 𝑥
▶ 𝐺, can be defined in many ways. For example, physics generator.

Borisyak M et al. PeerJ Computer Science 6:e274 (2020)

D.. Derkach Generative Modeling Spring 2023 11


Discriminator
▶ Add a classifying neural network, discriminator 𝑫𝝓, to distinguish
between the real and generated samples.
▶ Optimize:

max(𝔼#~"(#) log (𝐷/ 𝑥 ) + 𝔼#~$! (#) log(1 − 𝐷/ 𝑥 ))


/

Real samples Generated samples

D.. Derkach Generative Modeling Spring 2023 12


G+D recap

max(𝔼#~"(#) log (𝐷/ 𝑥 ) + 𝔼#~$! (#) log(1 − 𝐷/ 𝑥 ))


/

min 𝔼2~3(4;6) log(1 − 𝐷/ 𝐺, (𝑧) )


/

D.. Derkach Generative Modeling Spring 2023 13


Training at a Glance

For D and G defined as neural networks, we can use backpropagation.

D.. Derkach Generative Modeling Spring 2023 14


Optimal Solution

𝐶 𝐺 = max 𝑉 𝐺, 𝐷 =
7
= 𝔼#~"(#) log(𝐷/∗ 𝑥 ) + 𝔼#~$! (#) log(1 − 𝐷/∗ 𝑥 ) =

𝑝 𝑥 𝑞 𝑥
= 𝔼#~' log + 𝔼#~$! (#) log
𝑝 𝑥 +𝑞 𝑥 𝑝 𝑥 +𝑞 𝑥

D.. Derkach Generative Modeling Spring 2023 15


Lower Bound Reminder
▶ In case of ideal discriminator:

𝑝 𝑥 𝑞 𝑥
𝐶 𝐺 = 𝔼#~' log + 𝔼#~( log
𝑝 𝑥 +𝑞 𝑥 𝑝 𝑥 +𝑞 𝑥

▶ This can be compared to variational bound:


*" # *$ #
JS(𝑃| 𝑄 ≥ 𝔼#~"(#) log + 𝔼#~$(#) log
" # +$ # " # +$ #

▶ Difference is only in log 4

D.. Derkach Generative Modeling Spring 2023 16


Optimal Solution

D.. Derkach Generative Modeling Spring 2023 17


GAN algorithm

D.. Derkach Generative Modeling Spring 2023 18


GAN results

I. Goodfellow,et al. Generative Adversarial Networks, NIPS 2014


D.. Derkach Generative Modeling Spring 2023 19
GAN First Paper Disclaimer

While we make no claim that these samples are better than


samples generated by existing methods, we believe that these
samples are at least competitive with the better generative
models in the literature and highlight the potential of the
adversarial framework.

I. Goodfellow, Generative Adversarial Networks, NIPS 2014


D.. Derkach Generative Modeling Spring 2023 20
Enhancing GAN


Unsupervised Feature Learning

E Denton et al. Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks
D.. Derkach Generative Modeling Spring 2023 22
LAPGAN: results

D.. Derkach Generative Modeling Spring 2023 23


Convolutional Layers are Here to Help

▶ pooling layers à convolution layers.


▶ Use batchnorm.
▶ No fully connected hidden layers.
▶ ReLU activation in generator.
▶ LeakyReLU activation in the discriminator for all layers.
A. Radford et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, ICLR 2016
D.. Derkach Generative Modeling Spring 2023 24
DCGAN: results

D.. Derkach Generative Modeling Spring 2023 25


Walking in the Latent Space

TV Window

D.. Derkach Generative Modeling Spring 2023 26


Arithmetic in the Latent Space

D.. Derkach Generative Modeling Spring 2023 27


GAN problems


Game Approach Problems

Martin Arjovsky, Towards Principled Methods for Training Generative Adversarial Networks , ICLR17
D.. Derkach Generative Modeling Spring 2023 29
Mode Collapse

• GANs choose to generate a small


number of modes due to a defect in
the training procedure, rather than
due to the divergence they aim to
minimize.
I. Goodfellow NIPS 2016 Tutorial: Generative
Adversarial Network

Luke Metz et al Unrolled Generative Adversarial Networks ICLR 2017


D.. Derkach Generative Modeling Spring 2023 30
Mode Collapse
▶ For fixed D:
– 𝐺 tends to converge to a point
𝑥 ∗ that fools 𝐷 the most.
– In extreme cases, 𝐺 becomes
independent on 𝑧.
– Gradient on 𝑧 diminishes.

▶ When D restarts:
– Easily finds this x ∗ .
– Pushes G to the next point 𝑥 ∗∗ .

T. Che Mode Regularized Generative Adversarial Networks ICLR 2017


D.. Derkach Generative Modeling Spring 2023 31
Vanishing Gradients

▶ For disjoint support of real and


generated data

▶ An ideal discriminator can


perfectly tell the real and generated
data apart:
𝐷 𝐺(𝑧) ≈ 0

D.. Derkach Generative Modeling Spring 2023 32


Vanishing Gradients

▶ 𝐿! = − log 𝐷(𝐺 𝑧 )
"#(%)
▶ ,"% ≈ 0 for generated 𝑥
"'! (%)
▶ ,"% ≈ 0 for generated 𝑥
▶ Generator can’t train!
▶ Need to start closer (how?)

▶ Problem is further enhanced


due to noisy estimate from
data.

D.. Derkach Generative Modeling Spring 2023 33


Summary so Far
▶ Pros:
– Can utilize power of back-prop.
– No explicit intractable integral.
– No MCMC needed.
▶ Cons:
– Unclear stopping criteria.
– No explicit representation of 𝑔! (𝑥) .
– Hard to train.
– No evaluation metric so hard to compare with other models.
– Easy to get trapped in local optima that memorize training data.
– Hard to invert generative model to get back latent 𝑧 from generated 𝑥.
D.. Derkach Generative Modeling Spring 2023 34
Fixing GANs


Diminishing Gradients
▶ We have seen already that signal data is located on manifold.
▶ GAN case is in fact more complicated, as we need a discriminator that
distinguishes two supports.
▶ This is way too easy, if supports are disjoint.

D.. Derkach Generative Modeling Spring 2023 36


Diminishing Gradients: Noisy Supports
▶ Let’s make the problem harder: introduce random noise 𝜀 ∼ 𝑁(0; 𝜎 *𝐼):
ℙ#+8(#) = 𝔼9∼'(#) ℙ8 (𝑥 − 𝑦).
▶ This will make noisy supports, that makes it difficult for discriminator.

Martin Arjovsky, Towards Principled Methods for Training Generative Adversarial Networks , ICLR17
D.. Derkach Generative Modeling Spring 2023 37
Feature Matching

Danger of overtrain to match known tests!

D.. Derkach Generative Modeling Spring 2023 38


Historical averaging

Salimans et al. Improved Techniques for Training GANs, NIPS16


D.. Derkach Generative Modeling Spring 2023 39
Look into the Future: unrolled GANs

D.. Derkach Generative Modeling Spring 2023 40


Unrolled GAN: results

Luke Metz et al Unrolled Generative Adversarial Networks ICLR 2017


D.. Derkach Generative Modeling Spring 2023 41
𝑓-GANs


Reminder: Variational Lower Bound

▶ For convex 𝑓(. ), 𝑃 and 𝑄 some distributions, we define 𝑓 -divergence:


"#
𝐷! (𝑃| 𝑄 = ∫ 𝑞 𝑥 𝑓 $ #
𝑑𝑥.

▶ This is bounded:
𝐷! (𝑃| 𝑄 ≥ max 𝔼#~' 𝑇 𝑥 − 𝔼#~( 𝑓 ∗ 𝑇 𝑥 ,
% #

𝑇 𝑥 is some random function.


▶ The tight boundary can be estimated for each 𝑓 -divergence (𝑇 ∗ (𝑥)).
*" #
▶ For JS-divergence 𝑇∗ 𝑥 = log " # +$ # , 𝑓 ∗ 𝑥 = − log 2 − exp 𝑡 .

D.. Derkach Generative Modeling Spring 2022 43


Variational Divergence Minimization
𝐷! (𝑃| 𝑄 ≥ max 𝔼#~' 𝑇 𝑥 − 𝔼#~( 𝑓 ∗ 𝑇 𝑥 ,
% #
▶ Work in GAN paradigm:
– generator 𝑥~𝑄: 𝑥 = 𝐺! 𝑧 ;
– test function T 𝑥 .

min max 𝐹 𝜔, 𝜃 = 𝔼#~' 𝑇; 𝑥 − 𝔼#~<! (2) 𝑓 ∗ 𝑇; 𝑥


, ;
▶ To have wider range of functions:
𝑇; 𝑥 = 𝑔! 𝑉; 𝑥 ,
here 𝑔! : ℝ → 𝑑𝑜𝑚!∗ is output activation function for 𝑓 -divergence used

S. Nowozin et al. f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization

D.. Derkach Generative Modeling Spring 2022 44


Output activation function
𝐹 𝜔, 𝜃 = 𝔼#~' 𝑔! (𝑉; 𝑥 ) − 𝔼#~<! (2) 𝑓 ∗ 𝑔! (𝑉; 𝑥 )
choice of output activation function is somewhat arbitrary.

D.. Derkach Generative Modeling Spring 2022 45


Example: GAN objective
𝐹 𝜔, 𝜃 = 𝔼#~' 𝑔! (𝑉; 𝑥 ) − 𝔼#~<! (2) 𝑓 ∗ 𝑔! (𝑉; 𝑥 )

𝑔<?3 = − log(1 + exp −𝑣 )


𝑓 ∗ (𝑡) = − log(1 − exp(𝑡))

𝐹 𝜔, 𝜃 = 𝔼#~' log 𝐷; (𝑥) − 𝔼#~<! 2 log 1 − 𝐷; (𝑥) ,


for the last nonlinearity in the discriminator taken as the sigmoid
𝐷; (𝑥) = 1/(1 + 𝑒 =>" # )

D.. Derkach Generative Modeling Spring 2022 46


Variational Divergence Minimization
min max 𝐹 𝜔, 𝜃 = 𝔼#~' 𝑔! (𝑉; 𝑥 ) − 𝔼#~<! (2) 𝑓 ∗ 𝑔! (𝑉; 𝑥 )
, ;

D.. Derkach Generative Modeling Spring 2022 47


𝑓-GAN results

D.. Derkach Generative Modeling Spring 2022 48


𝑓-GAN Discussion
▶ Using 𝑓 -GAN approach, one can estimate any 𝑓 -divergence.
▶ Construction has some freedom in choice of function.
▶ Using different 𝑓 -divergence leads to very different learning dynamics.
▶ Does not solve mode collapse problem.
▶ We need a better way to train GANs.

D.. Derkach Generative Modeling Spring 2022 49


Conclusions: GANs
▶ use Generator-Discriminator game to estimate the distance from generated
distribution to the true one.
▶ produce sharp images.
▶ reconstruct implicit model of target PDF.

D.. Derkach Generative Modeling Spring 2023 50

You might also like