Lecture 5 - Adversarial Networks and Variants

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

Adversarial Networks G Variants

f-
Background Reading : GANS ,
GAN ,
,
conditional mutual
Information .

Lipschitz functions primal ,


-

Dual optimization

Based the
principles adversarial
learning
basic
on
of .

multiple improvisations have been proposed in the literature

we shall
study a
few of them namely ,
Info GAN ,
BIGAN ,

cycle GAN , Style GANG WGAN .


Info GAN

Objective : To learn a GAN with a latent space


that
is
semantically disentangled .

Proposal : The input noise vector to the


generator is

decomposed into two parts : 2 -

incompressible noise

C- structured latent code .

denoted variables with


C is
by L latent Ci , ↳ ,
.
.
.
.ec ,
distribution by
a
factor given
P( ci.cz ,
. . .

,
CD I
#

Pec:)
1=1

Since the Generator network takes both 2 Eec as


inputs ,

denoted G (2) c) GAN , the


it is
by 0
.
In a standard

latent code C
may
be
ignored E there is no
way

to
enforce
that the
generated distribution should utilize
both 2 & C.
Thus ,
Info GAN proposes an
information theoretic
regularization
the standard GAN
objective
over as
follows :

L±npaAN = minimax
O w
Flo ,w) -
II. ( c;
Golz > G)

Here Icc ; )
Go is the mutual
information between the

latent codes { the Generator distribution .


Variational Mutual
Information Maximization .

mutual Go)
In
practice , the
information term Idc ,
is not

possible to be
directly optimized as it requires access to

the latent posterior pcclx≥ .


Thus , a variational lower bound

is
optimized instead as
follows :
Let q( clx) be the variational approximation to
p (CIA)

I ( c ; G) =
It (c) -
H ( 4G )

=
Ea [ e-
.

8
Plc'l✗) ] -1 It (c)
cinpcc ,,

¥14 [ Eup ( it ) + DKL (Pkk) 110114# + It (c)


, log
=
×
or
,,,

×Ea [E. log 9101×7] -1 (c) ÷ Deal ≥


≥ It 0 .

→ a ,×,
The above term however needs samples from pcclx)

compute the which avoided


to inner
expectation ,
is
using

the trick
following .

E- E 964×1 E- ( Ctx) (1)


by tog
= -

✗na duplex, en Pas ,


✗ na

Lemma 5.1
from in

the
info GAN paper .
The Expectation in the RHS
of Eq . I can be computed

Monte Carlo estimates


using .

In
practice ,
the distribution
of
is approximated using
another

neural network in addition to the Generator E Discriminator

networks { Lin
paan is
optimized .

Post
training ,
it is shown that variation in a
single
component of C
, corresponds to variation in a
single semantic

data space
the
generated
.

in
factor
Bidirectional GANS ( BIG AND

Objective : Standard GANS do not have a means to

learn the inverse the data space


mapping from
to the latent space . The objective of a BIGAN

is to learn both the latent space


mappings from
data
to space & vice-versa
simultaneously .

Proposal : In addition to the standard Generator , an

Encoder network , E : ✗ → Z is trained .


Let peczlx) denote the density induced
by the Encoder

The standard discriminator also


network .
is
modified
to predict p (71×12) where 7=1 npx Gcx)
if ✗ E
if
o ✗ -
.

With this the


,
BIGAN
Optimizes following objective :

↳iaan .
= min Max F (O ,
W
, )
0,10 w

[E
bgtk-D-E.IE?y;i?!!I?..fYD-
Flo , wit ) =
#

÷;:EÉ
+*
It is shown that the optimal point is reached when

where
Pex =
Paz

PEX =L ,
B- [ PE ( 2K) dzdx

Paz
=

{ Pa / ✗
PALME) dxdz

BIGAN the JS divergence between the


Optimizes for joint
distributions over the data E the latent spaces .
Cycle -
GAN

Objective : To learn to translate between the


distributions
samples of a
pair of _

Proposal : use adversarial


learning
in a conditional

the distribution is
setting
where in source

used GAN instead


as
input for a
of the Noise

In addition
variable .
, incorporate a
two-way
enforces transitivity
loss that
consistency .
Given a
pair of domains XGY with Px Er Py as densities ,


Ax
> Cycle GAN has two

% Py mapping functions
^
cry { Gx : ✗→
Y , Gy : Y→ ×

which
Cycle GAN are
simultaneously
-

learned with the


,

along
discriminator
corresponding
functions Dy Ee Dx ,
respectively
In addition to the usual GAN loss a
cycle Consistency
-

loss introduced
for enforcing transitivity
is .

Lucie cons .
= E
x-P ×
[ 11 Gyltrxlxi) -

✗ 11 ]
,
+ ☒
Yup,
[ 116×(441-411)

Therefore the
final objective function is as
follows :

Lcyueaan E
log [ Dy°G✗( )] -1¥ tog Dy th
-1¥ hog G) 1-
= 1- × 1- DIG
>
P☒

)

Px wgD✗(✗ +
Lyde cons .
Style -
GAN

Objective : To learn unsupervised attribute separation or

disentanglement in the
generated space of a GAN .

Proposal : Learn
multiple latent vectors
corresponding
to
different possible styles in the
feature
space the Generator
of .
Given a latent code 2
,
first a latent
transformation
is learned via a network f : 2- → W

EW
Subsequently ,
the w vector is
fed separately
to each
of
the
feature maps
in the
generator via

adaptive instance ( Ada IN) layer


an
normalization as

follows :

✗i tlxi )
Ada IN (xi y) y,
-

,
= Y, +
ocxi )
where Xi is the ith
feature map of the generator ,
with

ocxi) the statistics


µC✗i) Er
being corresponding .

4 =
( % %)
,
is the output of an
affine transformation
layer with was the
input .

modification generator
Note that in the
style GAN
is a

architecture but does not involve


any
loss / metric modification .
Twas se-rste.in GAN

GANS the
in naive
formulation are known to be
very

This the
unstable to train .
is ascribed to non -

alignment
the that the supports the
of manifolds forms of

distributions the
between which
divergence is calculated .
It shown that distributions whose
is
for
a
pair of

do not dont
supports full dimension E
perfectly align .

the usual f- divergences such as JSD


forward / reverse KLD
-

,
,

will be maxed out with the existence discrimi


of a
perfect
-

nator This between


calls out
divergence metric
'

softer
'

for
-

distributions does not when the


that max out
manifolds of
the do
supports not
perfectly align .

Earth Movers or Wasserstein's distance :

Let P GOL denote two distributions over a


space ✗ .

P
The Wasserstein 's distance between GOV is
defined as

WCPIIQ ) =
int E
(x ,y)n ,
[11×-411]
→ €1T (P Q)
,
Here the is all joint distributions
,
infimum over set
of

11-(1%01) whose
marginals are
respectively REQ .

0( ✗ iy) is the amount of


"
mass
"
that is
transported from

to Y in order to P to 0L , which when



transform

multiplied with 11×-411 the amount


'

specifies
work
'

of

done in the said transportation .


Thus
" "

EMD is the the


,
or WD cost incurred
by
optimal transport plan Note that 0 Can be
every
.

which the
thought of as a
transport plan out
of

the IT
optimal is
sought out in EMD via
infimum over .

For instance
'

EMD shown to
good properties
'

is
posess some .

suppose p ,
be a
density over X .
2 be a RV over Z .

Let : ZX Rd → ✗ be a
parametric function [ neural network
]
g0
with distribution Then it can shown that
a
% .

,
if

go
is continuous in 0 ,
so is W( Pr , Po) ,
unlike JSDCP .
> Po)

and KL ¢12 ,
Po) .

Therefore it is desirable to use the

Wasserstein 's distance than


to learn
generative samplers
t However the
any of
the
divergence metric
infimum
in
-

the the
definition of WD is intractable ,
albeit a dual

used the
definition may
be to
optimize WD in practice .
NGAN :

The duality dual


kantrovich Rubinstein
provides a
definition
-

the
for WD as
follows :

w
(17×112) =
Sup
111-11 , ≤I
E-
✗-
Px
(1-6)/-4=(1-1×1)
xnpo

The f- ☒
supremum is over all the 1-
Lipschitz functions : ✗ → .
the
Typically , function 1- is
approximated by a neural

network called the critic network Er is


replaced
supremum

by the maximum over the


parameters of the neural network .

The distribution Po is
approximated sample
using a _ or

generator neural network


go
(2) ,
2nA /0 ,
I] .
With these ,

the objective for a wart N to


optimize will be as
follows :
Lwaan =
min Max E- 1×7 ] E
[to ( go
-

1W ✗ nP*
@ Zupz

Ilfw / tell
In neural which made
practice , fw is a network is

Lipschitz by weight clipping after every gradient update or

weight regularization .

References

1. https://arxiv.org/abs/1606.03657

2. https://arxiv.org/abs/1605.09782

3. https://arxiv.org/abs/1703.10593

4. https://arxiv.org/abs/1701.07875

5. https://arxiv.org/abs/1812.04948

6. https://arxiv.org/abs/1701.04862

7. https://vincentherrmann.github.io/blog/wasserstein

You might also like