Professional Documents
Culture Documents
Lecture 5 - Adversarial Networks and Variants
Lecture 5 - Adversarial Networks and Variants
Lecture 5 - Adversarial Networks and Variants
f-
Background Reading : GANS ,
GAN ,
,
conditional mutual
Information .
Dual optimization
Based the
principles adversarial
learning
basic
on
of .
we shall
study a
few of them namely ,
Info GAN ,
BIGAN ,
incompressible noise
,
CD I
#
•
Pec:)
1=1
latent code C
may
be
ignored E there is no
way
to
enforce
that the
generated distribution should utilize
both 2 & C.
Thus ,
Info GAN proposes an
information theoretic
regularization
the standard GAN
objective
over as
follows :
L±npaAN = minimax
O w
Flo ,w) -
II. ( c;
Golz > G)
Here Icc ; )
Go is the mutual
information between the
mutual Go)
In
practice , the
information term Idc ,
is not
possible to be
directly optimized as it requires access to
is
optimized instead as
follows :
Let q( clx) be the variational approximation to
p (CIA)
I ( c ; G) =
It (c) -
H ( 4G )
=
Ea [ e-
.
↳
8
Plc'l✗) ] -1 It (c)
cinpcc ,,
→ a ,×,
The above term however needs samples from pcclx)
the trick
following .
Lemma 5.1
from in
the
info GAN paper .
The Expectation in the RHS
of Eq . I can be computed
In
practice ,
the distribution
of
is approximated using
another
networks { Lin
paan is
optimized .
Post
training ,
it is shown that variation in a
single
component of C
, corresponds to variation in a
single semantic
data space
the
generated
.
in
factor
Bidirectional GANS ( BIG AND
∅
Let peczlx) denote the density induced
by the Encoder
↳iaan .
= min Max F (O ,
W
, )
0,10 w
[E
bgtk-D-E.IE?y;i?!!I?..fYD-
Flo , wit ) =
#
÷;:EÉ
+*
It is shown that the optimal point is reached when
where
Pex =
Paz
PEX =L ,
B- [ PE ( 2K) dzdx
Paz
=
{ Pa / ✗
PALME) dxdz
the distribution is
setting
where in source
In addition
variable .
, incorporate a
two-way
enforces transitivity
loss that
consistency .
Given a
pair of domains XGY with Px Er Py as densities ,
≥
Ax
> Cycle GAN has two
% Py mapping functions
^
cry { Gx : ✗→
Y , Gy : Y→ ×
which
Cycle GAN are
simultaneously
-
along
discriminator
corresponding
functions Dy Ee Dx ,
respectively
In addition to the usual GAN loss a
cycle Consistency
-
loss introduced
for enforcing transitivity
is .
Lucie cons .
= E
x-P ×
[ 11 Gyltrxlxi) -
✗ 11 ]
,
+ ☒
Yup,
[ 116×(441-411)
Therefore the
final objective function is as
follows :
Lcyueaan E
log [ Dy°G✗( )] -1¥ tog Dy th
-1¥ hog G) 1-
= 1- × 1- DIG
>
P☒
)
☒
Px wgD✗(✗ +
Lyde cons .
Style -
GAN
disentanglement in the
generated space of a GAN .
Proposal : Learn
multiple latent vectors
corresponding
to
different possible styles in the
feature
space the Generator
of .
Given a latent code 2
,
first a latent
transformation
is learned via a network f : 2- → W
EW
Subsequently ,
the w vector is
fed separately
to each
of
the
feature maps
in the
generator via
follows :
✗i tlxi )
Ada IN (xi y) y,
-
,
= Y, +
ocxi )
where Xi is the ith
feature map of the generator ,
with
4 =
( % %)
,
is the output of an
affine transformation
layer with was the
input .
modification generator
Note that in the
style GAN
is a
GANS the
in naive
formulation are known to be
very
This the
unstable to train .
is ascribed to non -
alignment
the that the supports the
of manifolds forms of
distributions the
between which
divergence is calculated .
It shown that distributions whose
is
for
a
pair of
do not dont
supports full dimension E
perfectly align .
,
,
softer
'
for
-
P
The Wasserstein 's distance between GOV is
defined as
WCPIIQ ) =
int E
(x ,y)n ,
[11×-411]
→ €1T (P Q)
,
Here the is all joint distributions
,
infimum over set
of
11-(1%01) whose
marginals are
respectively REQ .
specifies
work
'
of
which the
thought of as a
transport plan out
of
the IT
optimal is
sought out in EMD via
infimum over .
For instance
'
EMD shown to
good properties
'
is
posess some .
suppose p ,
be a
density over X .
2 be a RV over Z .
Let : ZX Rd → ✗ be a
parametric function [ neural network
]
g0
with distribution Then it can shown that
a
% .
,
if
go
is continuous in 0 ,
so is W( Pr , Po) ,
unlike JSDCP .
> Po)
and KL ¢12 ,
Po) .
the the
definition of WD is intractable ,
albeit a dual
used the
definition may
be to
optimize WD in practice .
NGAN :
the
for WD as
follows :
w
(17×112) =
Sup
111-11 , ≤I
E-
✗-
Px
(1-6)/-4=(1-1×1)
xnpo
The f- ☒
supremum is over all the 1-
Lipschitz functions : ✗ → .
the
Typically , function 1- is
approximated by a neural
The distribution Po is
approximated sample
using a _ or
1W ✗ nP*
@ Zupz
Ilfw / tell
In neural which made
practice , fw is a network is
weight regularization .
References
1. https://arxiv.org/abs/1606.03657
2. https://arxiv.org/abs/1605.09782
3. https://arxiv.org/abs/1703.10593
4. https://arxiv.org/abs/1701.07875
5. https://arxiv.org/abs/1812.04948
6. https://arxiv.org/abs/1701.04862
7. https://vincentherrmann.github.io/blog/wasserstein