Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

DEEP LEARNING

– GAN

林伯慎 Prof. Bor-Shen Lin


bslin@cs.ntust.edu.tw
VECTOR SPACE
DECOMPOSITION AND SYNTHESIS
decomposition synthesis
𝒄
𝑡 Φ
𝒙 Φ ෥
𝒙

𝑛
 Assume Φ = 𝝓𝑖 𝑖=1 is an orthonormal set, 𝒙 is a vector.
 Decomposition : 𝑐𝑖 = 𝒙, 𝝓𝑖 for 𝑖 = 1,2, … , 𝑛.
 𝑐𝑖 the amount of projection of x in the direction of 𝝓𝑖 .
 𝒄 = Φ𝑡 𝒙 is the decomposition of vector x.
 Synthesis : 𝒙
෥ = σ𝑛𝑖=1 𝑐𝑖 𝝓𝑖 = ΦΦ𝑡 𝒙.
 𝒙 is the reconstruction of 𝒙 with reconstruction loss 𝐿2 (𝒙, ෥
෥ 𝒙).
 If Φ is a basis, 𝐿2 𝒙, ෥
𝒙 = 0.
ANALYSIS
 If Φ = 𝝓𝑖 𝑛𝑖=1 is a orthonormal vectors in a vector
space, and 𝒙 is a vector in the vector space.
 𝑐𝑖 = 𝒙, 𝝓𝑖 for 𝑖 = 1,2, … , 𝑛.
 𝑐𝑖 is the projection of vector x on the direction of 𝝓𝑖 .
 Decomposition of the vector 𝐱 in the subspace
𝒄 = Φ𝑡 𝒙 = 𝝓1 𝝓2 … 𝝓𝑛 𝑡 𝒙
𝑐1 = 𝝓1 𝑡 𝒙
𝑐1 𝝓1 𝑡 𝝓1 𝑡 𝒙 𝝓1
⋮ = ⋮ 𝒙= ⋮
𝒄
𝑐𝑛 𝝓𝑛 𝑡 𝝓𝑛 𝑡 𝒙 𝒙

𝑐𝑖 = 𝝓𝑖 𝑡 𝒙 = 𝒙, 𝝓𝑖 ⋮
 Φ as an analysis network
Φ𝑡
 𝝓𝑖 connection weights of neuron i
SYNTHESIS
𝑐1
 ෥ = σ𝑛𝑖=1 𝑐𝑖 𝝓𝑖 = 𝝓1 𝝓2 … 𝝓𝑛
𝒙 ⋮
𝑐𝑛
𝑐1 𝝓1
𝑡
= Φ𝒄 = ΦΦ 𝒙.
𝝓1
 Reconstruction of the vector 𝒙 in
linear subspace spanned by Φ. 𝒄 ෥
𝒙
 Reconstruction error: 𝐿2 (𝒙, 𝒙
෥). ⋮ ⋮
 When Φ is a basis of the vector
space, 𝐿2 𝒙, 𝒙
෥ = 0.
Φ
 𝒄 is a representation of 𝒙.
analysis synthesis
𝒙 𝒄 ෥
𝒙
Φ𝑡 Φ
EXAMPLE: DFT / IDFT

 Discrete Fourier transform


 Decomposition of discrete-time signal 𝑥 𝑛 of length N on a
subspace with basis Φ = 𝑒 𝑗𝜔𝑛 .
 FT: 𝑋(𝜔) = 𝑥 𝑛 , 𝑒 𝑗𝜔𝑛 continuous spectrum
2𝜋𝑘𝑛
 DFT: 𝑋 𝑘 = 𝑥 𝑛 , 𝑒𝑗 𝑁 discrete spectrum
 Ingredients of 𝑥 𝑛 at different frequency (𝜔)
 Inverse Discrete Fourier transform
 Reconstruction of signal using features and basis Φ.
1
 IFT: 𝑥෤ 𝑛 = ‫׬‬ 𝑋(𝜔)𝑒 𝑗𝜔𝑛 𝑑𝜔
2𝜋
2𝜋𝑘𝑛
1 𝑁−1 𝑗 𝑁
 IDFT: 𝑥෤ 𝑛 = σ 𝑋[𝑘]𝑒
𝑁 𝑘=0
DECOMPOSITION
 Car
 A car → 1 handler,4 wheels,…
 Hamberger
 A hamberger → water, starch, mineral, … 𝒙
 3D vector projected onto 2D plane 𝒆
 Error vector perpendicular to the plane

𝒙
 Projection is the reconstruction
Φ
 Fourier analysis
 Decomposing the signal with a set of cosine functions
 「Fourier transform」decomposition of signal
 「Inverse Fourier transform」reconstruction of signal
AUTO-ENCODER
Analysis Synthesis
𝑥 𝑧 𝑥෤
Encoder Decoder + 𝐿2 (𝑥 − 𝑥)

Discriminator Generator

𝑥 𝑧 𝑥෤
D G + 𝐿2 (𝑥 − 𝑥)

 Self estimate of a vector to minimize 𝐿2 (𝑥 − 𝑥)



 D/G could be FNN, CNN/DCNN, RNN or others
 Representation learning (unsupervised)
 𝑧 is the feature of 𝑥
GAN
(GENERATIVE ADVERSARIAL NETWORK)

Generator
(synthesis)
noise
Xfake
z G 𝐷(𝑋)
D

Xreal
Real Discriminator
(analysis)
DISCRIMINATOR

 Binary Classifier
 Tell if an object is of
a specific type or not
 Positive/negative samples
 e.g. CNN

 Example: Face detection


 Positives: any face photos
𝐷(𝑋) ∈ (0,1)
 Negatives: any non-face photos 𝑋 D

label Cross
0/1 Entropy
FNN GENERATOR

784 28 x 28

noise X



z 



map X[i][j] = Y[i*28 +j]

Y
Fully connected
DCNN GENERATOR

16384 8x8x256
z G Xfake
DCNN
up pooling
Layer Operation Input Output
z 16x16x256


Fully Connected


100 100 16,384


16,384 x 100
128@3x3x25 Up pooling+
8x8x256 16x16x128
6 Conv 128@3x3x256
convolution
Up pooling+
16x16x128 32x32x64
Conv 64@3x3x128
16x16x128
Up pooling+
32x32x64 64x64x3
Conv 3@3x3x64
TRAINING OF GAN
GAN Learning

1. 𝑋𝑟𝑒𝑎𝑙 : its goal is to be accepted by


noise
Xfake D when learning D (gold as 1)
z G 𝐷(𝑋) max log(𝐷(𝑋𝑟𝑒𝑎𝑙 )) .
𝐷
D 2. 𝑋𝑓𝑎𝑘𝑒 : : its goal is to be rejected by
Xreal D when learnin D (gold as 0)
Real photos max log 1 − 𝐷(𝑋𝑓𝑎𝑘𝑒 ) .
𝐷
3. 𝑋𝑓𝑎𝑘𝑒 : its goal is to be pretend to
be real and accepted by D, so Dset
gold as 1 to generate gradient for
G to learn G (D is NOT updated)
max 𝐷(𝐺(𝑧)).
𝐺
HOW DOES GAN WORK?

+
Xreal
+ +
+
border +

- Xfake

- border

Xfake -

attackers Xfake
DISCUSSIONS
 Discriminator is a binary classifier
with positive samples ONLY.
Negative samples are produced by
Generator. defenders
Xreal +
 If Generator is not good enough, +
+ +

 Generated Xfake are too far away from +

Xreal, which makes the decision border -


boundary lousy.
 You cannot train a troop with weak border
imaginary enemies.. -
-
attackers
 When Generator becomes tough, Xfake
 Generated samples come closer to the
positive samples, and the decision
boundary shrink backward towards
the positive samples.
 Train Olympic athletics in real games.
GOALS OF GAN

 May be train discriminator(D) or generator(G).


 When the goal is to train the discriminator
 It means it is possible to train discriminator with GAN when
only positive samples are available.
 Make use of generator to produce more negative samples so as
to better train discriminator
 When the goal is to train the generator
 It is possible to generate something similar to the positive
samples (reals) but with variation(through using noise z)
 It is not expected to generate exact the same things
 mode collapse
→ when changing z, no difference (loss allows M-to-1)
→ cannot control the characteristics of the generated output
CONDITIONAL GAN (C-GAN)

z
Xfake
G
(0,1)
c D
Condition: 7 Real
photos
Xreal

• Training inputs: image+condition Implement (FC)


• Use c to control condition and z to produce
Cited from C-GAN by M Mirza
variation
• Conditions: label, image, or text
C-GAN EXAMPLE- MNIST

Change z

noise
z
G
c
Change
c condition 2

 Label as condition
Cited from C-GAN by M Mirza
C-GAN EXAMPLE – AUTO TAGGING

Xfake
z G

c D

Real tags rasberry


Xreal

z
creek,
lake,
c G waters,
river

Cited from C-GAN by M Mirza


C-GAN FOR IMAGE-TO-IMAGE TRANSLATION

condition

condition

 Cited from Image-to-Image Translation with


Conditional Adversarial Networks

 D使用PatchGAN: 判斷任意NxN的patch為real/fake
 減小Xreal空間,且有更多正樣本
DOMAIN TRANSFORMATION

 Auto-Encoder
 Variational Auto-Encoder (VAE)

 GAN/cGAN Transformer

 Cycle Consistent GAN

 Star GAN
AUTO-ENCODER, AE (TRANSFORMATION)

Transformation  Encoder-decoder
𝑋 𝑧  Unet/ResNet
D G
 Learn transformation
𝑌෨
 Need paired data
−𝐿𝑝 (𝑌෨ − 𝑌) 𝑋𝑖 , 𝑌𝑖
𝑌  min 𝐿1 (𝑌 − 𝑌)

 Example
 Gray-to-color
VARIATIONAL AUTO-ENCODER

𝝁 𝒛
+ G
𝑋
D 𝑌෨
×
𝝈 −𝐿𝑝 (𝑌෨ − 𝑌)
𝑌
𝑁(0,1) ~

 Encoder output:mean 𝝁 and stddev 𝝈


 𝑧𝑖 = 𝜇𝑖 + 𝑛𝑖 𝜎𝑖 , 𝑛𝑖 ~𝑁(0,1)
 record 𝑛𝑖 , update 𝜇𝑖 and 𝜎𝑖
 Add uncertainty to G: due to 𝑛𝑖
GAN / CGAN
 GAN
 Do not need paired
𝑋 data,
D G 𝑌෨
 X = 𝑋𝑖 , Y = 𝑌𝑗
D  Not easy to converge
Real well
GAN photos
𝑌  可加入L1 loss if paired
data available

𝑋
D G 𝑌෨
 cGAN (conditional)
 Need paired data
D  T= 𝑋𝑖 , 𝑌𝑖
Conditional Real
 Could add L1 loss
photos
GAN 𝑌
CYCLE GAN

𝑌𝑋 𝑋 𝑌𝑓𝑎𝑘𝑒 = 𝐹(𝑋)
𝑋 X-
X- Domain F Domain
F

𝑋෨

𝐿CYC (𝑋 − 𝑋) + G Y-
𝑌𝑟𝑒𝑎𝑙
− DY
Domain 𝐿𝐺𝐴𝑁 (𝐷𝑌 , 𝐹)

D G 0/1

 X-domain和Y-domain: are not required to be paired


 F for X → Y, G for Y → X
 2 cycle losses: 𝐿CYC (𝑋, 𝑋)
෨ and 𝐿CY𝐶 (𝑌, 𝑌)

 Transformed as fake data, Original as real data
 2 GAN losses: 𝐿𝐺𝐴𝑁 (𝐷𝑋 , 𝐺) and 𝐿𝐺𝐴𝑁 (𝐷𝑌 , 𝐹)
 Opt. for multiple networks (F, G, DX, DY) with multiple objectives.
CYCLE GAN - EXAMPLE

• Cited from Learning to Discover Cross-Domain Relations with Generative


Adversarial Networks
CYCLE GAN - EXAMPLE

• Cited from Unpaired Image-to-Image Translation using Cycle-


Consistent Adversarial Networks
EXAMPLES:
Open
灰階 CV Photoshop

 Cited from Daiva’s master thesis


DISCUSSIONS ON CYCLE-GAN

 To train the transformer instead of the generator


 Domain transformation
 black hair to blond hair, horse to zebra
 Without requiring pair data
 Compare with transformer (requiring pair data)
 Complicated and time consuming
 Joint optimization of multiple networks with multiple objectives.
 Reconstruction loss may help to improve the quality (peek the
ground truth)
 U-net or residual net used to accelerate the convergence
 Inconvenient for transforming among multiple attributes
STARGAN

 If using CycleGAN
 Multiple transformer
 A lot of computations
 Not flexible
STARGAN

𝑋
D G
𝑐: 𝑌𝑓𝑎𝑘𝑒
Black
hair D:
D real/fake
𝑌𝑟𝑒𝑎𝑙
Y domain
𝐶𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑟:
Dy Is black hair
෩ or not
X
.

𝐿𝑟𝑒𝑐 (𝑋 − 𝑋) G D
.
.
𝑐: Blond hair
STARGAN EXAMPLE

 Cited from StarGAN: Unified Generative Adversarial Networks for Multi-


domain Image-to-Image Translation

You might also like