Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

Sketch-Guided

Text-to-Image Generation
Final Report - Jul 27, by Elliott Wu
Mentor: Hyungjoo Cho
Advisor: Yongyi Lu, Yu-Wing Tai,
Chi-Keung Tang
sText2Image
male, long face, smile with mouth closed, double
eyelids, five o'clock shadow…
sText2Image
male, long face, smile with mouth closed, double
eyelids, five o'clock shadow…
sText2Image

male, long face,


smile with
mouth closed,
double eyelids,
five o'clock
shadow…

TEXT SKETCH IMAGE


Text2Image
● Generative Adversarial Text-to-Image Synthesis (Reed et al, ICML 2016)
● Stack-GAN: Text to Photo-realistic Image Synthesis (Zhang et al, arxiv)
● …

* retrieved from Stack-GAN


Sketch?

* Jun-yan Zhu, Generative Visual Manipulation on the Natural Image


Manifold, ECCV 2016
Sketch?

* collected from volunteers


Sketch?

? ? ?
Sketch?
Joint Representation
male, long face, male, long face,
smile with mouth smile with mouth
TEXT closed, double closed, double
TEXT
eyelids, five eyelids, five
o'clock shadow… o'clock shadow…

SKETCH Joint SKETCH


Space

IMAGE IMAGE
Network Architecture - Training
18
replicate 64
Generator: t 256
128

64 18 512
linear
100

z 4x8 4x8
8x16
16x32 G(z, t)
32x64
Discriminator: 256
128 18 64
512

real

512
fake/wrong 4x8
8x16 y
16x32
32x64
replicate
t
18
Network Architecture - Testing
text

z
Input:
Generator G(z, t)
sketch text backprop

Output: Lcontextual :

Lperceptual :

Discriminator
text
Data Preparation - Image
Face (CelebA) Bird (CUB) Flower (Oxford)
202k 11k 8k
Data Preparation - Image
Face (CelebA) Bird (CUB) Flower (Oxford)
40 attributes: For both bird and flower datasets, 10 captions per image
1 : "5_o_Clock_Shadow" provided by char-CNN-RNN (Reed et al, CVPR 2016):
2 : "Big_Lips"
3 : "Big_Nose"
4 : "Chubby"
5 : "Double_Chin"
6 : "Eyeglasses"
7 : "Goatee"
8 : "Heavy_Makeup"
9 : "High_Cheekbones"
attribute vector
10 : "Male" OR
11 : Mouth_Slightly_Open" text embedding
12 : "Mustache"
...
Data Preparation - Synthesized Sketch

Edge detection:
● XDog (Winnemöller et al, Computer & Graphics 2012)
● Photoshop photocopy effect

Simplification (synthesized sketches):


● Sketch simplification (Simo-Serra & Iizuka, SIGGRAPH 2016)

Image Edge Simplified


Data Preparation - Freehand Sketch

* collected from volunteers


Experiments - Face

male, long face,


smile with
mouth closed,
double eyelids,
five o'clock
shadow…

ATTRIBUTES SKETCH IMAGE


Experiments - Failures
Experiments - Failures
Experiments - Finally…
Experiments - Face

1 Attributes Match Sketch

2 Attributes Mismatch Sketch

3 Freehand Sketch
Experiments - Match (Mustache)
Experiments - Match (Eyeglasses)
Experiments - Match (Lipstick)
Experiments - Mismatch

Female,
Heavy_Makeup,
Wearing_Lipstick

Female,
Heavy_Makeup,
Smiling,
Wearing_Lipstick
Experiments - Mismatch

Male, Chubby,
Double_Chin,
High_Cheekbones,
Mouth_Open

Male
Experiments - Mismatch
Female,
High_Cheekbones,
Smiling,
Wearing_Lipstick,
No_Eyeglasses

Female,
Heavy_Makeup,
High_Cheekbones,
Pointy_Nose, Smiling,
Wearing_Lipstick,
No_Eyeglasses
Experiments - Freehand
Experiments - Freehand
Experiments - Failure Cases (Eyeglasses)
Timeline Sept - Oct
Before Mar Jul
Refine results and paper
Ideation write-up
Extension on Sketch-
Guided Text-to-Image

Nov
Submit to CVPR
Mar Aug
Submitted to ICCV on Run experiments on bird
Sketch-to-Image and flower datasets
THANK YOU!
Shangzhe (Elliott) Wu

Email: swuai@ust.hk

GitHub: elliottwu

You might also like