Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Penelitian Ilmu Komputer, Sistem Embedded and Logic

p-ISSN: 2303-3304, e-ISSN: 2620-3553


Vol. 10 (1): 31 – 40 (March 2022)
https://doi.org/10.33558/piksel.v10i1.4158

Enhanced Face Image Super-Resolution Using


Generative Adversarial Network

Bagus Hardiansyah 1,*, Elvianto Dwi Hartono 1

* Corespondence Author: e-mail: bagushardiansyah@untag-sby.ac.id

1
Informatics Engineering; Universitas 17 Agustus 1945 Abstract
Surabaya; Jl. Semolowaru No. 45 Surabaya; (031)
5931800; e-mail: bagushardiansyah@untag-sby.ac.id, We proposed an Enhanced Face Image Generative
elvianto.evh@untag-sby.ac.id Adversarial Network (EFGAN). Single image super-
resolution (SISR) using a convolutional is often a problem
Submitted : 17/01/2022 in enhancing more refined texture upscaling factors. Our
Revised : 27/01/2022 approach focused on mean square error (MSE),
Accepted : 25/02/2022 validation peak-signal-to-noise ratio (PSNR),
Published : 26/03/2022
and Structural Similarity Index (SSIM). However, the
peak-signal-to-noise ratio has a high value to detail. The
generative Adversarial Network (GAN) loss function
optimizes the super-resolution (SR) model. Thus, the
generator network is developed with skip connection
architecture to improve performance feature distribution.

Keywords: single image super-resolution, Generative


Adversarial Network.

1. Introduction

Single image super-resolution (SISR) is a great task of computer vision to


generate a low-resolution (LR) into a high-resolution (HR) image. They were aiming to
recover any HR from the corresponding LR. Furthermore, Super Resolution (SR) is
typically separated generic SR and class-specific SR.

Face image SR is a domain-specific SR problem, refers to the technique of


recovering HR face images from LR face images to increase the resolution of an LR
face image of low quality as well as recover the details. In many real-world scenarios,
limited by physical imaging systems and imaging conditions, the face images are
always low quality. Thus, with a wide range of applications and notable advantages,
FSR has always been a hot topic since its birth in image processing and computer
vision.

Recently, with the rapid development of deep learning technology, various


research areas and applications, such as computer vision, robotics, big data analysis,
and pilotless automobiles, have achieved major advancements. Another field, face
PIKSEL status is accredited by the Directorate General of Research Strengthening and
Development No. 225/E/KPT/2022 with Indonesian Scientific Index (SINTA) journal-level of S3,
starting from Volume 10 (1) 2022 to Volume 14 (2) 2026.
31
Bagus Hardiansyah, Elvianto Dwi Hartono

image generation and synthesis, has also undergone significant developments. In


particular, the emergence of the generative adversarial network (GAN), which is a type
of neural network architecture for the generative model first proposed by (Goodfellow,
2014), brought about a major breakthrough in the field of face image generation. GAN
consists of two networks, i.e., the generator that creates as realistic data as possible
and the discriminator that attempts to distinguish fake samples from real ones. The
two networks compete with each other during the training process, resulting in a
generator that can produce realistic data.

Furthermore, previous work on Face image SR (Jiang et al., 2016) is important


in the SR model. The main difference technique is that face-specific priors’ information
e.g., face detection. Furthermore, realistic details are a fundamental task in
intelligence surveillance (Zhu et al., 2016). Face recognition performs HR face images
to corresponding LR. Thus, SR has become a research area in computer vision in the
few years (Junyu et al., 2016).

Therefore, there is a common problem concerning face image SR and retrieving


16-pixel upscaling factors 4x to better quality detail. Recently, development research
in these areas is particularly using a learning algorithm. Thus, GAN's adaptation for
the task single LR is considered conditional generated HR. We approach our model
EFGAN, and We experiment further to validate the Peak-Signal-to-Noise Ratio
(PSNR) and Structural Similarity Index (SSIM) to improve fundamental visual quality.

2. Research Method
Single image SR methods can be divided into three categories: interpolation,
reconstruction, and example learning. The example learning by (Hardiansyah, 2021)
have achieved explosive development. However, we focus on discussing example-
based algorithm for better performance.
Recently, the previous works of GAN are one of the most common methods for
SR (Christian et al., 2016). The discriminative network of GAN methods generates HR
images to perform sharper details than other models (Emily et al., 2015). Furthermore,
reconstructed to perform detailed images with refined texture, (Christian et al., 2016)
presented a deep residual model. This perceptual loss is adversarial based on high-
frequency element mapping of the VGG (Karen et al., 2015).

32 Piksel 10 (1): 31 - 40 (March 2022)


Enhanced Face Image Super-Resolution Using Generative Adversarial Network.

SISR aims to predict the patch mapping HR output 𝐼 𝐻𝑅 from LR input image
𝐼 𝐿𝑅 . The downsample 𝐼 𝐿𝑅 to corresponding 𝐼 𝐻𝑅 in general SR approach. (Philip et al.,
2016) conditional generative adversarial networks (Mehdi et al., 2014) approach for
various pixel matrices tasks. 𝐼 𝐿𝑅 to 𝐼 𝐻𝑅 is evaluated as a dependent improvement task,
corresponding 𝐼 𝐿𝑅 to generate 𝐼 𝐻𝑅 . Then, EFGAN is proposed to optimize the space
of the networks in our model.
2.1. Network Architecture
Architecture generator 𝐺 ∶ 𝑅𝑁𝑥 → 𝑅𝑁𝑦 is entirely convolutional to generate an
HR appropriate to LR. Furthermore, 𝑁𝑥 = 𝐻 × 𝑊 × 𝐶 is dimensions of 𝑥 define 𝐻, 𝑊, 𝐶
pixel matrices image. The dimensionality of connection features in different layers
relate the convolution size kernel of 4 × 4 layer to reduce the element mapping
dimensionality. Framework generator network G shown in figure 1 includes
downsampling and upsampling convolutional layers factor 4. Furthermore, network G
is referred to as size 178×218×3(input) → 178×218×3 (output).
Architecture discriminator D: 𝑅𝑁𝑟𝑦 → 𝑅𝑁𝑟𝑦 , where 𝑅𝑁𝑟𝑦 , the dimensionality (H ×
W × 2C), is cluster generative SR G and appropriate original SR image. Architecture
D is similar G in figure 1. Therefore, there are two essentials between the G and D,
network dimensionality to downscale and upscale layers.
2.2. Learning Loss Function
The characteristic of GAN in training the data into generator G as well as
learning the input x to generate fake G(x) and discriminator D indicates the distribution
to real or fake data. We approach the process to correspond allocation directly to the
matrix’s element. For framework, we use norm 𝐿1 to measure loss function between
generative model G(z) and related element x. Motivated in previous work by (David et
al., 2017), GAN loss function norm 𝐿1 is employed to optimize generator and
discriminator model following equation 1.

ℒ(𝐼) = |𝐼 𝐻𝑅 − 𝐺(𝐼 𝐿𝑅 )| (1)

Distribution of images approximately wise loss is conditional significant


matrices pixels. Distribution of images approximately wise loss is conditional
significant matrices pixels. Furthermore, an additional function in equation 3 defined x
as HR and z have given G to LR face images. Define y fake HR G generated related
z, and ℒ𝐷 defines element space D. Defined ℒ 𝐷𝑟 discriminator describe ℒ 𝐷𝑓 present

PIKSEL status is accredited by the Directorate General of Research Strengthening and


Development No. 225/E/KPT/2022 with Indonesian Scientific Index (SINTA) journal-level of S3,
starting from Volume 10 (1) 2022 to Volume 14 (2) 2026.
33
Bagus Hardiansyah, Elvianto Dwi Hartono

discriminator to generated G. updated minimize ℒ𝐷 and ℒ𝐺 to given input discriminator and


generator parameters θD and θG.

𝑦 = 𝐺(𝑥 𝜃𝐷 ) (2)

ℒ𝐷 = ℒ 𝐷𝑟 − ℒ𝐷𝑓 , 𝑓𝑜𝑟 𝜃𝐷

(3)
ℒ𝐺 = ℒ(𝐺(𝑧)) − 𝑥, 𝑓𝑜𝑟 𝜃𝐷

For optimization level between the generator G and discriminator D, we employ


the equilibrium algorithm from (David et al., 2017) as shown in equation 4. The
Parameters generative model optimizes high frequency to the discriminator.
Furthermore, the algorithm is essential to maintain the entire training process. Thus,
setup γ = 0.5, λ = 0.001 to experiments.

ℒ𝐷 = ℒ𝐷𝑟 − 𝑘𝑡 ℒ𝐷 𝑓

(4)
𝑘𝑡+1 = 𝑘𝑡 + 𝜆𝑘 (𝛾ℒ𝐷𝑟 − ℒ𝐺 )

Therefore, we employ (David et al., 2017) (as shown in equation 5)


convergence to our model.

ℳ𝑐 = ℒ𝐷𝑟 + |𝛾ℒ𝐷𝑟 − ℒ𝐺 | (5)

These equations have two important differences: (1) We are given an input
generator, which is an LR face image, not a random vector sample. We assume
requirement for generating an HR face to our approach contains the generative face
image, (2) We use matrices norm 𝐿1 as the pixel loss function of the generator, as
shown in equation 3.
2.3. Generative Adversarial Network (GAN)
GAN is introduced by (Goodfellow, 2014), in which the artificial algorithm is specifically
used through machine learning. This method constructs images that look original to
human vision and realistic and natural texture. Therefore, the basic concept of GAN is
to train divide two networks, i.e., the generator network produces a face image, and
the discriminator network attempts to distinguish the image generated by a generator
or from the original image to a fake image. Therefore, GAN algorithm requirements

34 Piksel 10 (1): 31 - 40 (March 2022)


Enhanced Face Image Super-Resolution Using Generative Adversarial Network.

are added to the generator network to produce an HR image, and architecture


enhanced GAN generator and discriminator network are shown in Figure 1.

Source: Research Result


Figure 1. Framework Generator and Discriminator Network Architecture of GAN
2.4. Generator Network
Architecture of generator network is the Residual Network (ResNet) (C. Ledig
et al., 2017). Such residual networks are designed of each block convolution layers
3x3 kernel size, two Batch Normalization, and ReLU activation functions. Thus, skip
connection is used in each residual block. Enhance generator network LR using the
upsampling function used in this study is the sub-pixel layer (Shi et al., 2016). A
generator network is reconstructed from the LR image into an HR image.
2.5. Discriminator Network
Architecture discriminator network functions to distinguish input images SR. We
employ the discriminator model Deep Convolutional GAN architecture introduced by
(Radford, 2016) and provide ReLU (α = 0.2). The discriminator model contains

PIKSEL status is accredited by the Directorate General of Research Strengthening and


Development No. 225/E/KPT/2022 with Indonesian Scientific Index (SINTA) journal-level of S3,
starting from Volume 10 (1) 2022 to Volume 14 (2) 2026.
35
Bagus Hardiansyah, Elvianto Dwi Hartono

convolution layers set at a 3x3 kernel size and feature 512 kernels are input two dense
blocks and a sigmoid to produce classification prediction.
3. Results and Analysis
3.1. Experiment
We trained with a learning rate parameter of 0.001 and training with CelebA
face dataset (Ziwei et al., 2015). We used NVIDIA T4 for experiment to evaluate
qualitatively and quantitatively.

Source: Research Result

Figure 2. Qualitative with CelebA Dataset (4x upscaling factor)

36 Piksel 10 (1): 31 - 40 (March 2022)


Enhanced Face Image Super-Resolution Using Generative Adversarial Network.

Datasets CelebA contains more than 200.000 images with 40 attributes. Large
pose variations and backgrounds. Thus, training our proposed model with CelebA
images original size 178 × 218.

Set up LR datasets. Downsample the HR (178 × 218) to the resolution 44 × 54


LR images. Furthermore, we constructed the HR images for the input-output pairs.
Our proposed input/output images are the same size of 178 × 218 with three color
channels.

3.2. Result

In this section, all datasets CelebA are trained and tested to generate SR. After
validation of qualitative/quantitative, we provide in our work the more details
PSNR/SSIM of generative HR. Our proposed EFGAN method has significant result
qualitative/quantitative for validated PSNR/SSIM values. Pointing out that EFGAN
perform generate best result face images (4×) regardless of face expression, pose,
and other factors.
Table 1. Quantitative Comparisons On The CelebA Dataset
Dataset Name Size Scale Training Set PNSR SSIM
000001 178x218 x4 CelebA 20.92 0.6137
000038 178x218 x4 CelebA 24.40 0.7760
025315 178x218 x4 CelebA 25.64 0.7901
133459 178x218 x4 CelebA 27.47 0.8038
188371 178x218 x4 CelebA 27.89 0.8595
Source: Research Result

4. Conclusion

This paper proposed SR method with upscaling factors 4× to generate an HR


from LR. Also, the LR image controller to generate an HR instead of random noise.
The EFGAN end-to-end architecture processing is also introduced. However,
generative is more reliable to face expression, and pose. Thus, the skip-layer
connection techniques' generator and discriminator networks was used to enhance
the training phase's convergence speed. Therefore, our benchmark has powerful
benefits in training SR models.

We used the input dataset to the EFGAN network into generative HR image
size (178 × 218) that develop an evolved network to generate an HR directly (e.g., 32
PIKSEL status is accredited by the Directorate General of Research Strengthening and
Development No. 225/E/KPT/2022 with Indonesian Scientific Index (SINTA) journal-level of S3,
starting from Volume 10 (1) 2022 to Volume 14 (2) 2026.
37
Bagus Hardiansyah, Elvianto Dwi Hartono

× 32). The result display better performance face image SR task, and our proposed
architecture face image SR showed the sharp detail characteristic corresponding to
PSNR/SSIM validation.

Author Contributions
Bagus Hardiansyah proposed the topic; Bagus Hardiansyah and Elvianto Dwi
Hartono conceived models and designed the experiments; Bagus Hardiansyah and
Elvianto Dwi Hartono conceived the optimization algorithms. Bagus Hardiansyah and
Elvianto Dwi Hartono analyzed the result.

Conflicts of Interest
The author declare no conflict of interest.

References
Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A.
Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural
information processing systems, 2014, pp. 2672–2680.

J. Jiang, J. Ma, C. Chen, X. Jiang, and Z. Wang. Noise robust face image super-
resolution through smooth sparse representation. IEEE Transactions on
Cybernetics, PP(99):1–12, 2016.

Junyu Wu, Shengyong Ding, Wei Xu, and Hongyang Chao. Deep joint face
hallucination and recognition. arXiv preprint arXiv:1611.08091, 2016.

Shizhan Zhu, Sifei Liu, Chen Change Loy, and Xiaoou Tang. Deep cascaded bi-
network for face hallucination. In European Conference on Computer Vision,
pages 614–630. Springer, 2016.

David Berthelot, Tom Schumm, and Luke Metz. BEGAN: boundary equilibrium
generative adversarial networks. arXiv preprint arXiv:1703.10717, 2017

Hardiansyah, B., Lu, Y. Single image super-resolution via multiple linear mapping
anchored neighborhood regression. Multimed Tools Appl 80, 28713–28730
(2021).

38 Piksel 10 (1): 31 - 40 (March 2022)


Enhanced Face Image Super-Resolution Using Generative Adversarial Network.

Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew P. Aitken,
Alykhan Tejani, Johannes Totz, Zehan Wang, and Wenzhe Shi. Photo-realistic
single image super-resolution using a generative adversarial network. arXiv
preprint arXiv:1609.04802, 2016.

Emily L Denton, Soumith Chintala, Rob Fergus, et al. Deep generative image models
using a laplacian pyramid of adversarial networks. In Advances in neural
information processing systems, pages 1486–1494, 2015.

Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-
scale image recognition. International Conference on Learning
Representations(ICLR), 2015.

Junjun Jiang, Chen Chen, Jiayi Ma, Zheng Wang, Zhongyuan Wang, and Ruimin Hu.
Srlsp: A face image super-resolution algorithm using smooth regression with
local structure prior. IEEE Transactions on Multimedia, 19(1):27–40, 2017.

Xin Yu and Fatih Porikli. Ultra-Resolving Face Images by Discriminative Generative


Networks, pages 318– 333. Springer International Publishing, Cham, 2016.

Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. Image-to-image
translation with conditional adversarial networks. CoRR, abs/1611.07004,
2016.

Mehdi Mirza and Simon Osindero. Conditional generative adversarial nets. arXiv
preprint arXiv:1411.1784, 2014.

Martin Arjovsky, Soumith Chintala, and Lon Bottou. Wasserstein gan. arXiv preprint
arXiv:1701.07875, 2017.

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes
in the wild. In Proceedings of the IEEE International Conference on Computer
Vision, pages 3730–3738, 2015

C. Ledig et al., “Photo-realistic single image super-resolution using a generative


adversarial network,” Proc. - 30th IEEE Conf. Comput. Vis. Pattern Recognition,
CVPR 2017, vol. 2017-Janua, pp. 105–114, 2017, doi: 10.1109/CVPR.2017.19.

PIKSEL status is accredited by the Directorate General of Research Strengthening and


Development No. 225/E/KPT/2022 with Indonesian Scientific Index (SINTA) journal-level of S3,
starting from Volume 10 (1) 2022 to Volume 14 (2) 2026.
39
Bagus Hardiansyah, Elvianto Dwi Hartono

W. Shi et al., “Real-Time Single Image and Video Super-Resolution Using an Efficient
Sub-Pixel Convolutional Neural Network,” pp. 1–10, 2016

40 Piksel 10 (1): 31 - 40 (March 2022)

You might also like