Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Journal of King Saud University – Computer and Information Sciences 34 (2022) 7236–7246

Contents lists available at ScienceDirect

Journal of King Saud University –


Computer and Information Sciences
journal homepage: www.sciencedirect.com

Enhanced IPCGAN-Alexnet model for new face image generating on age


target
Hady Pranoto a,b,⇑, Yaya Heryadi a, Harco Leslie Hendric Spits Warnars a, Widodo Budiharto b
a
Computer Science Department, BINUS Graduate Program-Doctor of Computer Science, Bina Nusantara University, Indonesia
b
Computer Science Department, School of Computer Science, Bina Nusantara University, Jakarta Indonesia

a r t i c l e i n f o a b s t r a c t

Article history: Cross aging face recognition ability will decrease to recognize someone’s face after a certain time. Adding
Received 13 April 2021 synthetics face images at a certain age generated from face aging architecture is one way to increase the
Revised 10 August 2021 performance of cross aging face recognition. A synthetics face image can create use the Generative
Accepted 2 September 2021
Adversarial Network-based architecture. The current Generative Adversarial Network-based in face aging
Available online 11 September 2021
still needs high computation to create a model. Based on that reason we proposed a new optimal variant
of Identity Preserving Conditional Generative Adversarial Network (IPCGAN), to generate a synthetic face
Keywords:
image at certain age groups. In the proposed architecture, change made at the structure in the generator
Face aging
Aging pattern transfer
module, age classification module, and change the objective function to increasing the accuracy perfor-
Generative adversarial network mance when generates a realistic synthetic face image in certain age groups and also speed up the train-
ing time. Modification in the age classifier at the proposed network forces our architecture to generate
better synthetics face in certain age groups. Evaluation using Facenet, and Age prediction shows our
method accuracy has 4.2% better results in k-NN classification, 3.6% better accuracy results in SVM clas-
sification, 8.6% better accuracy result in age verification, and 4.5% fewer accuracy results in age
prediction.
Ó 2021 The Authors. Published by Elsevier B.V. on behalf of King Saud University. This is an open access
article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

1. Introduction added effects of natural aging and rejuvenation. Facial aging is


the process of synthesizing a new face at the desired age.
Cross aging face recognition ability will decrease its ability to Many research was conducted in face aging or age progression
recognize someone’s face after a certain time. This decrease in per- (Wang, 2016; Kemelmacher-Shlizerman et al., 2014), but this
formance is due to facial changes in people who want to be recog- research is still challenging because the existence of training sam-
nized because face aging or age progression. Adding synthetics face ples for the same person who has a long range of years is very dif-
images at a certain age generated from face aging architecture is ficult to find (Panis et al., 2016; Rothe et al., 2018), and the current
one way to increase the performance of cross aging face method in face aging still need high computation to create a model.
recognition. Much research in the field of facial aging has been done. Their
Face aging or age progression is a term used to describe a pro- approaches can be classified into three categories: Conventional,
cess to synthesize faces at certain age groups (Zhang et al., 2017), Deep Generative, and Generative Adversarial Networks Approach.
defined as an aesthetic process for rendering a new face with the The conventional approach itself, divided into three categories:
model-based, prototype, and reconstruction approach. A model-
based approach is an early approach in age progression research.
Active Appearances Models (AAMs) (Patterson et al., 2006), Global
⇑ Corresponding author at: Computer Science Department, BINUS Graduate Aging Function, Appearance Specific Aging Function (ASA),
Program-Doctor of Computer Science, Bina Nusantara University, Indonesia.
Weighted Appearance Angging Function (WAA), Weighted Person
E-mail address: hadypranoto@binus.ac.id (H. Pranoto).
Specific Aging Function (WSA) (Lanitis et al., 2002), Aging Pattern
Peer review under responsibility of King Saud University.
Subspace (AGES) (Geng et al., 2010), and compositional and
dynamic model for face aging (Suo et al., 2010) are model-based
approach. This method usually utilizes a kind of appearance model
to represent the face structure and structure of the input photo.
Production and hosting by Elsevier

https://doi.org/10.1016/j.jksuci.2021.09.002
1319-1578/Ó 2021 The Authors. Published by Elsevier B.V. on behalf of King Saud University.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
H. Pranoto, Y. Heryadi, Harco Leslie Hendric Spits Warnars et al. Journal of King Saud University – Computer and Information Sciences 34 (2022) 7236–7246

The model-based approach has result unrealistic face image and face image at a certain age and also preserving identity. Generating
often lose the identity at generated face. This model-based a new synthetic face and preserving identity is a challenging com-
approach using face shape, texture changes due to aging processes putational task. The main problem in GAN is the architecture has a
such as wrinkles, muscle changes, hair color are modeled with a big number of parameters and high-cost computational and needs
parameter (parametric model). a big number of sample data to produce a good model. IPCGAN as a
The prototype-based approach (Kemelmacher-Shlizerman et al., baseline architecture takes a long time in terms of training to
2014) applies an aging prototype to transfer differences between become convergent, the ability to maintain the identity of the
age groups into input face images to generate a new face image in input image can also be improved. Based on that problem we pro-
a certain age group. This process estimates changes in age patterns posed a new more optimal variant of IPCGAN architecture within a
based using average face or means face (Rowland and Perrett, more simple architecture and quick process to generating new
1995), a pattern for each predefined age group (Liu et al., 2008). faces at the target age and also maintaining identity.
Average face applied to the input face image to produce a synthetic
face image at a certain age group. When implementing the pattern
into the input face image, the prototype-based approach needs high 2. Related work
precision alignment between input images with a mean face, if not,
it will produce a plausible synthetic face image. This method, this Face aging or age progression using GAN borrowing from the
approach is limited to modeling aging patterns and missing the glo- game generator and discriminator, where the discriminator learns
bal comprehension of the human face such as personality and how to distinguish the data given from the generator as a real sam-
expression, which is a need in aging progression. ple or not, and the generator learns how to produce a real sample,
The reconstruction-based approach focuses only on finding the to make a discriminator can not distinguished. This learning pro-
aging bases or basic patterns of each age group and combining cess will continue until a certain point the discriminator is not able
them. The result of this method is a dictionary can be used to trans- to distinguish between real data and data generated by the gener-
form the input image into the new synthetic image at the desired ator. In the context of face aging data is face images. The challenge
age. Shu et al (Shu et al., 2015) proposed a Coupled Dictionary in this method is to get a sufficient number of complete sequence
Learning (CDL) to model the personalized aging pattern (Tang images of an individual for training data. GAN architecture can
et al., 2018) by preserving the personalized features of each indi- generate any object from given noise, and depending on given
vidual. They formulated CDL for short-term face aging photos, training data. Synthetics face can generate by GAN if the architec-
not long-term aging photos because collecting the long-term dense ture is trained with face dataset.
facing sequence is difficult. Generative Adversarial Networks change the face aging
Another method is the deep generative-based approach. This approach from a step-by-step age progression approach into direct
method gives better results than conventional approaches. This age progression (Zhang et al., 2017; Antipov et al., 2018; Li et al.,
approach can produce synthetic faces more realistic than ever 2018). The aging effect such as adding the wrinkle, sideburn, gray
before. An example experiment using the deep generative method hair, eye bags, and change the structure of the head, such as shrun-
is an Age progression using Temporal Restricted Boltzmann ken chin, enlarge eyes added directly when simulated the aging or
Machine (TRBM) (Duong et al., 2016) which utilized embedding a rejuvenation process.
temporal relationship between sequence face images, utilizing A certain condition can be given to GAN to produce the desired
the log-likelihood objective function and ignoring l2 reconstruction image (Isola et al., 2017), this has inspired researchers to produce
error during training, capture non-linear aging process efficiently facial images with certain age conditions (Antipov et al., 2018), by
and automatically synthesize a sequences age progression from utilizing a residual block, the process of change can occur (Liu,
each age group with greater detail. The Recurrent Neural Network et al., 2017). Previously GAN input that was derived from noise,
(RNN) (Wang, 2016), and two Layer-Gated Recurrent Units (GRU) replaced by using an input image, this will shorten the process to
to utilize the information (memory) from the previous layer, and produce synthetic images because it eliminates the process of pro-
create a smooth transition of change between age groups when ducing face images from scratch. IPCGAN (Wang et al., 2018) using
synthesis faces images. this method to produce an aging effect and maintaining identity.
The face aging or age progression improved significantly when The generative Adversarial Network method can be categorized
Generative Adversarial Networks (GAN) method is discovered based on how to architecture generated synthesis face image into
(Goodfellow, et al., 2014). Many research uses the GAN method three categories: translation-based, sequence-based, and
can create fake characters, fake faces, and fake images. GAN condition-based.
method has been demonstrated its success to generated high- A translation-based method is based on how to transfer style
quality images (Goodfellow, et al., 2014; Mirza and Osindero, from one set of images into another set of images, from one
2014) and also can be used to mapping image styles from a domain domain into another domain images. Translation-based method
to other domain images. For example, a mapping natural style introduce by Zhu et al. (Zhu et al., 2017) in Cycle-GAN architecture
landscape images into photographic paintings style images, from capturing style characteristics from a set of images and imple-
realistic face photos into sketch images (Zhu et al., 2017). A GAN mented in another set image. The advantage of Cycle-GAN does
architecture can create various objects depending on the training not require a pair match of two set domains. This advantage of
data provided (Im et al., 2016). Cycle-GAN can utilize for face aging. But Cycle GAN also has a dis-
Generative Adversarial Networks have improvement when a advantage. Cycle-GAN was only able to translate two domains.
conditional GAN (cGAN) introduce. A cGAN can create desired out- Palsson et al. (Palsson et al., 2018) proposed a face aging architec-
puts based on conditional given to architecture and can produce ture by adopted an architecture style transfer from cycleGAN (Zhu
one image translation to another image by implementing a certain et al., 2017), and uses cyclic consistency between the input face
characteristic condition on the image (Isola et al., 2017) and can be image and the generated synthetic image, to preserved identity
used also to evoke images of faces at a certain age in (Antipov et al., of the input face image in generated synthetic image.
2018). The second method is sequence-based, this method is not
Identity Preserved Generative Adversarial Networks (IPCGANs) designed to perform direct translation between different age
(Wang et al., 2018), have successfully generated a new synthetic groups. It uses multiple networks that are trained separately to

7237
H. Pranoto, Y. Heryadi, Harco Leslie Hendric Spits Warnars et al. Journal of King Saud University – Computer and Information Sciences 34 (2022) 7236–7246

perform sequential translations between two adjacent age groups. certain conditions or styles. Using this method the resulting in the
In this method, the translation conducted sequentially, and com- image being more realistic, because the structure of the image can
bined into a complete network. The output from network i-th will be maintained. Conditional GAN can create face aging at certain
be used as input for network i + 1th. The challenge in this method age groups by using face dataset and add aging conditional for each
is to get a complete and sequenced image of an individual. Accord- age-groups.
ing to Wang et al, although the deep learning approach is already First Age conditional GAN(acGAN) architecture proposed by
powerful, there are still challenges in training the age-group tran- Antipov et al. (Antipov et al., 2018), can generate a synthetic face
sition from certain age groups to other age groups in a ‘‘one-shot” at targeted age group use an age condition. The architecture is
way (Wang et al., 2019). But current face aging method used ‘‘one- composed of a two-step, reconstruction and face aging step. Recon-
shot” and achieve the state-of-the-art (Despois et al., 2020; Liu struction steps are an optimization problem to find optimal latent
et al., 2019; Fang et al., 2020). Research using sequence-based approximation z ⁄ and face aging step performed by a simple
are FaceFeat-GAN: a Two-stage approach for identity-preserving change condition of y at the input of the generator. In the recon-
face synthesis (Shen et al., 2018); D. Deb et al. with Child Face struction step, latent vector optimization compute Euclidean dis-
Age-Progression via Deep Feature Aging (Deb et al., 2020) to simu- tance between input image x and synthesis image x result from
late child’s face; Q. Li et al (Li et al., 2019) with Age Progression and FRðxÞ and FRðxÞ to create face aging images from an input image
Regression with Spatial Attention Modules, exploiting spatial to the target age group and also preserve the original person iden-
attention mechanisms to restrict image modification to regions tity. Minimum Euclidean distance indicates probability if the per-
in face image, closely age changes, to produce high visual fidelity son in both images same person. face aging step is process
face image; H. Fang et al with Triple-GAN: a progressive face aging compute Euclidean distance between input image x and images
with triple translation loss (Fang et al., 2020), adopt a triple trans- from age groups.
lation loss to model the robust interrelationship of age patterns
between different age groups. Duong et al (Duong et al., 2019) 2.2. Face aging with identity-preserved conditional generative
introduce Inverse Reinforcement (IRL) and implement Subject- adversarial networks (IPCGANs)
dependent Deep Aging Path (SDAP) structure model and use an
additional aging controller in the TNVP structure, based on the Identity Preserved Generative Adversarial Networks (IPCGANs)
hypothesis that each subject should have its facial development. (Wang et al., 2018), is an architecture that has successfully gener-
Multiple training pairs increase the ability to learn the mapping ated a new synthetic face image at a certain age group and preserv-
between pattern and label. ing identity. IPCGANs, has three modules: (1) conditional module
The third method is conditional-based, a method that utilizes a for generated real synthesis and look realistic face x from input
conditional in GAN (Mirza and Osindero, 2014) to control architec- image x, in target age groups t using age conditional ct ; (2)
ture produce synthetic faces in a certain age group using a label. preserving-identity module to generated face synthesis x has the
This method uses a one-hot code tensor to guiding the network same identity as input image x; and (3) classifier module (Yang
to generate synthetic face images on target age groups. From sev- et al., 2018) which forced the synthesis of images x created at
eral existing methods, the placement of this one-hot code varies. desired age group.
Some methods place this one-hot code on the generator and dis- The conditional module is adapted from cGAN architecture
criminator (Tang et al., 2018) part, other methods place one-hot (Isola et al., 2017). in this module, the generator tries to generate
code only on the generator part which is one hot code concate- a synthetics image x at certain age groups from images y using a
nated with input images. For example, Yao et al. (Yao et al., conditional characteristic. After synthetics image create discrimi-
2020) fuse age labels with latent vectors. In summary, the nator part forced the image resulting from the generator to fall
condition-based method is based on the concept how guiding the in age target groups and look real as a human face. This conditional
generator by including extra information about the target age module has two discriminators, the first discriminator D makes a
groups. The condition-based has high efficiency and a significant decision using a loss function to determine image produce by the
advantage compared to the translation-based approach. Example generator appears real or fake. Small loss values indicate the
research use conditional-based approach is Zhang et al. conducted resulting images look real, and vice versa. The second discriminator
Age progression/regression by conditional adversarial autoencoder has a task to align the output synthetics images y aligned with
(Zhang et al., 2017), W. Wang et al with recurrent face aging images from target age groups t using age condition ct . To be able
(Wang, 2016), J. Song et al. with Dual Conditional GANs for Face to complete this task, the objective function of the module was for-
Aging and Rejuvenation (Song et al., 2018), G. Antipov et al with mulated as Eqn 1.
Face Aging using Conditional Generative Adversarial Networks
(Antipov et al., 2018), S. Liu, et al., with Face Aging with Contextual min max V ðD; GÞ ¼ Ex px ðxÞ ½logDðxjC t Þ
G D
Generative Adversarial Networks (Liu, et al., 2017), Synthetic face
images from many methods in Age GAN showed these methods þ Ey py ðyÞ ½1  log DðGðyjCt ÞÞÞ ð1Þ
produce more realistic, high fidelity, and also preserved identity
from input images. where Px ðxÞ and P y ðyÞ are image distribution x, and y. Ex px ðxÞ are
expectation distribution data x and Ey py ðyÞ are expectation distribu-
2.1. Conditional GAN (cGAN) tion data y. The loss function for discriminator D is logDðxjC t Þ and
generator G is 1  log DðGðyjCtÞÞÞ. Generator GðyjCt Þ modeled data
A Conditional GAN (cGAN) proposed by Isola et al. (Isola et al., distribution from data given by y and ct .
2017) is an architecture based on GAN, this architecture translates IPCGANs use a Least Squares Generative Adversarial Networks
images from one style domain into another style domain by imple- (LSGANs) (Shu et al., 2015) model to push the generated face
menting a conditional into the architecture to get certain charac- images to look real, and close to decision boundaries and make face
teristics in the resulting image. In general model conditional GAN synthesis indistinguishable as real face images. LSGAN can gener-
can solve any style transfer problem from image to image, not only ate high-quality images and training process more stable. A Condi-
for a specific style domain, the model not only maps from an input tional LSGAN for face generation tasks in IPCGAN formulated in
image to output image but also can create a mapping function for Eqn 2.

7238
H. Pranoto, Y. Heryadi, Harco Leslie Hendric Spits Warnars et al. Journal of King Saud University – Computer and Information Sciences 34 (2022) 7236–7246

1 1 real faces difficult to distinguishable. LSGAN formulated as Eqn 3


LD ¼ Ex P x ðxÞ ½ððDðxjC t Þ  1Þ2  þ Ey P y ðyÞ ½ðDðGðyjC t ÞÞ
2

2 2 and Eqn 4.
1 h i 1 h i
LG ¼ Ey P y ðyÞ ½ðDðGðyÞjC t Þ  1Þ2  ð2Þ 1
2 LD ¼ Ez px ðxÞ ðDðxjC t Þ  1Þ2 þ Ez px ðx Þ 2ðDðGðyjC t ÞÞ2 ð3Þ
2 2
To optimize Conditional LSGANs, IPCGAN using a matching
aware discriminator for effective aligning conditions with gener- 1 h i
LG ¼ Ez px ðxÞ ðDðGðyjC t Þ  1Þ2 ð4Þ
ated images. 2
The Identity-preserved module is used to preserve identity
The LSGAN conditional, optimized use matching aware discrim-
information for the synthesized image. This module is an impor-
inator in (Reed et al., 2016), where the discriminator only checks
tant part of the architecture to preserve the identity information
generated image samples for corresponding age groups.
for the synthesized face. Perceptual loss use in this objective func-
To generate an aging pattern at a certain age group, the pro-
tion instead of Mean Square Error (MSE), because if use MSE to cal-
posed method using Age classification module based on Alexnet
culate differences between input image x and face age GðxjC t Þ in
Age Classifier in IPCGAN, this module has a task to generated
pixel square, the loss will force GðxjC t Þ to be the same as image x.
images at certain age groups. Using classifying method age classi-
A perceptual loss encourages the generated synthetic image to be
fier guarantee the generated face GðxjC t Þ fall into the target age C t .
close to a feature of input space in the same feature spaces. IPCGAN
If the generated face fall in the group C t our age classifier will give a
uses a lower feature layer to keep the content or personality of the
small penalty, vice versa age classifier will give a big penalty to the
input image, and a high feature layer to help keep style-related
architecture. This age classifier trying to get the smallest loss to
such as color, texture, etc. An aged face must change in terms of
generate face with an aging pattern at a certain age group. Loss
hair color, wrinkles, etc. IPCGAN also uses an Alexnet Classifier to
from objective function this age classifier forces the architecture
force image generated form generator, to synthesize face images
to generate an aging pattern in resulting images. Minimum loss
at certain age group by classifying them into an age group class,
values make generated image has an aging pattern from each age
when the output image from the generator has a small loss when
group. Age classifier in proposed method formulated as Eqn 5:
classified, it means the resulting image has age pattern from a X
specific class. Architecture trains with this condition until the best Lage ¼ lðGðxjC t Þ; C t Þ ð5Þ
result model is created from architecture, the architecture trained xpx ðxÞ

into minimum specific loss value as a threshold.


where lðÞ is to softmax loss. Through backpropagation, age classi-
fication loss forces the parameter of the generator to change and
3. Research method generate faces that lie in the correct age group.
To preserving identity input image in the synthesized face
3.1. Proposed architecture image, proposed method using preserving identity module. This
preserve identity module uses a perceptual loss to guarantee the
The proposed architecture is a new optimal variant of IPCGAN resulting image has age pattern change, but still has identity infor-
with improved performance to produce face synthetics face images mation from the input images. The preserving identity module did
at certain age targets and maintain the identity from input face not use adversarial loss in process generation of face, the adversar-
images. The proposed architecture aging process aims to generate ial loss forced the generate synthesized face will follow the target
a synthesized face image x that lies in the target age C t from image data distributions. Consequently, the generated sample can be like
x, with the hope that the image x has the following characteristics: any person in any age group, which means adversarial loss alone
(1) face image x looks really like a real face image; (2) x face image cannot guarantee that the generated sample can preserve identity
has the same identity as x image; (3) face image x has an age in the information from input face images.
age group C t . Models train based on the age groups that face image Actually, in IPCGAN or our proposed network, age classifier and
x belongs to related to C s , the model correspondent C s to any target identity preserving module share part the same architecture from
age groups C t in pair list. We, not training specify training C s age the first layer to the seventh layer. With same age classifier archi-
groups to C t age groups. tecture make synthetics face images at certain target age and pre-
Similar to IPCGANs, the proposed architecture has three- vent the generated image losses the identity from an input image.
module (1) a conditional GAN (cGAN) which generate a synthe- Feature space extracted from specific proper layer hðxÞ is impor-
sized face with a target age C t and guarantees image x looks real- tant to preserving the identity information and the lower layer is
istic; (2) An age classifier module that enforces the image x fall good to keep. Experimental style transfer in (Gatys et al., 2016;
into the desired age C t ; (3) An identify preserving module that Johnson et al., 2016) showed the lower layer, is good to keep the
guarantees image x the same identity with x. content and the higher layer is good for keeping the style-related
The conditional in generator module has a task to generate a color, texture, etc. Even face-aging has changed in terms of hair
synthesized face at a target age using generator G. Denote x as color, wrinkle, the identity information must be kept or not change.
input image within the source age group and y as a real face within Based on this we have concluded that face content is a representa-
target age groups. We denote the distribution of x and y as px ðxÞ tion of the identity information that can provide by a lower feature
and py ðyÞ. With cGAN, the synthesized face with age condition C t on age classification hðxÞ.
should be can not be distinguished as a real image by discriminator Modified Age classifier in proposed architecture use loss func-
D. For real face sample generated from generator G, the probability tion same with Alexnet Age Classifier (Eqn 6):
belong to real face DðxjC t Þ should be high. Discriminator D is also X 2
Lidentity ¼ jjhðxÞ  hðGðxjC t ÞÞjj ð6Þ
responsible for aligning the input label C t with generated images.
xpx ðxÞ
When using optimization in standard GAN (Goodfellow, et al.,
2014), experience instability occurs, with consequences always where h(x) is a feature extracted by a specific feature layer in a
produce a bad and unrealistic picture. The proposed architecture modified preserve identity layer based on Alexnet age classifier
uses Least Square GAN (LSGAN) (Mao et al., 2017), the same as IPC- architecture. In this part, perceptual loss used instead of Mean
GAN. The LSGAN tries to push both generated faces and real faces Square Error between x and its age face GðxjC t Þ. Using cross-
into the same decision boundary, and make generated faces and entropy loss the architecture allows the created image to still have
7239
H. Pranoto, Y. Heryadi, Harco Leslie Hendric Spits Warnars et al. Journal of King Saud University – Computer and Information Sciences 34 (2022) 7236–7246

a change in terms of hair color, bread, wrinkles, hairlines, and create ers in IPCGANs into 1 layer only, to make proposed architecture has
new generated faces differences from input face x. This perceptual faster execution time in the training process. For clearly our gener-
loss is used instead of use Mean Square Error loss as a loss function. ator architecture showed in Table 1.
Mean square error loss makes the input image and generated image The residual layer in the face aging architecture provides an
has to high similarity or identical. Age-face GðxjC t Þ function use opportunity for the architecture to form an aging pattern using
Mean Square Error will force generated image y became identically the weight parameters obtained through the training process, the
as image x. Choose the perceptual loss as loss function are the right residual layer provides an architectural opportunity to produce
choice, the perceptual loss encourages the generated image y to be synthetic images for the intended age group. However, the residual
close in feature spaces with a feature space of input image x, but layer has disadvantages. Adding the number of residual layers will
still have differential with the image. increase the number of the parameter in architecture. Increasing
To make sure the proposed architecture successfully generated the number of parameters also increases the number of computa-
faces in certain age groups and did not lose the identity from the tions processes, huge computation processes will burden the com-
input image, we formulated an objective function for the proposed puter and increase training time. For this reason, the author
architecture as we can see in Eqn 7: reduces the number of residuals in the architecture to has a mini-
mum number of parameters as small as possible, but still can pro-
Gloss ¼ k1 LG þ k2 Lidentity þ k3 Lage ð7Þ
duce a face aging pattern. The proposed architecture reduces the
where k1 controls to the extent the input image is aged. And k1 and number of computation processes needed, and the architecture
k3 controls to what extent we want to keep the identity information can synthesizing new faces at the desired age target more quickly.
and let the generated samples fall into the right age groups. In sec- Discriminator network: our discriminator consists 5-layer
tion 4, we will empirically find the optimal k1 ,k2 , and k3 . convolutional network. The first layer is convolutional layer 4 x4
In the proposed architecture, we improved the accuracy of the stride 2 and activation function leaky-ReLU. One hot code matrix
age classifier module. Increasing the accuracy of the classification age condition feature with size 64  64  5 has concatenates
section will make the age classifier module stronger to force the together with the output first layer and this combination use as
architecture to produce synthetic images in the intended age input for the second layer (Table 2).
group. We made a change in the classification section which to Age classification network: our proposed age classification
increase the age classification accuracy is used as the part that network is modified from Alexnet age classification architecture.
leads the architecture to produce face images in the desired age In this network age classifier and identity preserve part, share
group. We also make modifications to the generator, to reduce the same architecture from the first to seventh layer. The same
the number of computational processes on the architecture, it is module will use to generated synthetics images at certain target
hoped that reducing the number of residuals on the generator will age groups and preserving information identity from the input face
speed up the training process. image.
The proposed architecture change in Age Clasiffier and genera- This age classification network receives 227  227  3-pixel
tors part contained in IPCGAN can see in Fig. 1. images from the generator. The generator produces 128  128 x3
pixel face images, before being used in the age classification net-
3.2. Architecture work. The image resizes into 227  227  3 pixel images uses
image bilinear interpolation. This network process the image with
The proposed architecture has three networks in it. The three the convolutional process, activation, batch normalization, pooling
networks consist of generator network, discriminator network, process, linearization and drop out. And at the end of classification,
and age classifier network, each of which has a specific task to we use a softmax layer to classify the images and prevent
jointly produce new face images in the desired age group. overfitting.
Generator Network: Our generator receives 128  128  3 Because the proposed network use age classification for two
images and 128  128  5 matrix one-hot code age condition as tasks: (1) to force generated image has an aging pattern and (2)
input. One-hot code matrix is in age conditions will be inserted preserving identity, the network shared the first seven-layer. pre-
into architecture. In one-hot code, only one feature map layer is serving identity module uses the first seven-layer, output from this
filled with one while the rest feature maps layer is all filled with layer, computed use perceptual mean-square-loss to force similar-
zeros. At training, the input image and one-hot code concatenate ity between generated image, an input image, with this way we
before the first layer of convolutional. In the proposed architecture preserved the identity form input images. Output from the seventh
number of generator residual layer was reduce from 6 residual lay- layer continued classified into an age group class. If the output

Fig. 1. Proposed architecture.

7240
H. Pranoto, Y. Heryadi, Harco Leslie Hendric Spits Warnars et al. Journal of King Saud University – Computer and Information Sciences 34 (2022) 7236–7246

Table 1 parameter, reduces the memory requirement, provides a robust


Proposed architecture generator. parameter, and increasing statistical efficiency. Max pooling is for-
Input : 128  128  3 face image + 128  128  5 age condition mulated in Eqn 8.
1 Concatenate 128  128  3 face image with 128  128  5
age condition fmaxðxÞ ¼ maxi ðxi Þ ð8Þ
Conv 7  7, stride = 1
Batch Normalization 32, eps = 0.001
However, the biggest drawback of the max pooling operation
ReLU relates to the purpose of the max pooling operation, which only
2 Conv 3  3, stride = 2 passes a maximum of elements. If the majority of elements in
Batch Normalization 64, eps = 0.001 the pooling area are of high magnitude, the discerning features will
ReLU
disappear or be eliminated after the max-pooling operation. The
3 Conv 3  3, stride = 2
Batch Normalization 128, eps = 0.001 fact of this situation leads to an unacceptable result, as a lot of lost
ReLU information is passed on to the layer afterward (Sharma and
4 Residual 1x Mehra, 2019).
Conv 3  3, stride = 1, padding = 1 In the proposed architecture max-pooling layer in IPCGAN
BatchNormalization 128, eps = 0.001
ReLU
replace with a convolutional layer until the number of Max-
Conv 3  3, stride = 1, padding = 1 pooling layers in the proposed architecture remains only 1 layer
BatchNormalization 128, eps = 0.001 from the 4 max-pooling layers. Replacing this max-pooling layer
ReLU by convolutional layer in the proposed architecture aim to prevent
5 Deconv 3  3, stride = 2, padding = 1
information loss after a max-pooling operation. By replacing the
BatchNormalization 64 , eps = 0.001
ReLU max-pooling layer, the proposed architecture got an increase in
6 Deconv 3  3, stride = 2, padding = 1 the number of weight parameters from 56,888,709 parameters in
BatchNormalization 32 , eps = 0.001 the baseline Alexnet classifier into 85,908,293 parameters.
ReLU Logically, increase the number of parameters, will increase
7 Conv 7  7, stride = 1
Tanh
computation cost and processing time. But this replaces, the pro-
Output : Generated Face Image 128  128  3 posed architecture has some benefits, fist information loss from
layer to layer will reduce, important information from the previous
layer can be reserved. This important feature need for an age clas-
Table 2 sifier for the classification process, age classifier can classifier input
Proposed discriminator architecture. images into age groups with better accuracy. Max-pooling replace-
ment with a convolutional layer also allows the architecture gener-
Input : 128  128  3 face image + 64  64  5 age condition
1 Conv 4  4, stride = 2 ates new weight parameters, which has the benefit to produce
Concatenate 64  64  64 feature map with 64  64  5 age condition better accuracy (Sharma and Mehra, 2019). After replacing the
2 Conv 4  4, stride = 2 max-pooling layer, the proposed architecture c in Table 3.
Batch Normalization 64, eps = 0.001
When replacing the max-pooling layer with the convolutional
Leaky-ReLU(0.2)
3 Conv 4  4, stride = 2
layer. we must consider the number of receptive fields in the pro-
Batch Normalization 128, eps = 0.001 posed architecture the same as the baseline model. The receptive
Leaky-ReLU(0.2) field in a convolutional neural network is defined as an important
4 Conv 4  4, stride = 2 region of input space in a convolutional neural network. Receptive
Batch Normalization 128, eps = 0.001
field feature also cand defined as it as central localization and size,
Leaky-ReLU(0.2)
5 Conv 4  4, stride = 2 this field important to generating feature in CNN.
Output : feature maps 512  4  4
3.3. Dataset preparation

Author also observe some face aging dataset (dataset with age
image is classified into the wrong target age groups class, we give a label), such MORPH - Album 1 dataset (Ricanek and Tesafaye,
big penalty to architecture, and a small penalty if the input image 2006), MORPH - Album 2 dataset (Ricanek and Tesafaye, 2006),
is classified into the right target age group class. Architecture use FG-Net (Lanitis, 0000), AdienceFace dataset (Eidinger et al., 2014;
this way to force the generated image to fall into the right target Levi and Hassner, 2015), Cross-Age Celebrity Dataset (CACD)
age group class, this way assists the architecture to produce syn- (Chen et al., 2014), IMDB-WIKI (Rothe et al., 2018), AgeDB
thetic faces at the specified target age. (Moschoglou et al., 2006), and UTKFace (Zhang et al., 2017). To
To improving the accuracy performance of the age classifier, in choose a dataset we must consider dataset distribution groups
the proposed method, some part of the Alexnet age classifier was and the number of images contained in each age group, unbal-
modified. In the proposed architecture some max-pooling layer anced dataset and small or insufficient number of the image make
in the Alexnet age classifier was replaced with a convolutional architecture difficult to find an age pattern model for face aging.
layer. Cross-Age Celebrity Dataset (CACD) (Chen et al., 2014) choose
In a convolutional neural network, a max-pooling layer pro- for training the architecture to generated a model and evaluations.
vides the ability to learn invariant features and act as a regulator CACD contains 163,238 face images of 2000 celebrities with an age
to reduce the overfitting problem in the training process. A Max- range from 16 to 62 all the images are annotated with age, not the
pooling layer, selected from the convolutional calculation and acti- clean label. This number of images is less than the number of the
vation unit from the previous layer, and reducing the computation image from the original dataset because some images are for
cost by eliminating non-maximum components. By this elimina- cleansing. Dataset has a large variation in pose, illumination,
tion provide better performance on architecture that uses sparse expression, and even style in the dataset. This dataset also has a
coding and linear classifier. A max-pooling layer also increases balanced age group distribution, it will help the architecture to find
training speed, because the max-pooling layer reduces the input the age group pattern from the dataset. The distribution from the
dimension for the next layer without adding a changeable weight dataset can see in Fig. 2.

7241
H. Pranoto, Y. Heryadi, Harco Leslie Hendric Spits Warnars et al. Journal of King Saud University – Computer and Information Sciences 34 (2022) 7236–7246

rithm add more noise to the learning process, this helps to increase
the generalization of errors. Adam as Stochastic Gradient Descent
(SGD) choose in many deep learning architectures, adam optimizer
can straightforward to implement, need little memory, invariant to
diagonal rescale of gradient value, can use for large dataset or
parameters, fit for non-stationary objective, can implement into
very sparse or noisy gradient and hyperparameters, easy to imple-
ment using intuitive interpretation, easy to configure and only
need a little tuning. Default configuration parameters are almost
well on most problems (Xu, et al., 2015; Gregor et al., 2015). The
best properties in AdaGrad and RMSProp algorithms and combine
in Adam.
We trained the discriminator 2 times and the generator 1 time
to get optimal weight for the discriminator and generator. We
change the Alexnet classifier in the proposed architecture, task to
synthesizing age changes, we replace the max-pooling layer in
Alexnet classifier with a convolutional architecture, continue with
batch normalization to stabilize the layer as the similar method
applied by Perarnau et al (Perarnau et al., 2016). The proposed
age classifier can see in Table 3.
Fig. 2. CACD dataset distributions. We comparing our method with the IPCGAN (Wang et al., 2018)
a conditional GAN which achieving state-of-the-art performance
for facial aging. For the training proposed architecture, we use
Before use for training dataset preprocessed, each image in
the learning rate as 0.001 and use 32 images per batch size, the
dataset crop tightly, make face regions approximately 80% percent
whole training process takes 12,500 iterations each epoch.
of the image, continued with the alignment process into a frontal
Both architectures are trained until the discriminator loss
face with the key point on the left eye, right eyes, nose, left mouth,
threshold reaches a value of 10. Both models were trained 5 times
and right mouth, and resized it into 400 pixels  400 pixels. Data-
to produce the 5 model weight. Each model weight is used to gen-
set divides and grouped into 5 non-overlapping age groups age 11–
erate 500 synthetic images from 100 person images. The 100 per-
29, 21–30, 31–40, 41–50, and 50+, respectively, and separated into
son images we selected randomly but we still consider the
90:1 for training and testing.
proportion of each age group. Both IPCGAN and proposed are
trained to produce faces that look real, at a certain target age
3.4. Training process group. After training a model was produced, we train 5 times each
architecture.
For training we created a list of 400.000 pairs of age labels, to The generated image from IPCGAN and our proposed architec-
make each age group get the same opportunity to train. This list ture can see in Fig. 3.
creates randomly from data but still considering each age group
has the same number of opportunities to train. This list contained
source age condition C s and target age condition C t .
For each age condition C s or C t we prepared a one-hot code
matrix with dimension h  w  5, where h is the height of face Table 3
image, w is width face image, and 5 number feature maps from 5 Proposed architecture age classifier.
age groups. For the first age group feature maps, we filled one for
Input : 227  227  3 interpolate image from
the first map, and zero for the rest of the maps, we filled with zeros. 128  128  3 generator image.
For the second group feature map we fille one into the second map 1 Conv 11  11 stride = 4
and zero for the rest of the maps. And so on. Batch Normalization 32, eps = 0.001
Architecture train to generate face synthetic age at a certain age, ReLU
2 Conv 3  3 stride = 2 groups = 2
we prepared a pair set of the source face image, and source age
Batch Normalization 256, eps = 0.001
conditions as one-hot code matrix as a label in the generation pro- ReLU
cess. We used two image sizes as input, first 128-pixel  128-pixel 3 Conv 3  3 stride = 2 groups = 2
face images, used as input images into the generator D and 227- Batch Normalization 256, eps = 0.001
pixel  227-pixel face images used for the age classification pro- ReLU
4 Conv 3  3 stride = 1 groups = 2
cess. For the age conditions one-hot code matrix, we prepared Batch Normalization 384, eps = 0.001
two sizes one-hot code matrix, with size 64  64  5 for the dis- ReLU
criminator and 128  128  5 for the generator. In each batch, 5 Conv 3  3 stride = 1 groups = 2
the model was trained using 32 pairs of face images and a one- Batch Normalization 384, eps = 0.001
ReLU
hot code matrix.
6 Conv 3  3 stride = 2 padding = 1 groups = 2
Using the Dataloarder, training data, and test data prepared Batch Normalization 256, eps = 0.001
with a ratio of 9:1. Our architecture is trained by using train pair ReLU
labels of pairs of training label age groups to ensure the architec- 7 Max Pooling 3  3 stride = 2
ture is trained in balance, to produce a model which is capable to 8 Dropout
Linear (16.384,4096) or fc
generated synthetic face images in age groups with the same good ReLU
quality in each age group. Linear (4096, 4096) or fc
We use the stochastic Gradient Descent (SGD) approach ReLU
because the proposed architecture requires an update and exami- Linear (4096,5) or fc
Output: 5 age classification maps
nation of the error on each sample. SGD as the optimization algo-
7242
H. Pranoto, Y. Heryadi, Harco Leslie Hendric Spits Warnars et al. Journal of King Saud University – Computer and Information Sciences 34 (2022) 7236–7246

Fig. 3. Generated image.

4. Experiment and discussion IPCGANs architecture, Table 4 shows the accuracy performance
of the proposed architectures through the measurement uses the
Architecture trained and produce a weight model. This weight k-NN and SVM classification methods by utilizing the triplet
model uses to create synthetics face images. One hundred (100) embedding loss value as a feature. From the results of the k-NN
images from 100 identities, chosen from the CACD dataset, this classification with the augmented dataset, our proposed architec-
image and use as input images for architecture to generate syn- ture has a 4.2% better accuracy from IPCGAN (86.9% accuracy from
thetics face images. For each image, architecture created 1 image the proposed architecture compared to 83.4% IPCGAN accuracy,
for each age group, at the end architecture will produce 500 images and statistical tests on the measurements performed show signifi-
for 5 age groups from 100 identities. These images use for the eval- cance (p-value: 0.009). The results using SVM classification with an
uations process. augmented dataset, our results also show a 3.6% better accuracy
(89.2% accuracy from proposed architecture compared to 86.1%
accuracy IPCGAN), and the results of statistical tests conducted
4.1. Preserving identity ability
show that these differences show significance (p-value: 0.009).
Increasing accuracy is caused by the reduction of the residual layer,
To evaluate preserving identity ability from the IPCGAN and
with the large number of residual layers that must be passed,
proposed architecture when generating a new fake image, we use
changes in weight parameters will rarely occur compared to
FaceNet face recognition architecture introduce by Schroff et al.
IPCGAN.
(Schroff et al., 2015), which using triplet loss embedding as a fea-
For the second scheme, we evaluate the generated synthetics
ture in the face recognition process. FaceNet gets state of art in face
images using the face verification method in the FaceNet architec-
recognition because the has high accuracy when identify and rec-
ture and augmented or not augmented. Augmented data happen
ognize somebody’s face. If FaceNet can identify the synthetic face
when generated synthetics image is included into the dataset,
generated by the GAN models with the correct identity, it means,
and non-augmented data happen when generated synthetics
the models can generate a look real face image and preserving
image is excluded from dataset In this scheme evaluation, similar-
the identity from input face images.
ity distance between synthetic faces and faces image in dataset cal-
FaceNet face recognition (Schroff et al., 2015) requires a dataset
culated using the triplet loss embedding feature. If the similarity
in the training process to produce its triplet loss embedding feature
distance value of the synthesis face is less than the threshold value
which is used in face recognition. To produce a good performance
(0.3), the synthesis face will be categorized as the same identity,
result, prepared 15 other face photos of the same person from 100
and we counted as correct prediction and vice versa. Verification
individuals as the reference dataset from the CACD dataset and a
accuracy is calculated by dividing the number of correct guesses
separate folder for each individual was previously selected. At
by the number of how many times the face recognition has
the end of the preparation of this dataset, we finally had 1500 pho-
guessed.
tos of faces from 100 identities from the dataset for face
Evaluation using the face verification method using an aug-
recognition.
mented dataset shows the proposed architecture also has 8.6% bet-
To evaluate the preserving identity ability from IPCGAN and
ter accuracy (93.7% accuracy from the proposed architecture
proposed architecture, 3 testing schemes were prepared. The first
compared to 86.3% IPCGAN accuracy, and statistical tests carried
scheme, preserving identity evaluates if synthetics image includes
out shows significance (p-value: 0.009). Measurement using the
as a part dataset of FaceNet face recognition, in this scheme gener-
non-augmented dataset also shows that the proposed architecture
ated synthesize face image, put together with FaceNet dataset or
is better than the IPCGAN (71.2% accuracy from proposed architec-
called augmented data. The total amount of face images after gen-
ture compared to 56.5% IPCGAN accuracy, the statistical test also
erated synthesize face images put in the dataset is 2100 images.
shows significance (p-value: 0.009).
The evaluation trying to recognize 100 identities was chosen
The results of the evaluation using k-NN and SVM classification
before. FaceNet trained, and preserving identity ability evaluate
measurements show that reducing the number of residuals in the
use face recognition performance using SVM and KNN classifica-
generator part shows an increase in the accuracy of identification
tion techniques. The same treatment was implemented on IPCGAN
of identities in synthetic images. Residual layers provide an aging
and proposed architecture. The evaluation result first scheme can
effect on the architecture, but this number can be reduced to a suf-
see in Table 4.
ficient number of layers to create an aging effect. In IPCGAN the
From the results of our research, we can conclude the proposed
number of residuals has a total of 6 residual layers, in the proposed
architecture has the ability to preserving identity better than the
7243
H. Pranoto, Y. Heryadi, Harco Leslie Hendric Spits Warnars et al. Journal of King Saud University – Computer and Information Sciences 34 (2022) 7236–7246

Table 4
Evaluation result from k-NN, SVM, Face verification, and umber of parameter.

Model Scheme 1 Scheme 2 Scheme 3


Average k-NN Average SVM Face verification Face Average Age Number of Training
Accuracy Accuracy Accuracy verification Prediction Parameters Speed
(augmented) Accuracy Accuracy (second)
(non-
augmented)
IPCGAN 83.4% 86.1% 86.3% 56.5% 32.48% 66,122,600 191.37
Proposed model 86.9% 89.2% 93.7% 71.2% 31.00% 85,908,293 179.25
(modified Alexnet & (p-value 0.009) (p-value 0.009) (p-value 0.009) (p-value (p-value 0.754)
generator) 0.009)
Proposed model 81.9% 86.5% 87.3% 58.00% 33.20% 93,665,064 185.25
(modified Alexnet) (p-value 0.251) (p-value 0.465) (p-value 0.251) (p-value (p-value 0.754)
0.465)

architecture it is reduced to a sufficient number of layers to create in the proposed architecture is the correct action. Replacing
an aging effect. In IPCGAN the number of residuals has a total of 6 max-pooling with a convolutional layer reduces the important
residual layers, in the proposed architecture it is reduced to only 1 information loss, which is this information need for the classifica-
layer. This speeds up the training process and increases the ability tion process, and is also important in the formation of aging pat-
to maintain identity. terns on newly synthesized faces. The improvement in the aging
The increase in accuracy in this section is caused by a reduction effect in proposed architecture, when only changes the max-
in the amount of residual contained in the generator section. The pooling layer Alexnet classifier (modified Alexnet) show by accu-
author reduces the number of residual layers to the number of lay- racy increase from 32.48% in IPCGAN into 33.20% in the proposed
ers that are still able to perform synthetic face aging on the input architecture. Age Prediction evaluation results can see in Table 4.
image into a synthetic face image in the desired age group. the
residual layer has the task of producing the effect of face aging, 4.3. Number of parameters
each parameter in the residual layer adding an aging effect to the
input image. The other convolutional layers in architecture also We compared the number of model parameters from IPCGANs
adding an aging effect to the input image. Reducing the number as a baseline architecture compared with our proposed architec-
of residual layers to a certain number of the layer will still have ture. The number of model parameters can be seen in Table 4.
an aging effect on the architecture. The aging effect will reduce
the ability to preserve identity in the architecture. Author reducing
4.4. Training speed
the number of residual layers from baseline architecture until the
number layer can still provide an aging effect, and also can main-
To evaluate training speed performance, we make a measure-
tain identity.
ment comparison. We measure the speed of training every 100
batch iteration. We average the speed value from 10 measure-
ments. We choose 100 iterations because we want to get big differ-
4.2. Aging effect ability
ences in training speeds between measurement values to make it
easy to compare. Decreasing training speed in proposed architec-
The third scheme uses to evaluate the ability of the models from
ture caused by elimination number of the residual layer in gener-
both architectures when generating synthetic faces in certain age
ator, from 6 layers to 1 layer.
groups. In this scheme Age prediction from Rothe et al (Rothe
et al., 2018) architecture use to evaluate the ability of the model.
Age prediction is used to predict age from a generated face image. 5. Conclusion
If an image from an age group is predicted to be an image in the
same age group as the target age group, it’s called a correct predic- The results of the evaluation using k-NN and SVM classification
tion, and if predicted as a different age group, we called this a false method show that reducing the number of residuals in the gener-
prediction. The accuracy is calculated from the number of correct ator part shows an increase in the accuracy of identification of
predictions divided by the number of predictions was happen. identities in synthetic images. Residual layers provide an aging
Evaluation of the aging process, we use age prediction. The pro- effect on the architecture, reducing the number of the residual
posed architecture shows has 4.5% worst performance, (31.00% layer can be reduced to the ability of the architecture to create
accuracy in proposed architecture compared to 32.48% accuracy an aging effect, but this factor is not significant. This statement
in IPCGAN architecture, but statistical tests these differences do proves by decreased age prediction from 32.48% to 31.0% (model
not show significant results (p-value: 0.754). This decrease is 1). Decreased accuracy in age prediction proves eliminating the
caused by the number of residuals being reduced from 6 residual residual layer will reduce the ability to create the aging effect
layers to only 1 residual layer, this decrease is due to the proposed but increase the ability to preserving identity.
architecture, the opportunity is reduced to carry out the process of This statement proved by accuracy increase in proposed model
generating aging effect features. Aging effect feature that is useful 1 (modified on Alexnet classifier and generator), accuracy increase
for synthesizing facial images in the desired age group. The effect from 83.4% (IPCGAN) to 86,9% (model 1) using k-NN classification,
aging feature can be improved by strengthening the classifier sec- increase from 86.1% (IPCGAN) to 89.2% (model 1) using SVM clas-
tion which is actually in charge of leading the architecture to get sification, increased from 86.3% (IPCGAN) to 93.7% (model 1) using
the effect aging feature. face verification (augmented data), increase from 56.5% (IPCGAN)
Particularly, when the author strengthens the part of the age to 71.2% (model 1) using face verification (non augmented data.
classifier to increase the aging effect, action to replace three Reducing the number of residual layers also speeds up the training
max-pooling in Alexnet age classifier with a convolutional layer process.
7244
H. Pranoto, Y. Heryadi, Harco Leslie Hendric Spits Warnars et al. Journal of King Saud University – Computer and Information Sciences 34 (2022) 7236–7246

Strengthen the ability of Age classification will increase the K. Gregor, I. Danihelka, A. Graves, D. J. Rezende, and D. Wierstra, ‘‘DRAW: A
recurrent neural network for image generation,” 32nd Int. Conf. Mach. Learn.
ability to create an aging effect for architecture in general. when
ICML 2015, vol. 2, pp. 1462–1471, 2015.
in proposed architecture strengthen by replacing three max- D. J. Im, H. Ma, C. D. Kim, and G. Taylor, ‘‘Generative Adversarial Parallelization,” pp.
pooling layers in Alexnet Age classifier with a convolutional layer, 1–20, 2016.
improve aging pattern creation ability. Reducing the max-pooling Isola, P., Zhu, J.-Y., Zhou, T., Efros, A.A., 2017. Image-to-Image Translation with
Conditional Adversarial Networks. In: IEEE Conference on Computer Vision and
layers prevents important information loss which is important Pattern Recognition (CVPR), pp. 5967–5976.
for creating an aging pattern. Johnson, J., Alahi, A., Fei-Fei, L., 2016. Perceptual losses for real-time style transfer
In a face aging architecture, the ability to affect aging is inver- and super-resolution. Lect. Notes Comput. Sci. (including Subser. Lect. Notes
Artif. Intell. Lect. Notes Bioinformatics) vol. 9906 LNCS, 694–711.
sely proportional to the ability to preserve identity. However, the Kemelmacher-Shlizerman, I., Suwajanakorn, S., Seitz, S.M., 2014. Illumination-
aging effect ability can be reduced by strengthening the age classi- aware age progression. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern
fier section. By strengthening this classification ability, it will Recognit., 3334–3341
Lanitis, A., Taylor, C.J., Cootes, T.F., 2002. Toward automatic simulation of aging
increase the ability of the aging effect. The proposed architecture effects on face images. IEEE Trans. Pattern Anal. Mach. Intell. 24 (4), 442–455.
utilizes an improved age classifiers to lead the architecture to A. Lanitis, ‘‘FG-NET Aging Database.” Oct-2010.
shape the face at the intended age. This can be seen in the results Levi, G., Hassner, T., 2015. Age and gender classification using convolutional neural
networks. In: 2015 IEEE Conf Comput. Vis. Pattern Recognit. Work., pp. 34–42.
of research that made changes to the classifier, the ability of pre- Q. Li, Y. Liu, and Z. Sun, ‘‘Age Progression and Regression with Spatial Attention
diction accuracy increased from 32.48% in IPCGAN to 33.20% (pro- Modules,” arXiv, 2019.
posed, modified on age classification only. Li, P., Hu, Y., Li, Q., He, R., Sun, Z., 2018. Global and local consistent age generative
adversarial networks. Proc. - Int. Conf. Pattern Recognit. vol. 2018-Augus, 1073–
We also evaluate the speed of training in the model training, the
1078.
results show that the training speed from the proposed architec- C. Liu, J. Yuen, A. Torralba, J. Sivic, and W. T. Freeman, ‘‘SIFT Flow: Dense
ture is 12.77 s faster (191.37 s in IPCGAN subtract by 179.25 s in Correspondence across Different Scenes,” vol. 1, no. 1, pp. 28–42, 2008.
the proposed architecture). This increase in processing speed is Liu, Y., Li, Q., Sun, Z., 2019. Attribute-aware face aging with wavelet-based
generative adversarial networks. Proc. IEEE Comput. Soc. Conf. Comput. Vis.
caused by a reduction in the number of residual layers contained Pattern Recognit. vol. 2019-June, 11869–11878.
in the generator, which reduces the number of parameters in the S. Liu et al., ‘‘Face Aging with Contextual Generative Adversarial Nets,” Proc. 2017
architecture so that the computation of the architecture is reduced. ACM Multimed. Conf. - MM ’17, pp. 82–90, 2017.
Mao, X., Li, Q., Xie, H., Lau, R.Y.K., Wang, Z., Smolley, S.P., 2017. Least squares
From the results of the evaluation, it can be concluded that: generative adversarial networks. Proc. IEEE Int. Conf. Comput. Vis. vol. 2017-
eliminates residuals layers speeds up the training process because Octob, 2813–2821.
of the reduction in the number of parameters, but reducing the M. Mirza and S. Osindero, ‘‘Conditional Generative Adversarial Nets,” pp. 1–7, 2014.
S. Moschoglou, C. Sagonas, and I. Kotsia, ‘‘AgeDB : the first manually collected , in-
ability to produce the aging effect. Residuals help architecture to the-wild age database,” 2006.
produce images for the target age groups. Too many residual make Palsson, S., Agustsson, E., Timofte, R., Van Gool, L., 2018. ‘‘Generative adversarial
slower training speed. style transfer networks for face aging. In: IEEE Comput. Soc. Conf. Comput. Vis.
Pattern Recognit Work. vol. 2018-June, pp. 2165–2173.
Panis, G., Lanitis, A., Tsapatsoulis, N., Cootes, T.F., 2016. Overview of research on
Declaration of Competing Interest facial aging using the FG-NET aging database. IET Biometrics 5 (2), 37–46.
Patterson, E., Ricanek, K., Albert, M., Boone, E., 2006. Automatic representation of
adult aging in facial images. Int. Conf. Vis. Imaging, Image Process., 171 C176.
The authors declare that they have no known competing finan-
Perarnau, G., Van De Weijer, J., Raducanu, B., Álvarez, J.M., Csiro, D., 2016.
cial interests or personal relationships that could have appeared ‘‘Invertible Conditional GANs for image editing”, no. Figure 1, 1–9.
to influence the work reported in this paper. S. Reed, Z. Akata, X. Yan, and L. Logeswaran, ‘‘Generative Adversarial Text to Image
Synthesis,” 2016.
Ricanek Jr., K., Tesafaye, T., 2006. MORPH: A longitudinal image Age-progression, of
Acknowledgment normal adult. Proc. 7th Int. Conf. Autom. Face Gesture Recognit.
Rothe, R., Timofte, R., Van Gool, L., 2018. Deep expectation of real and apparent age
No funding for this research. from a single image without facial landmarks. Int. J. Comput. Vis. 126 (2-4),
144–157.
Rowland, D.A., Perrett, D.I., 1995. Manipulating Facial Appearance through Shape
and Color. IEEE Comput. Graph. Appl. 15 (5), 70–76.
References Schroff, F., Kalenichenko, D., Philbin, J., 2015. FaceNet: A unified embedding for face
recognition and clustering. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern
Recognit. vol. 07–12-June, 815–823.
Antipov, G., Baccouche, M., Dugelay, J.L., 2018. Face aging with conditional
Sharma, S., Mehra, R., 2019. Implications of Pooling Strategies in convolutional
generative adversarial networks. Proc. - Int. Conf. Image Process. ICIP vol.
neural networks: A Deep Insight. Found. Comput. Decis. Sci. 44 (3), 303–330.
2017-Septe, 2089–2093.
Y. Shen, B. Zhou, P. Luo, and X. Tang, ‘‘FaceFeat-GAN: a Two-stage approach for
B. C. Chen, C. S. Chen, and W. H. Hsu, ‘‘Cross-age reference coding for age-invariant
identity-preserving face synthesis,” arXiv, 2018.
face recognition and retrieval,” Lect. Notes Comput. Sci. (including Subser. Lect.
Shu, X., Tang, J., Lai, H., Liu, L., Yan, S., 2015. Personalized age progression with aging
Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 8694 LNCS, no. PART 6, pp.
dictionary. Proc. IEEE Int. Conf. Comput. Vis. vol. 2015 Inter, 3970–3978.
768–783, 2014.
Song, J., Zhang, J., Gao, L., Liu, X., Shen, H.T., 2018. Dual conditional GANs for face
D. Deb, D. Aggarwal, and A. K. Jain, ‘‘Child Face Age-Progression via Deep Feature
aging and rejuvenation. IJCAI Int Jt. Conf. Artif. Intell. vol. 2018-July, 899–905.
Aging,” arXiv, 2020.
Suo, J., Zhu, S.C., Shan, S., Chen, X., 2010. A compositional and dynamic model for
Despois, J., Flament, F., Perrot, M., 2020. AgingMapGAN (AMGAN): High-resolution
face aging. IEEE Trans. Pattern Anal. Mach. Intell. 32 (3), 385–401.
controllable face aging with spatially-aware conditional GANs. In: Lect. Notes
X. Tang, Z. Wang, W. Luo, and S. Gao, ‘‘Face Aging with Identity-Preserved
Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes
Conditional Generative Adversarial Networks,” in 2018 IEEE/CVF
Bioinformatics) vol. 12537 LNCS, pp. 613–628.
Conference on Computer Vision and Pattern Recognition, 2018, vol. 9, no. 1,
C. N. Duong, K. G. Quach, K. Luu, T. H. N. Le, M. Savvides, and T. D. Bui, ‘‘Learning
pp. 7939–7947.
from Longitudinal Face Demonstration - Where Tractable Deep Modeling Meets
Wang, W. et al., 2016. Recurrent face aging. In: 2016 IEEE Conf. Comput. Vis. Pattern
Inverse Reinforcement Learning,” Nov. 2019.
Recognit., pp. 2378–2386.
Duong, C.N., Luu, K., Quach, K.G., Bui, T.D., 2016. Longitudinal face modeling via
Z. Wang, X. Tang, W. Luo, and S. Gao, ‘‘Face Aging with Identity-Preserved
temporal deep restricted Boltzmann machines. Proc. IEEE Comput. Soc. Conf.
Conditional Generative Adversarial Networks,” in Proceedings of the IEEE
Comput. Vis. Pattern Recognit. vol. 2016-Decem, 5772–5780.
Computer Society Conference on Computer Vision and Pattern Recognition,
Eidinger, E., Enbar, R., Hassner, T., 2014. Age and gender estimation of unfiltered
2018, vol. 9, no. 1, pp. 7939–7947.
faces. IEEE Trans. Inf. Forensics Secur. 9 (12), 2170–2179.
Wang, W., Yan, Y., Cui, Z., Feng, J., Yan, S., Sebe, N., 2019. Recurrent face aging with
Fang, H., Deng, W., Zhong, Y., Hu, J., 2020. Triple-GAN: Progressive face aging with
hierarchical autoregressive memory. IEEE Trans. Pattern Anal. Mach. Intell. 41
triple translation loss. In: IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit
(3), 654–668.
Work, pp. 3500–3509.
K. Xu et al., ‘‘Show, attend and tell: Neural image caption generation with visual
Gatys, L., Ecker, A., Bethge, M., 2016. A neural algorithm of artistic style. J. Vis. 16
attention,” 32nd Int. Conf. Mach. Learn. ICML 2015, vol. 3, pp. 2048–2057, 2015.
(12), 326. https://doi.org/10.1167/16.12.326.
Yang, H., Huang, D., Wang, Y., Jain, A.K., 2018. Learning face age progression: A
Geng, X., Fu, Y., Miles, K.S., 2010. Automatic facial age estimation. In: 11th Pacific
pyramid architecture of GANs. In: Proc. IEEE Comput. Soc. Conf. Comput. Vis
Rim Int Conf. Artif. Intell., pp. 1–130.
Pattern Recognit., pp. 31–39.
I. J. Goodfellow et al., ‘‘Generative Adversarial Networks,” pp. 1–9, 2014.

7245
H. Pranoto, Y. Heryadi, Harco Leslie Hendric Spits Warnars et al. Journal of King Saud University – Computer and Information Sciences 34 (2022) 7236–7246

X. Yao, G. Puy, A. Newson, Y. Gousseau, and P. Hellier, ‘‘High resolution face age Zhu, J., Park, T., Isola, P., Efros, A.A., 2017. Unpaired image-to-image translation
editing,” arXiv, 2020. using cycle-consistent adversarial networks. In: IEEE International Conference
Z. Zhang, Y. Song, and H. Qi, ‘‘Age progression/regression by conditional adversarial on Computer Vision (ICCV), pp. 2242–2251.
autoencoder,” Proc. - 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR
2017, vol. 2017-Janua, pp. 4352–4360, 2017.

7246

You might also like