Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3157617, IEEE Access

Date of publication XXXX 00, 0000, date of current version XXXX 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.Doi Number

Recent Generative Adversarial Approach in


Face Aging And Dataset Review
Hady Pranoto1 2, Yaya Heryadi1, Harco Leslie Hendric Spits Warnars1 , and Widodo
Budiharto2
1
Computer Science Department, BINUS Graduate Program Doctor of Computer Science, Bina Nusantara University, Jakarta, Indonesia
2
Computer Science Department, School of Computer Science, Bina Nusantara University, Jakarta, Indonesia

Corresponding author: Hady Pranoto (e-mail: hadypranoto@binus.ac.id).


This paragraph of the first footnote will contain support information, including sponsor and financial support acknowledgment. For example, “This work was
supported in part by the U.S. Department of Commerce under Grant BS123456.”

ABSTRACT Many studies have been carried out in the field of face-aging, from approaches that use pure
image process algorithms, to approaches that use generative adversarial networks. In this paper, we provide
a review of a classic approach to an approach using a Generative Adversarial Network. Discuss Structure,
formulation, learning algorithm, challenge, the advantages, and disadvantages of the algorithms contained
in each proposed algorithm with systematic discussion. Generative Adversarial Networks is an approach that
gets the status of the art in the field of face aging by adding an aging module, making special attention to the
face part, and using an identity preserving module to preserve identity. In this paper, we also discuss the
database used in facial aging, along with its characteristics. A dataset which used in the face aging process
must have such criteria: (1) has a fair enough age group in the dataset, each age group must have a small
range, (2) has a balanced distribution of each age group, and (3) has enough number of face images.

INDEX TERMS Face recognition, Image Generation, Image Database, Face aging dataset, Deep
Generative Approach, Generative Adversarial Network

I. INTRODUCTION various limitations when modeling non-linear aging


processes.
Face aging has recently attracted the attention of the Face aging across ages becomes very challenging because
computer vision community, a variety of approaches ranging faces change over time is not linear. The linear method can not
from pure algorithms in the field of computer graphics to solve this problem. However, a modern approach, the Deep
approaches that use deep learning architectures are proposed. Generative Method (DGM) for face modeling and mapping of
Several successes have been reported, from approaches that the aging process, achieves a state-of-the-art in face aging
use theory in anthropology to approaches that use deep approach. Deep learning algorithms have a better ability to
learning. interpret and transfer non-linear aging features. Several faces
In general, approaches in age progression are classified aging studies on how to produce superior synthetic image
into four categories: (1) modeling; (2) reconstruction, (3) results as in [2][3][4][5][6][7], including the Generative
prototyping, and (4) deep learning approaches. The first to the Adversarial Networks method.
third category approaches usually use a simulation of the aging Inspired by a paper that obtained state-of-the-art results, in
process from facial features, by use (a) adopting this paper we aim to provide a review of the current Generative
anthropometric knowledge[1]; or (b) representing face Adversarial Networks in face aging progression and the
geometry and an appearance by setting it through conventional dataset used in that method. In this paper, each method will
parameters. Example methods using this approach are Active discuss both structurally and their formulation. In this paper,
Appearance Models (AAMs), 3D Morphable Model we will cover in general the Conventional approach and more
(3DMM). Even with many kinds of research getting inspiring deeply the Generative Adversarial Networks approach.
results, the representation of this method is still linear and has Discuss how they work in producing face synthetics, and the

VOLUME XX, 2017 1

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3157617, IEEE Access

public dataset used. Paper structure is built-in sequential in Generative Adversarial Network because the success factor
manner: discuss dataset used in face aging and their of this model to get a robust model, depends on the
characteristics, and the distributions, approaches used in face characteristic of the dataset. Currently face aging dataset is
aging and also discusses challenge and opportunity in face still limited in terms of age labels and the number of available
aging if we add more information to the dataset. datasets.
MORPH - Album 1 dataset [8], containing 1,690 grayscale
II. FACE AGING DATASET portable gray maps (PGM) images from 515 individuals.
Metadata this dataset metadata has information about subject
Collecting a dataset related to face aging is a challenge. identifier, date of birth, picture identifier, image date, race,
Several criteria conditions must be met in the process of facial hair flag, age differences, glasses flag, image filename,
collecting them. First, each subject/identity in the dataset must gender. Age group distribution in this dataset is not balanced,
have images of different ages and cover a long age range. But dominated by age of 18-29.
this is not an absolute requirement, because in some research MORPH - Album 2 dataset [8] consists of 55,000 unique
we found some face aging approaches don't need sequence images from 13,000 subjects, ages 16 to 77, with an average
images of the same person at different ages and can still age of 22. Has information about subject identifier, race,
produce the aging pattern. Itt is important to discuss the dataset picture identifier, date of birth, image date, age differences.
Table 1. Properties of different face aging datasets in the wild, dataset collected from unconstrained environment condition.

Database #Images #Subject Label Aligment Subject Type Clean Dataset Distribution
Type Label
MORPH – Album 1[8] 1,690 628 Age Frontal Non-famous Yes <18(±157), 18-29(±985),
30-39(±415), 40-49(±111),
50+(±22)
MORPH – Album 2[8] 55,134 13,000 Age Frontal Non-famous Yes <20(±7,469), 20-
29(±16.3225), 30-
39(±15.357), 40-
49(±12,050), 50+(±3.993)
FG-NET[9] 1,002 82 Age In-the-wild Non-famous Yes <20(±710), 20-29(±144),
30-39(±79), 40-49(±46),
50+(±23)
AdienceFaces[10][11] 26,580 2,284 Age In-the-wild Non-famous Yes 0-2(±1,427), 4-6(±2,162),
groups 8-13(±2,294), 15-
20(±1,653), 25-
32(±4,897), 38-
43(±2,350), 48-53(±825),
60+(±869)
CACD[12] 163,446 2,000 Age In-the-wild Celebrities No 0-10(0), 10-19(±7,057),
20-29(±39,069), 30-
39(±43,104), 40-
49(40,344), 50-
59(±30,960), 60+(±2,912)
IMDB-WIKI[13] 523,051 20,284 Age In-the-wild Celebrities No 0-10(±2), 11-20(±6), 21-
30(±15), 31-40(±45), 41-
50(±18), 51-60(±7), 61-
70(±3), 71-80(±2), 81-
90(1),91-100(1)
Age-DB[14] 16,488 568 Age In-the-wild Celebrities Yes 0-10(±1), 11-20(±4), 21-
30(±15), 31-40(±30), 41-
50(±20), 51-60(±13), 61-
70(±10), 71-80(±4),81-
90(±2),91-100(1)
UTKFace[4] >20,000 - Age In-the-wild Non-famous Yes 0-10(±3,300), 10-
19(±1,600), 20-
29(±7,400), 30-
39(±4,500), 40-
49(±2,200), 50-
59(±2,300), 60+(±2,750)

VOLUME XX, 2017


2 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3157617, IEEE Access

image film number, race, gender, facial hair flag, glasses flag. grayscale images. The age distribution is dominated by age
Age group distribution is more balanced compared to Morph groups of 30-40, and 40-50, slightly balanced, and the number
Album 1 dataset, balanced distribution on the dataset causes a of images enough to face aging architecture learn the age
face aging architecture easy to learn aging patterns from each pattern.
age group, and the number of face images from each age group UTKFace[4] dataset is a large-scale face dataset with a
must sufficient to get the age group pattern. long age span (range from 0 to 116 years old), containing more
FG-Net[9] is a dataset consisting of 1,002 face images than 20,000 images, with information about age, gender, and
from 82 subjects, has information on Identity Id, and age. The ethnicity. UTKFace dataset is in the wild with variation in
age distribution is not balanced, not separately well, has a very pose, expression, occlusion, illumination, and resolution. Age
small number of images, is dominated by the young age group, distribution of quite balanced, dominated by the age group 10-
and is not easy to get an aging process pattern. 19 years.
AdienceFace dataset[10][11] consisting of 26,580 face An unbalanced dataset distribution makes architecture
images from 2,984 subjects. Grouped into 8 age group labels difficult to find the good aging patterns of each age group. The
(0-2, 4-6, 8-12, 15-20, 25-32, 38-43, 48-53, 60+), and has a ability of the architecture to get aging patterns from each age
gender and identity label. The age groups distribution of this group depends on the number of images in each group. Not
dataset is dominated by young age labels/groups. sufficient number of faces image make architecture does not
The Cross-Age Celebrity Dataset (CACD)[12] consists of get enough pattern information from age groups images.
163,446 facial images from 2000 celebrities from 2004-to The characteristic of the dataset and age distribution of
2013, the dataset has information: name, id, year of birth, several existing face aging datasets are summarized in Table 1
celebrity ranking, LBP features from 16 facial landmarks. Age and Figure 1. Sample image for each dataset in Figure A.1.
groups distribution is slightly balanced but still dominates the
age 20 to 60 years. III. FACE AGING METHOD
IMDB-WIKI[13] dataset contained 523,051 images from
20,284 subjects, taken from IMBD, and Wikipedia. The A. Conventional Approach
dataset contains information on the date of birth, taken date,
1. Model-based approach
gender, face location, face score, second face score, celeb
name, celeb id, age calculated using taken date, and date of The early research approach in age progression utilizes an
birth. The distribution of age groups in this dataset is appearance model to represent the shape of the face structure
dominated by 20-30 and 30-40 age groups and unbalanced at and face texture in the input face images. The aging process is
young age groups and old age groups. represented by an aging function that applies parameter sets of
AgeDB[14] dataset contained 16,488 images from 568 different age groups into the input image. Patterson et al [15]
subjects, taken and annotated manually, to ensure the age and Lanitis et al [16] uses Active Appearance Models (AAMs)
labels are clean. The AgeDB is an in-the-wild dataset, the to simulate adult craniofacial aging in images using two
average number of images per subject is 29. Dataset has image approaches, (1) estimating age in an input image, and (2)
information: id, subject name, and ages. Dominate by

(a) (b) (c) (d)

(e) (f) (g) (h)


Figure 1 Dataset distribution (a) MORPH Album 1; (b) MORPH Album 2; (c)FG-NET; (d) AdienceFaces; (e) AGE-DB; (f) IMDB-
WIKI (g) CACD (h) UTK-Face.

VOLUME XX, 2017


3 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3157617, IEEE Access

shifting the active appearance model parameters in directional A prototype of average face estimation or means face [20]
aging axis. for each age group is applied to the input face image to
Active Appearance Models (AAMs) presented an generate new faces in the desired age group. To produce a
anthropological perspective on the active appearance model good synthetic image, precise alignment is needed so as not to
to facial aging and gave an effect to face recognition for the produce a plausible synthetic image. This method produces a
first time. relightable age subspace, a novel technique for subspace to
A different study was proposed by Geng et al. [17] which subspace alignment that can handle photo "in the wild", with
introduced the Aging Pattern Subspace (AGES) approach. a variant in illumination pose, and expression, and enhance
Gang et al. constructed a subspace representation of the aging realism for older subject output as a progressed image from an
pattern as a chronological sequence of face images, using input image. This method used a collection of head and up
Principal Component Analysis (PCA). Finally, a facial torso images, which has the effort to get the image. The
synthesis will result in a certain age group by applying a prototype approach to facial aging is limited to producing
variance aging effect on facial appearance. Used a face model aging patterns and lost the global understanding of the human
transformation among all less than one-year-old subjects in the face such as personality information, and possibly facial
database, instead use a cardioid strain transformation or global expressions. The sharper average face is also introduced in
aging function. The main problem encountered for this [21].
architecture is aging transformation is the difficulty in
constructing a suitable training set, a sequential age 3. Reconstruction-base approach
progression from a different individual.
Suo et al [18] introduced a dynamic and compositional This approach focuses on how to find the aging pattern of
model for facial aging. Representing each face from each age each age group and combine them, this dictionary is used to
group into a hierarchical “And-Or” graph model. The “And” convert from an input image into a synthetic faces image of
node is used to decompose the facial image into several parts the targeted age group.
to define facial details such as hair, wrinkles, etc., a pattern Coupled Dictionary Learning (CDL)[22] a method using
that is crucial for defining the perception of age. This reconstruction-based approach models the personalized aging
compositional and dynamic model seeks to build a perception pattern by preserving the personalized features of each
of aging and at the same time maintain the identity of an input individual, by formulating a short-term aging photo.
facial image by combining several parts of different faces with Collecting the long-term dense-facing sequences is difficult.
different aging effects. This method has problems when it A person always has a dense short-term face aging photo, but
lacks individual facial sequences and requires retraining when not in a long-term aging photo, covering all age groups.
additional individual facial sequences are added. The model Bi-level Dictionary Learning-based Personalized Age
had to be retrained to improve the weight parameters that had Progression (BDLPAP) method [23], automatically renders an
been generated. aging face personally, using short-term face aging pairs. A
In the research conducted by Suo et al.[18], the model was person's face can be composed of a personalization layer and
not trained for a long lifespan. The resulting model also still an aging layer in the face aging process. using the aging
leaves ghosting artifacts on the resulting synthetic face which invariant pattern that was successfully obtained using Coupled
is caused by the difficulty in performing precise alignment Dictionary Learning (CDL), a dictionary from captures aging
between the model and the original image. characteristics, which learning Personality-aware formulation
In the model-based approach, face structure, and texture and short-term coupled learning. individual characteristics
changes such as muscle changes, hair color, and wrinkles are such as a mole, birthmark, permanent scar, etc. represented in
modeled with parameters. The general form of face aging a face aging sequences {𝑥 , 𝑥 , … . , 𝑥 } on their aging
modeling is very difficult to find, this causes difficulties in dictionaries.
applying this facial aging model to certain faces, and there is
a possibility that the resulting facial model is not suitable for B. Deep Generative Model for Face Aging Approach
certain faces. Finding this parameter requires a lot of training
samples and requires large computational costs. This a. Temporal Restricted Boltzman Machine-based
mismatch and difficulty in performing precise alignment make model(TRBM).
it difficult to produce realistic images of aging faces without
losing identifying information. Temporal Restricted Boltzman Machine-based model
utilizing the embedding of temporal relationships between
2. Prototype Approach sequences of facial images. Duong et al. [2] proposed an Age
Progression using the log-likelihood objective function and
The basic idea of the prototype approach [19] is to apply ignoring the 𝑙 reconstruction error in the training process. The
differences between the age groups to the input face image to model can efficiently taking the non-linear aging process and
produce a new image in the desired age group. can automatically produce the sequential development of each

VOLUME XX, 2017


4 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3157617, IEEE Access

age group in greater detail. A combination of TRBM and function of embedding 𝒢. The framework of Temporal Non-
RBM has produced a model to simulate the age variation Volume Preserving can see in Figure 2.
model and transform the embedding. In this approach, linear
and non-linear interacting structures can exploit faces images. c. Generative adversarial Networks Approach
Facial wrinkles are increased along with geometric constraints
in post-processing for more consistent results. But by this Goodfellow et al.[24] proposed a new architecture that has
method, wrong results may still be produced. the name Generative Adversarial Network (GAN). Borrow the
idea of a pair game between generator and discriminator. The
b. Recurrent Neural Network-based model. generator tries to generate sample data that can deceive the
discriminator to consider the data as the original data, and the
Wang et al. [3] proposed a Recurrent Neural Network discriminator learns how to discriminate the data produced by
(RNN) and utilizes two-layer gated-recurrent-units(GRU) to the generator as real or fake data. The process continued until
model aging sequences. A recurrent connection between the discriminator can distinguish generated data is real or fake
hidden units can efficiently utilize information from the data.
previous layer and the “memory” is obtained from the A GAN can produce an object image from a given noise,
previous layer through recurrent connection to create smooth depending on the given training data. To generate an artificial
transitions between age groups in the process of synthesizing face image, a GAN must be trained with the face image. A
new faces. GAN can produce certain characteristics[25] in the resulting
This approach gets improvements over the classic object by assigning conditions to architecture[26], by applying
approach, which performs a fixed reconstruction that always a residual block, to make the change in the aging pattern can
results in a blurry and unclear image. The combination of happen[27]. GAN also can create a new synthetic image from
facial structure and facial aging modeling into a single unit the input image, not from scratch or noise to speed the
using RBMS makes a non-linear change that can be generation process[28].
interpreted properly and efficiently and produce a wrinkle, Step by step age process, changed by GAN into a direct
texture model for each age group with a more consistent age process[4][26][29]. Adding an aging pattern such as
output. But this method requires a large amount of training sideburn, wrinkles, eye bags, gray hair, and structure change
data to produce a robust and general model. such as the structure of the head, enlarged eyes, shrunken chin
Temporal Non-Volume Preserving (TNVP) was directly implement when the aging or rejuvenation process
introduced by Duong et al. [38]. this method use embedding happens.
feature transformation between faces in consecutive stages, GAN can be categorized into three categories: translation-
comparing CNN structure. This method uses an empirical based, sequence-based, and condition-based. The translation-
balance threshold and Restricted Boltzman Machine, TNVP based method is based on how to transfer style from one set
approach guarantee the architecture is intractable model domain of an image into another set domain of the image.
density with the ability to exact inference between faces in Cycle GAN[30] is one translation-based method, which
consecutive stages. captures style characteristics from a set of an image to be
The TNVP model has advantages from the architectures to implemented in another set of images. CycleGAN does not
improve generate image quality and highly non-linear feature require a paired set of two sets of domains, this advantage can
generation. The objective function TNVP can formulate as: be utilized in face aging, to translate images from one age
group to another. CycleGAN is only able to translate two age
𝑧 = ℱ (𝑥 ; 𝜃 ) groups in pairs, which is a drawback of this architecture. In
𝑧 = ℋ(𝑧 ; 𝑥 ; 𝜃 , 𝜃 ) (1)
figure 2, the author illustrated how CycleGAN worked to
= 𝒢(𝑧 ; 𝜃 ) + ℱ (𝑥 ; 𝜃 )
generate an image 𝑦 in domain Y from image 𝑥, and cycle
consistency process try restore image 𝑦 to image x which is in
Where ℱ , ℱ represent the bijection function mapping 𝑥
domain X. Forward process and back-forward process, and
and 𝑥 to their latent variables 𝑧 , 𝑧 , correspondingly. The
cycle consistency loss from this process enable architecture to
aging transformation between latent variables performs by the
get style map from both domains. A clear understanding
Temporal Non-Volume Preserving Temporal Non-Volume Preserving Temporal Non-Volume Preserving translation-based method framework can see in Figure 2.
Pz ( z i )
The sequence-based model is implemented step by step
Pz ( z1 ) Pz ( z 2 | z 1 , x 2 ) Pz ( z i 1 | z i , x i 1 ) Pz ( z n 1 ) Pz ( z n | z n 1 , x n )

process, and each model is trained independently, to produce


a sequential translation between two neighboring age groups.
Px ( x1 )
Px ( x 2 | x1 ) Px ( x i )
Px ( xi 1 | x i )
Px ( x n1 )
Px ( x n | x n 1 ) In this model, the translation process is carried out
sequentially, and each resulting model will be combined into
Input Age Group Age Group Age Group Age Group
a complete network. The output from the current network i-th
Age Group Age Group
image 1 2 i i+1 n-1 n
will be employed as input in the next network i+1th. The
Figure 2. Temporal Non-Volume Preserving Framework sequence-based method is trained to produce faces of each age

VOLUME XX, 2017


5 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3157617, IEEE Access

group in each stage. The challenge in this method is to produce C. Generative adversarial Networks Approach
images of an individual sequentially and completely. The
framework can see in Figure 3. 1. Translation-based Approach
The conditional-based method[31] uses a conditional in its
architecture to produce an artificial face image in a certain age a. Generative Adversarial Style Transfer Networks for
group. The conditional is in the form of a label that is made Face Aging
into a one-hot encoder tensor. This one-hot code tensor is used
to force the network to generate a synthetic face image at the Style transfer in cycleGAN[30] is used in a face aging
desired age. The location of this one-hot encoder in a varied architecture by implementing an architecture style transfer
architecture, some are placed on the generator and [37]. By utilizing cyclic consistency among the input image
discriminator[6], or there is also an architecture that only puts and the generated image, face aging can maintain the identity
this one-hot encoder on the generator section only[32]. of the generated image still the same as the input face. The
The concept of the conditional-based is how to drive the aging effect is produced by utilizing the mapping functions of
generator by providing extra information on the age target that this style transfer architecture, to find minimize the value of
you want to produce, this method has a significant advantage age loss and cycle consistency loss in equations 2:
and high efficiency compared to the translation-based
ℒ (𝐺 , 𝐺 ) = 𝔼 ~ ( )𝔼 ( ) [|𝐷𝐸𝑋(𝐺 (𝑥 ), 𝑘) − 𝐷𝐸𝑋(𝑥 ) − 𝑘
approach and sequence-based approach. The conditional- ~
+ |𝐷𝐸𝑋(𝐺∓ (𝑥 ), 𝑘) − 𝐷𝐸𝑋(𝑥 ) + 𝑘|]
based method framework. (𝐺 , 𝐺 ) = 𝔼 ~ ( ) 𝔼 ~ (2)
ℒ ( ) |𝐺 (𝑥 ) − 𝑥 |
The one-step approach to facial aging is still a top priority,
+ |𝐷(𝐺 (𝑥 ), 𝑘) − 𝑥 |
and there are still many challenges to producing synthetic
faces for certain age groups using only one training
course[33], and the current face aging method used “one-shot” Stability in training is produced using LSGAN loss which
and achieves state-of-the-art[34][35][36]. Much research on is formulated in equation 3.
current generative face aging categories is illustrated in Figure
ℒ (𝐺, 𝐷) = 𝔼 ) [(𝐷(𝑥 ) − 1) ]
Appendix A3. ~ (

+𝔼 𝐷 𝐹(𝑥 ) (3)
~ ( )

And to create the aging effect and preserve the identity


they set up the final objective function as equation 4:

ℒ(𝐺 , 𝐺 , 𝐷) = ℒ (𝐺 , 𝐺 , 𝐷) + ℒ (𝐺 , 𝐺 ) + ℒ (𝐺 , 𝐺 )
(4)
+ ℒ (𝐺 , 𝐺 )

The style transfer method as contained in the cycleGAN


Figure 3. Translation-based Approach
utilized pairwise training between age groups, to create the
artificial face images at the desired age. CycleGAN
RNN or Cycle GAN RNN or Cycle GAN framework can see in Figure 2.
Output
Image
b. Triple-GAN: Progressive Face Aging with Triple

Triple translation loss was utilized in Triple Generative


Input Adversarial Network (Triple-GAN) by Fang et al [36] to
image RNN or Cycle GAN RNN or Cycle GAN model the strong interrelationship between the aging patterns
Figure 4. Sequence-based Approach in different age groups. Ability to learn the mapping between
labels offered by multiple training pairs used triple translation
loss as formulated as:
Tℒ = 𝐺(𝑥, 𝐿 ) − 𝐺 𝐺 𝑥, 𝐿 , 𝐿 (5)

[90..116] Discriminator Generated three kinds of face images 𝐺(𝑥, 𝐿 ), 𝐺 𝑥, 𝐿


and
Real or Synthetic?
Generator
𝐺 𝐺 𝑥, 𝐿 , 𝐿 and all synthesized faces used in identity
Age
Synthetic
Image
Classification preservation and age classification.
Input L2 loss [90..116] Loss
Penalizes large The final objective function can be formulated as:
image deviations from
[0..10] target age
Increase pixel ℒ = 𝛼ℒ + 𝛽ℒ + 𝛾ℒ + 𝜆ℒ
wise similarity (6)
ℒ = 𝛼ℒ
Figure 5. Conditional-based Approach

VOLUME XX, 2017


6 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3157617, IEEE Access

Where 𝛼, 𝛽, 𝛾 and 𝜆 are values to control weight four children which can be used to find children who are victims of
objectives. Triple transformation loss reduces the distance human trafficking. This framework is illustrated in Figure 7.
between synthetic faces that are of the same age target by
producing the image in different paths. The triple-GAN Style Encoder
framework can see in Figure 6. Style Feature L pix
Identity Preservation Age Classifier
Lf
ID Encoder
ID Feature
Lt Identi ty
Enrollment
Lf
Lid
Preservation

age : 18
Enrollment age :18 Feature
Age Classifier
Aging
Identity Preservation Target age : 35 Module
Generator concatenation
Target age : 35 Backward
L fam propagation
Forward
Lt propagation
Real data Share weight
Figure 7. Child Age Progression Framework
Lf

Discriminator 2. Sequence-based Approach


Figure 6. TRIPLE GAN Framework
a. FaceFeat GAN, two stages a Two-Stage Approach
for Identity-Preserving Face Synthesis

c. Child Face Age-Progression via Deep Feature FaceFeat GAN [40] solve problems in terms of
Aging
preserving identity with two stages of synthesis, namely
feature generation and feature to image rendering. The first
Deb et al.[39] proposed a feature aging module that can
stage works in the feature domain to produce a synthesis of
simulate age progress deep face feature output by a face
various facial features and the second stage works in the image
matcher to guide age progression in image spaces. Synthesize
domain to render realistic photos with high diversity and
age faces, enhance longitudinal face recognition without
maintain identity information.
requiring any explicit training. This model can increase close-
To reconstruct the input image to produce a more accurate
set face recognition for a 10-year time-lapse and enhance the
pixel-wise, and extractor {𝐸 } implement the real features
ability to identify young children who are possible victims of
{𝑓 } of the input image 𝑥 face and recognition face 𝑥
child trafficking or abduction. Instead of separating an age-
which will extract the identity feature 𝑓 . The identity feature
related component from an identity feature, they would like to
will be used as input to the 𝐺 generator to make 𝑥 generate
automatically learn a projection within latent space.
an 𝑥 image and become the 𝐷 input discriminator as a
Let us assume 𝑥 ∈ 𝑋 and 𝑦 ∈ 𝑌, 𝑋 and 𝑌 are two face
synthetic image. Generator 𝐺 will try to learn to map from
domains when images acquired age 𝑒 and 𝑡, where 𝑒 is source
feature space to image space under identity preserving
age and 𝑡 is age targe. Domain 𝑋 and 𝑌 have differences in
constraints. The real feature 𝑓 extracted by 𝐸 is used by the
aging and has also differences in noise, quality, and pose. This
discriminator 𝐷 to force the 𝐺 generator to produce realistic
architecture simplifies ℱ modeling transformation in a deep
features.
feature by ℱ operator, and formulated as:
Generator 𝐺 and discriminator 𝐷 trained with this function:
𝑦 = ℱ (𝜓(𝑥), 𝑒, 𝑡) = 𝑊 × (𝜓(𝑥) ⊕ 𝑒 ⊕ 𝑡) + 𝑏 (7)
min ℒ = ∅ (𝑥 )+∅ (𝑥 , 𝑥 ) + 𝜆 ∅ (𝑥 )
⊝ (8)
Where function 𝜓(𝑥) is a function to encode features in latent + 𝜆 ∅ (𝑥 )
min ℒ = ∅ (𝑥 ) − 𝜆 ∅ (𝑥 ) − 𝜆 ∅ (𝑥 ) (9)
space. Function ℱ learn the feature space projection and ⊝

generate an image 𝑥 in 𝑋 with 𝑌 age feature from source age


𝑒 to age target 𝑡. Representation lies in d-dimensional Where ∅ (𝑥 , 𝑥 ) = |𝑥 − 𝑥 | , 𝑙 reconstruction
Euclidean space, which 𝒵 is highly linear. Output of ℱ is a loss, and ∅ (. ) is the loss function to measure identity
linear shift in deep space, 𝑊 ∈ ℝ and 𝑏 ∈ ℝ , learned the preserving quality. ∅ is an energy function to determine
parameter of ℱ and ⊕ is concatenation in the layer. Scale facial features face is real or fake. ∅ (. ) to determine the
parameter permits the feature to scale directly from the generated face is real or fake. And 𝜆 , 𝜆 , 𝜆 , 𝜆 and 𝜆 is
registration source age and target age, since the feature does the value of strength in a different term.
not change extremely during the aging process, such as
wrinkle, or color of the eye. This architecture has the
advantage of projecting the face of aging in young people or
VOLUME XX, 2017
7 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3157617, IEEE Access

Face Feat utilized Identity Feature and 3DMM feature. 1


Γ ∗ = arg max ℒ( 𝜍 Γ) = log 𝑃(𝜍 ) (11)
Identity feature 𝑓 from input image 𝑥 uses the face 𝑀

recognition module as a classification task with cross-entropy
loss. 3DMM feature is a 3D morphable model used to model SDAP optimizes the trackable look-like hoot objective
2D images in 3D with a basic set of shapes 𝐴 and a set of function with a convolutional neural network based on a deep
basic expressions 𝐴 Uses a two-stage generator where the neural network, providing appropriate facial aging
output of the first generator will be the input for the second development on individual subjects by optimizing the reward
generator. By applying these two generator levels, aging process. This method allows multiple age images as
competition between the two generators to produce synthetic input. By considering all information from a subject of various
images has a diverse pattern in the output image. For clear ages, seeks an optimal aging path for a given subject to
understanding, the Face Feat framework is illustrated in Figure produce an efficient face aging process by utilizing the power
8. of the generative probabilistic model under the IRL approach
in an advanced neural network. For a clear understanding of
Stage 1 Stage 2
the Subject-dependent Deep aging path (SDAP) model, the
z1 G1f f1r
framework is illustrated in Figure 9.
f1s D1f

z 2 G2f f 2r
D2f E1 f r
f 2s 2

E2 f 2r xr
f kr
zk Gkf f ks D k
f

xr
G I
DI
Ek f kr
r
x
G D Eid f id G I DI
f id
Figure 8. Face Feat GAN Framework

Figure 9. Subject-dependent Deep aging path (SDAP)


b. Subject-dependent Deep aging path (SDAP) model
Framework

Additional aging controller used in TNVP structure at


research by [7], based on the hypothesis that each individual 3. Conditional-based Approach
has their facial development. Rather than simply embedding
aging transformations into pairs that are linked between a. Conditional Adversarial Autoencoder(CAAE)
successive age groups, the Subject Dependent Aging
Policy(SDAP) structure studies age transformations within the Conditional Adversarial Autoencoder(CAAE)[4],
entire facial sequence to produce better synthetic ages. SDAP adopting the conditional GAN approach, adds an age label as
Network in it is architecture to ensure the availability of an a conditional encode to lead GAN to produce a synthetic face
appropriate planning aging path to produce a face aging at a certain age from an input image. CAAE changes have
controller related to subject features. It is important to note that been made to the objective functions contained in the original
SDAP is a pioneer in the IRL framework concerning Age GAN to:
min max 𝜆ℒ 𝑥, 𝐺(𝐸(𝑥), 𝑙) + 𝛾𝑇𝑉 𝐺(𝐸(𝑥), 𝑙)
progression. ,

SDAP in its approach uses 𝜍 = {𝑥 , 𝑎 , … , 𝑥 } as age + 𝔼 ∗~ ( ) [log 𝐷 (𝑧 ∗ )]


sequence of 𝑖 − 𝑡ℎ subject, where {𝑥 , … , 𝑥 } are faced +𝔼 ~ ( ) [log 1 − 𝐷 𝐸(𝑥) ] (12)
sequences representative of the face development of 𝑖 − 𝑡ℎ +𝔼 ,~ ( ,) log 𝐷 (𝑥, 𝑙)
subject and as a control variable for how much the aging effect +𝔼 ,~ ( ,) log 1
will be added to an image 𝑥 to became 𝑥 . Probability from −𝐷 𝐺(𝐸(𝑥), 𝑙)
𝜍 can formulated use energy function 𝐸𝑟(𝜍 ):

1 Where 𝑙 is the vector representing age level, 𝑧 is the latent


𝑃(𝜍 ) = exp −𝐸𝑟(𝜍 ) (10)
𝑍 feature vector, 𝐸 is the decoder function, for example, 𝐸(𝑥) =
𝑧. ℒ(. , . ) and 𝑇𝑉(. ) are ℓ norm and total variation total
Where 𝑍 is partition function, is similar to the joint distribution function which is effective for removing ghosting artifacts.
between variables of RBM. The goal is can predict 𝑎 for each The coefficients 𝜆 and 𝛾 are intended for a balance between
𝑥 while synthesized images. SDAP objective function can be smoothness and high resolution. The latent vector is used to
formulated as: personalize face features and age conditions to control
progression by studying the learning manifold. Makes the age
progression/regression more flexible and easy to manipulate.

VOLUME XX, 2017


8 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3157617, IEEE Access

Framework of. Conditional Adversarial Autoencoder(CAAE) process more focused and convergent. This framework is
can see in Figure 10. illustrated in Figure 11.
L2 Loss
c. Contextual Generative Adversarial Nets (C-GANs)
32x32x128 8x8x512
64x64x6
16x16x256
8x8x1024 16x16x512 64x64x128
32x32x256
Generator G
128x128x64
Contextual Generative Adversarial Nets(C-GANs)[27],
Input
Face
4

Conv_2
Input consist of a conditional transformation network and two
Face
128x128x3 Conv_1
Encoder E
Conv_3
Conv_4
Reshape Deconv_1
FC_1 FC_2 reshape
Deconv_3
Deconv_2 Deconv_4 128x128x3 discriminative networks. This conditional imitates the aging
64
Discriminator on Z-- Dz
16x16x256
Discriminator Dimg procedure with several specially designed residual blocks. The
pattern discrimination in this architecture is the best part, a task
32 1x1xn
16 128x128x3 64x64x64 8x8x512
32x32x128
1
1

to distinguish real transition patterns from fake ones, guide the


Input
FC_4 Face
Z or
p(z) FC_3 Conv_2 FC_2

synthetics face to fit with the real conditional distribution, and


FC_2 Conv_1 Conv_3 Conv_4 Reshap
FC_1 FC_1
Label e

extra regularization for the conditional transformation


Figure 10 Conditional Adversarial Autoencoder(CAAE)
network, ensuring the image pairs fit the real transition pattern
distribution.
b. Age Conditional Generative Adversarial Neural
Transition pattern discriminative network
Network Model 𝐷 𝑥 , 𝑥 , 𝑦, 𝑦 + 1 has a task to transition patterns
between 𝑥 with age at 𝑦 to the image 𝑥 at next age group
𝑦 + 1. 𝐷 the task is to distinguish the real joint distribution
Input face x of age y0 z0 z* Latent vector Face Aging 𝑥 , 𝑥 , 𝑦~𝑃 𝑥 , 𝑥 , 𝑦 from the fake one and forced
approximation z Resulting face xtarget
Encoder
E
Identity
Preserving
Optimizations
of age “60+” the generator to obey the real transition pattern distribution
Generator
G when generating fake pair 𝑥 , 𝐺 𝑥 , 𝑦 + 1 . By considering
Initial reconctruction x0 of age y0 Optimized reconctruction
of age y0
x
“60+”
the loss in the conditional transformation in the generator 𝐺,
y0
Generator
G
y0
Generator
G age discriminative 𝐷 , transition pattern network 𝐷 , the
Figure 11. Age Conditional Generative Adversarial objective of the function is formulated as:
Neural Network Model
min max max 𝐸 𝜃 , 𝜃 , 𝜃 = 𝐸 + 𝐸 + 𝜆𝑇𝑉

The conditional GAN(cGAN)[25] produces one-to-one =𝐸 , ~ , log 𝐷 𝑥 , 𝑦


image translation by implementing certain characteristics by +𝐸 ~ , ~ log 1 − 𝐷 (𝐺(𝑥, 𝑦), 𝑦)
embedding a condition. Antipov, Baccouche, and Dugelay +𝐸 , , ~ , , log 𝐷 𝑥 , 𝑥 , 𝑦
[26] proposed an Age Conditional Generative Adversarial 1
+ 𝐸 , ~ log 1 − 𝐷 𝑥 , 𝐺 𝑥 , 𝑦 + 1 , 𝑦
(14)
,
Network (acGAN) by implementing cGAN, which is capable 2
1
of generating a synthetic face image in the required age + 𝐸 , ~
2 , log 1 − 𝐷 𝐺 𝑥 , 𝑦 − 1 , 𝑥 , 𝑦 − 1
category. This method, reconstruct an input into a new face at + 𝜆 𝑇𝑉 𝐺 𝑥 , 𝑦 − 1 + 𝑇𝑉 𝐺 𝑥 , 𝑦 + 1 + 𝑇𝑉 𝐺(𝑥, 𝑦)
a certain age group and preserving identity from face image
utilized latent vector optimization. Euclidean distance
between input image 𝑥, and reconstructed image 𝑥̅ , by Age
minimized Euclidean distance embedding from 𝐹𝑅 (𝑥) and Discriminator
Real/Fake
Face?
𝐹𝑅 (𝑥̅ ) used to preserving identity. AcGAN create a new face Image Real
Age Discriminative
with high quality, the resulting image will be optimized using Image
Network
a latent vector along with the adversarial process of the
acGAN. Age
With input face image 𝑥 with age 𝑦 , an optimal latent Fake
Transformer
vector 𝑧 ∗ find to allow reconstructed face 𝑥̅ = 𝐺(𝑧 ∗ , 𝑦 ) Image
Image
generated as close as possible to the initial one, with a given Fake
target age 𝑦 , new synthetic face image generated by Pair Conditional
Transformation Network
𝑥 = 𝐺 𝑧∗, 𝑦 with switching the age at the input
generator. the objective function in acGAN is as follows: Fake
Paired Pair
Image
min max 𝑣(𝜃 , 𝜃 ) = 𝔼 , ~ [log 𝐷(𝑥, 𝑦)]
(13) Real/Fake
Discriminator Face Pair?
+𝔼 ~ ( ), ~ [log 1 − 𝐷(𝐺(𝑧, 𝑦), 𝑦) ] Age
Transition Pattern
Discriminative Network
The placement of conditionals in this architecture will Figure 12 Contextual Generative Adversarial
ensure that the architecture can produce changes in the aging Nets(C-GANs)
pattern on the resulting synthetic facial and make the modeling

VOLUME XX, 2017


9 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3157617, IEEE Access

d. Dual Conditional GAN for Face Aging and For a clear understanding of the dual conditional GAN
Rejuvenation framework, we illustrated it in Figure 13.

Song et al. [41] proposed a novel dual conditional GAN e. Identity-Preserved Conditional Generative
mechanism, which enables face aging and rejuvenation, Adversarial Network (IPCGANs)
training multiple sets without identity with different ages. This
framework contained two conditional GAN, primal GAN IPCGANs [28] has three modules consisting of a
transform a face into other ages, based on age condition, and conditional GAN (cGAN) module to generate synthetic faces
dual GAN or second network learn how to inverse the task. according to the expected age target 𝐶 and produce 𝑥̈ photos
using the loss function construction error identity can that look real as a result of syntheses that successfully model
preserving. The generator learns transition patterns such as face aging at age stages in a short period of time, and the two
shape and texture between groups to create a realistic face preserved-identity modules which ensure 𝑥̈ has the same
photo. With multiple training images 𝐹 , 𝐹 , … , 𝐹 with identity as 𝑥, and a classifier module [42] that pushes an image
differences ages, the network learn face aging or rejuvenation 𝑥̈ to the desired target age. IPCGAN conditional module part
model 𝐺 , and face reconstruction model 𝐺 with a facial adopts the derived from cGAN architecture from Isola et al.
image 𝑋 with age of 𝐶 and target age condition 𝐶 . This [25].
model can predict the face 𝐺 (𝑋 , 𝐶 ) of a person at differences With a generator, a synthetic image generated from the
ages, and identity still preserving by 𝑋 = 𝐺 (𝐺 (𝑋 , 𝐶 ), 𝐶 ). input image with age condition 𝐶 . Discriminator 𝐷 will
This framework uses adversarial loss (𝐿 ), generation loss ensure generated images real or fake images. Generator 𝐺
(𝐿 ), to match data distribution and similarity between tasks to increase the probability that generated image is
synthetic image and original image, with this mechanism mistaken by discriminator 𝐷 as the original image 𝐷(𝑥|𝐶 ).
framework can produce a specific synthetic image at a certain Discrimator taks to align 𝐶 label input to the generated image.
age. Reconstruction loss (𝐿 ) used to evaluate To make this task successfully, the objective function is
consistency between synthetics images, and original ones, formulated as:
with this mechanism, identity can be preserved. 𝐿 ,𝐿
and 𝐿 formulated as: min max 𝑉(𝐷, 𝐺) = 𝔼 ~ ( ) [log 𝐷(𝑥|𝐶 )] + 𝔼 ~ ( ) [1
(17)
− log 𝐷(𝐺(𝑦|𝐶 ))]
𝐿 = 𝐸 , ~[log(1 − 𝐷 (𝐺 (𝑋 , 𝐶 ), 𝐶 ))]
+𝐸 ~ [log(𝐷 (𝑋 , 𝐶 ))] IPCGAN uses Least Square Generative Adversarial
+𝐸 , ~ log(1 − 𝐷 (𝐺 𝑋 , 𝐶 , 𝐶 ))
(15) Network (LSGAN) [43] in discriminator force generated
+𝐸 ~ [log(𝐷 (𝑋 , 𝐶 ))]
𝐿 = ‖𝐺 (𝐺 (𝑋 , 𝐶 ), 𝐶 ) − 𝑋 ‖ + 𝐺 𝐺 𝑋 , 𝐶 , 𝐶 − 𝑋
images that look real and difficult to distinguish. The LSGAN
𝐿 = ‖𝐺 (𝑋 , 𝐶 ) − 𝑋 ‖ + 𝐺 𝑋 , 𝐶 − 𝑋 conditional can be formulated as follows:

1
And final objective: 𝐿 =
2
𝔼 ~ ( ) [( 𝐷(𝑥|𝐶 ) − 1) ]
1 (18)
+ 𝔼 ~ ( ) 𝐷 𝐺(𝑦|𝐶 )
2
𝐿(𝐺 , 𝐺 , 𝐷 , 𝐷 ) = 𝐿 (𝐺 , 𝐷 , 𝐺 , 𝐷 ) + 𝛼 𝐿 (𝐺 , 𝐺 )
(16)
+𝛽𝐿 (𝐺 , 𝐺 )
at generator loss (𝐿 ):
Where 𝛼 and 𝛽 are hyperparameters to balance the objective 1
function. 𝐿 =
2
𝔼 ~ ( ) [(𝐷(𝐺(𝑦)|𝐶 )) − 1) ]
(19)

Random
Age condition 0 1 0 0 0 0 0 0 0
To preserve identity IPCGAN uses perceptual loss instead
selection Real image
Age group
Fake/real?
of adversarial loss. Adversarial loss makes generated sample
discrminator
data following desired distribution, the resulting image can
Input image Age condition 0 1 0 0 0 0 0 0 0
Generated image
any person within age target. Perceptual loss Identity Loss
(𝐿 ) formulated as:
Recons Generated image
loss Reconstruction Recons
Age condition 0 0 0 1 0 0 0 0 0
image loss
𝐿 = |ℎ(𝑥) − ℎ(𝐺(𝑥|𝐶 )| (20)
∈ ( )
Reconstructed
image Random Input image
selection
discrminator Age group
Where ℎ(. ) related to an extracted feature by a specific
Fake/real
Age condition
layer in a pre-trained neural network. This function loss does
0 0 1 0 0 0 0 0 0

Figure 13. Dual Conditional GAN for Face Aging and not use Mean Square Error (MSE) for calculating losses
Rejuvenation between image 𝑥 and generated age face 𝐺(𝑥|𝐶 ) in pixel
space, because the generated face has a change in hair color,
side-burn, wrinkle, gray hair, etc, which causes a big
VOLUME XX, 2017
10 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3157617, IEEE Access

difference between the input images 𝑥 and generated face.


MSE loss will force generated face 𝐺(𝑥|𝐶 ) identical image ℒ =𝔼 𝐷 𝐺 𝐼 ,𝛼 −1 +𝔼 𝐷 (𝐼 ) − 1
with image 𝑥. IPCGAN uses proper layer ℎ(. ) to preserve
+𝔼 𝐷 𝐺 𝐼 ,𝛼
identity, an experiment in style transfer show lower feature
layer is good to preserve the face content (identity), and the +𝔼 𝐷 𝐺 𝐼 ,𝛼 −1 (23)
higher layer to preserve styles such as face texture and
+𝔼 𝐷 𝐼 −1
wrinkle, identity information certainly should not change.
In IPCGAN, age classification uses to force generated age +𝔼 𝐷 𝐺 𝐼 ,𝛼
face in desired age target. If the resulting generated face is in
the correct category, the architecture gives a small penalty, To penalize the difference between an input image uses
vice versa. Age classification loss was used for this, and reconstruction loss formulated as:
formulated as:
ℒ =𝔼 𝐺 𝐺 𝐼 ,𝛼 ,𝛼 −𝐼
(24)
𝐿 = ℓ(𝐺(𝑥|𝐶 ), 𝐶 ) +𝔼 𝐺 𝐺 𝐼 ,𝛼 ,𝛼 −𝐼
(21)
∈ ( )

Where ℓ(−) is a softmax loss, in back-propagation age Activation loss uses total activation of the attention mask,
classification loss 𝐿 will force the parameter to change, formulated as:
and ensure the generated face is in the right age or age group.
To produce a new face at the same age and identity using ℒ =𝔼 𝐺 𝐼 ,𝛼 +𝔼 𝐺 𝐼 ,𝛼 (25)
the conditional GAN (cGAN), a final objective function is
formulated as follows: To force the generator to reduce the error between
estimated age ages and target age used Age Regression Loss
𝐺 = 𝜆 𝐿 +𝜆 𝐿 + 𝜆 𝐿 (22) formulated as:

Where 𝜆 is a control to want as far as the input image gets ℒ =𝔼 𝐷 𝐺 𝐼 ,𝛼 −𝛼 +𝔼 𝐷 𝐼 −𝛼 +


old. And 𝜆 and 𝜆 are controlling the extent to which we want
𝔼 𝐷 𝐺 𝐼 ,𝛼 −𝛼 (26)
to store the identity information and how far, the resulting
image falls to the appropriate age group. +𝔼 |𝐷 (𝐼 ) − 𝛼 |
The advantage of this method is that the architecture is trained
in one go to be able to produce a model that can synthesize By optimizing this equation, the auxiliary regression
new faces in many groups in one model. For clear network 𝐷 gain the age estimation ability and the generator
understanding IPCGAN framework is illustrated as: 𝐺 encourage to produce fake face at desired age target.
The final loss function in this model is a linear
combination of all defined losses. Formulated as follow:
Random Face Image
>=50 age group 0.11 0.22 0.30
ℒ= ℒ +𝜆 ℒ +𝜆 ℒ +𝜆 ℒ (27)
0.230.11
0.960.22
0.410.30
0.330.23
0.450.96
0.660.41
0.11 0.22 0.30
0.230.11
0.960.22
0.410.30
0.11 0.22 0.30
0.230.11
0.960.22
0.410.30
Disciminator 0.230.11
0.11
0.11 0.22
0.960.22
0.22 0.30
0.30
0.330.23
0.450.96
0.66
0.410.300.41

Where 𝜆 ,𝜆 , and 𝜆 are coefficients to balance each


0.33 0.45 0.66
(D)
0.330.23
0.450.96
0.660.41
0.33 0.45 0.66 0.230.11
0.960.22
0.410.30
0.220.11
0.110.33 0.300.22 0.30 0.230.11
0.960.22
0.41
0.45 0.660.41 0.110.33 0.300.660.30
0.220.45
0.960.23
0.230.11 0.410.96 0.23 0.96 0.41
0.22
0.33 0.300.66
0.45 0.960.23
0.230.11 0.410.96
0.22
0.41
0.300.66
0.23 0.96 0.41 0.33 0.45
Generator (G)
loss.
0.11 0.22 0.30 0.23 0.96 0.41

Fake or real
0.11 0.22 0.30
0.23 0.96 0.41
0.23 0.96 0.41

Input Image
Generated Image >=50 year By adding a spatial attention mechanism to the
10-20 year Alexnet
old Age
old architecture, the learning process in training only focuses on
h(G ( x | Ct ))  h( x))
Classifier the part that is the focus of attention, strengthening the
Entropy loss attention part has a significant impact on the aging pattern that
Figure 14. IPCGAN Framework is formed. The effects of aging or rejuvenation will be more
formed in the parts that are of concern.

f. Age Progression and Regression with Spatial


Attention Modules

Spatial attention mechanisms exploit in Li et al.[44][45],


to restrict image modification to areas closely related to age
changes, make the image has high visual fidelity when
synthesized in the wild cases. This model uses adversarial loss
is formulated as:
VOLUME XX, 2017
11 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3157617, IEEE Access

Age Progression and Regression with Spatial Attention Where D is discriminator real, fake regulator.
Modules framework illustrated as: Finally, the objective function for this framework can be
formulated as:
Iy Age Condition
Ao
Age Condition
Ao Reconstructed
' min ∑ 𝜆 𝑙 +𝜆 𝑙 +𝑙
Input image
Age
image
Io , ,{ }
Progressor
Age
Progressor (32)
Gp Output image Output image
Gp min 𝑙
, ,{ }
Recons loss Recons loss
Age Age
Progressor Gp ( I y ,  o ) Gr ( I o ,  o ) Progressor
Gr
Reconstructed
Gr
Input image
Where 𝜆 and 𝜆 are hyperparameters to balance the losses
I
image
' Age Condition Ay Age Condition
Ay Io and optimize two objectives. S2GAN framework is illustrated
y
Regresion cycle
Young > Old > Young
Regresion cycle
Old > Young > Old
in Figure 16.
D1p Real/Fake?
L1
G p ( I y , o ) Dp Age Estimation
Aging
Auxiliary Classifier Dp Dp transform
Io Discriminator Bi w1 10+

Dr1 Bi w2 20+
Real/Fake? Bi w3
Res deconvs 30+
convs res
Gr ( I o ,  o ) Dr Dr Age Estimation Bi w4
Bi w5 40+
Auxiliary Classifier Bi  [bi1 , bi 2 ,..., bim ]
Iy Discriminator Dr 50+
Personalized
Figure 15. Age Progression and Regression with Spatial basis
S2 module Real/Fake?
Attention Modules & Age Loss
Figure 16. S2GAN: Share Aging Factor Across Age and
Share Aging Trends Among individual Framework

g. S2GAN: Share Aging Factor Across Age and Share


Aging Trends Among individual h. Automatic Face Aging In Video via Deep
Enforcement Learning
He et al. [46] proposed a continuous face aging favorable
accuracy, identity preserving, and fidelity by using Duong et al. [47] proposed a novel approach to the
interpretation (coefficient) of any pair of adjacent groups. synthesis of automatically age-progressed facial images in a
Consist of three parts: (1) personalized aging basis; (2) video sequence using deep enforcement learning. Modifying
transformation basis age representation; and (3) representation face structure and longitudinal face aging process on a given
age face decoder. subject, across video frame. Deep enforcement learning
Personalized aging basis, processes each individual dominated Garantie visual identity from input face preserved.
by a personalized aging factor using neural network encoder The embedding function ℱ , map 𝑋 into latent
E. map input image to personalized 𝐵 = [𝑏 , 𝑏 , … , 𝑏 ] representation ℱ(𝑋 ), high-quality synthesis image, which
using 𝐵 = 𝐸(𝑋 ). Given aging basis 𝐵 can obtain age two main properties: (1) linearity separable and (2) detail
representation using an age-specific transform, formulated by: preserving. Age progression can be interpreted as linear
transversal from the younger region ℱ(𝑋 ) toward older
𝑟 = 𝑊 𝑏 =𝐵𝑤 (28)
region ℱ(𝑋 ); using the formula:
To make this framework can achieve its aims they use the
following objective: Age loss for accurate face aging which ℱ (𝑋 ) = ℳ ℱ 𝑥 ; 𝑋 :
=ℱ 𝑋 + 𝛼∆ | :
(33)
formulates as:
𝑙 =− log(𝐶 (𝑥 )) (29) :
Where ∆ | learning from neighbors containing only
Where 𝐶 (. ) denote the probability that a sample falls into k- aging effect only without the presence of other factors, i.e
th age group, predicted by the classifier C. identity, pose, etc., estimated by:
L1 Loss for identity preservation is formulated as:
𝑙 = 𝛿(𝑦 = 𝑘) ||𝑥 − 𝑥 || (30) 1
| :
∆ = ℱ (𝒜(𝑥, 𝑥 )) − ℱ (𝒜(𝑥, 𝑥 )) (34)
𝐾
Adversarial Loss for image fidelity is formulated as: 𝒩 𝒩

𝑙 = max(1 − 𝐷(𝑥 , 𝑦 ), 0) + max(1 + 𝐷 𝑥 , 0)


(31)
𝑙 = −𝐷(𝑋 , 𝑘)

VOLUME XX, 2017


12 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3157617, IEEE Access

Framework Automatic face aging in video via deep GENERATOR


enforcement can see in Figure 17:
Residual
CAN CAN CAN Block CAN
Automatics proces 1st
Input frame
Young age Old age Adversarial Loss Conditioned-Attention Normalization
young age regions
DISCRIMINATOR
External Data
regions
Feature 1*2048 Normalized
Frame 1 Activation
mapping
Synthesis Backbone 1*4 +
1*1
Components
Agediff
Element-wise Sum
.
Matric Multiplication
. Element-wise product W 1*2048
Input frame Convolutional
Convolutional || W ||
young age Transpose
Contribution-Aware Age Classifier Instance
.

Feature
Automatics proces i-th Synthesis
Average
Pooling
(CAAC) Normalization
Frame n
mapping Components
Figure 18. Conditioned-Attention Normalization
Figure 17 Automatic Face Aging In Video via Deep GAN (CAN-GAN)
Inforcement Learning

j. Hierarchical Face Aging through Disentangled


i. Conditioned-Attention Normalization GAN (CAN- Latent Characteristics
GAN)
Lie et al [49] proposed a Disentangled Adversarial Auto
Shi et al., [48] introduced a Conditioned-Attention Encoder (DAAE) to disentangle face images into three
Normalization GAN (CAN-GAN), an architecture for age independent factors: age, identity, and extraneous information.
synthesis from input face images by leveraging the aging A hierarchical conditional generator by passing disentangled
differences between two age groups, to capture a face aging identity age embedding into high layer and low layer with
region with different attention factors. This architecture can class conditional batch normalization to prevent loss of
freely translate from input face to aging face at certain age identity and age information used in this architecture.
groups with strong identity preservation satisfying aging effect Disentangled adversarial learning mechanism introduces
and authentic visualization. CAN-GAN layer is designed to boost age quality progression.
increase aging-related information on face, when smoothing By employing age distribution DAAE can accomplish
unrelated information by using an attention map. create face synthetic with the effect of face aging at an
CAN-GAN in training used formulation for adversarial arbitrary age. Architecture learned from given an input image,
loss as follow: learned about age posterior distribution, and treated as an age
𝐿 = 𝐸 [𝐷 (𝑥)] − 𝐸 , [𝐷 (𝐺(𝑥, 𝑎𝑔𝑒 ))]
(35)
estimator. DAAE can efficiently and accurately estimate the
− 𝜆 𝐸 ̅ [(||∇ ̅ 𝐷 (𝑥)|| − 1) ] age distribution.
Where x denotes the real image, and 𝑥̅ is sampled between DAAE consisting two components, the inference network E
pairs of real and synthetic images. Construction loss used to and the hierarchical generative network 𝐺, 𝑋 , 𝑋 and 𝑋
construct an image in age target formulated as: denote as the real sample, reconstruction sample, and new
sample. This framework based on the original variational
𝐿 = ||𝑥 − 𝐺(𝑥, 𝑎𝑔𝑒 = 0)|| (36)
autoencoder (VAE)[50] and inspired by IntroVAE[51], aims
to have the ability to self estimate age accuracy, identity
And optimized generator G by 𝐿 for optimizing 𝐷
preserving, and image quality. To preserve identity and age
and G when generating synthetics images to target a group by
𝐿 = 𝐸 , [− log 𝐷 (𝑐|𝑥)]; accurately, two regulations were used for generator G, and
(37) formulated:
𝐿 =𝐸 , − log 𝐷 𝑐 𝐺(𝑥, 𝑎𝑔𝑒 ) ;
Where c and 𝑐 denote to original age class label and age ℒ
( )
=
1
||𝑍 − 𝑍 || +
1
||𝑍 − 𝑍 ||
target class label. And final losses are formulated as: 𝐶 𝐶
𝐿 =𝐿 + 𝜆 𝐿
1 1 (39)
(38) ( )
𝐿 =𝐿 +𝜆 𝐿 +𝜆 𝐿 ℒ = ||𝑍 − 𝑍 || + ||𝑍 − 𝑍 ||
𝐶 𝐶
Where 𝜆 and 𝜆 is trade-off parameters.
By paying attention to the important parts of the face that Where 𝑍 and 𝑍 are inferred representation generated image
characterize the change in age, this architecture is more 𝑋 , and 𝑍 and 𝑍 are inferred to generate an image 𝑋 .
assertive in defining the attributes of changes that age at the To avoid a blurry image in VAE they use an inference network
desired age. E and generator adversarial and formulate as:
( ) ( )
ℒ = ℒ (𝜇 , 𝜎 )+
( )
∝ 𝑚−ℒ (𝜇 , 𝜎 )
( )
(40)
+ 𝑚−ℒ (𝜇 , 𝜎 )
( )

( )
= ℒ (𝜇 , 𝜎 ) + ℒ ( )
(𝜇 , 𝜎 )
VOLUME XX, 2017
13 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3157617, IEEE Access

Discriminator style GAN with minibatch standard deviation


Where m is positive margin, ∝ is weight coefficient, and has a task to discriminate the last layer, to force synthesis
(𝜇 , 𝜎 ), (𝜇 , 𝜎 ), and (𝜇 , 𝜎 ) computed from real data 𝑋 , image to produce in age target 𝑡.
reconstruction sample 𝑋 and new sample 𝑋 . In the training scheme, they process source image cluster s into
The final objective of this framework are: target cluster t where s≠t and perform three forward phases:
𝑌 = 𝐺(𝑋, 𝑍 )
( ) ( ) ( ) 𝑌 = 𝐺(𝑋, 𝑍 ) (43)
ℒ = ℒ +𝜆 ℒ + 𝜆 ℒ + 𝜆 ℒ 𝑌 = 𝐺(𝑌 , 𝑍 )
( ) ( ) ( ) (41)
ℒ = ℒ +𝜆 ℒ + 𝜆 ℒ + 𝜆 ℒ
Where 𝑌 is generated image in target age t, 𝑌 is a
reconstructed image at source age s, and 𝑌 is a cycle
reconstructed image at source age s, generated from the
Identity
classifier
Z generated image at age target t.
They use a conditional adversarial loss ℒ to generate
an image at the target age t; self reconstruct loss ℒ and cycle
Concat

Concat

Concat

Zt
consistency loss ℒ to force generator to learn identity
 E
EE ~ N (0,1) Attention

translation, and they formulated as:


Gblock
Gblock

Gblock

Gblock

Gblock

Gblock

ZE
Split

X E ℒ (𝐺, 𝐷) = 𝐸 , [log 𝐷 (𝑥)] + 𝐸 , 1 − log 𝐷 𝑌


E A ~ N (0,1)
A Xr ℒ (𝐺) = ||𝑥 − 𝑌 || (44)
ZA
ℒ (𝐺) = ||𝑥 − 𝑌 ||
Concat

Concat

Concat

A
And to keep identity preserving they use:
Age conditional batch ℒ (𝐺) = ||𝐸 (𝑋) − 𝐸 (𝑌
normalization
)|| (45)
Figure 19. Hierarchical Face Aging through Disentangled
Latent Characteristics Framework Age vector loss to correct embedding of a real and
generated image, by penalizing distance between age encoder
output with age vector 𝑍 and 𝑍 which generated a sample by
k. Lifespan Age Transformation Synthesis the generator. Age vector loss formulates as:
ℒ (𝐺) = ||𝐸 (𝑋) − 𝑍 )|| + ||𝐸 (𝑌 ) − 𝑍 )|| (46)
Or-El et al. [52] proposed a framework to addresses the
problem of single photo age progression and regression. A
Framework optimized by:
prediction of how a person in the future, or look in the past, a
min max ℒ (𝐺, 𝐷) + 𝜆 ℒ (𝐺) + 𝜆 ℒ (𝐺) + 𝜆 ℒ (𝐺)
novelty multi-domain image to image generation using GAN. (47)
Learned latent space mode a continuous bidirectional aging +𝜆 ℒ (𝐺)
process, trained to approximate continuous age transformation How the framework worked can see in figure 21.
from 0 to 70-year-olds, modifying shape and texture with six
anchorage classes. generator

This framework based on GAN contains a conditional


Identity feature

generator and a single discriminator. This conditional Identity


encoder
Decoder

generator was responsible for transition among age groups. Input age Output
30-39
Consisting of three parts: identity encoder, mapping network, age 15-19

and decoder. This framework preserves identity by encoding


identity and age in a separate path. Input age

A single generator is used for all ages, consisting of an Target age Input
code
Mapping
identity encoder, a latent mapping network, and a decoder. 15-19 code
generator network

The training process uses an age encoder to embed both real


and generate images into age latent space. The decoder takes generator generator
an age with identity feature and generates an output image Identity Identity

with age style using convolutional block. In general generator feature feature

mapping from input image, target age vector 𝑧 into output Age code
Output
age 15-19
Age code
Reconstructi
on

image 𝑧 and formulate as: Age 30-39

Input age Lage Age


encoder
discriminator Ladv
𝑦 = 𝐺(𝑋, 𝑍 ) = 𝐹(𝐸 (𝑋), 𝑀(𝑍 )) (42) 30-39

Figure 20. Lifespan Age Transformation Synthesis


Age encoder mapping of input image 𝑋 into correct
location vector space 𝑍, produces an age vector that is
correspondent with the target age group.

VOLUME XX, 2017


14 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3157617, IEEE Access

l. Progressive Face Aging with Generative 1


Adversarial Network (PFA-GAN) ℒ = 𝐸 [𝐷([(𝑋 , 𝜆 : ); 𝐶 0]) − 1] (52)
2

Huang et al.[53], proposed a Progressive Face Aging And age estimation loss for progression face aging can
using a Generative Adversarial Network(PFA-GAN) to formulate as:
remove ghost artifacts that arise when the distance between 1
ℒ = 𝐸 [ |𝑦 − 𝑦| + ℓ(𝐴(𝑋)𝑊, 𝐶 )] (53)
age groups becomes large. The network is trained to reduce 2
accumulated artifacts and blurriness. Consists of several For identity consistency to keep the identity and discard
subnetworks whose job is to imitate the aging process, from unrelated information they adapt mixed identity loss a
young to old, or vice versa. Learning is like a specific aging Structural Similarity (SSIM) and formulated into three losses
effect of two adjacent age groups. Age estimation loss is used function ℒ , ℒ , and ℒ .
to increase aging accuracy, and Pearson Correlation is used as Final loss for generative loss and discriminative loss
an evaluation metric to calculate aging smoothness. formulate as equation 50.
The architecture has 4 generator subnetworks. each
subnetwork 𝐺 is responsible for generating aging faces from ℒ = 𝜆 ℒ + 𝜆 ℒ + 𝜆 ℒ
𝑖 to 𝑖 + 1, Consists of a residual skip connection, a binary 1 1 (54)
ℒ = 𝐸 [𝐷([𝑋; 𝐶]) − 1] + 𝐸 [𝐷([𝐺(𝑋 , 𝜆 : ); 𝐶 )]
gate 𝜆 and a subnetwork 𝐺 . The binary gate 𝜆 can control 2 2
the aging flow and can decide which aging mapping the
subnetwork should be involved in. Each subnetwork can And Progressive Face Aging with Generative Adversarial
formulate as 𝑋 = 𝐺([𝑋 ; 𝐶 ]), and progressive aging Network (PFA-GAN) framework can be illustrated as:
framework can be formulated as:

𝑋 = 𝐺 ∘𝐺 ∘⋅⋅⋅∘ 𝐺 (𝑋 ) (48)
G1
1
Where the symbol ∘ is the function composition. Xi
<=30
G1 31-40
To prevent the architecture from producing the same
image as the input image, a residual skip connection is added
G2
to each subnetwork, to produce an identity mapping from 2
input to output. By adding this connection and binary gate into G2 40-50

the subnetwork the change from age groups 𝑖 to 𝑖 + 1 can


rewrite as: G1
3
𝑋 = 𝐺 (𝑋 ) = 𝑋 + 𝜆 𝐺 (𝑋 ) (49) G3 >=50

Figure 21. PFA-GAN Framework


Where 𝜆 ∈ {0,1} is the binary gate that controls if the
subnetwork 𝐺 elaborate on the path to age groups. 𝜆 = 1 if
the subnetwork 𝐺 is among source age groups 𝑠 and target
m. Age Flow: Conditional Age Progression and
age group 𝑡, i.e 𝑠 ≤ 𝑖 < 𝑡 otherwise 𝜆 = 0. Tensor C used in
Regression with Normalizing Flow.
cGAN based method converted into a binary gate vector 𝜆 =
(𝜆 , 𝜆 , … , 𝜆 ) regulatory aging flow in the age progression
Huang et al.[54], proposed a framework that integrates
framework in figure x and expressed as:
both the advantages of the flow-based method model and the
GAN. consists of three parts:(1) an encoder that maps a given
𝑋 = 𝑋 + 𝜆 𝐺 (𝑋 )
= 𝑋 + 𝜆 𝐺 (𝑋 ) + 𝜆 𝐺 (𝑋 ) (50) face image into latent space. Through an Invertible
= 𝑋 + 𝜆 𝐺 (𝑋 ) + 𝜆 𝐺 (𝑋 ) + 𝜆 𝐺 (𝑋 ) Conditional Translation Module(ICTM) which translates the
source latent vector to another target vector, a decoder
The network is very elastic in modeling the age reconstructs the resulting face of the target latent vector with
progression between two different age groups using this the same encoder, all parts are invertible, which can achieve
framework. In the end, by providing an image of a young 𝑋 bijective aging mapping.
face from the source age group 𝑠, the aging process to the The novelty of ICTM is the ability to manipulate the
target age group 𝑡 can be formulated as follows: direction of change in age progression. While keeping the
other attributes unchanged, to keep the changes insignificant.
𝑋 = 𝐺(𝑋 ; 𝜆 : ) (51) The second is that they use latent space to ensure that the
resulting latent vector is indistinguishable from the original.
The least-square loss function for adversarial loss used in
The resulting experimental results demonstrate superior
generator G formulate as:
performance. Flow-based encoder G maps input images into
latent spaces and decoder that inverse using function 𝐺
VOLUME XX, 2017
15 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3157617, IEEE Access

formulated encoding process 𝑧 = 𝐺(𝐼) and generate reserver in whole life, with only one image as reference. Generated
procedure 𝐼 = 𝐺(𝑧), where z is are gaussian distribution 𝑝(𝑧). face image on given age target, give non-linear transformation
Generator 𝐺 optimized by: on structure and texture.
𝑑𝐺 LFS is a conditional GAN, where conditional apply to
ℒ = −𝐸 ~ ( ) [log 𝑝𝜃(𝑧) + log |𝑑𝑒𝑡 |] (55) latent face representation. LFS proposed to disentangle face
𝑑𝐼
synthesis where structure, identity, and age transformation can
Where G is extracted latent vector and inverse of G is 𝐺
model effectively.
Invertible conditional translation module is given latent vector
The shape, texture, and identity features were extracted
𝑧 from source age groups 𝐶 and with age progression
separately from the encoder. Has two transformer modules,
translate 𝑧 to 𝑧 (age target 𝐶 ) using ICTM. Cycle
one a conditional convolutional-based, and the other a channel
consistency achieves by reversibility of ICTM. Each flow
attention-based, to model non-linear shape and texture.
contains two convolutional networks with a channel attention
Accommodate distinct aging processes and ensure the
module to make the model focus only on necessary latent.
synthesis image has age sensitivity and is identity-preserved.
Discriminator used simple Multilayer Perceptron (MLP),
Three distinct set features: shape 𝑓(𝑠), texture 𝑓(𝑡), and
with 512 neurons, followed by spectral normalization [55],
identity 𝑓(𝑖𝑑) are extracted from the encoder. And formulated
and leaky ReLU as activations, negative slope 0.2 and bottom
as:
layer with 1 neuron, and others for age classification to 𝑓(𝑠) = 𝑅 (𝜀 (𝐼 ))
improve age accuracy in the framework. 𝑓(𝑡) = 𝒯(𝜀 (𝐼 )) (58)
To make this framework work the loss function in this 𝑓(𝑖𝑑) = 𝒯 (𝜀 (𝐼 ))
framework contained Attribute-aware Knowledge Distillation
Loss (ℒ ), Adversarial loss (ℒ ), Age Classification Loss Where 𝑅 is a residual block to extract shape information, and
(ℒ ), and Consistency Loss (ℒ ). Formulate as: 𝐼 is a convolutional projection module to extract texture
information and pool it into a vector identity. Identity feature
ℒ = 𝐸 |𝑧 − (𝑧 + 𝑠 × 𝑧̅ , − 𝑧̅ , )| extracted by another convolutional projection module 𝑓(𝑖𝑑).
1 Age conditional and texture transformation using
ℒ = 𝐸 [(𝐷(𝑍 ) − 1) ]
2 (56)
ℒ = 𝐸 [ℓ(𝐴(𝑍 ), 𝑡)] conditional convolutional where convolutional filters are
ℒ = −𝐸 ~ ( ) [log 𝑝𝜃(𝜇 , 𝜎 , 𝑧 )] modulated by the target age information. For texture
transformation age conditional channel attention was used and
And final loss formulates as: designed as

ℒ = 𝜆 ℒ +𝜆 ℒ + 𝜆 ℒ + 𝜆 ℒ (57) 𝑓 (𝑍 ) = 𝒯 (𝑓 , 𝑍 ) = 𝑓 ∘ 𝑃 (𝐴 (𝑍 )) (59)

The framework of age flow can see in as: Where ∘ is element-wise multiplication, and 𝑃 is linear
projection layer. Image generation generator G: 𝐼 =
𝐺(𝑓 (𝑍 ), 𝑓 (𝑍 )) transforming reference face images into
older age groups with the same shape information or ℒ =
||𝑅 (ℰ 𝐼 − 𝑅 (ℰ 𝐼 )|| the difference is minimized. To
make the framework achieve the aim they use the identity loss
function ℒ , cycle consistency loss ℒ , reconstruction loss
ℒ , conditional adversarial loss ℒ and formulate as:
= ||𝐼𝐷 ℰ (𝐼 ) − 𝐼𝐷 ℰ (𝐼 ) ||

ℒ = ||𝐼 − 𝑓(𝐼 , 𝑍 )||
ℒ = ||𝐼 − 𝐺(𝑓 (𝑍 ), 𝑓 (𝑍 ))|| (60)
ℒ =𝐸 ~ ( ) [log(𝐷(𝐼 |𝑍)] + 𝐸 ~ [1
− log(𝐷(𝐼 |𝑍)
Overall training objective formulated as:

ℒ= 𝜆 ℒ +𝜆 ℒ + 𝜆 ℒ + 𝜆 ℒ + 𝜆ℒ
(61)

Figure 22. Age Flow: Conditional Age Progression and where 𝜆 , 𝜆 , 𝜆 , 𝜆 , and 𝜆 are hyperparameters to
Regression with Normalizing Flow balance the objective function. The framework for
Disentangled Lifespan face synthesis can see in Figure 24
n. Disentangled Lifespan face synthesis

He et al [56] proposed a lifespan face synthesis (LFS)


model to generate a set of photo-realistic face images, a person
VOLUME XX, 2017
16 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3157617, IEEE Access

Transformed
Rs St shape A. Feature Extration B. Attention based feature decomposition
D. Age Estimation
features
Input face Age related
Ps Gs D Ladv Channel Feature
Age Regresion
Age
Latent age Attention
Module
Attention
Classification
Mask
code Target age Recycle Encoder Mixed Spatial
Texture E Feature attention ...I E. AIFR
50-59 image age
 t
module Identity
feature 15-19 Age Regresion
Tt
related
Pooling feature Age

Input image
Transformed
texture
F C. Identity conditional F. Face Age Synthesis
Classifiecation

Age 15-19 Element wise module


feature multiplication Identity
Latent age Recycle Identity Identity decoder Discriminator classification
code image age Lcyc Sum Average
conditional
block
conditional
block D OR Dimg
L id Jd AE Pt
15-19
code
Convolutional
layer
Real/
Fake ?
Gradient Synthetic
input Deconvolutional Reversal Layer Face
Real face
target layer

Linear Layer Feed Fordward

Figure 23. Disentangled Lifespan face synthesis


Figure 24. Age-Invariant Face Recognition and Age
Synthesis: Multitask Learning Framework.
o. .Age-Invariant Face Recognition and Age
Synthesis: Multitask Learning Framework.

IV. CHALLENGE IN FACE AGING


Zhizong et al.[57] proposed a multitask framework to
handle two tasks termed MTL-Face which can learn identity
IV. DATASET
representation and generate pleasing face synthesis.
Decomposed two unrelated components: identity and age-
The challenge in a deep neural network is the quality and
related features through an attention mechanism.
number of datasets, the success of a network is based on the
To make this framework optimize, they conduct the
availability of data that allows the network to be properly
following process:
trained to produce models. Dataset availability currently
(1) Age-invariant face recognition task for preserving
relies on public datasets, for datasets relating to children are
identity formulated as:
minimal. Most of the datasets used in facial aging research
ℒ = ℒ (𝐿(𝑋 ), 𝑌 ) + 𝜆 ℒ (𝐴 (𝑋 )) related to children are private data. This difficulty may be
(62)
+𝜆 ℒ (𝐺𝐿𝑅(𝑋 ))
caused by several legal regulations that protect a child's right
(2) Face synthesis task generated synthesis image at age to privacy. Ethnicity is a challenge in finding the pattern of
target use age label face aging, between one ethnic group and another, the face
aging pattern is not linear. If there are diverse datasets, of
𝐼 = 𝐷({𝐸 (𝐼)} , 𝐼𝐶𝑀(𝑋 , 𝑡)) (63) course, the algorithm can be retrained to produce aging
patterns across races.
(3) Improve and stable training process using Least
A face aging has challenges when has a large age range
Square GAN(LSGAN):
in age groups of a dataset, making architecture difficult to
1 find a pattern. High resolution generated face more
ℒ = 𝐸 [𝐷 ([𝐼 ; 𝐶 ])] (64)
2 attractive, to produce a high resolution make a challenge in
(4) Face aging and rejuvenation in holistic formulated the field of algorithms to be built. The incorrect label is also
as: a challenge, to give the correct age label to the dataset is not
𝑋 , 𝑋 = 𝐴𝐹𝐷(𝐸(𝐼 )) easy, until now there is no clear benchmark in determining
ℒ = ℓ (𝑋 , 𝑡) (65) the age label of an image in the dataset, many datasets are
ℒ = 𝐸 ||𝑋 − 𝑋 || collected with incorrect labels.
Where ||.|| represents the Frobeneius norm. External factors, the aging pattern of each individual is
(5) Final loss processed by: not linear, it is influenced by lifestyle, nutrition,
environment, and disease. If there is a dataset that has this
ℒ =𝜆 ℒ +𝜆 ℒ +𝜆 ℒ (66)
information, it will certainly provide a new direction in
With a weight sharing strategy, this framework has research.
succeeded in increasing the smoothness in synthesizing new
faces in the age group generated in the desired age group under A. Method
face in the wild conditions. A model-based approach using an anthropological
Age-Invariant Face Recognition and Age Synthesis: perspective in the active appearance model is represented by
Multitask Learning Framework illustrated: an aging function that applies the mean face parameter of
different age groups [15][16], or variant aging effect [17].
The difficulty of this method is constructing a suitable
training set where a sequential age progression from a
different individual is needed to create face model

VOLUME XX, 2017


17 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3157617, IEEE Access

transformation. Suo et al[18] used a compositional and labels (correct label). GAN is a convolutional neural
dynamic model for face aging, representing the face in each network, the architecture requires a long training time and
age group with a hierarchal “And-Or” graph. This large computational costs. The advantage is that this
compositional and dynamic model can build aging effect approach can produce a single model that can be applied to
perception and preserve identity from an input image in high all age groups and the resulting model can make a smooth
but has difficulty to develop model while lacking image and real transition from one age group to another and leave
sequences, cant trained for long age range and need precise only a few artifacts. Attention mechanism can enhance this
alignment to produce fine synthetic images and not loss architecture, emphasize the parts that are determined to
information identity. produce the aging pattern.
Prototype-based approaches can produce reliable age Following figure 6, we describe the timeline for the
subspace and age alignment, which can handle in-the-wild development of face aging using a generative approach. And
photos that have a lot of variation in pose, expression, and for describing the performance of each method we provide
lighting. This method uses a collection of head and torso the result of each method in Table A.1.
images in an attempt to obtain a synthetic image at a certain
age. The prototype model approach requires very precise
image alignment, to avoid the occurrence of plausible facial V. CONCLUSION
images.
The reconstruction-based approach such as coupled The dataset used in the facial aging process also plays
Dictionary Learning (CDL), models aging patterns an important role in the success of finding aging patterns in
maintaining the personalization of each person's face and each age group. Smaller age range better than larger age
uses the aging base or basic pattern of each age group and range, 5-year range better than 10-year range, because 5
combines it into the input image. To produce short-term or years range has fewer differences than 10 years range. It
neighborly patterns of age change requires a complete makes an aging face architecture easy to find an aging
number of individuals dataset and sequential it’s a difficult pattern. Teenager 20 years old has big differences from an
task. adult 30 years old. They can not be grouped into the same
A deep neural network approach such as TRBM and age range, the author gives a suggestion, the age range must
Recurrent Neural Network is a better approach to face aging. be narrowed down to 5 years. The young age range must add
This method utilizes information from the past, to find soft to increase the ability of the architecture to generate young
transition patterns between age groups. Using a single unit aging patterns. The dataset label must be clean, not the
model, the interpretation of facial changes can be achieved. wrong label on the picture image(a photo must be grouped in
The facial structure and changes in face aging are carried out the correct group). Dataset distribution of each age group
in one training, this makes it easier for the architecture to must be balanced, to prevent bias in the training process, and
form a robust and simple model. But this approach requires the number of images for each age group must be sufficient
a long time of large computations and requires a large so that the architecture can produce a good model. Diversity
number of datasets to produce a robust model. in the dataset can improve the quality of the found face aging
Generative Adversarial Network (GAN) is another pattern.
approach in face aging. The translation method is based on Based on the comparison of Mean Absolute Error
how to transfer style from one set of group images to another (MAE) or accuracy of each method in appendix 2, the face
set of a group image. For example, CycleGAN which uses aging architecture that uses the conditional GAN approach is
this method tries to capture the style of an age group into still the superior approach. The current trend of approach is
other age groups. CycleGAN has the advantage that it does how to apply state-of-the-art models to mobile computing or
not require consecutive photos of the same individual in each edge computers.
age group domain and only requires that each age group in The synthetic face image quality of the face aging
the dataset has a sufficient number. The sequential-based process depends on which algorithm is used in the process.
approach seeks to find facial aging patterns in each adjacent A model-based approach, difficult to find a general model
age group, in contrast to other approaches that seek to obtain for a certain age group, and a bad alignment model with face
a direct model, this approach is more concerned with a image, produce blurry plausible image and identity
sequential approach, requires a dataset sequentially for each information losses. Using a dynamic model, the resulting
age group. The training will be carried out gradually from synthetic photo variations will vary a lot. The prototyping
the young age group to the old age group, and vice versa. approach eliminates the global identity information of the
This transition is carried out sequentially, each stage will be individual on synthetic face images, producing image loss
trained independently or separately to get a complete aging identity information. Reconstruction approach in which
process. The conditional-based approach used a label as one- CDL, identity information can be maintained even though
hot code into the architecture embed it in the generator and the reconstruction process must be sequenced, from one
or discriminator. This method required a dataset with clear aging group to another aging group, makes generate

VOLUME XX, 2017


18 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3157617, IEEE Access

synthetic face a lot of variation. The Generative Adversarial Networks,” in 2018 IEEE/CVF Conference on Computer Vision
and Pattern Recognition, Jun. 2018, vol. 9, no. 1, pp. 7939–
approach is still the best, with the approach method, by 7947, doi: 10.1109/CVPR.2018.00828.
finding the minimum value of the mean square error for [7] C. N. Duong, K. G. Quach, K. Luu, T. H. N. Le, M. Savvides,
images of a certain age group, the pattern of age groups can and T. D. Bui, “Learning from Longitudinal Face Demonstration
- Where Tractable Deep Modeling Meets Inverse Reinforcement
be found, and by finding the minimum value of perceptual
Learning,” Nov. 2019, [Online]. Available:
loss with the original image, the aging pattern can be http://arxiv.org/abs/1711.10520.
implemented into input face images and identity information [8] K. Ricanek Jr. and T. Tesafaye, “MORPH: A longitudinal image
is maintained. The alternative method uses cyclic Age-progression, of normal adult,” Proc. 7th Int. Conf. Autom.
Face Gesture Recognit, pp. 0–4, 2006.
consistency in Fang et al research (Fang, Deng, Zhong, & [9] A. Lanitis, “FG-NET Aging Database.” Oct. 2010, [Online].
Hu, 2020) to maintain its identity, or use two stages of Available: http://www.fgnet.rsunit.com.
synthesis (feature generation, and feature to image [10] E. Eidinger, R. Enbar, and T. Hassner, “Age and gender
estimation of unfiltered faces,” IEEE Trans. Inf. Forensics
rendering) in GAN. Feature generation tasks to synthesize Secur., vol. 9, no. 12, pp. 2170–2179, 2014, doi:
various facial features, tasks render photorealistic in the 10.1109/TIFS.2014.2359646.
image domain with high diversity but preserves identity. [11] G. Levi and T. Hassner, “Age and gender classification using
convolutional neural networks,” 2015 IEEE Conf. Comput. Vis.
For future research, face aging in high resolution is a Pattern Recognit. Work., pp. 34–42, 2015, doi:
challenge, because processing high-resolution images make 10.1109/CVPRW.2015.7301352.
the architecture need a high computation process, and has [12] B.-C. Chen, C.-S. Chen, and W. H. Hsu, “Cross-Age Reference
challenged find algorithm or method, this computation has a Coding for Age-Invariant Face Recognition and Retrieval,” in
Computer Vision – ECCV 2014, vol. 16, no. 7, 2014, pp. 768–
lower requirement, and faster. Improving the quality of the 783.
dataset in the context of the diversity of the dataset is very [13] R. Rothe, R. Timofte, and L. Van Gool, “Deep expectation of
challenging. A dataset with a variety of races, living real and apparent age from a single image without facial
landmarks,” Int. J. Comput. Vis., vol. 126, no. 2–4, pp. 144–157,
environments, nutrition, and lifestyle, can open the 2016, doi: 10.1007/s11263-016-0940-3.
opportunity for topic research such effect the nutrition, living [14] S. Moschoglou, C. Sagonas, and I. Kotsia, “AgeDB : the first
environment, nutrition, and lifestyle, or face aging on Asian manually collected , in-the-wild age database,” 2006.
[15] E. Patterson, K. Ricanek, M. Albert, and E. Boone, “Automatic
people. The effect of disease or sickness on face aging also representation of adult aging in facial images,” Int. Conf. Vis.
can be researched. Imaging, Image Process., p. 171¨C176, 2006.
[16] A. Lanitis, C. J. Taylor, and T. F. Cootes, “Toward automatic
simulation of aging effects on face images,” IEEE Trans.
Pattern Anal. Mach. Intell., vol. 24, no. 4, pp. 442–455, 2002,
CONFLICT OF INTEREST
doi: 10.1109/34.993553.
[17] X. Geng, Y. Fu, and K. S. Miles, “Automatic Facial Age
The authors declare no conflict of interest. Estimation,” 11th Pacific Rim Int. Conf. Artif. Intell., pp. 1–130,
2010.
[18] J. Suo, S. C. Zhu, S. Shan, and X. Chen, “A compositional and
ACKNOWLEDGMENT dynamic model for face aging.,” IEEE Trans. Pattern Anal.
Mach. Intell., vol. 32, no. 3, pp. 385–401, 2010.
[19] I. Kemelmacher-Shlizerman, S. Suwajanakorn, and S. M. Seitz,
No funding for this research “Illumination-aware age progression,” Proc. IEEE Comput. Soc.
Conf. Comput. Vis. Pattern Recognit., pp. 3334–3341, 2014, doi:
10.1109/CVPR.2014.426.
REFERENCES [20] D. A. Rowland, “Manipulating Facial Appearance through
Shape and Color,” IEEE Comput. Graph. Appl., vol. 15, no. 5,
[1] N. Ramanathan and R. Chellappa, “Modeling age progression in pp. 70–76, 1995, doi: 10.1109/38.403830.
young faces,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. [21] C. Liu, J. Yuen, A. Torralba, J. Sivic, and W. T. Freeman, “SIFT
Pattern Recognit., vol. 1, no. July, pp. 387–394, 2006, doi: Flow: Dense Correspondence across Different Scenes,” vol. 1,
10.1109/CVPR.2006.187. no. 1, pp. 28–42, 2008, doi: 10.1007/978-3-540-88690-7_3.
[2] C. N. Duong, K. Luu, K. G. Quach, and T. D. Bui, “Longitudinal [22] X. Shu, J. Tang, H. Lai, L. Liu, and S. Yan, “Personalized age
face modeling via temporal deep restricted Boltzmann progression with aging dictionary,” Proc. IEEE Int. Conf.
machines,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Comput. Vis., vol. 2015 Inter, pp. 3970–3978, 2015, doi:
Recognit., vol. 2016-Decem, pp. 5772–5780, 2016, doi: 10.1109/ICCV.2015.452.
10.1109/CVPR.2016.622. [23] X. Shu, J. Tang, Z. Li, H. Lai, L. Zhang, and S. Yan,
[3] W. Wang et al., “Recurrent Face Aging,” 2016 IEEE Conf. “Personalized Age Progression with Bi-Level Aging Dictionary
Comput. Vis. Pattern Recognit., pp. 2378–2386, 2016, doi: Learning,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no.
10.1109/CVPR.2016.261. 4, pp. 905–917, 2018, doi: 10.1109/TPAMI.2017.2705122.
[4] Z. Zhang, Y. Song, and H. Qi, “Age progression/regression by [24] I. J. Goodfellow et al., “Generative Adversarial Networks,” pp.
conditional adversarial autoencoder,” Proc. - 30th IEEE Conf. 1–9, 2014, doi: 10.1001/jamainternmed.2016.8245.
Comput. Vis. Pattern Recognition, CVPR 2017, vol. 2017-Janua, [25] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-Image
pp. 4352–4360, 2017, doi: 10.1109/CVPR.2017.463. Translation with Conditional Adversarial Networks,” in 2017
[5] C. N. Duong, K. G. Quach, K. Luu, T. Hoang Ngan Le, and M. IEEE Conference on Computer Vision and Pattern Recognition
Savvides, “Temporal non-volume preserving approach to facial (CVPR), Jul. 2017, pp. 5967–5976, doi:
age-progression and age-invariant face recognition,” arXiv, pp. 10.1109/CVPR.2017.632.
3735–3743, 2017. [26] G. Antipov, M. Baccouche, and J.-L. L. Dugelay, “Face aging
[6] X. Tang, Z. Wang, W. Luo, and S. Gao, “Face Aging with with conditional generative adversarial networks,” Proc. - Int.
Identity-Preserved Conditional Generative Adversarial Conf. Image Process. ICIP, vol. 2017-Septe, no. February, pp.

VOLUME XX, 2017


19 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3157617, IEEE Access

2089–2093, 2018, doi: 10.1109/ICIP.2017.8296650. Locally: Face Aging With an Attention Mechanism,” ICASSP
[27] S. Liu et al., “Face Aging with Contextual Generative 2020 - 2020 IEEE Int. Conf. Acoust. Speech Signal Process., pp.
Adversarial Nets,” Proc. 2017 ACM Multimed. Conf. - MM ’17, 1963–1967, 2020.
pp. 82–90, 2017, doi: 10.1145/3123266.3123431. [46] Z. He, M. Kan, S. Shan, and X. Chen, “S2GAN: Share aging
[28] Z. Wang, X. Tang, W. Luo, and S. Gao, “Face Aging with factors across ages and share aging trends among individuals,”
Identity-Preserved Conditional Generative Adversarial Proc. IEEE Int. Conf. Comput. Vis., vol. 2019-Octob, no. Iccv,
Networks,” in Proceedings of the IEEE Computer Society pp. 9439–9448, 2019, doi: 10.1109/ICCV.2019.00953.
Conference on Computer Vision and Pattern Recognition, Jun. [47] C. N. Duong et al., “Automatic face aging in videos via deep
2018, vol. 9, no. 1, pp. 7939–7947, doi: reinforcement learning,” Proc. IEEE Comput. Soc. Conf.
10.1109/CVPR.2018.00828. Comput. Vis. Pattern Recognit., vol. 2019-June, pp. 10005–
[29] P. Li, Y. Hu, Q. Li, R. He, and Z. Sun, “Global and Local 10014, 2019, doi: 10.1109/CVPR.2019.01025.
Consistent Age Generative Adversarial Networks,” Proc. - Int. [48] C. Shi, J. Zhang, Y. Yao, Y. Sun, H. Rao, and X. Shu, “CAN-
Conf. Pattern Recognit., vol. 2018-Augus, pp. 1073–1078, 2018, GAN: Conditioned-attention normalized GAN for face age
doi: 10.1109/ICPR.2018.8545119. synthesis,” Pattern Recognit. Lett., vol. 138, pp. 520–526, 2020,
[30] J. Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired Image- doi: 10.1016/j.patrec.2020.08.021.
to-Image Translation Using Cycle-Consistent Adversarial [49] P. Li, H. Huang, Y. Hu, X. Wu, R. He, and Z. Sun, “Hierarchical
Networks,” Proc. IEEE Int. Conf. Comput. Vis., vol. 2017- Face Aging Through Disentangled Latent Characteristics,” Lect.
Octob, pp. 2242–2251, Oct. 2017, doi: 10.1109/ICCV.2017.244. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell.
[31] M. Mirza and S. Osindero, “Conditional Generative Adversarial Lect. Notes Bioinformatics), vol. 12348 LNCS, pp. 86–101,
Nets,” pp. 1–7, 2014, [Online]. Available: 2020, doi: 10.1007/978-3-030-58580-8_6.
http://arxiv.org/abs/1411.1784. [50] D. P. Kingma and M. Welling, “Auto-encoding variational
[32] X. Yao, G. Puy, A. Newson, Y. Gousseau, and P. Hellier, “High bayes,” 2nd Int. Conf. Learn. Represent. ICLR 2014 - Conf.
resolution face age editing,” arXiv, 2020. Track Proc., no. Ml, pp. 1–14, 2014.
[33] W. Wang, Y. Yan, Z. Cui, J. Feng, S. Yan, and N. Sebe, [51] H. Huang, Z. Li, R. He, Z. Sun, and T. Tan, “Introvae:
“Recurrent Face Aging with Hierarchical AutoRegressive Introspective variational autoencoders for photographic image
Memory,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. synthesis,” Adv. Neural Inf. Process. Syst., vol. 2018-Decem, no.
3, pp. 654–668, 2019, doi: 10.1109/TPAMI.2018.2803166. Nips, pp. 52–63, 2018.
[34] J. Despois, F. Flament, and M. Perrot, “AgingMapGAN [52] R. Or-El, S. Sengupta, O. Fried, E. Shechtman, and I.
(AMGAN): High-Resolution Controllable Face Aging with Kemelmacher-Shlizerman, “Lifespan Age Transformation
Spatially-Aware Conditional GANs,” Lect. Notes Comput. Sci. Synthesis,” Lect. Notes Comput. Sci. (including Subser. Lect.
(including Subser. Lect. Notes Artif. Intell. Lect. Notes Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 12351
Bioinformatics), vol. 12537 LNCS, pp. 613–628, 2020, doi: LNCS, pp. 739–755, 2020, doi: 10.1007/978-3-030-58539-6_44.
10.1007/978-3-030-67070-2_37. [53] Z. Huang, S. Chen, J. Zhang, and H. Shan, “PFA-GAN:
[35] Y. Liu, Q. Li, and Z. Sun, “Attribute-aware face aging with Progressive Face Aging with Generative Adversarial Network,”
wavelet-based generative adversarial networks,” Proc. IEEE IEEE Trans. Inf. Forensics Secur., vol. 16, pp. 2031–2045,
Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2019- 2021, doi: 10.1109/TIFS.2020.3047753.
June, pp. 11869–11878, 2019, doi: 10.1109/CVPR.2019.01215. [54] Z. Huang, S. Chen, J. Zhang, and H. Shan, “AgeFlow:
[36] H. Fang, W. Deng, Y. Zhong, and J. Hu, “Triple-GAN: Conditional Age Progression and Regression with Normalizing
Progressive face aging with triple translation loss,” IEEE Flows,” pp. 743–750, 2021, doi: 10.24963/ijcai.2021/103.
Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Work., vol. [55] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, “Spectral
2020-June, pp. 3500–3509, 2020, doi: normalization for generative adversarial networks,” 6th Int.
10.1109/CVPRW50498.2020.00410. Conf. Learn. Represent. ICLR 2018 - Conf. Track Proc., 2018.
[37] S. Palsson, E. Agustsson, R. Timofte, and L. Van Gool, [56] S. He, W. Liao, M. Y. Yang, Y.-Z. Song, B. Rosenhahn, and T.
“Generative adversarial style transfer networks for face aging,” Xiang, “Disentangled Lifespan Face Synthesis,” pp. 3877–3886,
IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Work., 2021, [Online]. Available: http://arxiv.org/abs/2108.02874.
vol. 2018-June, pp. 2165–2173, 2018, doi: [57] Z. Huang, J. Zhang, and H. Shan, “When Age-Invariant Face
10.1109/CVPRW.2018.00282. Recognition Meets Face Age Synthesis: A Multi-Task Learning
[38] C. N. Duong, K. G. Quach, K. Luu, T. H. N. Le, and M. Framework,” pp. 7278–7287, 2021, doi:
Savvides, “Temporal Non-volume Preserving Approach to 10.1109/cvpr46437.2021.00720.
Facial Age-Progression and Age-Invariant Face Recognition,”
Proc. IEEE Int. Conf. Comput. Vis., vol. 2017-Octob, pp. 3755–
3763, 2017, doi: 10.1109/ICCV.2017.403.
[39] D. Deb, D. Aggarwal, and A. K. Jain, “Child Face Age-
Progression via Deep Feature Aging,” arXiv, 2020.
[40] Y. Shen, B. Zhou, P. Luo, and X. Tang, “FaceFeat-GAN: a Two-
stage approach for identity-preserving face synthesis,” arXiv,
2018.
[41] J. Song, J. Zhang, L. Gao, X. Liu, and H. T. Shen, “Dual
conditional GANs for face aging and rejuvenation,” IJCAI Int.
Jt. Conf. Artif. Intell., vol. 2018-July, pp. 899–905, 2018.
[42] H. Yang, D. Huang, Y. Wang, and A. K. Jain, “Learning Face
Age Progression: A Pyramid Architecture of GANs,” Proc.
IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp.
31–39, 2018, doi: 10.1109/CVPR.2018.00011.
[43] X. Mao, Q. Li, H. Xie, R. Y. K. Lau, Z. Wang, and S. P.
Smolley, “Least Squares Generative Adversarial Networks,”
Proc. IEEE Int. Conf. Comput. Vis., vol. 2017-Octob, pp. 2813–
2821, 2017, doi: 10.1109/ICCV.2017.304.
[44] Q. Li, Y. Liu, and Z. Sun, “Age Progression and Regression with
Spatial Attention Modules,” arXiv, 2019, doi:
10.1609/aaai.v34i07.6800.
[45] H. Zhu, Z. Huang, H. Shan, and J. Zhang, “Look Globally, Age

VOLUME XX, 2017


20 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3157617, IEEE Access

APPENDIX

A.1. Score, Categories, and method Face Aging GAN

Tabel A. 1 Score, Categories, and method Face Aging GAN

Category Method Advanced Criteria Accuracy or MAE sco


Conditional- Conditional Adversarial Smooth age regression and Preserving Identity Same person as ground truth 43.38%, Not same as gr
based Autoencoder(CAAE) [4] progression, producing more 22.04%
photorealistic. Age Progression From 235 paired images 79 subjects, 47 respondents
52.77 votes CAAE is better, 28.99% prior work bette
Conditional- Age Conditional Preserving identity Face recognition Initial reconstruction 53.2% accuracy, “Pixelwise” O
based Generative Adversarial optimizations Identity-Preserving Optimizations 82.9% accuracy
Neural Network
Model[26]
Conditional- Contextual Generative Correctly synthesize real age Face recognition Identity preserving, a good result at while input age a
based Adversarial Nets (C- progression with different range maximum 8 years. The majority
GANs) [27] Cross-Age Face Verification Equal error rate (EER) on cGANs Synthetic Pair 8.7
11.05%, and Original Pair 17.41%
Conditional- Identity-Preserved Realistic face synthesis, good Face Verification 96.90%
based Conditional Generative preserving identity, and age Image Quality 71.74%
Adversarial Network consistency Age Classification 31.74%
(IPCGANs) [28] VGG-Face Score 36.33±1.85
Time Cost 0.28s
Translation- Subject-dependent Deep Efficient in synthesizing in the Face identification (SF+SDAP) 64.4%
based Aging Path (SDAP) [7] wild aging faces. Age Estimation SDAP’s synthesized faces MAEs 3.94
Sequence- Two-Stage Approach for Photo-realistic image, with high Identity Preserving 97.62%±0.78
based Identity-Preserving Face preserving identity quality Similarity+Image Quality Similarity 0.693, User Score 22.4%
Synthesis[40]
Translation- Child Face Age- Ability to identify young FACENET With Feature Aging Module, CFA-Rank-1 55.30%, C
based Progression via Deep children 53.58%, ITWCC-Rank-1 21.44%, ITWCC-Rank-1 @
Feature Aging[39] COSFACE With Feature Aging Module, Face-identifaction CFA
1 @1% FAR 94.24%, ITWCC-Rank-1 66.12%, ITW
25.04%, Face recognition: 95.91% FGNET dataset, 9
Conditional- Age Progression and Synthesizing lifelike face image Age Estimation Error 1.53±6.50 (MORPH), 1.78±7.53(CACD), 4.77±10.5
based Regression with Spatial at desired age with personalized Verifaction Rate 100%(MORPH), 99.92%(CACD), 98.10%(UTKFac
Attention Modules[44] feature and keeping un age-
irrelevant unchanged
Translation- Triple-GAN: Progressive Identity classification and age Age estimation using face++ Average 94.01% (MORPH), Average 93.38% (CAC
based Face Aging with Triple classfication Age classification accuracy Average 71.26% (MORPH), Average 69.43% (CAC
Translation Loss[36]
Translation- Generative Adversarial Aging effect Age Progression (Likert scale:1- 3.17±0.016
based Style Transfer Networks 5)
for Face Aging[37] Age Regression (Likert scale:1-5) 3.28±0.075
Conditional- Conditioned-Attention Aging relevant information Age estimation Synthetic Face with CAAC: MORPH-(30-40 year) 3
based Normalization GAN 46.43%±5.89 (51-77 year) 58.52%±5.93 CACD: (30
(CAN-GAN) [48][54] 50 year) 47.73% ± 7.22 (51-77 year) 57.04% ± 8.45
Synthetic Face w/o CAAC:MORPH (30-40 year) 36
47.38% ± 6.27 (51-77 year) 64.23% ± 5.96 CACD

VOLUME XX, 2017


21 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3157617, IEEE Access

Real face:MORPH (16-30) 27.86% ± 6.12 (30-40 year) 39.11% ± 7.43 (41-50
year) 48.34%± 8.37 (51-77 year) 58.09% ± 8.90 CACD (16-30) 30.93% ± 7.80
(30-40 year) 38.40 ± 9.03 (41-50 year) 46.63 ± 10.34 (51-77 year) 53.99 ± 10.93

Face Verification Test face: MORPH Synthetics face1 96.45% Synthetics face2 95.04% Synthetics
face3 90.09% CACD: Synthetics face1 95.58% Synthetics face2 93.72%
Synthetics face3 88.14%;
Synthetic face1: MORPH Synthetics face2 96.32% Synthetics face3: 92.20%
CACD: Synthetics face2 95.65% Synthetics face3 89.03%
Synthetic face2: MORPH Synthetics face3: 95.28% CACD: Synthetics face3
91.48%
Conditional Progressive Face Aging Aging estimation loss and Face Verification Rate MORPH: 31-40(100%), 41-50(100%), 50+(99.7%)
-based with Generative Pearson correlation for aging CACD:31-40(99.97%), 41-50(99.89), 50+(99.69)
Adversarial Network smoothness
(PFA-GAN)[53]
Conditional Age Flow: Conditional attribute-aware knowledge Age Accuracy Age Accuracy: MORPH, 30-(87.19%), 31-40(91.85%), 41-50(90.68%),
-based Age Progression and distillation to learn the Cosine Similarity 50+(90.71%), CACD, 30-(86.49%), 31-40(78.06%),41-50(85.91%), 50+(88.15%)
Regression with manipulation direction of age gender Cosine similarity: MORPH, 30-(0.897),31-40(0.923), 41-50(0.903), 50+(0.767),
Normalizing Flow[54] progression while keeping other Race CACD, 30-(0.905), 31-40(0.926), 41-50(0.919), 50+(0.847)
unrelated attributes unchanged, Gender: MORPH, 31-40(98.19%), 41-50(99.26%),51+(98.21), CACD, 31-
alleviating unexpected changes 40(98.63%), 41-50(98.02%), 51+(98.49%)
in facial attributes. Race: MORPH, 31-40(98.80%), 51+(97.72%)
Conditional Disentangled Lifespan lifespan face synthesis (LFS) Identity preservation Identity preservation(3.07±0.19), Shape transformation(3.18±0.35), Teture
-based face synthesis[56] generate transformation(3.30±0.21), Reconfiguration(4.07±0.27), Age error(3.53±2.81),
a set of photo-realistic face Age Accuracy(65.6%)
images of a person’s
Conditional Hierarchical Face Aging Accurate estimated age Aging Accuracy MORPH, 31-40(99.48%), 41-50(99.36%), 50+(99.36)
-based through Disentangled distribution for synthetics face. CACD, 31-40(99.24%), 41-50(99.19), 50+(99.19%)
Latent Characteristics[49]
Conditional Lifespan Age multi-domain image-to-image Identity Accuracy From 50 images,
-based Transformation generative adversarial network Age Accuracy Same Identity: 15-19(50) , 30-39(45), 50-59(41), All(136)
Synthesis[52] architecture, Age difference, 15-19(12.7%),, 30-39(11.6%), 50-59(9.%), All(11.3)
Conditional Age-Invariant Face Weight-sharing strategy Accuracy A96.23%(AgeDB-30), 95.62%(CALFW), 99.55%(CACD-VS), 94.78%(FG-NET)
-based Recognition and Age improved smoothness of
Synthesis: Multitask synthesis face
Learning Framework[57]
Conditional S2GAN: Share Aging The personalized aging basis for Age Accuracy MORPH:Average of All Pairs (99.69%), Hardest Pair-(11-20,50+)(96.08%),
-based Factor Across Age and the synthesis of aging factors Easiest Pair (11-20-21-30):100%
Share Aging Trends with age-specific transform CACD: Average of All Pairs (98.91%), Hardest Pair-(11-20,50+)(94.08%), Easiest
Among individual[46] Pair (11-20-21-30):99.96%
Conditional Automatic Face Aging In Age-progressed faces, temporal Aging Consistency Aging Consistency: 245.64
-based Video via Deep smoothness, and cross-age face Temporal Smoothness Temporal Smoothness: 61.80
Enforcement Learning[47] verification Matching Accuracy Matching Accuracy: 83.67%
Conditional Dual conditional GANs Photo realistic face Score of images 4+→10+(0.86), 8+→30+(0.82), 10+→20+(0.81), 10+→40+(0.80),
-based for Face Aging and 20+→60+(0.80), 30+→60+(0.72), 40+→60+(0.84), Average(0.81)
Rejuvenations[41]

VOLUME XX, 2017


22 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3157617, IEEE Access

A.2. Sampe Images from each GAN method

Tabel A. 2 Score, Categories, and method Face Aging GAN

Category Method Sample Image


Conditional- Conditional Adversarial
based Autoencoder(CAAE)
[4]

Conditional- Age Conditional


based Generative Adversarial
Neural Network
Model[26]

Input
Conditional- Contextual Generative
based Adversarial Nets (C-
GANs) [27]

Conditional- Identity-Preserved
based Conditional Generative
Adversarial Network
(IPCGANs) [28]

Translation- Subject-dependent
based Deep Aging Path
(SDAP) [7]

Input 15 25 30 43 55 60
Sequence- Two-Stage Approach
based for Identity-Preserving
Face Synthesis[40]

Translation- Child Face Age-


based Progression via Deep
Feature Aging[39]

VOLUME XX, 2017


23 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3157617, IEEE Access

Conditional- Age Progression and


based Regression with Spatial
Attention Modules[44]

Translation- Triple-GAN:
based Progressive Face Aging
with Triple Translation
Loss[36]
Translation- Generative Adversarial
based Style Transfer
Networks for Face
Aging[37]

Conditional- Conditioned-Attention
based Normalization GAN
(CAN-GAN) [48][54]

Conditional Progressive Face Aging


-based with Generative
Adversarial Network
(PFA-GAN)[53]

Conditional Age Flow: Conditional


-based Age Progression and
Regression with
Normalizing Flow[54]

Conditional Disentangled Lifespan


-based face synthesis[56]

VOLUME XX, 2017


24 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3157617, IEEE Access

Conditional Hierarchical Face


-based Aging through
Disentangled Latent
Characteristics[49]

Conditional Lifespan Age


-based Transformation
Synthesis[52]

Conditional Age-Invariant Face


-based Recognition and Age
Synthesis: Multitask
Learning
Framework[57]

Conditional S2GAN: Share Aging


-based Factor Across Age and
Share Aging Trends
Among individual[46]

Conditional Automatic Face Aging


-based In Video via Deep
Enforcement
Learning[47]

Conditional Dual conditional GANs


-based for Face Aging and
Rejuvenations[41]

VOLUME XX, 2017


25 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3157617, IEEE Access

A.3. Dataset Sample Image

Morph Album 1 & 2 FGNET AdienceFaces AGE-DB

IMDB-WIKI CACD UTK-Face

Figure A.3 Dataset sample image

VOLUME XX, 2017


26 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3157617, IEEE Access

A.4. Generative Adversarial Network Approach Category

Generative Adversarial Style Transfer Networks for Face Aging


Translation-
based Triple-GAN: Progressive Face Aging with Triple Translation Loss
approach
Child Face Age-Progression via Deep Feature Aging

Sequence- FaceFeat GAN, two stages a Two-Stage Approach for Identity-Preserving Face Synthesis
based
approach Subject-dependent Deep aging path (SDAP) model
Generative
Conditional Adversarial Autoencoder(CAAE)
Adversarial
Network- Age Conditional Generative Adversarial Neural Network Model
based Contextual Generative Adversarial Nets (C-GANs)
Approach Dual Conditional GAN for Face Aging and Rejuvenation
Identity-Preserved Conditional Generative Adversarial Network (IPCGANs)
Age Progression and Regression with Spatial Attention Modules
S2GAN: Share Aging Factor Across Age and Share Aging Trends Among individual
Conditional
based Automatic Face Aging In Video via Deep Inforcement Learning
approach
Conditioned-Attention Normalization GAN (CAN-GAN)
Hierarchical Face Aging through Disentangled Latent Characteristics
Lifespan Age Transformation Synthesis
Progressive Face Aging with Generative Adversarial Network (PFA-GAN)

Age Flow: Conditional Age Progression and Regression with Normalizing Flow
Disentangled Lifespan face synthesis
Age-Invariant Face Recognition and Age Synthesis: Multitask Learning Framework

Figure A.4. Generative Adversarial Network Approach Category

VOLUME XX, 2017


27 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3157617, IEEE Access

A.5. Generative Adversarial Network Approach timeline

2016 2017 2018 2019 2020 2021


Temporal Non-volume Preserving Progressive Face Aging with
Conditional Adversarial Subject-dependent Deep Aging Path Child Face Age-Progression via
transformation Identity-Preserved Conditional Generative Adversarial Network
Autoencoder(CAAE) (SDAP) Deep Feature Aging
Generative Adversarial Network (PFA-GAN)
(IPCGANs)
Age Conditional Generative Age Flow: Conditional Age
Age Progression and Regression with Triple-GAN: Progressive Face Progression and Regression with
Adversarial Neural Network
Spatial Attention Modules Aging with Triple Translation Loss Normalizing Flow
Model
Two-Stage Approach for Identity-
Preserving Face Synthesis
S2GAN: Share Aging Factor Across Disentangled Lifespan face
Contextual Generative synthesis
Age and Share Aging Trends Among
Adversarial Nets Conditioned-Attention
individual
Generative Adversarial Style Normalization GAN (CAN-GAN)
Age-Invariant Face Recognition
Transfer Networks for Face Aging and Age Synthesis: Multitask
Automatic Face Aging In Video via
Deep Inforcement Learning Learning Framework.
Hierarchical Face Aging through
Dual Conditional GAN for Face Disentangled Latent Characteristics
Aging and Rejuvenation

Lifespan Age Transformation


Synthesis

Figure A.5. Generative Adversarial Network Approach timeline

VOLUME XX, 2017


28 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/

You might also like