Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

1428 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 8, NO.

8, AUGUST 2021

FISS GAN: A Generative Adversarial Network for


Foggy Image Semantic Segmentation
Kunhua Liu, Zihao Ye, Hongyan Guo, Member, IEEE, Dongpu Cao, Member, IEEE,
Long Chen, Senior Member, IEEE, and Fei-Yue Wang, Fellow, IEEE

Abstract—Because pixel values of foggy images are irregularly weather, and when fog exists, the pixel values of foggy
higher than those of images captured in normal weather (clear
images), it is difficult to extract and express their texture. No
images are irregularly higher than those of clear images. As a
method has previously been developed to directly explore the result, the texture of foggy images is less than that of clear
relationship between foggy images and semantic segmentation images. There are already many methods for semantic
images. We investigated this relationship and propose a segmentation of clear images, which can extract and express
generative adversarial network (GAN) for foggy image semantic
segmentation (FISS GAN), which contains two parts: an edge the features of clear images and achieve good semantic
GAN and a semantic segmentation GAN. The edge GAN is segmentation results. However, the performance of these
designed to generate edge information from foggy images to methods on foggy images is poor. This poor performance
provide auxiliary information to the semantic segmentation GAN. occurs because current methods cannot efficiently extract and
The semantic segmentation GAN is designed to extract and
express the texture of foggy images and generate semantic express the features of foggy images. Moreover, foggy image
segmentation images. Experiments on foggy cityscapes datasets data are not sparse, and the current excellent work [6], [7] on
and foggy driving datasets indicated that FISS GAN achieved sparse data cannot be used. Therefore, to date, researchers
state-of-the-art performance. have developed two ways to address this problem:
Index Terms—Edge GAN, foggy images, foggy image semantic
segmentation GAN, semantic segmentation. A. Defogging-Segmentation Methods
In this method, first, a foggy image is converted to a fog-
I. Introduction free image by defogging algorithms, and then the restored
NVIRONMENTAL perception plays a vital role in the image is segmented by a semantic segmentation algorithm.
E fields of autonomous driving [1], robotics [2], etc., and Therefore, the defogging-segmentation method can be
this perception influences the subsequent decisions and separated into two steps.
control of such devices [3]–[5]. Fog is a common form of Step 1: Fog removal. According to the classic atmosphere
Manuscript received March 11, 2021; accepted April 17, 2021. This work scattering model [8], [9], a fog-free image can be represented
was supported in part by the National Key Research and Development by a foggy image
Program of China (2018YFB1305002), the National Natural Science
Foundation of China (62006256), the Postdoctoral Science Foundation of I(x) − A
China (2020M683050), the Key Research and Development Program of J(x) = +A (1)
Guangzhou (202007050002), and the Fundamental Research Funds for the
t(x)
Central Universities (67000-31610134). Recommended by Associate Editor where J(x) is the fog-free image; I(x) is the foggy image; t(x)
Xin Luo. (Corresponding author: Long Chen.)
is the transmission map; and A is the global atmospheric light.
Citation: K. H. Liu, Z. H. Ye, H. Y. Guo, D. P. Cao, L. Chen, and F.-Y.
Wang, “ FISS GAN: A generative adversarial network for foggy image Step 2: Semantic segmentation of fog-free images. When
semantic segmentation,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 8, pp. semantic segmentation is performed, the algorithms’ inputs
1428–1439, Aug. 2021. may be the fog-free image and its auxiliary information or
K. H. Liu, Z. H. Ye, and L. Chen are with the School of Computer Science
and Engineering, Sun Yat-sen University, Guangzhou 510006, China (e-mail: only the fog-free image. Therefore, the problem of semantic
liukh5@mail.sysu.edu.cn;yezh9@mail2.sysu.edu.cn;chenl46@mail.sysu. image segmentation after defogging can be expressed as
edu.cn).
H. Y. Guo is with the State Key Laboratory of Automotive Simulation and S (x) = F( f (J(x), g(x))) (2)
Control and the Department of Control Science and Engineering, Jilin where g(x) is auxiliary information; if there is no auxiliary
University (Campus Nanling), Changchun 130025, China (e-mail:
guohy11@jlu.edu.cn). information, g(x) is self-mapping; f (·) is the relation between
D. P. Cao is with the Mechanical and Mechatronics Engineering J(x) and g(x); F(·) is flection from f (J(x), g(x)) to S (x); and
Department at the University of Waterloo, 200 University Avenue West S (x) is the semantic segmentation image.
Waterloo, ON N2L 3G1, Canada (e-mail: dongpu@uwaterloo.ca).
F.-Y. Wang is with the State Key Laboratory of Management and Control B. Semantic Segmentation Method Based on Transfer Learning
for Complex Systems, Institute of Automation, Chinese Academy of
Sciences, Beijing 100190, and also with the Institute of Systems Engineering, In this method, first, a semantic segmentation model is
Macau University of Science and Technology, Macau 999078, China (e-mail: trained based on clear images. Then, based on the trained
feiyue.wang@ia.ac.cn).
Color versions of one or more of the figures in this paper are available
semantic segmentation model and transfer learning, the
online at http://ieeexplore.ieee.org. semantic segmentation model is trained on foggy images. The
Digital Object Identifier 10.1109/JAS.2021.1004057 semantic segmentation method based on transfer learning can

Authorized licensed use limited to: J.R.D. Tata Memorial Library Indian Institute of Science Bengaluru. Downloaded on April 11,2022 at 14:58:11 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: FISS GAN: A GAN FOR FISS 1429

also be separated into two steps. on a combination of concepts from U_Net [15], called a
Step 1: Training the semantic segmentation model with dilated convolution U_Net. By incorporating dilated
clear images. The method used to obtain the semantic convolution layers and adjusting the feature size in the
segmentation model is the same as that shown in (2). convolutional layer, dilated convolution U_Net has shown
However, the inputs for this method are clear images and their improved feature extraction and expression ability.
auxiliary information or only clear images. The training model 2) A direct FISS method (FISS GAN) that generates
can be expressed as semantic segmentation images under edge information
M = F( f (C(x), g(x))) (3) guidance is proposed. We show our method’s effectiveness
through extensive experiments on foggy cityscapes datasets
where C(x) are the clear images, M is the semantic and foggy driving datasets and achieve state-of-the-art
segmentation model of clear images, and g(x) is the auxiliary performance. To the best of our knowledge, this is the first
information mentioned above. paper to explore a direct FISS method.
Step 2: Training the transfer learning model with foggy The structure of this paper is as follows: Section I is the
images. Using the clear images as the source domain and introduction; Section II introduces the work related to foggy
foggy images as the target domain, the semantic segmentation images and semantic segmentation methods; Section III
model can be trained with foggy images based on the model describes FISS GAN in detail; Section IV describes the
above experiments designed to verify the performance of FISS
S (x) = T (M, I(x)) = T (F( f (C(x), g(x))), I(x)) (4) GAN; and Section V summarizes the full paper.
where T (·) is a transfer learning method, and the other terms
II. Related Work
are the same as defined above.
These two methods can achieve semantic segmentation A. Foggy Images
results for foggy images; however, they are based on defogged
Most studies on foggy images are based on defogging
images or semantic segmentation models trained with clear
methods. Image defogging methods can be divided into
images. Without this information, these two methods are
traditional defogging methods and deep learning-based
useless. This study focuses on a new semantic segmentation
defogging methods. Meanwhile, according to the different
method that directly explores the mapping relationship
processing methods, traditional defogging methods can be
between foggy images and the resulting semantic
divided into image enhancement defogging methods and
segmentation images. The mathematical model can be
physical model-based defogging methods. The methods based
expressed as follows:
on image enhancement [16]–[18] do not consider the fog in
S (x) = F( f (I(x), g(x))). (5) the image and directly improve contrast or highlight image
It is challenging to solve (5). The motivation of this paper is features to make the image clearer and achieve purpose of
to explore a semantic segmentation method that can efficiently image defogging. However, when contrast is improved or
solve (5), which is an efficient method to express the mapping image features are highlighted, some image information will
relationship between foggy images and the resulting semantic be lost, and images defogged by this method will be obviously
segmentation images. distorted.
A generative adversarial network (GAN) is an efficient The methods based on atmospheric scattering models
semantic segmentation method. Luc et al. [10] first explored [19]–[25] consider the fog in the image and study the image
the use of a GAN for clear image semantic segmentation defogging mechanism or add other prior knowledge (scene
because a GAN could enforce forms of higher-order depth information [26], [27]) to produce a clear image.
consistency [11]. Subsequently, [12] and [13] also provided Among these methods, the classic algorithms are the dark
GANs for the semantic segmentation of clear images and channel defogging method proposed by He et al. [23], an
achieved state-of-the-art performance. In this paper, we also approach based on Markov random fields presented by TAN
explore the semantic segmentation method for foggy images [21], and a visibility restoration algorithm proposed by Taral
based on a GAN. Additionally, based on the “lines first, color et al. [28]. The image defogging methods based on
next” approach, edge images are used to provide auxiliary atmospheric scattering models provide better defogging
information for clear image inpainting [14]. This method has results than those obtained by image enhancement. However,
been shown to greatly improve the quality of clear image the parameters used in the methods that utilize atmospheric
inpainting. In this paper, we also analyze the foggy image scattering models to defog an image, such as the defogging
semantic segmentation (FISS) problem using the “ lines first, coefficient and transmittance, are selected according to
color next” approach and use edge images as auxiliary experience, so the resulting image exhibits some distortion.
information. Specifically, we first obtain the edge information With the development of deep learning (DL), recent
of foggy images and then obtain the semantic segmentation research has increasingly explored defogging methods based
results for foggy images under the guidance of this edge on DL. Some researchers obtained the transmission map of a
information. Based on the above ideas, a two-stage FISS GAN fog image through a DL network and then defogged the image
is provided in this paper. The main contributions of this paper based on an atmospheric scattering model [29]–[32]. This
are as follows: kind of method does not need prior knowledge, but its
1) We propose a novel efficient network architecture based dependence on parameters and models will also cause slight

Authorized licensed use limited to: J.R.D. Tata Memorial Library Indian Institute of Science Bengaluru. Downloaded on April 11,2022 at 14:58:11 UTC from IEEE Xplore. Restrictions apply.
1430 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 8, NO. 8, AUGUST 2021

image distortion. Other researchers designed neural networks and solve the limited dataset problem of domain adaptation, Li
to study end-to-end defogging methods [33]–[38]. Moreover, et al. [62] presented a bidirectional learning framework of
with the development of GANs in image inpainting and image semantic segmentation in which the image translation model
enhancement, researchers have also proposed image and the segmentation adaptation model were trained
defogging methods based on GANs [39]–[44], which greatly alternately and while promoting each other.
improve the quality of image defogging. In addition to studies The approaches above can directly address clear images and
on defogging, researchers have studied methods for obtaining achieve state-of-the-art performance. However, these methods
optical flow data from foggy images [45]. cannot handle foggy images very well because of their weak
texture characteristics. To the best of our knowledge, there has
B. Semantic Segmentation been no research on a direct semantic segmentation method
Semantic segmentation is a high-level perception task for for foggy images.
robotic and autonomous driving. Prior semantic segmentation
methods include color slices and conditional random fields III. Foggy Image Semantic Segmentation GAN
(CRFs). With the development of DL, traditional DL-based Unlike current semantic segmentation GANs [10], [12],
semantic segmentation methods have greatly improved the which handle clear images and contain one part, FISS GAN
accuracy of semantic segmentation. The fully convolutional (Fig. 1) handles foggy images and contains two parts: the edge
network (FCN) [1] is the first semantic segmentation method GAN and the semantic segmentation GAN. The purpose of
based on traditional DL. However, due to its pooling the edge GAN is to obtain the edge information of foggy
operation, some information may be lost. Therefore, the images to assist with the semantic segmentation tasks. The
accuracy of semantic segmentation with this method is low. edge directly achieved from foggy images contains all detailed
To increase the accuracy of semantic segmentation, many edge information, while the edge information used for
improved semantic segmentation frameworks [15], [47]–[56] semantic segmentation is only its boundary information.
and improved loss functions [51] were subsequently proposed. Therefore, we use the edge information achieved from the
Most traditional DL-based semantic segmentation methods are ground truth of the semantic segmentation image as the
supervised. Supervised semantic segmentation methods can ground truth in our edge GAN instead of the edge information
achieve good segmentation results, but they require a large from the clear image.
amount of segmentation data. To solve this problem, Hoffman To clarify, we tested these two kinds of edges by the Canny
et al. [57] and Zhang et al. [58] proposed training semantic algorithm [63]. The visual differences between the two edges
segmentation models through a synthetic dataset where the are shown in Fig. 2 . As seen in Fig. 2 , the edge achieved
new model is trained to predict real data by transfer learning. directly from the foggy image contains too much useless
Luc et al. [10] introduced GAN into the field of semantic information for semantic segmentation. In contrast, another
segmentation. The generator’s input is the image that needs to edge is just the boundary of its semantic segmentation, which
be segmented, and the output is the semantic segmentation is appropriate for semantic segmentation.
classification of the image. The discriminator’s input is the The purpose of the semantic segmentation GAN is to
ground truth of the semantic segmentation classification or the accomplish the semantic segmentation of foggy images. The
generated semantic segmentation classification, and the output inputs of the semantic segmentation GAN are foggy images
is the judgment result of whether the input is a true value. In and edge images achieved from the edge GAN, and its outputs
addition, considering GAN’s outstanding performance in are the semantic segmentation results of foggy images.
transfer learning, researchers proposed a series of semantic Therefore, based on the mathematical model for the semantic
segmentation GANs based on transfer learning. Pix2Pix [12] segmentation of foggy images (formula (5)), the mathematical
is a typical GAN model for semantic segmentation that model of FISS GAN can be expressed as follows:
considers semantic segmentation as one image-to-image S (x) = F( f (I(x), Egan (x))) (6)
translation problem and builds a general conditional GAN to
solve it. Because domain adaptation cannot capture pixel-level where F(·) is the semantic segmentation GAN; f (·) is the
and low-level domain shifts, Hoffman et al. [13] proposed concatenate function; I(x) is the foggy image; and Egan (x) is
cycle-consistent adversarial domain adaptation (CYCADA), the edge information achieved from the edge GAN.
which can adapt representations at both the pixel level and A. Dilated Convolution U_Net
feature level and improve the precision of semantic
segmentation. To further improve feature extraction and expression
An unsupervised general framework that extracts the same abilities, we learn convolution and deconvolution features by
features of the source domain and target domain was proposed combining thoughts from U_Net [15] and propose a new
by Murez et al. [59]. To address the domain mismatch network architecture, namely, dilated convolution U_Net
problem between real images and synthetic images, Hong (Fig. 3). Dilated convolution U_Net consists of three
et al. [60] proposed a network that integrates GAN into the convolution layers (C1, C2, and C3), four dilated convolution
FCN framework to reduce the gap between the source and layers (DC), and three fusion layers ( f (C3, DC), f (C2,CT 1),
target domains; Luo et al. [61] proposed a category-level and f (C1,CT 2)). The dilated convolution U_Net contains 4
adversarial network that enforces local semantic consistency dilated convolution layers and can result in a receptive field of
during the trend of global alignment. To improve performance the dilation factor of 19. Fusion layers are the layers that

Authorized licensed use limited to: J.R.D. Tata Memorial Library Indian Institute of Science Bengaluru. Downloaded on April 11,2022 at 14:58:11 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: FISS GAN: A GAN FOR FISS 1431

Dilated convolution U_Net Conv


Transposed Conv

...

C1 f(C1, CT2)
Generated edge images D1_C1 D1_C5
Foggy images
D1_C2
3×H×W Edge generator: G1
D1_C3
G1_C3 D1_C4

Stage 1: Edge GAN Ground truth of edge images Edge discriminator: D1

Dilated convolution U_Net Conv


Transposed Conv

...

C1 f(C1, CT2) Generated semantic D2_C1


Foggy images + segmentation images D2_C5
Semantic segmentation D2_C2
generated edge images D2_C3
generator: G2 D2_C4
G2_C3

Semantic segmentation
Ground truth of semantic
discriminator: D2
segmentation images
Stage 2: Semantic segmentation GAN

Fig. 1. The pipeline of FISS GAN.

f(C1, CT2): (64+64)×H×W


Concat
64×H×W
C1
f(C2, CT1):
(128+128)×(H/2)×(W/2) CT2
Concat
128×(H/2)×(W/2)
C2 f(C3, DC):
(a) Foggy image (b) Edge achieved directly (c) Edge achieved from the (256+256)×(H/4)×(W4) CT1
from the foggy image semantic segmentation Concat
of the foggy image 256×(H/4)×(W4)
C3
Conv
Fig. 2. The visual differences.
Dilated Conv
Transposed Conv DC
256×(H/4)×(W4)
concatenate features from the dilated convolution results or
transposed convolution results with the corresponding Fig. 3. Structure of dilated convolution U_Net.
convolution layer. Similar to the fusion approach of U_Net
[15], we divided the fusion operation into three steps: deconvolution layers are used to express the extracted
Step 1: Fuse C 3 and DC to obtain f (C3, DC) and features. The size of each layer feature is shown in Fig. 3.
deconvolute f (C3, DC) to obtain CT1; The differences between dilated convolution U_Net and
Step 2: Fuse C 2 and CT 1 to obtain f (C2,CT 1) and U_Net [15] are as follows: 1) Dilated convolution U_Net
deconvolute f (C2,CT 1) to obtain CT2; incorporates dilated convolution layers to improve feature
Step 3: Fuse C1 and CT2 to obtain f (C1,CT 2). extraction ability. 2) In feature fusion, because the feature
The fusion approach of this paper is a concatenation sizes of the convolution layers and deconvolution layers in
operation. Three convolution layers and four dilated U_Net [15] are different, the features of the convolution layers
convolution layers are used to extract input features, and two are randomly cropped, and this operation leads to features that

Authorized licensed use limited to: J.R.D. Tata Memorial Library Indian Institute of Science Bengaluru. Downloaded on April 11,2022 at 14:58:11 UTC from IEEE Xplore. Restrictions apply.
1432 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 8, NO. 8, AUGUST 2021

do not correspond. Thus, some information from the fusion The loss function plays an important role in the neural
image might be lost in the fusion step. In the dilated network model. This function determines whether the neural
convolution U_Net [15] proposed in this study, the feature network model converges or achieves good accuracy. The
sizes of the convolution layers and their corresponding edge GAN includes G1 and D1. The loss function includes the
deconvolution layers are the same, which means that the loss function of G1 and that of D1. The inputs of D1 are the
features of the convolution layers can be directly fused with ground truth of the edge images and the edge images
the features of the deconvolution layers. Thus, no information generated from G1, where the ground truth of the edge image
will be lost in the fusion step. 3) U_Net achieves image is achieved by the Canny algorithm [63] from the semantic
feature extraction and expression from the convolution layers, segmentation image. In addition, the output of D1 is whether
maximum pooling layers, upsampling layers (first the bilinear its input is true. Specifically, the output is the probability
layer, then the convolution layer or transformed convolution
matrix (0 ~ 1).
layers) and convolution layers. U_Net consists of 23
The value of the probability matrix is expected to be close
convolution layers, 4 maximum pooling layers and 4
to 1 after the ground truth passes through D 1, which means
upsampling layers. According to the convolution kernel and
that this edge image is the ground truth (the size of the matrix
step size of U_Net, the number of parameters that need to be
is the same as the size of the output matrix, and the label value
trained is 17 268 563. The dilated convolution U_Net propo-
sed in this paper achieves image feature extraction and is 1). In contrast, the value of the probability matrix of the
expression by convolution layers, dilated convolution layers, generated edge image after passing through D1 is close to 0,
transformed convolution layers and convolution layers. This which means that this edge image is a generated edge image
method consists of 3 convolution layers, 4 dilated convolution (the size of the matrix is the same as the size of the output
layers and 2 transformed convolution layers. With the matrix, and the label value is 0). Therefore, the discriminator
convolution kernel and step size of dilated convolution U_Net loss function of the edge GAN (D1 loss) is designed as the
(Table I), the number of parameters that need to be trained is BCE loss of the discriminator output and its corresponding
4 335 424. The more parameters that need to be trained, the label.
more computations that are required. Therefore, dilated Since the output of D 1 includes the true value probability
convolution U_Net has fewer network layers, fewer obtained by taking the ground truth of the edge image as the
parameters, and less computation than U_Net. input and the false value probability obtained by taking the
generated edge image as the input, the D1 loss has two parts:
TABLE I the BCE loss between the true value probability and 1,
Parameters of G1 and G2
namely, true BCE loss, and the BCE loss between the false
Input Kernel Stride Padding value probability and 0, namely, false BCE loss. Specifically,
C1 [7×7, 64] 1 3 D1 loss is the average of true BCE loss and false BCE loss.
C2 [4×4, 128] 2 1 Let the foggy image be F, and the generated edge image be
C3 [4×4, 256] 2 1
g
FE ; let the ground truth of the edge images be FE , and the
Dilated convolution U_Net true value probability be FEL ; let the false value probability
DC [3×3, 256] 1 1
g , and let the D 1 loss be D1Loss. D1Loss can be
be FEL
CT1 [4×4, 128] 2 1
formulated as
CT2 [4×4, 64] 2 1
G1_C3 [7×7, 1] 1 3 BCELoss(FEL, 1) + BCELoss(FEL, g 0)
D1Loss =
G2_C3 [7×7, 19] 1 3 2
g
−log(FEL) − log(FEL)
= . (7)
B. Edge GAN 2
The architecture of the edge GAN, as shown in Fig. 1, The features of the D 1 convolution layer can adequately
includes the edge generator G 1 and edge discriminator D1. express the ground truth of the edge image or the generated
The purpose of G1 is to generate an edge image similar to the edge image. Therefore, we achieve G1’ s ability to generate
ground truth edge image. G 1 is composed of the dilated images by narrowing the gap between the feature of the
convolution U_Net and one convolution layer (G1_C3). ground truth edge image and the feature of the generated edge
Because the edge image is a set of 0 or 255 pixel values, it can image. The gap between the feature of the ground truth edge
be expressed by single-channel image data. Therefore, the size image and the feature of the generated edge image is
of G1_C 3 is 1×H×W. The purpose of D 1 is to determine calculated by L1 losses. Meanwhile, the false BCE loss from
whether the generated edge image is the ground truth image D1Loss indicates the quality of the image generated by G1. A
and provides feedback (please refer to “the false binary cross large false BCE loss indicates that the generated edge image is
entropy (BCE) loss from D1Loss” below) to the edge different with the ground truth image. A small false BCE loss
generator G1 to improve the accuracy of the generated image. indicates that the generated edge image is close to the ground
The design of D1 is similar to that of PatchGAN [64], which truth image. A false BCE loss partly reflects the quality of the
contains five standard convolution layers. generator, and its optimization goal is consistent with that of

Authorized licensed use limited to: J.R.D. Tata Memorial Library Indian Institute of Science Bengaluru. Downloaded on April 11,2022 at 14:58:11 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: FISS GAN: A GAN FOR FISS 1433

the generator, which is to reduce its value. Therefore, it is BCELoss(FS L, 1) + BCELoss(FS gL, 0)
considered part of the generator loss function. D2Loss =
2
Hence, the G 1 loss is the sum of the L 1 losses from each g
−log(FSL) − log(FSL)
layer feature and the false BCE loss. If the input is FE , the = . (9)
convolution layer output of D 1 is DCi (i = 1, 2, 3, 4, 5); if the 2
input is FE , the convolution layer output of D 1 is In this paper, the loss function of G2 is designed as the sum
gi (i = 1, 2, 3, 4, 5). G 1 loss is expressed as G1Loss, and it of three parts: L 1 loss between FS L and gL
FS
DC
(L1Loss(FS L, FS L)), cross-entropy loss between FS and g
g FS
can be formulated as
(CrossEntropyLoss(g FS , FS )), and false BCE loss from

5 D2Loss. Let the loss function of G2 be G2Loss, which can be
G1Loss = gi ) + BCELoss(FEL,
L1Loss(DCi , DC g 0) formulated as
i=1
gL) + BCELoss(FS
G2Loss = L1Loss(FS L, FS gL, 0)

5
= DCi − DC ]
gi − log(FEL). (8) + CrossEntropyLoss(g FS , FS )
   
i=1    
n−1 
∑ 
  eFS  gj  
 

C. Semantic Segmentation GAN g
= FS L − FS L −  
log  n−1  × FS j −log(FS
 gL).
j=0    ∑ g 
 

Similar to the edge GAN, the semantic segmentation GAN   e j  FS 
includes the semantic segmentation generator G 2 and j=0
(10)
semantic segmentation discriminator D2. The goal of G2 is to
generate semantic segmentation classifications with the same The loss functions of edge GANs are mathematical
ground truth as the semantic segmentation classifications. G2 operations (linear operations) of several existing loss
is composed of the dilated convolution U_Net and one functions, which have all been proven to be convergent when
convolution layer (G2_C3). The goal of the semantic proposed and are commonly used in GANs. Therefore,
segmentation GAN is to divide the foggy images into n mathematical operations (linear operations) of several existing
classes. Therefore, the size of G2_C3 is n×H×W. The purpose loss functions are also convergent, as are the loss functions of
of D 2 is to judge whether the generated semantic the semantic segmentation GAN.
segmentation image is the ground truth image and provide IV. Experiments
feedback (please refer to “ the false BCE loss from D2Loss”
below) to the semantic segmentation generator G 2 so that it A. Experimental Setting
can improve the accuracy of the generated image. The The foggy cityscapes dataset [65] is a synthetic foggy
structure of D 2 is the same as that of D 1, which contains 5 dataset with 19 classifications (road, sidewalk, building, wall,
standard convolution layers. etc.) for semantic foggy scene understanding (SFSU). It
The input of D 2 is the ground truth of the semantic contains 2975 training images and 500 valuation images with
segmentation image of the foggy image and the semantic β = 0.005 (β is the attenuation coefficient; the higher the
segmentation image generated by G 2, and its output is the attenuation coefficient is, the more fog there is in the image),
probability matrix (0 ~ 1), which indicates whether the input is 2975 training images and 500 valuation images with β = 0.01,
ground truth. Therefore, similar to the D 1 loss of the edge and 2975 training images and 500 valuation images with β =
GAN, the discriminator loss function of the semantic 0.02. Due to the differences in the attenuation coefficients, we
segmentation GAN (D2 loss) includes two parts: the BCE loss separate the foggy cityscapes dataset into three datasets.
between true value probability and 1, namely, true BCE loss, Dataset 1 is composed of 2975 training images and 500
and the BCE loss between false value probability and 0, valuation images with β = 0.005. Dataset 2 is composed of
namely, false BCE loss. Specifically, D2 loss is the average of 2975 training images and 500 valuation images with β = 0.01,
true BCE loss and false BCE loss. and Dataset 3 is composed of 2975 training images and 500
Let the semantic segmentation image generated by G 2 be valuation images with β = 0.02. The corresponding semantic
]
FS F , and the[ generated semantic segmentation classification segmentation ground truth contains semantic segmentation
]
be gFS ,g
FS = g FS 0 , g gj , g
FS 1 , . . . , FS gj is the jth
FS n−1 , where FS images with color, semantic segmentation images with labels,
images with instance labels and label files with polygon data.
generated semantic segmentation classification; let the ground
The ground truth of edge images is obtained from semantic
truth of
[ semantic segmentation (n ] class) classification be FS; segmentation images with color and by the Canny algorithm
FS = FS 0 , FS 1 , . . . , FS j , FS n−1 , where FS j is the jth ground [63].
truth of semantic segmentation classification; let the ground The foggy driving dataset [65] is a dataset with 101 real-
truth of the semantic segmentation image be FS F , and the world images that can be used to evaluate the trained models.
label obtained by FS be FS L ; let the label obtained by g FS be We separately use Dataset 1, Dataset 2, and Dataset 3 to train
gL , and the discriminator loss function of the semantic
FS the models and use the foggy driving dataset [65] as the test
segmentation GAN be D2Loss, which can also be formulated set to test the trained models. Due to the lack of training data
as and validation data, we carry out random flip, random crop,

Authorized licensed use limited to: J.R.D. Tata Memorial Library Indian Institute of Science Bengaluru. Downloaded on April 11,2022 at 14:58:11 UTC from IEEE Xplore. Restrictions apply.
1434 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 8, NO. 8, AUGUST 2021

rotation, and translation operations on the data during training Therefore, it is easier to extract and express the features of
and verification to avoid the overfitting phenomena. images in Dataset 1 than those of Dataset 2 and Dataset 3.
The activation function of the dilated convolution U_Net is Additionally, we test the pixel accuracy of the edge GAN on
ReLU [66], while that of G1_CT 3 and G2_CT 3 is sigmoid. each dataset. The qualitative experimental results of each
The activation function of the first four layers in D1 and D2 is dataset are shown in Fig. 7, and the quantitative experimental
LeReLU [67], and the parameter value is 0.25, while that of results of each dataset are shown in Table IV . The pixel
the last layer is sigmoid. The optimization algorithm of the accuracy of Dataset 1 is 87.79%, which is slightly larger than
edge GAN and semantic segmentation GAN is Adam [68]. that of Dataset 2 and Dataset 3, which indicates that the edge
The experiment’s input size is 256 × 256, and the number of GAN can efficiently generate edge images, and more edge
training epochs is 100. The edge GAN and semantic features can be extracted from the dataset with less fog.
segmentation GAN architecture parameters are shown in
Tables I and II. C. Convergence Process
We count the validation data of OCR [48], HANet [49] and
TABLE II FISS GAN to create a mean IoU diagram (Fig. 8) and loss
Parameters of D1 and D2
diagram (Fig. 9) for each model. The X-axis of both Fig. 8 and
Input Kernel Stride Padding
Fig. 9 is the epoch. The Y-axis of Fig. 8 is the mean IoU value,
D1_C1, D2_C1 [4×4, 64] 2 1 while the Y -axis of Fig. 9 is the loss value. To be more
D1_C2, D2_C2 [4×4, 128] 2 1 specific, the loss value of OCR [48], HANet [49] were
D1_C3, D2_C3 [4×4, 256] 2 1 obtained from their open-source code, while the loss value of
D1_C4, D2_C4 [4×4, 512] 1 1 FISS GAN is G2Loss. As seen in Fig. 8, the mean IoU value
D1_C5, D2_C5 [4×4, 1] 1 1 of the verification data is not significantly different from that
of the test data. Meanwhile, Fig. 9 shows that the loss of OCR
[48], HANet [49] and FISS GAN tends to decrease or
B. Qualitative and Qualitative Experimental Results stabilize. Therefore, the OCR model [48], HANet model [49]
To the best of our knowledge, there is no direct semantic and FISS GAN model are all convergent models.
segmentation method for foggy images for comparison;
however, OCR [48] and HANet [49] have achieved D. Ablation Study
remarkable results on cityscapes datasets in public without To verify that the dilated convolution in the dilated
additional training data. Among them, HANet [49] achieved convolution U_Net can extract more features than the standard
the best performance. To verify the performance of FISS convolution, we separately use the dilated convolution and
GAN, we compare it with OCR [48] and HANet [49]. Our standard convolution (standard convolution U_Net) to train
training and validation data come from the foggy cityscapes and test the FISS GAN (edge GAN and semantic
dataset mentioned above, and we separately train OCR [48], segmentation GAN). The datasets (training and test datasets),
HANet [49] and FISS GAN on Dataset 1, Dataset 2, and FISS GAN parameters, and epoch numbers are the same as in
Dataset 3. Meanwhile, we use the foggy driving dataset as the the above experiments. The pixel accuracy and mean IoU are
test data. shown in Table V . As seen in Table V , regardless of the
The qualitative experimental results on Dataset 1, Dataset 2, dataset, the pixel accuracy and the mean IoU achieved through
and Dataset 3 are separately shown in Figs. 4–6. The semantic dilated convolution U_Net are higher than those of standard
segmentation effect of FISS GAN is better than that of OCR convolution U_Net.
[48] and HANet [49] on each dataset. To further determine the Additionally, to verify the edge effect on FISS GAN, we
performance of each model, the mean intersection over union replace the edge achieved from the semantic segmentation
(IoU) score of each model is calculated in this paper (Table III). images with edges achieved from foggy images and trained
As shown in Table III, the mean IoU scores of FISS GAN on FISS GAN (edge GAN and semantic segmentation GAN)
Dataset 1, Dataset 2, and Dataset 3 are 69.37%, 65.94%, and with the same experimental settings above. The pixel accuracy
64.01%, respectively, which are all higher than the and mean IoU are shown in Table VI . As seen in Table VI,
with the same dataset, the pixel accuracy and mean IoU
corresponding scores of OCR [48] and HANet [49], and FISS
achieved from the edges of semantic segmentation images are
GAN achieved state-of-the-art performance. These results
slightly higher than those obtained from the other methods.
indicate that FISS GAN can extract more features from a
This experiment indicates that the edge achieved from the
foggy image than OCR [48] and HANet [49]. Meanwhile,
semantic segmentation images could provide more guided
regardless of the method, the mean IoU score on Dataset 1 information than the edges achieved from foggy images.
was higher than that on Dataset 2 and Dataset 3. According to
our analysis, the ultimate reason for this difference is that V. Conclusions and Future Work
images in Dataset 1 have small attenuation coefficients, which Currently, semantic segmentation methods for foggy images
means the image pixels from Dataset 1 are smaller than those are based on fog-free images or clear images, which do not
from Dataset 2 and Dataset 3, and the images in Dataset 1 explore the relation between foggy images and their semantic
have more texture than those in Dataset 2 and Dataset 3. segmentation images. A semantic segmentation method (FISS

Authorized licensed use limited to: J.R.D. Tata Memorial Library Indian Institute of Science Bengaluru. Downloaded on April 11,2022 at 14:58:11 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: FISS GAN: A GAN FOR FISS 1435

(a) Foggy images (b) Ground truth (c) OCR [48] (d) HANet [49] (e) FISS GAN

Fig. 4. The qualitative experimental results of each model on Dataset 1.

(a) Foggy images (b) Ground truth (c) OCR [48] (d) HANet [49] (e) FISS GAN

Fig. 5. The qualitative experimental results of each model on Dataset 2.

(a) Foggy images (b) Ground truth (c) OCR [48] (d) HANet [49] (e) FISS GAN

Fig. 6. The qualitative experimental results of each model on Dataset 3.

GAN) has been proposed in this paper that can directly GAN can directly extract the features from foggy images and
process foggy images. FISS GAN was composed of edge achieve state-of-the-art results for semantic segmentation.
GAN and semantic segmentation GAN. Specifically, FISS Although FISS GAN can directly extract the features from a
GAN first obtained edge information from foggy images with foggy image and realize its semantic segmentation, it cannot
edge GAN and then achieved semantic segmentation results accurately segment a foggy image with a limited texture. In
with semantic segmentation GAN using foggy images and the future, we will focus on designing a more efficient feature
their edge information as inputs. Experiments based on foggy extraction network to improve the accuracy of the semantic
cityscapes and foggy driving datasets have shown that FISS segmentation of foggy images.

Authorized licensed use limited to: J.R.D. Tata Memorial Library Indian Institute of Science Bengaluru. Downloaded on April 11,2022 at 14:58:11 UTC from IEEE Xplore. Restrictions apply.
1436 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 8, NO. 8, AUGUST 2021

TABLE III 2.5


The Mean IoU Score of Each Model 2.0
1.5

Loss
Dataset Models Mean IoU (%) 1.0
OCR [48] 24.44 0.5
Dataset 1 HANet [49] 43.85 0
10 20 30 40 50 60 70 80 90 100
FISS GAN 69.37 Epoch
OCR [48] 20.88 OCR-Dataset 1 OCR-Dataset 2 OCR-Dataset 3
HANet-Dataset 1 HANet-Dataset 2 HANet-Dataset 3
Dataset 2 HANet [49] 41.45 FISS GAN-Dataset 1 FISS GAN-Dataset 2 FISS GAN-Dataset 3
FISS GAN 65.94
OCR [48] 19.94 Fig. 9. Validation loss for OCR [48], HANet [49], and FISS GAN.

Dataset 3 HANet [49] 39.73


FISS GAN 64.01 TABLE V
Comparison Results of Standard Convolution U_Net and
Dilated Convolution U_Net (%)
Pixel accuracy Mean IoU
Datasets Standard Dilated Standard Dilated
convolution convolution convolution convolution
Dataset 1 U_Net U_Net U_Net U_Net
Dataset 1 66.35 7.79 58.95 69.37
Dataset 2 65.97 87.35 56.49 65.94
Dataset 3 64.91 86.92 54.36 64.01

Dataset 2 TABLE VI
Comparison Results of Different Edges (%)
Pixel accuracy Mean IoU
Datasets Edges of Edges of semantic Edges of Edges of semantic
foggy segmentation foggy segmentation
images images images images
Dataset 3 Dataset 1 74.41 87.79 60.07 69.37
Dataset 2 74.25 87.35 58.42 65.94
Dataset 3 73.68 86.92 57.85 64.01
(a) Foggy image (b) Edge GAN (c) Ground truth
References
Fig. 7. The qualitative experimental results of each dataset.
[1] L. Chen, W. J. Zhan, W. Tian, Y. H. He, and Q. Zou, “Deep integration:
A multi-label architecture for road scene recognition,” IEEE Trans.
TABLE IV Image Process., vol. 28, no. 10, pp. 4883–4898, Oct. 2019.
The Quantitative Experimental Results of Each Dataset [2] K. Wada, K. Okada, and M. Inaba, “ Joint learning of instance and
Pixel accuracy (%) semantic segmentation for robotic pick-and-place with heavy occlusions
Datasets
in clutter,” in Proc. IEEE Int. Conf. Robotics and Autom., Montreal,
Dataset 1 87.79 Canada, 2019, pp. 9558–9564.
Dataset 2 87.35 [3] Y. C. Ouyang, L. Dong, L. Xue and C. Y. Sun, “Adaptive control based
Dataset 3 86.92 on neural networks for an uncertain 2-DOF helicopter system with input
deadzone and output constraints,” IEEE/CAA J. Autom. Sinica , vol. 6,
% no. 3, pp. 807–815, May 2019.
80
70 [4] Y. H. Luo, S. N. Zhao, D. S. Yang, and H. W. Zhang, “A new robust
60 adaptive neural network backstepping control for single machine
50 infinite power system with TCSC,” IEEE/CAA J. Autom. Sinica, vol. 7,
Mean IoU

40 no. 1, pp. 48–56, Jan. 2020.


30 [5] N. Zerari, M. Chemachema, and N. Essounbouli, “ Neural network
20 based adaptive tracking control for a class of pure feedback nonlinear
10 systems with input saturation,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 1,
0 pp. 278–290, Jan. 2019.
10 20 30 40 50 60 70 80 90 100
[6] D. Wu and X. Luo, “ Robust latent factor analysis for precise
EPOCH
OCR-Dataset 1 OCR-Dataset 2 OCR-Dataset 3
representation of high-dimensional and sparse data,” IEEE/ CAA J.
HANet-Dataset 1 HANet-Dataset 2 HANet-Dataset 3 Autom. Sinica, pp. 766–805, Dec. 2019.
FISS GAN-Dataset 1 FISS GAN-Dataset 2 FISS GAN-Dataset 3 [7] X. Luo, Y. Yuan, S. L. Chen, N. Y. Zeng, and Z. D. Wang, “Position-
transitional particle swarm optimization-incorporated latent factor
Fig. 8. Validation mean IoU for OCR [48], HANet [49] and FISS GAN. analysis,” IEEE Trans. Knowl. Data. En., pp. 1–1, Oct. 2019.

Authorized licensed use limited to: J.R.D. Tata Memorial Library Indian Institute of Science Bengaluru. Downloaded on April 11,2022 at 14:58:11 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: FISS GAN: A GAN FOR FISS 1437

[8] A. Cantor, “ Optics of the atmosphere: Scattering by molecules and [27] S. G. Narasimhan and S. K. Nayar, “Chromatic framework for vision in
particles,” IEEE J. Quantum. Elect., vol. 14, no. 9, pp. 698–699, Sept. bad weather,” in Proc. IEEE. Int. Conf. Computer Vision and Pattern
1978. Recognition, Hilton Head, USA, 2000, pp. 598–605.
[9] S. G. Narasimhan and S. K. Nayar, “Vision and the atmosphere,” Int. J. [28] J. Tarel and N. Hautière, “Fast visibility restoration from a single color
Comput. Vision, vol. 48, no. 3, pp. 233–254, Jul. 2002. or gray level image,” in Proc. 12th IEEE Int. Conf. Computer Vision,
[10] P. Luc, C. Couprie, S, Chintala, and J. Verbeek, “Semantic Kyoto, Japan, 2009, pp. 2201–2208.
segmentation using adversarial networks,” arXiv preprint arXiv: [29] H. Zhang, V. Sindagi, and V. M. Patel, “ Joint transmission map
1611.08408, Dec. 2016. estimation and dehazing using deep networks,” IEEE Trans. Circ. Syst.
[11] A. Arnab, S. Jayasumana, S. Zheng, and P. H. S. Torr, “ Higher order Vid., vol. 30, no. 7, Jul. 2020.
conditional random fields in deep neural networks,” in Proc. European [30] W. Q. Ren, S. Liu, H. Zhang, J. S. Pan, X. C. Cao, and M. H. Yang,
Conf. Computer Vision, B. Leibe, J. Matas, N. Sebe and M. Welling “Single image dehazing via multi-scale convolutional neural networks,”
Eds. Cham, Germany: Springer, 2016, pp. 524–540. in Proc. European Conf. Computer Vision, B. Leibe, J. Matas, N. Sebe
[12] P. Isola, J. Y. Zhu, T. H. Zhou, and A. Efros, “Image-to-image and M. Welling Eds. Cham, Germany: Springer, 2016, pp. 154–169.
translation with conditional adversarial networks,” in Proc. IEEE. Conf. [31] H. Zhang and V. M. Patel, “ Densely connected pyramid defogging
Computer Vision and Pattern Recognition, Honolulu, USA, 2017, pp. network,” in Proc. IEEE Conf. Computer Vision and Pattern
1125–1134. Recognition, Salt Lake City, USA, 2018, pp. 3194–3203.
[13] J. Hoffman, E. Tzeng, T. Park, Y. J. Zhu, P. Isola, K. Saenko, A. Efros, [32] B. L. Cai, X. M. Xu, K. Jia, C. M. Qing, and D. C. Tao, “DehazeNet:
and T. Darrell, “ Cycada: Cycle-consistent adversarial domain An end-to-end system for single image haze removal,” IEEE Trans.
adaptation,” in Proc. 35th Int. Conf. Machine Learning, Stockholm, Image Process., vol. 25, no. 11, pp. 5187–5198, Nov. 2016.
Sweden, 2018, pp. 1989–1998. [33] D. D. Chen, M. M. He, Q. N. Fan, J. Liao, L. H. Zhang, D. D. Hou, L.
[14] K. Nazeri, E. Ng, T. Joseph, F. Z. Qureshi, and M. Ebrahimi, Yuan, and G. Hua, “ Gated context aggregation network for image
“Edgeconnect: Generative image inpainting with adversarial edge dehazing and deraining,” in Proc. IEEE Winter Conf. Applications of
learning,” arXiv preprint arXiv: 1901.00212, Jan. 2019. Computer Vision, Waikoloa, USA, 2019, pp. 1375–1383.
[15] O. Ronneberger, P. Fischer, and T. Brox, “ U-net: Convolutional [34] S. Y. Huang, H. X. Li, Y. Yang, B. Wang, and N. N. Rao, “An end-to-
networks for biomedical image segmentation,” in Proc. Int. Conf. end dehazing network with transitional convolution layer,” Multidim.
Medical Image Computing and Computer-assisted Intervention. Cham, Syst. Sign P., vol. 31, no. 4, pp. 1603–1623, Mar. 2020.
Germany: Springer, 2015, pp. 234–241.
[35] H. Zhang, V. Sindagi, and V. M. Patel, “ Multi-scale single image
[16] J. Y. Kim, L. S. Kim, and S. H. Hwang, “ An advanced contrast dehazing using perceptual pyramid deep network,” in Proc. IEEE Conf.
enhancement using partially overlapped sub-block histogram Computer Vision and Pattern Recognition Workshops, Salt Lake City,
equalization,” in Proc. Int. Conf. IEEE Symposium on Circuits and USA, 2018, pp. 902–911.
Systems, Geneva, Switzerland, 2000, pp. 537–540.
[36] X. Qin, Z. L. Wang, Y. C. Bai, X. D. Xie, and H. Z. Jia, “FFA-Net:
[17] A. Eriksson, G. Capi, and K. Doya, “ Evolution of meta-parameters in Feature fusion attention network for single image defogging,” arXiv
reinforcement learning algorithm,” in Proc. IEEE/RSJ. Int. Conf. preprint arXiv: 1911.07559, Nov. 2019.
Intelligent Robots and System, Las Vegas, USA, 2003, pp. 412–417.
[37] Q. S. Yi, A. W. Jiang, J. C. Li, J. Y. Wan, and M. W. Wang,
[18] M. J. Seow and V. K. Asari, “ Ratio rule and homomorphic filter for “Progressive back-traced dehazing network based on multi-resolution
enhancement of digital colour image,” Neurocomputing , vol. 69, recurrent reconstruction,” IEEE Access , vol. 8, pp. 54514–54521, Mar.
no. 7–9, pp. 954–958, Mar. 2006. 2020.
[19] S. Shwartz, E. Namer, and Y. Y. Schechner, “Blind haze separation,” in [38] B. Y. Li, X. L. Peng, Z. Y. Wang, J. Z. Xu, and D. Feng, “An all-in-one
Proc. IEEE. Int. Conf. Computer Vision and Pattern Recognition, New network for defogging and beyond,” arXiv preprint arXiv: 1707.06543,
York, USA, 2006, pp. 1984–1991. Jul. 2017.
[20] Y. Y. Schechner and Y. Averbuch, “ Regularized image recovery in [39] H. Y. Zhu, X. Peng, V. Chandrasekhar, L. Y. Li, and J. H. Lim,
scattering media,” IEEE Trans. Pattern Anal. , vol. 29, no. 9, “DehazeGAN: When image dehazing meets differential programming,”
pp. 1655–1660, Sept. 2007. in Proc. 27th Int. Joint Conf. Artificial Intelligence, Stockholm,
[21] R. T. Tan, “ Visibility in bad weather from a single image,” in Proc. Sweden, 2018, pp. 1234–1240.
IEEE. Int. Conf. Computer Vision and Pattern Recognition, Anchorage, [40] R. D. Li, J. S. Pan, Z. C. Li, and J. H. Tang, “Single image dehazing via
USA, 2008, pp. 1–8. conditional generative adversarial network,” in Proc. IEEE Conf.
[22] R. Fattal, “Single image dehazing,” ACM Trans. Graphic., vol. 27, no. 3, Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018,
pp. 1–9, Aug. 2008. pp. 8202–8211.
[23] K. M. He, J. Sun, and X. O. Tang, “ Single image haze removal using [41] D. Engin, A. Genc, and H. K. Ekenel, “ Cycle-dehaze: enhanced
dark channel prior,” IEEE Trans. Pattern Anal. , vol. 33, no. 12, cycleGAN for single image dehazing,” in Proc. IEEE Conf. Computer
pp. 2341–2353, Dec. 2011. Vision and Pattern Recognition Workshops, Salt Lake City, USA, 2018,
[24] K. B. Gibson and T. Q. Nguyen, “ On the effectiveness of the dark pp. 825–833.
channel prior for single image dehazing by approximating with [42] G. Kim, J. Park, S. Ha, and J. Kwon, “ Bidirectional deep residual
minimum volume ellipsoids,” in Proc. IEEE. Int. Conf. Acoustics, learning for haze removal,” in Proc. IEEE Conf. Computer Vision and
Speech and Signal Processing, Prague, Czech Republic, 2011, pp. Pattern Recognition Workshops, Long Beach Convention and
1253–1256. Entertainment Center, USA, 2018, pp. 46–54.
[25] D. F. Shi, B. Li, W. Ding, and Q. M. Chen, “ Haze removal and [43] A. Dudhane and S. Murala, “ CDNet: Single image de-hazing using
enhancement using transmittance-dark channel prior based on object unpaired adversarial training,” in Proc. IEEE Winter Conf. Applications
spectral characteristic,” Acta Autom. Sinica, vol. 39, no. 12, of Computer Vision, Waikoloa, USA, 2019, pp. 1147–1155.
pp. 2064–2070, Dec. 2013. [44] P. Sharma, P. Jain, and A. Sur, “ Scale-aware conditional generative
[26] S. G. Narasimhan and S. K. Nayar, “ Interactive (de) weathering of an adversarial network for image defogging,” in Proc. IEEE Winter Conf.
image using physical models,” in Proc. IEEE Workshop on color and Applications of Computer Vision, Snowmass, USA, 2020, pp.
photometric Methods in computer Vision, vol. 6, no. 4, Article No. 1, 2355–2365.
Jan. 2003. [45] W. D. Yan, A. Sharma, and R. T. Tan, “ Optical flow in dense foggy

Authorized licensed use limited to: J.R.D. Tata Memorial Library Indian Institute of Science Bengaluru. Downloaded on April 11,2022 at 14:58:11 UTC from IEEE Xplore. Restrictions apply.
1438 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 8, NO. 8, AUGUST 2021

scenes using semi-supervised learning,” in Proc. IEEE Conf. Computer [63] J. Canny, “ A computational approach to edge detection,” IEEE Trans.
Vision and Pattern Recognition, Seattle, USA, 2020, pp. 13259–13268. Pattern Anal., vol. PAMI-8, no. 6, pp. 679–698, Nov. 1986.
[46] J. Long, E. Shelhamer, and T. Darrell, “ Fully convolutional networks [64] J. Y. Zhu, T. Park, P. Isola and A. A. Efros, “Unpaired image-to-image
for semantic segmentation,” in Proc. IEEE Conf. Computer Vision and translation using cycle-consistent adversarial networks,” in Proc. IEEE
Pattern Recognition, Boston, USA, 2015, pp. 3431–3440. Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 2223–2232.
[47] L. Chen, W. J. Zhan, J. J. Liu, W. Tian, and D. P. Cao, “Semantic [65] C. Sakaridis, D. X. Dai, and L. Van Gool, “ Semantic foggy scene
segmentation via structured refined prediction and dual global priors,” understanding with synthetic data,” Int. J. Comput. Vision , vol. 126,
in Proc. IEEE Int. Conf. Advanced Robotics and Mechatronics, no. 9, pp. 973–992, Mar. 2018.
Toyonaka, Japan, 2019, pp. 53–58.
[66] V. Nair and G. E. Hinton, “ Rectified linear units improve restricted
[48] Y. H. Yuan, X. L. Chen, and J. D. Wang, “Object-contextual boltzmann machines,” in Proc. 27th Int. Conf. on Int. Conf. on Machine
representations for semantic segmentation,” arXiv preprint arXiv: Learning, F. Johannes and J. Thorsten, Eds, 2010, pp. 807–814.
1909.11065, Sept. 2019.
[67] A. L. Maas, A. Y. Hannun, and A. Y. Ng, “ Rectifier nonlinearities
[49] S. Choi, J. T. Kim, and J. Choo, “ Cars can’t fly up in the sky: improve neural network acoustic models,” in Proc. ICML, vol. 30, no.
Improving urban-scene segmentation via height-driven attention 1, pp. 3, 2013.
networks,” in Proc. IEEE Conf. Computer Vision and Pattern
[68] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
Recognition, Seattle, USA, 2020, pp. 9373–9383.
arXiv preprint arXiv: 1412.6980, Dec. 2014.
[50] M. H. Yin, Z. L. Yao, Y. Cao, X. Li, Z. Zhang, S. Lin, and H. Hu,
“Disentangled non-local neural networks,” arXiv preprint arXiv:
2006.06668, Sept. 2020. Kunhua Liu received the Ph.D. degree from
Shandong University of Science and Technology in
[51] H. S. Zhao, J. P. Shi, X. J. Qi, X. G. Wang, and J. Y. Jia, “Pyramid 2019. She is currently a post-doctoral student with
scene parsing network,” in Proc. IEEE Conf. Computer Vision and the School of Computer Science and Engineering,
Pattern Recognition, Honolulu, USA, 2017, pp. 2881–2890. Sun Yat-sen University. Her research interests
[52] L. C. Chen, Y. K. Zhu, G. Papandreou, F. Schroff, and H. Adam, include computing vision and SLAM.
“Encoder-decoder with atrous separable convolution for semantic image
segmentation,” in Proc. European Conf. Computer Vision, Munich,
Germany, 2018, pp. 801–818.
[53] H. Y. Chen, L. H. Tsai, S. C. Chang, J. Y. Pan, Y. T. Chen, W. Wei,
and D. C. Juan, “ Learning with hierarchical complement objective,”
Zihao Ye received the B.E. degree in software
arXiv preprint arXiv: 1911.07257, Nov, 2019.
engineering from Sun Yat-sen University, in 2020.
[54] J. Fu, J. Liu, H. J. Tian, Y. Li, Y. J. Bao, Z. W. Fang, and H. Q. Lu, He is currently a master student in software
“Dual attention network for scene segmentation,” in Proc. IEEE Conf. engineering at the Sun Yat-Sen University. His
Computer Vision and Pattern Recognition, Long Beach, USA, 2019, pp. research interests include simultaneous localization
3146–3154. and mapping and computer vision.
[55] C. Zhang, G. S. Lin, F. Y. Liu, R. Yao, and C. H. Chen, “Canet: Class-
agnostic segmentation networks with iterative refinement and attentive
few-shot learning,” in Proc. IEEE Conf. Computer Vision and Pattern
Recognition, Long Beach, USA, 2019, pp. 5217–5226.
[56] J. J. He, Z. Y. Deng, L. Zhou, Y. L. Wang, and Y. Qiao, “Adaptive Hongyan Guo (M’ 17) received the Ph.D. degree
pyramid context network for semantic segmentation,” in Proc. IEEE from Jilin University, in 2010. She joined Jilin
Conf. Computer Vision and Pattern Recognition, Long Beach, USA, University, Changchun, China, in 2011, where she
2019, pp. 7519–7528. became an Associate Professor in 2014, and has been
a Professor since 2018. She was a Visiting Scholar at
[57] J. Hoffman, D. Q. Wang, F. Yu, and T. Darrell, “ FCNs in the wild:
Cranfield University in 2017. She became the Tang
Pixel-level adversarial and constraint-based adaptation,” arXiv preprint Aoqing Professor in Jilin University in 2020. Her
arXiv: 1612.02649, Dec, 2016. research interests include stability control, path
[58] Y. H. Zhang, Z. F. Qiu, T. Yao, D. Liu, and T. Mei, “Fully planning and decision making of intelligent vehicles.
convolutional adaptation networks for semantic segmentation,” in Proc.
IEEE Conf. Computer Vision and Pattern Recognition, Salt Lake City,
USA, 2018, pp. 6810–6818. Dongpu Cao (M’08) received the Ph.D. degree from
[59] Z. Murez, S. Kolouri, D. Kriegman, R. Ramamoorthi, and K. Kim, Concordia University, Canada, in 2008. He is the
“Image to image translation for domain adaptation,” in Proc. IEEE Canada Research Chair in Driver Cognition and
Conf. Computer Vision and Pattern Recognition, Salt Lake City, USA, Automated Driving, and currently an Associate
Professor and Director of Waterloo Cognitive
2018, pp. 4500–4509.
Autonomous Driving (CogDrive) Laboratory at
[60] W. X. Hong, Z. Z. Wang, M. Yang, and J. S. Yuan, “Conditional University of Waterloo, Canada. His current research
generative adversarial network for structured domain adaptation,” in focuses on driver cognition, automated driving and
Proc. IEEE Conf. Computer Vision and Pattern Recognition, Salt Lake cognitive autonomous driving. He has contributed
City, USA, 2018, pp. 1335–1344. more than 200 papers and 3 books. He received the
SAE Arch T. Colwell Merit Award in 2012, IEEE VTS 2020 Best Vehicular
[61] Y. W. Luo, L. Zheng, T. Guan, J. Q. Yu, and Y. Yang, “Taking a closer Electronics Paper Award and three Best Paper Awards from the ASME and
look at domain shift: Category-level adversaries for semantics IEEE conferences. Prof. Cao serves as Deputy Editor-in-Chief for IET
consistent domain adaptation,” in Proc. IEEE Conf. Computer Vision Intelligent Transport Systems Journal, and an Associate Editor for IEEE
and Pattern Recognition, Long Beach, USA, 2019, pp. 2502–2511. Transactions on Vehicular Technology, IEEE Transactions on Intelligent
[62] Y. S. Li, L. Yuan, and N. Vasconcelos, “ Bidirectional learning for Transportation Systems, IEEE/ASME Transactions on Mechatronics, IEEE
Transactions on Industrial Electronics, IEEE/CAA Journal of Automatica
domain adaptation of semantic segmentation,” in Proc. IEEE Conf.
Sinica, IEEE Transactions on Computational Social Systems, and ASME
Computer Vision and Pattern Recognition, Long Beach, USA, 2019, pp. Journal of Dynamic Systems, Measurement and Control. He was a Guest
6936–6945. Editor for Vehicle System Dynamics, IEEE Transactions on SMC: Systems

Authorized licensed use limited to: J.R.D. Tata Memorial Library Indian Institute of Science Bengaluru. Downloaded on April 11,2022 at 14:58:11 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: FISS GAN: A GAN FOR FISS 1439

and IEEE Internet of Things Journal. He serves on the SAE Vehicle Outstanding Chinese Talents Program from the State Planning Council, and in
Dynamics Standards Committee and acts as the Co-Chair of IEEE ITSS 2002, was appointed as the Director of the Key Laboratory of Complex
Technical Committee on Cooperative Driving. Prof. Cao is an IEEE VTS Systems and Intelligence Science, CAS, and the Vice President of the
Distinguished Lecturer. Institute of Automation, CAS in 2006. In 2011, he became the State Specially
Appointed Expert and the Founding Director of the State Key Laboratory for
Management and Control of Complex Systems. Dr. Wang has been the Chief
Long Chen (SM’ 18) received the Ph.D. degree in Judge of Intelligent Vehicles Future Challenge (IVFC) since 2009 and the
signal and information processing from Wuhan Director of China Intelligent Vehicles Proving Center (IVPC) at Changshu
University, in 2013. From October 2010 to since 2015. Currently, he is the Director of Intel's International Collaborative
November 2012, he was a co-trained Ph.D. candidate Research Institute on Parallel Driving with CAS and Tsinghua University. His
at the National University of Singapore. He is current research focuses on methods and applications for parallel intelligence,
currently an Associate Professor at the School of social computing, and knowledge automation. He is a Fellow of INCOSE,
Computer Science and Engineering, Sun Yat-sen IFAC, ASME, and AAAS. In 2007, he received the National Prize in Natural
University. He received the IEEE Vehicular Sciences of China, numerous best papers awards from IEEE Transactions, and
Technology Society 2018 Best Land Transportation became an Outstanding Scientist of ACM for his work in intelligent control
Paper Award, IEEE Intelligent Vehicle Symposium and social computing. He received the IEEE ITS Outstanding Application and
2018 Best Student Paper Award and Best Workshop Paper Award. His Research Awards in 2009, 2011, and 2015, respectively, the IEEE SMC
research interests include autonomous driving, robotics, artificial intelligence Norbert Wiener Award in 2014, and became the IFAC Pavel J. Nowacki
where he has contributed more than 110 publications. He serves as an Distinguished Lecturer in 2021. Since 1997, he has been serving as the
Associate Editor for IEEE Transactions on Intelligent Transportation General or Program Chair of over 30 IEEE, INFORMS, IFAC, ACM, and
Systems, IEEE Transactions on Intelligent Vehicles, IEEE/CAA Journal of ASME conferences. He was the President of the IEEE ITS Society from 2005
Automatica Sinica and IEEE Technical Committee on Cyber-Physical Systems to 2007, the IEEE Council of RFID from 2019 to 2021, the Chinese
Newsletter. Association for Science and Technology, USA, in 2005, the American Zhu
Kezhen Education Foundation from 2007 to 2008, the Vice President of the
ACM China Council from 2010 to 2011, the Vice President and the Secretary
Fei-Yue Wang (S’87–M’89–SM’94–F’ 03) received General of the Chinese Association of Automation from 2008–2018. He was
the Ph.D. degree in computer and systems the Founding Editor-in-Chief (EiC) of the International Journal of Intelligent
engineering from the Rensselaer Polytechnic Control and Systems from 1995 to 2000, the IEEE ITS Magazine from 2006
Institute, Troy, NY, USA, in 1990. He joined the to 2007, the IEEE/CAA Journal of Automatica Sinica from 2014 to 2017,
University of Arizona in 1990 and became a China’s Journal of Command and Control from 2015 to 2021, and China’s
Professor and the Director of the Robotics and Journal of Intelligent Science and Technology from 2019 to 2021. He was the
Automation Laboratory and the Program in EiC of the IEEE Intelligent Systems from 2009 to 2012, the IEEE
Advanced Research for Complex Systems. In 1999, Transactions on Intelligent Transportation Systems from 2009 to 2016, and
he founded the Intelligent Control and Systems the IEEE Transactions on Computational Social Systems from 2017 to 2020.
Engineering Center at the Institute of Automation, Currently, he is the President of CAA’s Supervision Council and Vice
Chinese Academy of Sciences (CAS), Beijing, China, under the support of the President of IEEE Systems, Man, and Cybernetics Society.

Authorized licensed use limited to: J.R.D. Tata Memorial Library Indian Institute of Science Bengaluru. Downloaded on April 11,2022 at 14:58:11 UTC from IEEE Xplore. Restrictions apply.

You might also like