Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 17, NO.

5, MAY 2020 789

Classifier-Constrained Deep Adversarial Domain


Adaptation for Cross-Domain Semisupervised
Classification in Remote Sensing Images
Wenxiu Teng , Student Member, IEEE, Ni Wang, Huihui Shi, Yuchan Liu, and Jing Wang

Abstract— This letter presents a classifier-constrained deep used as a generic image representation. Therefore, almost all
adversarial domain adaptation (CDADA) method for cross- these works focus on the methods of acquiring strong image
domain semisupervised classification in remote sensing (RS) representations by transferring a pretrained DCNN to their
images. A deep convolutional neural network (DCNN) is used
to build feature representations to describe the semantic content tasks.
of scenes before the adaptation process. Then, adversarial domain However, RS images are inevitably affected by various
adaptation is used to align the feature distribution of the source human and natural factors, such as sensors, camera perspec-
and the target. Specifically, two different land-cover classifiers are tives, geographic locations, seasons, and weather conditions.
used as a discriminator to consider land-cover decision bound- Therefore, when the source data set is far from the tar-
aries between classes and increase their distance to separate them
from the original land-cover class boundaries. The generator get data set (also known as data shift), transferring strate-
then creates robust transferable features far from the original gies for pretrained DCNNs is likely to yield unsatisfactory
land-cover class boundaries under the classifier constraint. The results. Domain adaptation (DA) can be helpful to solve this
experimental results of six scenarios built from three benchmark problem [9].
RS scene data sets (AID, Merced, and RSI-CB data sets) are DA, as it pertains to transfer learning (TL), is the process of
reported and discussed.
adapting one or more source domains for transferring informa-
Index Terms— Cross-domain classification, deep convolutional tion to improve the performance of a target learner [10]. The
neural networks (DCNNs), domain adaptation (DA), generative DA method for deep features is called deep DA, and it can
adversarial networks (GANs), remote sensing (RS).
generally be categorized into discrepancy-based or adversarial-
based methods. Discrepancy-based methods typically mini-
I. I NTRODUCTION mize the loss of difference between the source and target

S CENE classification in very high-resolution (VHR)


images is a fundamental step in the interpretation of
remote sensing (RS) images and has become an active research
distributions. For example, Othman et al. [11] proposed a
DA network (DAN) for cross-scene classification that opti-
mized a loss function composed of three terms related to
topic in the RS community. The goal of scene classification is discrimination, distance between the source and target data
to categorize scene images into a discrete set of meaningful distributions, and geometric structure. Ammour et al. [12]
classes according to the image contents [1]. proposed an asymmetric adaptation neural network method
Over the years, great efforts have been made to develop for cross-domain classification in RS images, and this network
powerful approaches for the scene classification of RS images, addresses the data-shift problem via an asymmetric adaptation
from pioneering work that introduced the bag-of-visual-words layer and learns its weights by jointly minimizing two losses
(BoVWs) model [2], [3] to the current method of using related to distribution discrepancy and class discrimination.
deep convolutional neural networks (DCNNs) [4]–[8]. Studies Adversarial-based methods work as a two-player game sim-
have found that intermediate features extracted from a DCNN ilar to generative adversarial networks (GANs) [13]. These
pretrained on large-scale data sets, such as ImageNet, can be methods divide the base network into a feature generator net-
work G and classifier C and add a separate domain classifier
Manuscript received March 11, 2019; revised May 24, 2019 and
July 3, 2019; accepted July 23, 2019. Date of publication August 12, 2019; (discriminator) network D. The discriminator is learned by
date of current version April 22, 2020. This work was supported in part by minimizing the classification error associated with distinguish-
the National Natural Science Foundation of China under Grant 41601455 and ing the source from the target domains, while the generator
in part by the Key Projects of Anhui Natural Science Research in Universities
under Grant KJ2017A416. (Corresponding authors: Wenxiu Teng; Ni Wang.) learns transferable representations that are indistinguishable
W. Teng is with the College of Forestry, University of Nanjing Forestry, to confound the domain discriminator, and thus align features
Nanjing 210037, China (e-mail: wenxiu.teng@ieee.org). across domains. For example, Ganin et al. [14] proposed a
N. Wang, H. Shi, Y. Liu, and J. Wang are with the College of Geographic
Information and Tourism, University of Chuzhou, Chuzhou 239000, China domain adversarial neural network (DANN) method to match
(e-mail: wnstrive@163.com; shihuihui899@163.com; 403536729@qq.com; the source and target features by making them indistinguish-
71536409@qq.com). able for a domain discriminator. Tzeng et al. [15] proposed
Color versions of one or more of the figures in this letter are available
online at http://ieeexplore.ieee.org. a unified framework for DA techniques based on adversarial
Digital Object Identifier 10.1109/LGRS.2019.2931305 learning objectives that combine adversarial and discriminative
1545-598X © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

thorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on December 07,2023 at 02:31:24 UTC from IEEE Xplore. Restrictions app
790 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 17, NO. 5, MAY 2020

Fig. 1. Proposed CDADA method.

learning. Elshamli et al. [16] applied a DANN method to The results show that CDADA is effective for cross-
both hyperspectral and multispectral images under different domain classification in RS images.
DA scenarios. Bashmal et al. [17] developed an approach to
learn invariant representations for aerial vehicle image cate- II. C LASSIFIER -C ONSTRAINED D EEP A DVERSARIAL
gorization using adversarial networks. Adversarial methods, D OMAIN A DAPTATION
s
in particular, have become increasingly popular because of Let us consider X s = {(x is , yis )}ni=1 as a source domain of
their simplicity in training and success in minimizing the t
n s labeled images and X t = {(x tj )}nj =1 as a target domain of n t
domain shift. unlabeled images. Fig. 1 shows the flowchart of the proposed
However, a major drawback of deep adversarial DA is method. The network architecture is composed of a generator
that the discriminator only attempts to distinguish the fea- network G, which takes inputs X s and X t , and three classifiers
tures as either a source or a target and aims to match the C, C1 , C2 , which take features from G. In addition, please note
feature distributions between different domains globally with- that the two classifiers C1 , C2 are used as a discriminator.
out considering the land-cover decision boundaries between For the network training, we first train the generator network
classes. This process results in the generation of ambiguous and classifiers on the labeled source images by optimizing
features near land-cover class boundaries and the reduction in the cross-entropy loss using the Adam method. The objective
classification accuracy. To solve this problem, two different function is given as follows:
land-cover classifiers are used as a discriminator to consider
land-cover class boundaries when aligning the feature distri- 
K

bution of the source and the target. This method attempts min Lcls (X s , Ys ) = −E(xs ,ys )∼(X s ,Ys ) I[k=ys ] log C(G(x s ))
G,C
k=1
to create robust transferable features far from the original
(1)
land-cover class boundaries under the classifier constraint to
improve the classification accuracy. where I is an indicator function that is equal to 1 if a statement
The main contributions of this letter can be summarized is true and 0 otherwise; log represents the log-likelihood cost
based on two major aspects. function; and K is the number of categories.
1) We propose a novel DA method called classifier- The adversarial DA method is then used to align the feature
constrained deep adversarial DA (CDADA) for cross- distribution of the source and the target. Two different land-
domain semisupervised classification in RS images. This cover classifiers C1 , C2 are used as a discriminator. To extend
method uses two different land-cover classifiers as a their distance and separate them from the original land-cover
discriminator to consider land-cover decision boundaries class boundaries, the Manhattan distance output by the class
between classes in the process of aligning the feature probabilities of the two classifiers is used to measure their
distribution of the source and target. distance. Specifically, generator G is used to extract the deep
2) Our method is applied to six scenarios and com- feature of the target images, and then, classifiers C1 , C2 are
pared with recently proposed advanced DA techniques. used to classify images into K classes; that is, they output

thorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on December 07,2023 at 02:31:24 UTC from IEEE Xplore. Restrictions app
TENG et al.: CDADA FOR CROSS-DOMAIN SEMISUPERVISED CLASSIFICATION 791

Fig. 2. Sample images of the common classes extracted from the AID, Merced, and RSI-CB data sets.

a K -dimensional vector of logits. The softmax function is used Algorithm 1 Proposed Method
to obtain class probabilities through the vector, and it is given Input:
s
as follows: Labeled source images, X s = {(x is , yis )}ni=1 ;
t
p (1) = softmax(C1 (G(x t ))) (2) Unlabeled target images, X t = {(x tj )}nj =1
Output:
p(2) = softmax(C2 (G(x t ))) (3) t
Target class labels, X t = {(y tj )}nj =1
where p(1) , p (2) are the K -dimensional probabilistic outputs 1: Set parameters:
for C1 , C2 , respectively, and softmax is the softmax function. Epoch: num_epoch = 100;
The absolute values of the difference between the two clas- Mini-batch size: b = 32;
sifiers’ probabilistic outputs (Manhattan distance) are used as Adam parameters: learning rate η = 2.0 × 10−4 , exponen-
the distance between two classifiers. The adversarial loss is tial decay rate for the first and second moments β1 = 0.9,
given as follows: β2 = 0.999, and epsilon = 1.0 × 10−8
  2: Use the VGG16 network trained on the ImageNet data
1   (1) (2) 
K
Ladv (X t ) = Ext ∼X t pk − pk (4) set as a pretrained DCNN to initialize a feature generator
K network G and classifier C,C1 ,C2 individually
k=1
(1) (2)
3: for epoch = 1 : num_epoch do
where pk and pk are the probability outputs of p(1) and 4: Randomly shuffle the labeled source images and unla-
p(2) for class k, respectively. beled target images and organize them into Nb groups
For adversarial training, we first maximize the probabilistic each of size m
outputs distance of two classifiers so that they are far from the 5: for k = 1 : Nb do
original land-cover class boundaries. Specifically, we train the 6: Pick mini-batches X sk , X t k from X s and X t
classifiers C1 , C2 as a discriminator for a fixed generator G. 7: Train G,C,C1 ,C2 on X sk by optimizing the objective
The objective function is given as follows: function in (1) using the Adam method
max Ladv (X t ). (5) 8: Train C1 ,C2 on X t k by optimizing the objective func-
C 1 ,C 2 tion in (5) using the Adam method
Then, to minimize the probabilistic outputs distance of 9: Train G on X t k by optimizing the objective function in
two classifiers to confound the discriminator, the generator (6) using the Adam method
creates robust features far from the original land-cover class 10: end for
boundaries. Specifically, we train the generator G to minimize 11: end for
t
the adversarial loss for fixed classifiers. The objective function 12: Classify the target domain {(x tj )}nj =1 using G and C.
t
is given as follows: 13: return {(y tj )}nj =1

min Ladv (X t ). (6)


G
The discriminator and the generator are iteratively optimized III. E XPERIMENTAL R ESULTS
in a two-player game akin to the original GANs setting, where
the goal of the discriminator is to extend their distance to keep A. Data Set Description
them away from the original land-cover class boundaries and The following three heterogeneous RS scene data sets,
where the goal of generator is to confound the discriminator which were collected and labeled by different experts, are used
so that robust transferable features are created far from the to build the cross-domain scene data sets: the RSI-CB [18],
original land-cover class boundaries. AID [7], and UC Merced [2] data sets.
In Algorithm 1, we present the steps of the proposed The RSI-CB data set is a worldwide large-scale benchmark
method. for RS image classification and was constructed by collecting

thorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on December 07,2023 at 02:31:24 UTC from IEEE Xplore. Restrictions app
792 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 17, NO. 5, MAY 2020

Fig. 3. Relationship between adversarial loss and accuracy during training of AID ↔ Merced, AID ↔ RSI-CB, and Merced ↔ RSI-CB cross-domain scene
data sets. (a) AID → Merced. (b) AID → RSI-CB. (c) Merced → RSI-CB. (d) Merced → AID. (e) RSI-CB → AID. (f) RSI-CB → Merced.

sample images from Google Earth imagery and Bing Maps. TABLE I
This data set is annotated with crowdsource data, including C OMMON C LASSES E XTRACTED F ROM THE AID,
M ERCED , AND RSI-CB D ATA S ETS
Open Street Map (OSM) data. This benchmark has two
subdata sets with sizes as 256 × 256 and 128 × 128. The
former contains six categories with 35 subclasses of more than
24 000 images. In this letter, we only use data sets of size
256 × 256. The AID data set consists of large-scale aerial
images of size 600 × 600 with multiple resolutions (8–0.5 m),
and it was constructed by collecting sample images from
Google Earth imagery. The UC Merced data set is a 21-class
land-use image data set that contains RGB images of size
256 × 256 with a pixel resolution of 0.3 m.
From the above data sets, we build six cross-domain scene
data sets termed AID ↔ Merced, AID ↔ RSI-CB, and
Merced ↔ RSI-CB by extracting the most common classes
through visual inspection. Fig. 2 shows some sample images of
these cross-domain scene data sets. Table I shows the number
of images per class extracted from each data set, and Table II
provides the number of training and testing images used for
each scenario.

B. Experimental Setup
We use the VGG16 network trained on the ImageNet data
set as a pretrained DCNN and fine-tune only the final feature
layer and the fully connected layer. We set the batch size
as 32 and use Adam with a learning rate η = 2.0 × 10−4 ,
an exponential decay rate for the first and second moments
β1 = 0.9, β2 = 0.999, and an epsilon = 1.0 × 10−8
as an optimizer. We report the accuracy after 100 epochs.
In addition, we compare our results with the fine-tuned source and target features by making them indistinguishable
VGG16 without adaptation; the DAN [11], which optimizes for a domain discriminator; and the adversarial discriminative
a loss function composed of three terms related to discrimi- DA (ADDA) [15], which combines adversarial and discrimina-
nation, distance between source and target data distributions, tive learning. DAN, DANN, and ADDA use the same network
and geometric structure; the DANN [14], which matches the and parameters as the method in this letter.

thorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on December 07,2023 at 02:31:24 UTC from IEEE Xplore. Restrictions app
TENG et al.: CDADA FOR CROSS-DOMAIN SEMISUPERVISED CLASSIFICATION 793

TABLE II class boundaries when aligning the feature distribution of the


L ABELED AND U NLABELED I MAGES U SED FOR source and the target. The experimental results on six cross-
E ACH C LASSIFICATION S CENARIO
domain scene data sets demonstrate the effectiveness of the
proposed method.

ACKNOWLEDGMENT
The authors would like to thank two anonymous reviewers
for carefully reviewing this letter and giving valuable com-
ments to improve this letter.

R EFERENCES
[1] G. Cheng, J. Han, and X. Lu, “Remote sensing image scene classifi-
cation: Benchmark and state of the art,” Proc. IEEE, vol. 105, no. 10,
TABLE III
pp. 1865–1883, Oct. 2017.
P ERFORMANCES IN T ERMS OF AVERAGE A CCURACY [2] Y. Yang and S. Newsam, “Bag-of-visual-words and spatial extensions
FOR S IX C ROSS -D OMAIN S CENE D ATA S ETS for land-use classification,” in Proc. ACM, New York, NY, USA, 2010,
pp. 270–279.
[3] S. Chen and Y. Tian, “Pyramid of spatial relatons for scene-level land
use classification,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 4,
pp. 1947–1957, Apr. 2015.
[4] F. Hu, G.-S. Xia, J. Hu, and L. Zhang, “Transferring deep convolutional
neural networks for the scene classification of high-resolution remote
sensing imagery,” Remote Sens., vol. 7, no. 11, pp. 14680–14707,
Nov. 2015.
[5] D. Marmanis, M. Datcu, T. Esch, and U. Stilla, “Deep learning earth
observation classification using ImageNet pretrained networks,” IEEE
Geosci. Remote Sens. Lett., vol. 13, no. 1, pp. 105–109, Jan. 2016.
[6] G. J. Scott, M. R. England, W. A. Starms, R. A. Marcum, and
C. H. Davis, “Training deep convolutional neural networks for land–
C. Results cover classification of high-resolution imagery,” IEEE Geosci. Remote
Fig. 3 shows the relationship between the adversarial loss Sens. Lett., vol. 14, no. 4, pp. 549–553, Apr. 2017.
[7] G.-S. Xia et al., “AID: A benchmark data set for performance evaluation
and the accuracy during the training of the six cross-domain of aerial scene classification,” IEEE Trans. Geosci. Remote Sens.,
scene data sets. For the AID ↔ Merced, AID ↔ RSI-CB, vol. 55, no. 7, pp. 3965–3981, Jul. 2017.
and Merced ↔ RSI-CB cross-domain scene data sets, [8] F. Hu, G.-S. Xia, W. Yang, and L. Zhang, “Recent advances and oppor-
tunities in scene classification of aerial images with deep models,” in
the adversarial loss diminishes and the accuracy improves, Proc. IEEE Int. Geosci. Remote Sens. Symp., Jul. 2018, pp. 4371–4374.
thus confirming that minimizing the adversarial loss for target [9] D. Tuia, C. Persello, and L. Bruzzone, “Domain adaptation for the
images can increase the accuracy of the adaptation. classification of remote sensing data: An overview of recent advances,”
IEEE Geosci. Remote Sens. Mag., vol. 4, no. 2, pp. 41–57, Jun. 2016.
Table III shows the classification accuracies for the six data [10] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Trans.
sets. The average accuracy of this method for six cross-domain Knowl. Data Eng., vol. 22, no. 10, pp. 1345–1359, Oct. 2010.
scene data sets is 92.17%, which is 13.45% higher than that of [11] E. Othman, Y. Bazi, F. Melgani, H. Alhichri, N. Alajlan, and M. Zuair,
“Domain adaptation network for cross-scene classification,” IEEE Trans.
fine-tuned VGG16 without adaptation. DAN has the highest Geosci. Remote Sens., vol. 55, no. 8, pp. 4441–4456, Aug. 2017.
average accuracy among several other adaptive methods in [12] N. Ammour, L. Bashmal, Y. Bazi, M. M. A. Rahhal, and M. Zuair,
the field, although it is still 3.75% lower than the method “Asymmetric adaptation of deep features for cross-domain classification
in remote sensing imagery,” IEEE Geosci. Remote Sens. Lett., vol. 15,
in this paper. The accuracy of the method in this letter is no. 4, pp. 597–601, Apr. 2018.
the highest for all data sets except the Merced → RSI-CB, [13] I. J. Goodfellow et al., “Generative adversarial nets,” in Proc. NIPS,
for which the accuracy is slightly lower than that of DAN. Cambridge, MA, USA, 2014, pp. 2672–2680.
[14] Y. Ganin et al., “Domain-adversarial training of neural networks,”
The experimental results show that our method can generate J. Mach. Learn. Res., vol. 17, no. 1, pp. 2030–2096, Jan. 2016.
more robust transferable features and improve classification [15] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell, “Adversarial discrim-
accuracy. inative domain adaptation,” in Proc. CVPR, Jul. 2017, pp. 2962–2971.
[16] A. Elshamli, G. W. Taylor, A. Berg, and S. Areibi, “Domain adaptation
using representation learning for the classification of remote sensing
IV. C ONCLUSION images,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 10,
In this letter, a CDADA method is proposed for cross- no. 9, pp. 4198–4209, Sep. 2017.
[17] L. Bashmal, Y. Bazi, H. AlHichri, M. M. Al Rahhal, N. Ammour, and
domain semisupervised classification in RS images. This N. Alajlan, “Siamese-GAN: Learning invariant representations for aerial
method uses DCNN built feature representations to describe vehicle image categorization,” Remote Sens., vol. 10, no. 2, p. 351,
the semantic content of scenes and uses two different land- Feb. 2018.
[18] H. Li, C. Tao, Z. Wu, J. Chen, J. Gong, and M. Deng, “RSI-CB: A large
cover classifiers as a constraint to compel a generator to create scale remote sensing image classification benchmark via crowdsource
robust transferable features far from the original land-cover data,” 2017, [Online]. Available: https://arxiv.org/abs/1705.10450

thorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on December 07,2023 at 02:31:24 UTC from IEEE Xplore. Restrictions app

You might also like