Professional Documents
Culture Documents
Classifier-Constrained Deep Adversarial Domain Adaptation For Cross-Domain Semisupervised Classification in Remote Sensing Images
Classifier-Constrained Deep Adversarial Domain Adaptation For Cross-Domain Semisupervised Classification in Remote Sensing Images
Abstract— This letter presents a classifier-constrained deep used as a generic image representation. Therefore, almost all
adversarial domain adaptation (CDADA) method for cross- these works focus on the methods of acquiring strong image
domain semisupervised classification in remote sensing (RS) representations by transferring a pretrained DCNN to their
images. A deep convolutional neural network (DCNN) is used
to build feature representations to describe the semantic content tasks.
of scenes before the adaptation process. Then, adversarial domain However, RS images are inevitably affected by various
adaptation is used to align the feature distribution of the source human and natural factors, such as sensors, camera perspec-
and the target. Specifically, two different land-cover classifiers are tives, geographic locations, seasons, and weather conditions.
used as a discriminator to consider land-cover decision bound- Therefore, when the source data set is far from the tar-
aries between classes and increase their distance to separate them
from the original land-cover class boundaries. The generator get data set (also known as data shift), transferring strate-
then creates robust transferable features far from the original gies for pretrained DCNNs is likely to yield unsatisfactory
land-cover class boundaries under the classifier constraint. The results. Domain adaptation (DA) can be helpful to solve this
experimental results of six scenarios built from three benchmark problem [9].
RS scene data sets (AID, Merced, and RSI-CB data sets) are DA, as it pertains to transfer learning (TL), is the process of
reported and discussed.
adapting one or more source domains for transferring informa-
Index Terms— Cross-domain classification, deep convolutional tion to improve the performance of a target learner [10]. The
neural networks (DCNNs), domain adaptation (DA), generative DA method for deep features is called deep DA, and it can
adversarial networks (GANs), remote sensing (RS).
generally be categorized into discrepancy-based or adversarial-
based methods. Discrepancy-based methods typically mini-
I. I NTRODUCTION mize the loss of difference between the source and target
thorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on December 07,2023 at 02:31:24 UTC from IEEE Xplore. Restrictions app
790 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 17, NO. 5, MAY 2020
learning. Elshamli et al. [16] applied a DANN method to The results show that CDADA is effective for cross-
both hyperspectral and multispectral images under different domain classification in RS images.
DA scenarios. Bashmal et al. [17] developed an approach to
learn invariant representations for aerial vehicle image cate- II. C LASSIFIER -C ONSTRAINED D EEP A DVERSARIAL
gorization using adversarial networks. Adversarial methods, D OMAIN A DAPTATION
s
in particular, have become increasingly popular because of Let us consider X s = {(x is , yis )}ni=1 as a source domain of
their simplicity in training and success in minimizing the t
n s labeled images and X t = {(x tj )}nj =1 as a target domain of n t
domain shift. unlabeled images. Fig. 1 shows the flowchart of the proposed
However, a major drawback of deep adversarial DA is method. The network architecture is composed of a generator
that the discriminator only attempts to distinguish the fea- network G, which takes inputs X s and X t , and three classifiers
tures as either a source or a target and aims to match the C, C1 , C2 , which take features from G. In addition, please note
feature distributions between different domains globally with- that the two classifiers C1 , C2 are used as a discriminator.
out considering the land-cover decision boundaries between For the network training, we first train the generator network
classes. This process results in the generation of ambiguous and classifiers on the labeled source images by optimizing
features near land-cover class boundaries and the reduction in the cross-entropy loss using the Adam method. The objective
classification accuracy. To solve this problem, two different function is given as follows:
land-cover classifiers are used as a discriminator to consider
land-cover class boundaries when aligning the feature distri-
K
bution of the source and the target. This method attempts min Lcls (X s , Ys ) = −E(xs ,ys )∼(X s ,Ys ) I[k=ys ] log C(G(x s ))
G,C
k=1
to create robust transferable features far from the original
(1)
land-cover class boundaries under the classifier constraint to
improve the classification accuracy. where I is an indicator function that is equal to 1 if a statement
The main contributions of this letter can be summarized is true and 0 otherwise; log represents the log-likelihood cost
based on two major aspects. function; and K is the number of categories.
1) We propose a novel DA method called classifier- The adversarial DA method is then used to align the feature
constrained deep adversarial DA (CDADA) for cross- distribution of the source and the target. Two different land-
domain semisupervised classification in RS images. This cover classifiers C1 , C2 are used as a discriminator. To extend
method uses two different land-cover classifiers as a their distance and separate them from the original land-cover
discriminator to consider land-cover decision boundaries class boundaries, the Manhattan distance output by the class
between classes in the process of aligning the feature probabilities of the two classifiers is used to measure their
distribution of the source and target. distance. Specifically, generator G is used to extract the deep
2) Our method is applied to six scenarios and com- feature of the target images, and then, classifiers C1 , C2 are
pared with recently proposed advanced DA techniques. used to classify images into K classes; that is, they output
thorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on December 07,2023 at 02:31:24 UTC from IEEE Xplore. Restrictions app
TENG et al.: CDADA FOR CROSS-DOMAIN SEMISUPERVISED CLASSIFICATION 791
Fig. 2. Sample images of the common classes extracted from the AID, Merced, and RSI-CB data sets.
a K -dimensional vector of logits. The softmax function is used Algorithm 1 Proposed Method
to obtain class probabilities through the vector, and it is given Input:
s
as follows: Labeled source images, X s = {(x is , yis )}ni=1 ;
t
p (1) = softmax(C1 (G(x t ))) (2) Unlabeled target images, X t = {(x tj )}nj =1
Output:
p(2) = softmax(C2 (G(x t ))) (3) t
Target class labels, X t = {(y tj )}nj =1
where p(1) , p (2) are the K -dimensional probabilistic outputs 1: Set parameters:
for C1 , C2 , respectively, and softmax is the softmax function. Epoch: num_epoch = 100;
The absolute values of the difference between the two clas- Mini-batch size: b = 32;
sifiers’ probabilistic outputs (Manhattan distance) are used as Adam parameters: learning rate η = 2.0 × 10−4 , exponen-
the distance between two classifiers. The adversarial loss is tial decay rate for the first and second moments β1 = 0.9,
given as follows: β2 = 0.999, and epsilon = 1.0 × 10−8
2: Use the VGG16 network trained on the ImageNet data
1 (1) (2)
K
Ladv (X t ) = Ext ∼X t pk − pk (4) set as a pretrained DCNN to initialize a feature generator
K network G and classifier C,C1 ,C2 individually
k=1
(1) (2)
3: for epoch = 1 : num_epoch do
where pk and pk are the probability outputs of p(1) and 4: Randomly shuffle the labeled source images and unla-
p(2) for class k, respectively. beled target images and organize them into Nb groups
For adversarial training, we first maximize the probabilistic each of size m
outputs distance of two classifiers so that they are far from the 5: for k = 1 : Nb do
original land-cover class boundaries. Specifically, we train the 6: Pick mini-batches X sk , X t k from X s and X t
classifiers C1 , C2 as a discriminator for a fixed generator G. 7: Train G,C,C1 ,C2 on X sk by optimizing the objective
The objective function is given as follows: function in (1) using the Adam method
max Ladv (X t ). (5) 8: Train C1 ,C2 on X t k by optimizing the objective func-
C 1 ,C 2 tion in (5) using the Adam method
Then, to minimize the probabilistic outputs distance of 9: Train G on X t k by optimizing the objective function in
two classifiers to confound the discriminator, the generator (6) using the Adam method
creates robust features far from the original land-cover class 10: end for
boundaries. Specifically, we train the generator G to minimize 11: end for
t
the adversarial loss for fixed classifiers. The objective function 12: Classify the target domain {(x tj )}nj =1 using G and C.
t
is given as follows: 13: return {(y tj )}nj =1
thorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on December 07,2023 at 02:31:24 UTC from IEEE Xplore. Restrictions app
792 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 17, NO. 5, MAY 2020
Fig. 3. Relationship between adversarial loss and accuracy during training of AID ↔ Merced, AID ↔ RSI-CB, and Merced ↔ RSI-CB cross-domain scene
data sets. (a) AID → Merced. (b) AID → RSI-CB. (c) Merced → RSI-CB. (d) Merced → AID. (e) RSI-CB → AID. (f) RSI-CB → Merced.
sample images from Google Earth imagery and Bing Maps. TABLE I
This data set is annotated with crowdsource data, including C OMMON C LASSES E XTRACTED F ROM THE AID,
M ERCED , AND RSI-CB D ATA S ETS
Open Street Map (OSM) data. This benchmark has two
subdata sets with sizes as 256 × 256 and 128 × 128. The
former contains six categories with 35 subclasses of more than
24 000 images. In this letter, we only use data sets of size
256 × 256. The AID data set consists of large-scale aerial
images of size 600 × 600 with multiple resolutions (8–0.5 m),
and it was constructed by collecting sample images from
Google Earth imagery. The UC Merced data set is a 21-class
land-use image data set that contains RGB images of size
256 × 256 with a pixel resolution of 0.3 m.
From the above data sets, we build six cross-domain scene
data sets termed AID ↔ Merced, AID ↔ RSI-CB, and
Merced ↔ RSI-CB by extracting the most common classes
through visual inspection. Fig. 2 shows some sample images of
these cross-domain scene data sets. Table I shows the number
of images per class extracted from each data set, and Table II
provides the number of training and testing images used for
each scenario.
B. Experimental Setup
We use the VGG16 network trained on the ImageNet data
set as a pretrained DCNN and fine-tune only the final feature
layer and the fully connected layer. We set the batch size
as 32 and use Adam with a learning rate η = 2.0 × 10−4 ,
an exponential decay rate for the first and second moments
β1 = 0.9, β2 = 0.999, and an epsilon = 1.0 × 10−8
as an optimizer. We report the accuracy after 100 epochs.
In addition, we compare our results with the fine-tuned source and target features by making them indistinguishable
VGG16 without adaptation; the DAN [11], which optimizes for a domain discriminator; and the adversarial discriminative
a loss function composed of three terms related to discrimi- DA (ADDA) [15], which combines adversarial and discrimina-
nation, distance between source and target data distributions, tive learning. DAN, DANN, and ADDA use the same network
and geometric structure; the DANN [14], which matches the and parameters as the method in this letter.
thorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on December 07,2023 at 02:31:24 UTC from IEEE Xplore. Restrictions app
TENG et al.: CDADA FOR CROSS-DOMAIN SEMISUPERVISED CLASSIFICATION 793
ACKNOWLEDGMENT
The authors would like to thank two anonymous reviewers
for carefully reviewing this letter and giving valuable com-
ments to improve this letter.
R EFERENCES
[1] G. Cheng, J. Han, and X. Lu, “Remote sensing image scene classifi-
cation: Benchmark and state of the art,” Proc. IEEE, vol. 105, no. 10,
TABLE III
pp. 1865–1883, Oct. 2017.
P ERFORMANCES IN T ERMS OF AVERAGE A CCURACY [2] Y. Yang and S. Newsam, “Bag-of-visual-words and spatial extensions
FOR S IX C ROSS -D OMAIN S CENE D ATA S ETS for land-use classification,” in Proc. ACM, New York, NY, USA, 2010,
pp. 270–279.
[3] S. Chen and Y. Tian, “Pyramid of spatial relatons for scene-level land
use classification,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 4,
pp. 1947–1957, Apr. 2015.
[4] F. Hu, G.-S. Xia, J. Hu, and L. Zhang, “Transferring deep convolutional
neural networks for the scene classification of high-resolution remote
sensing imagery,” Remote Sens., vol. 7, no. 11, pp. 14680–14707,
Nov. 2015.
[5] D. Marmanis, M. Datcu, T. Esch, and U. Stilla, “Deep learning earth
observation classification using ImageNet pretrained networks,” IEEE
Geosci. Remote Sens. Lett., vol. 13, no. 1, pp. 105–109, Jan. 2016.
[6] G. J. Scott, M. R. England, W. A. Starms, R. A. Marcum, and
C. H. Davis, “Training deep convolutional neural networks for land–
C. Results cover classification of high-resolution imagery,” IEEE Geosci. Remote
Fig. 3 shows the relationship between the adversarial loss Sens. Lett., vol. 14, no. 4, pp. 549–553, Apr. 2017.
[7] G.-S. Xia et al., “AID: A benchmark data set for performance evaluation
and the accuracy during the training of the six cross-domain of aerial scene classification,” IEEE Trans. Geosci. Remote Sens.,
scene data sets. For the AID ↔ Merced, AID ↔ RSI-CB, vol. 55, no. 7, pp. 3965–3981, Jul. 2017.
and Merced ↔ RSI-CB cross-domain scene data sets, [8] F. Hu, G.-S. Xia, W. Yang, and L. Zhang, “Recent advances and oppor-
tunities in scene classification of aerial images with deep models,” in
the adversarial loss diminishes and the accuracy improves, Proc. IEEE Int. Geosci. Remote Sens. Symp., Jul. 2018, pp. 4371–4374.
thus confirming that minimizing the adversarial loss for target [9] D. Tuia, C. Persello, and L. Bruzzone, “Domain adaptation for the
images can increase the accuracy of the adaptation. classification of remote sensing data: An overview of recent advances,”
IEEE Geosci. Remote Sens. Mag., vol. 4, no. 2, pp. 41–57, Jun. 2016.
Table III shows the classification accuracies for the six data [10] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Trans.
sets. The average accuracy of this method for six cross-domain Knowl. Data Eng., vol. 22, no. 10, pp. 1345–1359, Oct. 2010.
scene data sets is 92.17%, which is 13.45% higher than that of [11] E. Othman, Y. Bazi, F. Melgani, H. Alhichri, N. Alajlan, and M. Zuair,
“Domain adaptation network for cross-scene classification,” IEEE Trans.
fine-tuned VGG16 without adaptation. DAN has the highest Geosci. Remote Sens., vol. 55, no. 8, pp. 4441–4456, Aug. 2017.
average accuracy among several other adaptive methods in [12] N. Ammour, L. Bashmal, Y. Bazi, M. M. A. Rahhal, and M. Zuair,
the field, although it is still 3.75% lower than the method “Asymmetric adaptation of deep features for cross-domain classification
in remote sensing imagery,” IEEE Geosci. Remote Sens. Lett., vol. 15,
in this paper. The accuracy of the method in this letter is no. 4, pp. 597–601, Apr. 2018.
the highest for all data sets except the Merced → RSI-CB, [13] I. J. Goodfellow et al., “Generative adversarial nets,” in Proc. NIPS,
for which the accuracy is slightly lower than that of DAN. Cambridge, MA, USA, 2014, pp. 2672–2680.
[14] Y. Ganin et al., “Domain-adversarial training of neural networks,”
The experimental results show that our method can generate J. Mach. Learn. Res., vol. 17, no. 1, pp. 2030–2096, Jan. 2016.
more robust transferable features and improve classification [15] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell, “Adversarial discrim-
accuracy. inative domain adaptation,” in Proc. CVPR, Jul. 2017, pp. 2962–2971.
[16] A. Elshamli, G. W. Taylor, A. Berg, and S. Areibi, “Domain adaptation
using representation learning for the classification of remote sensing
IV. C ONCLUSION images,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 10,
In this letter, a CDADA method is proposed for cross- no. 9, pp. 4198–4209, Sep. 2017.
[17] L. Bashmal, Y. Bazi, H. AlHichri, M. M. Al Rahhal, N. Ammour, and
domain semisupervised classification in RS images. This N. Alajlan, “Siamese-GAN: Learning invariant representations for aerial
method uses DCNN built feature representations to describe vehicle image categorization,” Remote Sens., vol. 10, no. 2, p. 351,
the semantic content of scenes and uses two different land- Feb. 2018.
[18] H. Li, C. Tao, Z. Wu, J. Chen, J. Gong, and M. Deng, “RSI-CB: A large
cover classifiers as a constraint to compel a generator to create scale remote sensing image classification benchmark via crowdsource
robust transferable features far from the original land-cover data,” 2017, [Online]. Available: https://arxiv.org/abs/1705.10450
thorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on December 07,2023 at 02:31:24 UTC from IEEE Xplore. Restrictions app