Professional Documents
Culture Documents
Airplane Detection Based On Unsupervised Deep
Airplane Detection Based On Unsupervised Deep
Research Article
Keywords: Deep Domain Adaptation, Deep Convolutional Neuronal Network, Computer Vision, Object
Detection, Deep Transfer Learning
DOI: https://doi.org/10.21203/rs.3.rs-2088221/v1
License: This work is licensed under a Creative Commons Attribution 4.0 International License.
Read Full License
Springer Nature 2021 LATEX template
Abstract
In this work, we use the detection and localization capabilities of the
pre-trained Faster Region Convolutional Neuronal Network (Faster R-
CNN) model including Resnet50 as the backbone of the architecture.
Our model has been trained with a huge benchmark dataset Common
Objects in Context (MSCOCO) as a source domain. The model pre-
trained is used in Unsupervised Deep Domain adaptation (UDDA) for
airplane detection and localization in Remote Sensing Image (RSI) as a
domain target. We evaluate our proposed approach using images contain-
ing multi-objects (airplanes) on different scales and types collected from
1
Springer Nature 2021 LATEX template
2 Article Title
the public dataset for Object Detection in Aerial Images (DOTA) and
different airport images extracted from Google Earth. Extensive exper-
iments in a Cloud environment reveal the usefulness of the proposed
approach regarding the score. UDDA algorithm proposed is a nondeter-
ministic machine learning approach for detecting the airplane in RSI.
1 Introduction
With the explosion of satellites and data available, extracting meaningful infor-
mation from Remote Sensing Images (RSI) requires technology tools based on
computer vision and machine learning. Object detection is one of the most
fundamental and challenging problems in computer vision. Recently, in RSI
several studies have been developed for detecting and localizing objects in an
image in many applications using neural networks [1]. In response to a multi-
tude of applications, several neural network architectures have been developed
[2]. The main objective of improving neural networks is the structure of hid-
den layers to optimize the architecture for a task [3]. Initially, neural network
architecture has achieved success in image classification tasks. In object detec-
tion, the model must be able to recognize a single object or several objects
in a single image and draw boundaries for each object and label them. Due
to recent developments in deep learning, detection and localization systems
have been proposed to solve these issues. Among the different architectures
of deep neural networks, CNNs are the relevant ones and have made progress
in the processing of images and video. The first design of CNN was proposed
by LeCun et al. [4]. Object detection is based on two separate tasks: classifi-
cation and localization. The object detection algorithm uses derived features
and learning algorithms to detect an object and find its location. There are
several civil and military applications based on object detection from RSI,
including aircraft detection, airport security, and flight tracking, which offer
more specific information. In previous work, we used Deep CNN to classify
aircraft [5, 6]. The two categories of image object detectors are: one utilizes
one-stage methods to predict bounding boxes of the region of interest; the sec-
ond uses a two-stage approach to propose candidate object bounding boxes.
The algorithm-based two-stage detectors identify the subsets of the image
that might contain an object (region proposal) and classify them for making
predictions within the proposed region. The advantage for the second cate-
gory is the high detection accuracy achieved [7]. Deep CNN consists of many
layers concatenated; the basic convolutional neural network architecture con-
tains: i)convolutional layers, ii) pooling layers, and iii) fully connected layers,
which map the representation between the input and the output. Using the
softmax activation function in the output layer gives a posterior distribution
Springer Nature 2021 LATEX template
Article Title 3
over the class labels and localization. One of the convolutional neural network
models that perform object detection in two stages is the Residual network
(Resnet). Resnet is a network in which the input doesn’t need any connection
to the next layer, but it can skip one or more layers to the back of the input
[8]. According to application needs, variant CNN architectures have been pro-
posed in the literature, such as R-CNN [9, 10], Fast R-CNN is faster than
R-CNN [11], and Faster R-CNN to avoid some limitations [12]. The perfor-
mance and popularity of various cutting-edge object detection models, as well
as the Resnet’s ability to solve the vanishing gradient problem on CNN, are
the reasons for using it in this work. Furthermore, we selected Faster R-CNN,
which is the most representative model from the two-stage object detection
combining region proposal and classification. In this study, the Resnet model
is employed as the core architecture for the generic functionality extraction
module. Among Resnet algorithms, Resnet50 is a popular CNN model that is
50 hidden layers deep and used in the classification task. A block of 64 filters
with a kernel size of 7 × 7 and an exploring layer with a size of 7 × 7 and a
stride of 2 make up the network’s initial configuration. The remaining portion
of the network is divided into four further blocks of three convolutional layers
that are connected through shortcut connections, with the kernel sizes being
1,3, and 1 accordingly. There are a total of 50 levels created by repeating the
second, third, fourth, and fifth blocks three times each, four times, and six
times respectively, and followed by fully connected layers with 1000 nodes and
an average pooling layer with a Softmax.
Among the approaches in machine learning, supervised deep learning
assumes that the test data is randomly sampled from the annotated dataset
known as the domain source. In deep learning, the model requires the same
processing stages as the machine learning model: training and testing or valida-
tion. Over-fitting can be caused by a small dataset in the training of the model,
and the model will fail in generalization. Furthermore, acquiring and annotat-
ing a huge number of RSI for deep learning algorithms would be expensive.
Despite their significant effects on a variety of tasks, deep learning models are
difficult to use in practical tasks since they need a huge amount of annotated
data and can not generalize to data with skewed distributions. To overcome
these problems, deep transfer learning is a technique that uses knowledge from
prior training data to new training data to accelerate the process and use less
memory [13]. Due to its popularity and compelling results in many domains,
deep transfer learning has attracted a lot of attention from different areas.
Deep transfer learning can be considered more as domain adaptation, in which
we use the learner’s knowledge to solve a related but distinct problem [14]. The
main goal of domain adaptation is to derive an object detector algorithm for
the unlabeled (target) data from the labeled (source) data [15]. Several algo-
rithms for object detection have been presented in the past decade. Faster
R-CNN is a popular object detection approach in the adaptive domain detec-
tion. In this context, the UDDA formulation assumes that the source domain is
labeled while no labels are available for the target domain. UDDA is commonly
Springer Nature 2021 LATEX template
4 Article Title
used in learning tasks without target labels since it immediately translates the
source features into the target feature space [16, 17]. Thus, it can be used to
adapt deep networks to possibly smaller and unlabeled dataset. Due to the
limited availability of labeled dataset in RSI, we used UDDA based on Deep
Transfer Learning for a pre-trained model to detect airplanes in a given image.
The following are the significant contributions of this study:
• We used UDDA, which provides full supervision in source domain while
providing no supervision in target domain.
• Faster R-CNN was chosen as the detector in the domain source, and we
improved its ability for object detection in domain task.
• Sharing network weight and applying unsupervised domain adaptation for
airplane detection and localization in RSI.
• Our model detects and localizes airplanes at different scales and directions
in the complex background without modifying the model’s parameters.
The remainder of the document is organized into: section 2 reviews the related
work. The third section presents a basic theory, while section 4 details the
methodology proposed. Experimental results and discussion are presented in
section 5. Finally, a conclusion is provided in section 6.
2 Related Work
Several papers in the literature have developed object detector models based on
deep domain adaptation. Shen et al. proposed a novel algorithm. Deeply super-
vised object detectors, the authors argued deep supervision with a densely
connected network architecture could significantly reduce optimization prob-
lems [18]. Rochan et al. used word vectors to build a relationship between
the weakly annotated source domain and the target domain, then information
was transported from the source bounding box to the target objects based on
their relationship [19]. Many researchers have applied Deep CNN to airplane
detection in RSI including ship detection [20]. Wu et al. used a method based
on R-CNN to detect airplanes in the selective search context; the authors
extracted region proposals and classified region proposals [21]. The model for
detecting unlabeled samples in RSI was proposed based on the Local Binary
Pattern algorithm; the authors use the LBP algorithm to extract feature vec-
tors in the target domain and use hybrid regularization in transfer learning.
Cail et al. propose a feature-shared transform network that utilizes the gen-
eral information in the bottom layers to boost performance. Furthermore,
the authors add a special regularization term to the loss function to allevi-
ate the negative effect caused by the vanishing gradient phenomenon [14].
Deep transfer learning is an important method in machine learning to solve
the training-less data problem. Deep transfer learning includes all techniques
aimed at minimizing the effort involved in developing new models by trans-
ferring the knowledge without training them from scratch. UDDA focuses on
unsupervised machine learning tasks in both the source and the target domain
Springer Nature 2021 LATEX template
Article Title 5
[22]; this method can provide faster results because the object detectors have
already been trained on a large number of images in domain source. We focused
on airplane detection, which is more challenging as both object localization
and class need to be predicted. In the last decade, UDDA has been identi-
fied as a prominent part of machine learning advancement. It is already being
applied in a variety of fields, including computer vision and natural language
processing [23].Unsupervised deep transfer learning is a deep transfer learning
that uses unannotated data from the target domain [24]. The authors in [25]
introduce a generic unsupervised deep learning approach to train deep mod-
els without the need for manual label supervision, and use a strategy to learn
the underlying class decision boundaries iterative. According to the properties
of data and different approaches used in different domain adaptation scenes,
different deep domain adaptation scenarios are presented in [15, 26]. Hosang
et al. studied the performance of region proposal algorithms in depth using
various dataset and revealed that these algorithms have low repeatability and
therefore are not robust to noise and disturbance; the architecture of models
and parameters extracted from the natural or synthetic scenes is transferred
to the detection object in the target domain of remote sensing image [27]. Han
et al. proposed a semi-supervised generative model to generate a new train-
ing set, by combining pre-trained CNN and SVM classifiers [28]. Wei et al.
proposed a model named X-LineNet based on one-stage and anchor-free for
aircraft detection; the model proposed transforms the goal of aircraft detection
in RSI from detection to prediction and grouping of connected intersecting
line segments, allowing the network to learn using visual grammar informa-
tion [29]. Teng et al. proposed adversarial domain adaptation in RSI to align
the feature distribution of the source and the target for the classification task
[30].To improve the precision of airplane and car detection in RSI, Ding et al.
enhanced the structure of the VGG16-Net algorithm [31]. Chen et al. proposed
a domain adaptation Faster R-CNN algorithm to detect aircraft in RSI as a
domain task, the authors used images extracted from the dataset DOTA as a
source but in target domain images with different brightness conditions and
achieved a 54,28% average precision score [32].
3 Basic theory
The domain adaptation concept is based on the similarity of two domains in
which the model is applied to a different but related new area. Consider the
source domain Ds = {Xsi , Ysi }N i=1 where Ns is the number of images and Xs
s i
th i
denotes i image and Ys denotes the bounding boxes and respective object
labels in the corresponding image. in DA. we denote the target dataset as
Nt
Dt = {Xti , Yti }i=1 and having Nt number of target domain labeled images Xti
i
and Yt denotes the object labels in the corresponding image. In UDDA. we
denote the target dataset as Dt = {Xti }N i=1 and having Nt number of target
t
i
domain images Xt whitout labels. Each domain is caracterised by four parts:
the feature space X , the label space Y, the marginal probability distribution
Springer Nature 2021 LATEX template
6 Article Title
where errs and errs indicate the error probability of the source and tar-
get domains, respectively. The following equation will be used to express the
distance between the two domains.
dH (s, t) = 2 1 − min(errs (h(xs )) + errt (h(xt ))) (3)
h∈H
In order to align the distribution between two domains, the distance dH (s, t)
should be minimized :
mindH (s, t) ⇐⇒ max min(errs (h(xs )) + errt (h(xt ))) (4)
f f h∈H
4 Design Methodology
The following section provides a detailed methodology. Recently, many effec-
tive deep learning algorithms require a large number of natural images.
Overfitting is a common issue in the limited dataset. To overcome this prob-
lem in RSI, deep transfer learning is adopted. The final challenge is based on
UDDA, which is a label less airplane detection model.
Article Title 7
8 Article Title
1 X T Pc
mAP = (5)
classes F Pc + T Pc
c∈classes
Article Title 9
To assess our approach, we compared our proposed work with the latest work.
Our model improves airplanes detection and localization, and also detects
Ref.[32] ZF 88.13%
Ref.[32] VGGM 89.76%
Ref.[32] VGG16 90.17%
Our approach Resnet 50 99,41%
other objects. From Table 1, Resnet50 model used in this work provides the
highest score.
Springer Nature 2021 LATEX template
10 Article Title
6 Conclusion
This paper presents an unsupervised domain adaptation-based approach for
airplane detection and localization. The pre-trained model is based on a huge
labeled MSCOCO dataset and, subsequently, transfers the learned information
to the unlabeled target domain to detect and localize airplanes in RSI. The
approach is used and the results have been reported using unannotated images
in RSI. Moreover, a comparison with other approaches reveals that our method
improves the detection and localization of airplanes in remote sensing images.
Although the proposed approach yields a high score, there are different aspects
for improvement in future work.
Declarations
Ethics approval
Not applicable.
Competing interests
The authors affirm that the research was conducted in the absence of any
commercial or financial relationships that could be construed as a potential
conflict of interest.
Funding
This research received no specific grant from any funding agency in the public,
commercial, or not-for-profit sectors.
Article Title 11
Author’s contributions
YBY:Visualization, Investigation, Software, Validation, Writing - review &
editing.
SL and KF: Participated in the basic theory associated and implementation.
EA: Supervised the research.
All authors contributed to article revision and read and approved the submit-
ted version.
References
[1] Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., Alsaadi, F.: A survey of deep
neural network architectures and their applications. Neurocomputing 234
(2013). https://doi.org/10.1016/j.neucom.2016.12.038
[2] Amirian, S., Wang, Z., Taha, T.R., Arabnia, H.R.: Dissection of deep
learning with applications in image recognition. In: 2018 International
Conference on Computational Science and Computational Intelligence
(CSCI), pp. 1142–1148 (2018). https://doi.org/10.1109/CSCI46756.2018.
00221
[3] Wu, X., Sahoo, D., Hoi, S.C.: Recent advances in deep learning for object
detection. Neurocomputing 396, 39–64 (2020)
[4] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning
applied to document recognition. Proceedings of the IEEE 86(11), 2278–
2324 (1998)
[8] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image
Springer Nature 2021 LATEX template
12 Article Title
[9] Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies
for accurate object detection and semantic segmentation. In: Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition,
pp. 580–587 (2014)
[10] Agrawal, P., Girshick, R., Malik, J.: Analyzing the performance of multi-
layer neural networks for object recognition. In: European Conference on
Computer Vision, pp. 329–344 (2014). Springer
[11] Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International
Conference on Computer Vision, pp. 1440–1448 (2015)
[12] Faster, R.: Towards real-time object detection with region proposal net-
works. Advances in neural information processing systems 9199(10.5555),
2969239–2969250 (2015)
[13] Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Transactions
on Knowledge and Data Engineering 22(10), 1345–1359 (2010). https:
//doi.org/10.1109/TKDE.2009.191
[14] Cai, G., Wang, Y., He, L., Zhou, M.: Unsupervised domain adaptation
with adversarial residual transform networks. IEEE transactions on neural
networks and learning systems 31(8), 3073–3086 (2020). https://doi.org/
10.1109/tnnls.2019.2935384
[15] Wang, Mei, Deng, Weihong: Deep visual domain adaptation: A survey.
Neurocomputing 312, 135–153 (2018)
[16] Kouw, W.M., Loog, M.: A review of domain adaptation without target
labels. IEEE transactions on pattern analysis and machine intelligence
43(3), 766–785 (2019). https://doi.org/10.1109/TPAMI.2019.2945942
[17] Zhang, Y., Deng, B., Tang, H., Zhang, L., Jia, K.: Unsupervised
multi-class domain adaptation: Theory, algorithms, and practice. IEEE
Transactions on Pattern Analysis and Machine Intelligence (2020)
[18] Shen, Z., Liu, Z., Li, J., Jiang, Y.-G., Chen, Y., Xue, X.: Dsod: Learn-
ing deeply supervised object detectors from scratch. In: Proceedings of
the IEEE International Conference on Computer Vision, pp. 1919–1927
(2017)
[19] Rochan, M., Wang, Y.: Weakly supervised localization of novel objects
using appearance transfer. In: Proceedings of the IEEE Conference on
Springer Nature 2021 LATEX template
Article Title 13
[20] Tang, J., Deng, C., Huang, G.-B., Zhao, B.: Compressed-domain ship
detection on spaceborne optical image using deep neural network and
extreme learning machine. IEEE Transactions on Geoscience and Remote
Sensing 53(3), 1174–1185 (2015). https://doi.org/10.1109/TGRS.2014.
2335751
[21] Wu, H., Zhang, H., Zhang, J., Xu, F.: Fast aircraft detection in satellite
images based on convolutional neural networks. In: 2015 IEEE Interna-
tional Conference on Image Processing (ICIP), pp. 4210–4214 (2015).
https://doi.org/10.1109/ICIP.2015.7351599
[23] Andersson, L., Lupu, M., Hanbury, A.: Domain adaptation of general
natural language processing tools for a patent claim visualization system.
In: Information Retrieval Facility Conference, pp. 70–82 (2013). Springer
[24] Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., Liu, C.: A survey on
deep transfer learning. In: International Conference on Artificial Neural
Networks, pp. 270–279 (2018). Springer
[25] Huang, J., Dong, Q., Gong, S., Zhu, X.: Unsupervised deep learning
by neighbourhood discovery. In: International Conference on Machine
Learning, pp. 2849–2858 (2019). PMLR
[27] Hosang, J., Benenson, R., Dollar, P., Schiele, B.: What makes for effective
detection proposals? IEEE transactions on pattern analysis and machine
intelligence 38(4), 814–830 (2015)
[28] Han, W., Feng, R., Wang, L., Cheng, Y.: A semi-supervised generative
framework with deep learning features for high-resolution remote sensing
image scene classification. ISPRS Journal of Photogrammetry and Remote
Sensing 145, 23–43 (2018)
[29] Wei, H., Zhang, Y., Wang, B., Yang, Y., Li, H., Wang, H.: X-linenet:
Detecting aircraft in remote sensing images by a pair of intersecting line
segments. IEEE Transactions on Geoscience and Remote Sensing 59(2),
1645–1659 (2020)
Springer Nature 2021 LATEX template
14 Article Title
[30] Teng, W., Wang, N., Shi, H., Liu, Y., Wang, J.: Classifier-constrained
deep adversarial domain adaptation for cross-domain semisupervised clas-
sification in remote sensing images. IEEE Geoscience and Remote Sensing
Letters 17(5), 789–793 (2019)
[31] Ding, P., Zhang, Y., Deng, W.-J., Jia, P., Kuijper, A.: A light and
faster regional convolutional neural network for object detection in opti-
cal remote sensing images. ISPRS journal of photogrammetry and remote
sensing 141, 2086218 (2018)
[32] J.Chen, Y.L. J.Sun, C.Hou: Object detection in remote sensing images
based on deep transfer learning. Multimed Tools Appl 81, 12093–12109
(2021). https://doi.org/10.1007/s11042-021-10833-z
[33] Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F.,
Vaughan, J.W.: A theory of learning from different domains. Machine
learning 79(1), 151–175 (2010)
[34] Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning
with deep convolutional generative adversarial networks. arXiv preprint
arXiv:1511.06434 (2015)
[36] Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D.,
Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In:
European Conference on Computer Vision, pp. 740–755 (2014). Springer
[37] Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G.,
Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.: Pytorch: An imper-
ative style, high-performance deep learning library. Advances in neural
information processing systems 32 (2019)
[39] Xia, G.-S., Bai, X., Ding, J., Zhu, Z., Belongie, S., Luo, J., Datcu, M.,
Pelillo, M., Zhang, L.: Dota: A large-scale dataset for object detection
in aerial images. In: Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition (CVPR) (2018)