Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

A Deep Learning-Based SAR Ship Detection

Chushi Yu Yoan Shin*


School of Electronic Engineering School of Electronic Engineering
Soongsil University Soongsil University
Seoul 06978, Korea Seoul 06978, Korea
Email: csyu@soongsil.ac.kr Email: yashin@ssu.ac.kr

Abstract—In recent years, with the in-depth development of without increasing the cost of inference. It also involves the
remote sensing technology, ship detection based on remote extend-and-compound scaling, so the object detector can
2023 International Conference on Artificial Intelligence in Information and Communication (ICAIIC) | 978-1-6654-5645-6/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICAIIC57133.2023.10067131

sensing images has become an important task for coastal effectively reduce the number of parameters and computation,
countries. Synthetic aperture radar (SAR) is one of the most and thus greatly improve the detection speed. In this paper,
important active imaging sensors in remote sensing since it is not based on the latest version YOLOv7, we conduct training
affected by the clouds, day and night. However, ship targets in optimization for the characteristics of the SAR ship detection.
SAR images have problems such as unclear contour information, The high resolution SAR images dataset (HRSID) [7] and
complex background, and strong scattering. Ship detection
SAR ship detection dataset (SSDD) [8] are used for verifying
algorithms based on the convolutional neural networks achieved
the effectiveness and the applicability of the proposed scheme.
good results, albeit with many missed and false detections. You
only look once (YOLO) is a single-stage target detection II. METHODOLOGY
algorithm, which has the characteristics of fast speed and high
accuracy. In this paper, a YOLOv7-based ship scheme for SAR A. Overview of the Proposed Method
images is proposed. Numerical experiments on the high- In this work, the latest version of the YOLO object
resolution SAR images dataset (HRSID) and SAR ship detection
detection method is used to detect SAR ship targets. Figure 1
dataset (SSDD) prove the effectiveness of the method, showing
demonstrates the overall process of the SAR ship detection
that the method can increase the speed without reducing the
detection accuracy and recall. In addition, the experimental scheme based on the YOLOv7 method.
results also demonstrate the robustness of the model and the
ability to detect ship targets in complex marine environment.

Keywords—Synthetic aperture radar, remote sensing, ship


detection, deep learning, YOLOv7

I. INTRODUCTION
Synthetic aperture radar (SAR) is a microwave imaging
Fig. 1. The proposed YOLOv7 based SAR ship detection scheme.
sensor that can actively detect in all-day and all-weather
conditions. It has good applicability for monitoring oceans B. Architecture of the Modified Network
with changing climates [1]. In the task of detecting ships, SAR
will not be affected by the changeable ocean weather, and it The YOLOv7 is a single-stage method that consists of four
can monitor the ship targets in all directions in real time [2]. parts: input, backbone, head and prediction as shown in Fig. 2.

Deep learning techniques have been incorporated into The backbone part is the feature extraction network of the
many remote sensing applications, such as object and oil spill YOLOv7. The features extracted from the input image can be
detection, traffic monitoring, terrain mapping, coastline called the feature layer, which is the feature set of the input
monitoring, and marine fisheries management. Among them, image. In the backbone part, we obtained three feature layers
object detection has always been a research hotspot in the field for the next step of network construction.
of remote sensing. With the rapid development of deep Feature pyramid network (FPN) [9] is the enhanced
learning theory and technology, the performance of object feature extraction network of the YOLOv7. The three
detection based on deep learning far exceeds that of traditional effective feature layers obtained in the backbone part will
methods. General deep learning-based object detection perform feature fusion in this part. The purpose of feature
algorithms are divided into two-stage algorithms such as faster fusion is to combine feature information of different scales. In
region-based convolutional neural network (Faster R-CNN) the YOLOv7, the path aggregation network (PANet) [10]
[3] and single-stage algorithms such as RetinaNet [4] and you structure is used. We not only do up-sample but also down-
only look once (YOLO) [5]. The YOLO is a commonly-used sample the features to achieve feature fusion.
single-stage target detection algorithm, which has the
characteristics of fast speed and high accuracy. It shows Head is the classifier and regressor of the YOLOv7.
satisfactory performance in detecting small targets and Through backbone and FPN, we can already obtain three
occluded targets in complex environments, and the detection enhanced effective feature layers with independent width,
speed is better than other deep learning algorithms. height and number of channels. The feature map can be
regarded as a collection of feature points one after another.
The YOLOv7 [6] is the latest detector in the YOLO series. There are three prior frames on each feature point, and each a
The network is designed with a trainable bag-of-freebies that prior frame has the number of channel features. What head
enables real-time detectors to greatly improve the accuracy actually does is to judge the feature points and judge whether
*Corresponding author there is an object corresponding to the prior frame on the
This work was supported in part by the NRF of Korea grant funded by feature points.
the MSIT (2020R1A2C2010006), and by the MSIT under the ITRC support
program (IITP-2022-2018-0-01424) supervised by the IITP.

978-1-6654-5645-6/23/$31.00 ©2023 IEEE 744 ICAIIC 2023


Backbone Components
CBS CBS CBS CBS ELAN MP1 ELAN MP1 ELAN MP1 ELAN
ELAN CBS CBS CBS CBS CBS Concat CBS
CBS
Input
CBS
Components CBS

SPPCSPC Concat ELAN-H REP CONV


CBS CBS CBS CONV BN SiLU
ELAN-H CBS CBS CBS CBS CBS Concat CBS
UP MP2

UP CBS Upsample Concat ELAN-H REP CONV CBS

ELAN-H Concat CBS Maxpool

Maxpool CBS
UP MP2 CBS Maxpool Concat CBS CBS Concat CBS
MP1 Concat SPPCSPC
CBS CBS Concat CBS Maxpool

ELAN-H REP CONV


CBS
Head

Fig. 2. The network architecture based on the YOLOv7, which consists of four blocks: input, backbone, head and prediction. Components part list CBS,
MP, ELAN, ELAN-H, and SPPCSPC in detail.
As shown in Fig. 2, an innovative multi-branch stacking
structure is used for feature extraction. Compared with 𝐿𝑜𝑠𝑠 = 𝐿𝑐𝑙𝑠 + 𝐿𝑟𝑒𝑔 + 𝐿𝑜𝑏𝑗  ()
previous YOLO models, the skip connection structure of the
model is denser. An innovative down-sampling structure is In the YOLOv7, the matching process of positive samples
adopted, max pooling and convolution with a stride size of during training can be divided into two parts. Firstly, for each
2 × 2 are used for feature parallel extraction and compression. ground truth frame, a prior frame and feature points are
Auxiliary branch-assisted convergence is used, but not in the roughly matched by coordinates, width and height. Secondly,
YOLOv7 and YOLOv7-x, which have smaller models. we use SimOTA[15] to adaptively select the number of prior
boxes corresponds to each ground truth box. The positive
In addition, a special spatial pyramid pooling (SPP) sample matching is to find out which prior boxes are
structure is proposed in the YOLOv7 as improvement. The considered to have corresponding ground truth boxes, and it is
SPP [11] expanded receptive field with cross stage partial responsible for the prediction of ground truth boxes.
network (CSP) [12] mechanism is used, and the CSP structure
is introduced into the SPP structure. This module has a large In order to improve the training efficiency of the model,
residual edge to assist optimization and feature extraction. the number of positive samples was increased in the YOLOv7.
During the training, each ground truth box could be predicted
Drawing on the re-parameterization visual geometry by multiple prior boxes. In addition, for each ground truth box,
group (RepVGG) [13] structure, the YOLOv7 introduces the intersection over union (IoU) and type will be calculated
RepConv and fuses it into a specific part of the network and according to the adjusted prediction box of the prior box to
then reduce the number of parameters in the network after obtain the loss value, and then the prior box most suitable for
ensuring network. Like the previous version of YOLO, the the ground truth box will be found.
decoupling head used by the YOLOv7 is together, that is,
classification and regression are implemented in a 1 × 1 III. EXPERIMENTAL RESULTS
convolution.
A. Dataset and Training Strategy
Therefore, the work of the whole YOLOv7 network is We use the HRSID [7] and SSDD [8] to verify the
feature extraction - feature enhancement - prediction of the performance of the proposed method. The HRSID is widely
object situation corresponding to the prior frame. used for ship detection and instance segmentation, consisting
C. Loss Function of Sentinel-1 and TerraSAR-X images with resolutions of
0.5m, 1.0m, and 3.0m, which contains 5,604 high-resolution
The overall loss function is consistent with the YOLOv5 SAR images and 16,591 ship instances. It draws on the
[14]. The loss of the network is composed of a classification construction process of the COCO [16] datasets, including
loss, a regression loss and an object loss. The classification SAR images with different resolutions, polarizations, sea
loss is the type of object contained in the feature point, the conditions, sea areas, and coastal ports, which is a benchmark
regression loss is the regression parameter judgment of the for researchers to evaluate their approaches. We randomly
feature point, the object loss is the judgment of whether the divide images into a training set, a validation set and a test set
feature point contains an object. During the training process, as 7:1:2. The SSDD is the first open dataset in the field of SAR
the overall objective function can be described as ship detection and consists of 1,160 images with 2,456 ship

TABLE I. RESULTS OF THE HRSID

Models Precision Recall mAP@0.5 mAP@0.5:0.9 Model size (MB) Speed (sec)
YOLOv7 0.844 0.700 0.786 0.481 74.7 0.279
YOLOv7-tiny 0.819 0.713 0.773 0.456 12.2 0.191
YOLOv7-d6 0.479 0.691 0.399 0.194 306.5 6.700
YOLOv7-e6 0.668 0.695 0.568 0.302 221.3 0.548
YOLOv7-w6 0.848 0.692 0.784 0.449 162.3 0.789
YOLOv7-x 0.803 0.730 0.784 0.464 142.0 0.348

745
YOLOv7 YOLOv7-tiny YOLOv7-d6 YOLOv7-e6 YOLOv7-w6 YOLOv7-x

(a-1) (b-1) (c-1) (d-1) (e-1) (f-1)

(a-2) (b-2) (c-2) (d-2) (e-2) (f-2)

(a-3) (b-3) (c-3) (d-3) (e-3) (f-3)

(a-4) (b-4) (c-4) (d-4) (e-4) (f-4)

(a-5) (b-5) (c-5) (d-5) (e-5) (f-5)


Fig. 3. Comparison of sample ship detections for the proposed scheme based on the YOLOv7 and extended methods in HRSID.

Ground truth inshore RetinaNet YOLOv5 YOLOv7 YOLOv7-tiny

Ground truth offshore RetinaNet YOLOv5 YOLOv7 YOLOv7-tiny


Fig. 4. Ship detection results obtained by other state-of-the-art methods in SSDD.
targets. All the experiments are conducted on an NVIDIA Table 1 lists the results of the HRSID with YOLOv7 and
RTX 2070 graphics processing unit (GPU) using PyTorch. several extended models. Among them, the original YOLOv7
model can achieve well evaluation values, and the light-
The batch size is set to 8 and the number of training epochs weight YOLOv7 (YOLOv7-tiny) can quickly complete the
is set to 300. We use the stochastic gradient descent (SGD) as detection while ensuring accuracy, providing the possibility
the optimizer with an initial learning rate of 0.01 and a for real-time detection. Figure 3 illustrates several sample
momentum of 0.9. results for visualizing the detection results of the proposed
B. Results and Discussion scheme.
In this experiment, the precision, the recall and the mean In order to examine the feasibility of the proposed scheme,
average precision (mAP) are used to evaluate the detection the trained model and parameters are used for ship detection
performance of the models. on real SAR images. Figure 4 presents the inshore and

746
offshore ship detection sample results which are obtained by [4] T. -Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for
some state-of-the-art methods such as RetinaNet and dense object detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol.
42, no. 2, pp. 318-327, Feb. 2020.
YOLOv5. Experimental results show that most of the small-
[5] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look
scale ships can be recognized even without training. At the once: Unified, real-time object detection,” Proc. IEEE CVPR 2016, pp.
same time, large-scale ship recognition is relatively inaccurate, 779-788, Las Vegas, USA, Dec. 2016.
but some ship features can also be extracted. [6] C. Wang, A. Bochkovskiy, and H. Liao, “YOLOv7: Trainable bag-of-
freebies sets new state-of-the-art for real-time object detectors,” arXiv
IV. CONCLUSION preprint arXiv:2207.02696, July 2022.
In this work, a YOLOv7-based SAR ship detection [7] S. Wei, X. Zeng, Q. Qu, M. Wang, H. Su, and J. Shi, “HRSID: A high-
resolution SAR images dataset for ship detection and instance
scheme is proposed after analyzing the existing state-of-the- segmentation,” IEEE Access, vol. 8, pp. 120234-120254, June 2020.
art object detection algorithms. Combined with the real SAR
[8] T. Zhang, X. Zhang, J. Li, X. Xu, B. Wang, X. Zhan, Y. Xu, X. Ke, T.
dataset, we tried to detect ship targets fast and accurately in Zeng, H. Su, I. Ahmad, D. Pan, C. Liu, Y. Zhou, J. Shi, and S. Wei,
the complex marine environment. For this purpose, different “SAR ship detection dataset (SSDD): Official release and
versions of the YOLOv7 model were trained and evaluated by comprehensive data analysis,” Remote Sensing, vol. 13, no. 18, pp.
comparing test results. According to the analysis of 3690, Sept. 2021.
experimental results, this method has a good application [9] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie,
prospect in SAR ship detection. At the same time, this study “Feature pyramid networks for object detection,” Proc. IEEE CVPR
2017, pp. 2117–2125, Honolulu, USA, July 2017.
provides a theoretical reference for the detection of other
[10] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path aggregation network for
maritime objects. The model meets the ship detection instance segmentation,” Proc. IEEE CVPR 2018, pp. 8759–8768, Salt
requirements in terms of accuracy and real-time performance, Lake City, USA, June 2018.
but the disadvantage is that the recognition accuracy of the [11] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep
model for large scale ships is low due to the characteristics of convolutional networks for visual recognition,” IEEE Trans. Pattern
the dataset and the factors of the algorithm itself. In future Anal. Mach. Intell., vol. 37, no. 9, pp. 1904-1916, Sept. 2015.
work, we plan to combine the proposed method with attention [12] C. Y. Wang, H. Y. M. Liao, Y. H. Wu, P. Y. Chen, J. W. Hsieh, and I.
mechanism or transformer to achieve more accurate detection H. Yeh, “CSPNet: A new backbone that can enhance learning
capability of CNN,” Proc. IEEE CVPR 2020, pp. 1571-1580, virtual
and localization of ship of different scales. conference, June 2020.
REFERENCES [13] X. Ding, X. Zhang, N. Ma, J. Han, G. Ding, and J. Sun, “RepVGG:
Making vgg-style convnets great again,” arXiv preprint
[1] A. Moreira, P. Prats-Iraola, M. Younis, G. Krieger, I. Hajnsek, and K. arXiv:2101.03697, Jan. 2021.
P. Papathanassiou, “A tutorial on synthetic aperture radar,” IEEE [14] G. Jocher, “YOLOv5,” available online at https://github.com/
Geosci. & Remote Sensing Mag., vol. 1, no. 1, pp. 6-43, Mar. 2013. ultralytics/yolov5, 2020.
[2] J. Li, C. Xu, H. Su, L. Gao, and T. Wang, “Deep learning for SAR ship [15] Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, “YOLOX: Exceeding YOLO
detection: Past, present and future,” Remote Sensing, vol. 14, no. 11, series in 2021,” arXiv preprint arXiv:2107.08430, July 2021.
pp. 2712, June 2022.
[16] T-Y. Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick, J. Hays, P.
[3] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real- Perona, D. Ramanan, C. L. Zitnick, and P. Dollár, “Microsoft COCO:
time object detection with region proposal networks,” IEEE Trans. Common objects in context,” arXiv preprint arXiv:1405.0312, May 1.
Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137-1149, June 2017.

747

You might also like