Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Optical Remote Sensing Image Waters

Extraction Technology Based on Deep Learning


Context-Unet
Ruoda Yan1, Shan Dong2*

1
Beijing Key Laboratory of Embedded Real-time Information Processing Technology, Beijing Institute
of Technology, Beijing 100081, China;
2
Engineering Center of Digital Audio and Video, Communication University of China,
Beijing 100024, China;
*Corresponding Author: e-mail: 13718516311@163.com

Abstract—The optical remote sensing images are used the pixels. However, this pixel-by-pixel image block
to sea-land segmentation which is a difficult task. Due to classification method is very time-consuming. Another
general deep learning based methods required a large disadvantage is that it is limited by image blocks and cannot
number of refined annotation, however, a large scale simulate large context information, thus affecting the
optical remote sensing images annotation is very performance of the algorithm. The Context-Unet network
expensive. Therefore, the Context-Unet is proposed to can solve this problem very well. Unlike classic CNN,
produce accurate sea-land segmentation results by a few classic CNN uses a fully connected layer to obtain fixed-
annotated training samples. In this paper, we apply the length feature vectors for classification after convolutional
layers. It accepts input images of any size and uses the
Context-Unet network to the watershed extraction. On
deconvolution layer to sample the last convolutional layer
the basis of this, we re-compile the loss function to
back to the same size as the input image, allowing each pixel
improve the accuracy of watershed extraction. Example: to retain the blank space of the original input image and
Finally, the date collected from Google Earth service is generate a forecast. Finally, the above extracted feature map
used to train and test this paper proposed Context-Unet is classified pixel by pixel.
and state-o- the-art methods. The experiments proved
that the proposed method outperforms than other The main aspect of water extraction is land-sea
methods, and it can achieve 98% precision and 97% segmentation. With the development of remote sensing
recall ratios. technology, the use of satellite remote sensing images for
geophysical detection and exploration has become our main
Keywords—Remote Sensing Image, Water extraction, means of territorial detection and complex environment
object detection, Deep learning, Unet network exploration. Correspondingly, in land-sea segmentation, the
use of remote sensing technology for analysis and detection
I. INTRODUCTION has also become a hot topic, mainly to discuss the accuracy
of land-sea segmentation. Land-sea segmentation has
Target recognition has always been an important issue in
developed rapidly in recent years. There are many
the field of computer vision. So far, many mature algorithms
classification algorithms for remote sensing images. Because
have been developed. Target recognition of remote sensing
of the complexity of remote sensing image data, none of
image is to determine whether a given remote sensing image
them is the best.
contains one or more objects in the set of interest categories,
and to locate each detected object in the image. Target This paper mainly introduces several commonly used
recognition in optical remote sensing images is often faced methods of water area extraction, first of all, the simplest and
with great challenges, including the dramatic changes in feasible single-band threshold method. This method is to
visual appearance caused by changes in perspective, determine a gray value as the threshold to distinguish water
occlusion, background interference, illumination, explosive and other objects. The determination of gray value is to
growth in quantity and quality of remote sensing images, and select a single infrared band for repeated experiments by
diversification requirements in various new application fields. using the characteristics of low reflectivity in near infrared
Different from object classification tasks, some application band and easy to distinguish from other objects. However, it
scenarios need to obtain image pixel-level classification is difficult to distinguish the water body from the shadows
results, such as: semantics-level image segmentation, and near the shore due to its limited geographical application,
ultimately get the corresponding position of each pixel which leads to the large area of the water area. Therefore, in
classification results; edge detection, equivalent to two order to segment land and sea accurately, the single-band
classifications of each pixel (edge or non-edge). For the threshold method is not enough to meet the requirements.
problem of semantics segmentation and edge detection, the
classical method is to take the image block as the center of The single-band threshold method cannot achieve the
the pixel, and then train the classifier by using the features of desired results. People naturally begin to consider the multi-
the image block as samples. In the test phase, image blocks spectral hybrid analysis method. This method is based on the
are also selected to classify each pixel in the test image, and analysis of the features of the selected area and surrounding
the classification results are used as the predictive values of objects to build the water extraction model. Because this

978-1-7281-2345-5/19/$31.00 ©2019 IEEE

Authorized licensed use limited to: UNIVERSITE DE LA MANOUBA. Downloaded on January 05,2021 at 18:20:02 UTC from IEEE Xplore. Restrictions apply.
model is not universal, it is not suitable for land-sea
segmentation.
Another commonly used method for waters segmentation
is the spectral relationship. This method can build many
models, combine the bands, input the formulas into the band
calculation, and cut the polygon manually. The manual
operation of this algorithm is heavy, and it is not suitable for
a large number of data processing.
To sum up, we need to find a network model suitable for
many situations when using remote sensing image to
segment water. In order to deal with a large number of
training sets, we need to use depth learning method.
Figure 1: Network Architecture of Unet
II. SEMANTIC SEGMENTATION
In the field of image, semantics refers to the content of III. THE SIGNIFICANCE OF IN-DEPTH LEARNING FOR REMOTE
the image and the understanding of the meaning of the image. SENSING IMAGES:
Segmentation means that different objects in the image are People can obtain a large amount of remote sensing data
segmented from the perspective of the pixel, and each pixel by satellite, and then analyze and process the rich
in the original image is labeled. At present, the main information in remote sensing images, deduce the types of
application fields of semantics segmentation are: geographic targets at one time, and calculate their related attributes and
information system, unmanned vehicle driving, medical coordinates. Although such information can be acquired
image analysis, robotics and other fields. conveniently and quickly, people's ability to process large
There are many deep learning techniques in semantic amounts of data is far from keeping up with the ability to
segmentation, such as CNN and FCN, and the U-net network acquire data. Obviously, we can't detect specific targets only
mentioned in this paper. CNN can get the effective through manual interpretation. Deep learning is a machine
representation of the original image, which enables CNN to learning model by constructing many hidden layers. Using
recognize the rules above the vision directly from the this model, the input massive data are processed, and each
original pixels through very little pre-processing. One of its level learns automatically from bottom to top. Finally, the
most important characteristics is that it has a very light head feature descriptions equivalent to the input data are
(the smaller the input weight, the more the output weight), extracted, which achieves good results in the data set. In
showing an inverted triangle shape, which can well avoid the addition, U-net networks are currently mostly used for the
problem of too fast gradient loss in reverse propagation. But segmentation of medical images, and the image data
it also has many obvious problems, the most important is the obtained by satellites is generally quite large, so we believe
need to adjust parameters, need a large number of samples.
that U-net networks will be used to process satellite images
Compared with the traditional method of image
to achieve better results. When the size of the satellite image
segmentation using CNN, FCN (Ref. [9]) has two obvious
advantages: one is that it can accept any size of input image determines the segmentation, it is impossible to input the
without requiring all training images and test images to have original image size into the network. It must be cut into
the same size. Second, it is more efficient, because it avoids small patches one by one. When cutting into small patches,
the problem of repeated storage and convolution caused by U-net is suitable for the overlay with overlap due to the
the use of pixel blocks. FCN is a pioneering work of deep network structure. However, when the map is cut, the
learning in image segmentation, but its disadvantage is that surrounding area is included, and the surrounding overlap
the details of segmentation results are not good enough. U- portion can provide information such as the arts and the like
net network is an improvement based on FCN. Personally, I for the edge portion of the divided area. The Unet network
think there are two improvements: one is suitable for multi- consists of a shrink path and an extended path. The shrink
scale; the other is suitable for super-large image path follows the typical structure of the convolutional
segmentation. 5-net includes two parts l as shown in Fig. 1. network, and the number of feature channels is doubled
The second part is the upper sampling part. Because the during each downsampling process. Each step in the
network structure is U-shaped, it is called U-net network. In extended path includes upsampling of the feature map and
the feature extraction part, there are five scales for each pool halving the number of feature channels by convolution. The
layer, including the original map scale. In the up-sampling architecture uses a continuous layer to complement a
part, each up-sampling is fused at the same scale as the shrinking network that increases the resolution of the output
number of channels corresponding to the feature extraction and combines the high resolution of the shrink path with the
part, but the crop is needed before the fusion. The fusion here
upsampled output. In the upsampling section, the network is
is also stitching.
allowed to propagate context information to higher
resolution layers because of the large number of feature
channels. It is worth noting that the network uses only the
parts that are valid for each convolution, only the pixels
available in the context of the input image.
A. Neural network training process
Acquisition of training sample set: including training data
collection, analysis, selection and preprocessing. Firstly, the

Authorized licensed use limited to: UNIVERSITE DE LA MANOUBA. Downloaded on January 05,2021 at 18:20:02 UTC from IEEE Xplore. Restrictions apply.
most important input mode should be determined from a
large amount of measurement data. That is to say, correlation (5)
analysis of measurement data is carried out to find out the
most important quantity as input. After determining the main If it is not an edge point, there are eight adjacent pixels:
input quantity, we should preprocess it, change the data to a
certain range, such as [-1, 1], and exclude wild stores. At the (6)
same time, we can check whether there are periodic, fixed
trends or other relationships. The purpose of data Therefore, the increased loss function is:
preprocessing and analysis is to make the data easy for
learning and training of neural network. When choosing the
network type and structure, the network structure and
parameters (the number of layers, the number of nodes in
each layer, the initial weights, the learning algorithm, etc.)
will be determined after the network type is determined.
Training and testing: The training samples are used to (7)
train the network repeatedly until the appropriate mapping The final loss function is:
results are obtained.
B. Loss function (8)
The loss function includes the original loss function and
C. Result:
the improved part of Unet network, and the sum of the two
is the final value. The loss function of Unet network is The experiment was conducted under limux-python 3.7
obtained by cross-entropy and predicted by Softmax. The environment. 179 pictures were trained and 36 pictures were
specific process is as follows: tested.
Among them, the formula of accuracy is as follows:
(1)
Where p (x) is the actual value and Q (x) is the predicted (9)
value. Among them, TP is the real case and FP is the false
case.
The recall rate indicators are as follows:
(2)
The value obtained by Softmax is the predicted value of (10)
point i. Because of One-Hot tag, cross-entropy can be Among them, FN represents false negative cases.
simplified as follows: The final image comparison results are shown in the
following figure. Here, we just take one of the pictures as an
example.
(3)
In addition, based on the loss function, a local smoothing
regularization is proposed to obtain better spatial
consistency results, which makes us get rid of the complex
morphological operations commonly used in traditional
methods. Multitask loss is used to obtain both segmentation
and edge detection results. Additional structured edge
detection branches can further refine the segmentation
results and significantly improve the edge accuracy.
At the edge of the sea and the land, there are often some
fine structures. It is often difficult to obtain fine results by
dividing the network. In this paper, the loss function of the
Figure 2: Satellite imagery
U-net network is changed by the research results of Dongcai
Cheng et al. Whether the point is the probability of an edge
point, then identifies two sets of edge points, one on land
and one on the ocean.
Select the edge points on the picture, each pixel is
denoted as i, and the adjacent 8 pixels of each pixel are
recorded as i(j) (j takes an integer from 1 to 8). The
probability that the pixel is an edge point is pi, pi(j).
For edge points on the ground, the loss function is:

(4)
For the edge points at sea, the loss function is:
Figure 3: Image after Unet network segmentation (Where the white area is
land and the black area is water)

Authorized licensed use limited to: UNIVERSITE DE LA MANOUBA. Downloaded on January 05,2021 at 18:20:02 UTC from IEEE Xplore. Restrictions apply.
 In the current situation, there is no good solution to
the influence of illumination. The visual error caused
by light and the problem of pixel recognition make
the extraction of edge points more difficult. Different
angles and different times will change the
segmentation state of the same coastline.
IV. CONCLUSION
In this paper, water area extraction is mainly based on
Unet network. The purpose of fine edge segmentation is
achieved by modifying the loss function. The advantage of
Unet network is that it does not need to deal with a large
amount of data. Therefore, the network greatly optimizes the
Figure 4: Processed image after improving the Unet network loss function requirements of traditional convolution network for the
(Where the white area is land and the black area is water r)
number of training images. In addition, Unet network can be
better. Processing larger images, for satellite remote sensing
images can be better received and processed. The
modification of loss function is actually a process of
thinning the loss function of the original network in a sense.
Using multi-task loss to obtain both segmentation and edge
detection results, the edge points can be judged more
accurately, and thus more precise results can be obtained.
V. ACKNOWLEDGMENTS
This work was supported in part by the Chang Jiang
Scholars Program under Grant T2012122, and in part by the
Hundred Leading Talent Project of Beijing Science and
Technology under Grant Z141101001514005.
REFERENCES

[1] Cheng D, Meng G, Cheng G, et al. SeNet: Structured Edge Network


for Sea-Land Segmentation[J]. IEEE Geoscience & Remote Sensing
Letters, 2017, 14(2):247-251.
[2] Yu L, Wang Z, Tian S, et al. Convolutional Neural Networks for
Water Body Extraction from Landsat Imagery[J]. International
Journal of Computational Intelligence & Applications, 2017, 16(01):6.
[3] Meng Weican. Research on Intelligent Extraction of Water Boundary
from Remote Sensing Images [D]. PLA University of Information
Engineering, 2012.
[4] Arnab A, Zheng S, Jayasumana S, et al. Conditional Random Fields
Meet Deep Neural Networks for Semantic Segmentation: Combining
Probabilistic Graphical Models with Deep Learning for Structured
Prediction[J]. IEEE Signal Processing Magazine, 2018, 35(1):37-52.
[5] Teichmann M T T , Cipolla R . Convolutional CRFs for Semantic
Figure 5: Loss function change curve Segmentation[J]. 2018.
[6] Liu Z , Li X , Luo P , et al. Deep Learning Markov Random Field for
It can be seen that the segmentation effect after changing Semantic Segmentation[J]. IEEE Transactions on Pattern Analysis
the loss function is better than the U-net network alone, and and Machine Intelligence, 2017:1-1.
the segmentation result is more detailed. Moreover, it can be [7] Chen L C , Papandreou G , Schroff F , et al. Rethinking Atrous
seen from the loss function curve that the loss function can Convolution for Semantic Image Segmentation[J]. 2017.
be at a relatively low value, therefore, this is an [8] Zhou Z , Siddiquee M M R , Tajbakhsh N , et al. UNet++: A Nested
U-Net Architecture for Medical Image Segmentation[J]. 2018.
improvement in sea and land segmentation, water extraction.
[9] Jonathan Long, Evan Shelhamer, Trevor Darrell. Fully convolutional
D. Currently existing problems networks for semantic segmentation, 2015 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), 15 October 2015.
 Acquisition of remote sensing image samples. The
lack of remote sensing scenes leads to a better
comparison between algorithms, which limits the
research progress of the algorithm.

Authorized licensed use limited to: UNIVERSITE DE LA MANOUBA. Downloaded on January 05,2021 at 18:20:02 UTC from IEEE Xplore. Restrictions apply.

You might also like