Comparing U-Net Convolutional Network With Mask R-CNN in Agricultural Area Segmentation On Satellite Images

2020 7th NAFOSTED Conference on Information and
Computer Science (NICS)
Comparing U-Net Convolutional Network with

Mask R-CNN in Agricultural Area Segmentation on
Satellite Images
Thinh Tran Pham Quoc Tam Tran Linh Thu Nguyen Tran Minh
Faculty of Information Technology Faculty of Information Technology Faculty of Information Technology
VNUHCM – University of Science VNUHCM – University of Science VNUHCM – University of Science
Ho Chi Minh City, Viet Nam Ho Chi Minh City, Viet Nam Ho Chi Minh City, Viet Nam
1612659@student.hcmus.edu.vn 1612589@student.hcmus.edu.vn ntmthu@fit.hcmus.edu.vn
Abstract—Deep learning is the fastest-growing trend in deep learning include automatic speech recognition, image
statistical analysis of remote sensing data. Deep learning models processing identification, natural language processing,
are used for information processing of spectral steps, bioinformatics, etc.
identification statistics, segmentation and classification of the
objects in satellite images, etc. Image segmentation could help to Deep learning in agricultural satellite image processing is
make the object statistics more accurate by separating the one part of applying deep learning methods in identifying and
objects from the background. In this paper, we propose segmenting the agricultural objects on satellite images. Along
knowledge of Mask R-CNN and U-Net in satellite imagery with the development of deep learning models in image
segmentation, and we also make an experiment for these models processing [6][7], this topic has been studied in many
to show the appropriateness in this field. Experimental result of
the mean average precision (mAP) on dataset of Vietnam
scientific works for a long time. Some previous researches
satellite images is 95.21% for Mask R-CNN and 92.69% for U- are Land Use and Land Cover Classification [8], Forest
Net. Classification and Structural Estimation [9], Agricultural
Land Detection in Insular Areas using improved AlexNet
Keywords—Deep learning, segmentation, Mask R-CNN, U- network model [10], etc. These previous researches indicate
Net, agricultural areas, satellite images. that the identification and segmentation of agricultural areas
with deep learning model have high applicability in various
I. INTRODUCTION
fields: such as agricultural mapping, calculation of harvested
Satellite imagery or remote sensing imagery is usually agricultural yields, and estimation of the amount of fertilizer,
known as the data obtained from the satellite, civil aircraft, etc. The experimental works mentioned are tested only on the
dedicated aircraft, or other drones. Their purpose is to show dataset in Western countries (The Americas or Europe),
objects on the surface of the earth through sensors and video where science and technology are developed with modern
cameras. Typically, satellite imagery is used for studying and high-resolution machines. Their findings raise a big
measurements, gathering objective information, and studying question about the effectiveness of agricultural satellite
the surface of the Earth or different planets. The result of imagery in developing countries, especially in Vietnam.
those studies affect many areas of life such as weather Beside the fact that Vietnam's agriculture accounts for 17.4%
forecast, natural disaster forecast in meteorology; monitoring of its GDP in 2015, Vietnam is also the second-largest rice
the rate of desertification, the rate of coastal erosion in exporter in the world. This means that Vietnam may represent
geology; monitoring forest cover, warning and monitoring the agriculture of Eastern countries.
forest fires in forestry and many other fields.
Regarding the segmentation problem of agricultural areas
In agriculture, remote sensing is also used in on satellite imagery, there are very few experimental
management, statistics, and farming. Its applications could be researches on both Mask R-CNN and U-Net models, in which
mentioned as forecasting and managing agricultural the researchers compare their effectiveness, advantages, and
production, remote sensing information and geographic disadvantages. Mask R-CNN model [13], proposed by K. He
information, surveying and mapping agricultural maps, et al, was developed from Faster R-CNN with an additional
warning deforestation, monitoring normalized difference branch in the architecture to create a mask layer for the object
vegetation index (NDVI), etc [1][2]. The emergence of segmentation. Since then, Mask R-CNN has been
Agriculture 4.0 attributes to practical applications of experimented on many different satellite imagery processing
agricultural monitoring through satellite imagery. problems. All of them achieve high efficiency [14][15].
Deep learning, a small branch of artificial intelligence [3], Mask R-CNN surpasses the two winners of COCO Challenge
with many works and researches after 2012, has impacted on [16]: MNC [17] (2016) and FCIS [18] (2015). Moreover,
different aspects of life through solving many practical regarding the effectiveness of experiments on many problems
problems [4] such as health care, data processing, stock of satellite image segmentation [14][15], Mask R-CNN is
analysis, physical recognition, etc. Deep learning models are hoped to bring about the remarkable advances in agricultural
faster and more accurate than other methods in solving area segmentation on satellite imagery. Besides, U-Net [19],
problems of artificial intelligence. The reason is that they are the model proposed by O. Ronneberger et al with the original
built on the human neural network-an architecture that has purpose of serving the segmentation of biomedical objects,
many advantages in information transmission and processing has quickly been experimented in segmenting different types
[5]. The areas of science that can be well addressed by using of objects in various fields after showing its high efficiency
978-0-7381-0553-6/20/$31.00 ©2020 IEEE 124

[20][21][22]. In processing satellite imagery, U-Net was also model achieves certain effectiveness in the agricultural area
experimented with the segmentation of buildings [23], urban segment. The drawback of this model is that its architecture
areas [24], etc. These experiments are yielding high-precision is relatively complex so it requires a large number of
prediction; therefore, this shows that U-Net can also be a convolution layers, which leads to model training and testing
good model for agricultural area segmentation on satellite in a large area with the requirement of large computer
images. Due to these reasons, we decided to experiment and resources (at least 16GB of RAM for wide-area training and
compare the two models of Mask R-CNN and U-Net in the testing). When comparing FCN with another model (SegNet),
agricultural areas object segmentation on satellite images. the experimental study by M. Yang and colleagues [12]
This work will help to highlight the effectiveness of Mask R- showed the higher efficiency level of FCN in the segment of
CNN and U-Net in supporting agricultural remote sensing. In rice areas. However, both models have limitations with the
addition, the experiment is conducted on two datasets: one images of large areas, when the objects that need segmenting
dataset with images obtained in Vietnam, and the other with get smaller. U-Net is found to be a simple yet effective model
images obtained in other regions (The Americas or Europe). through different studies that goes beyond the original
This method not only confirms the accuracy of Mask R-CNN proposals of U-Net. With this finding, U-Net seems to be
and U-Net on satellite imagery with high-quality images of more optimal than FCN during training and remains
Western countries in previous works but also tests their competitively accurate in object segmentation. With the U-
effectiveness in the case of Eastern agricultural areas shaped architecture of symmetric encoders and decoders, this
(especially in Vietnam). helps U-Net to synthesize the attributed information of
objects more accurately. U-Net model has also been used in
In this paper, we present the results of research and
processing agricultural satellite imagery by experiment.
experiment on two Mask R-CNN and U-Net models in the
Andrei Stoian [26] utilized U-Net to segment the distribution
agricultural image segmentation. The paper includes 6 main
of many soils on satellite images and compared traditional U-
parts: part I-an overview of the work, part II-the current state
Net to fine-tune U-Net. However, due to the focus on many
of deep learning application in satellite image processing in
types of objects (17 types), the highest accuracy in the object
agriculture, part III-the process of two deep learning Mask R-
segment of this work is achieved when segmenting water
CNN and U-Net models for the agricultural area
areas, not agricultural areas. Mask R-CNN is a model with
segmentation on satellite images, part IV-the detailed process
the nature of solving object recognition problems like those
of training for two test models, part V-the predicted results of
in Charou's research [25]. However, it is more optimal than
Mask R-CNN and U-Net for real images, the accuracy of
Charou's models due to an additional branch used in the
each model and the comparison of their effectiveness in the
object segmentation. Mask R-CNN has been proved to be
agricultural area segmentation on satellite images, part VI-
effective in many object segmentation problems
conclusion and future development for this work.
[15][19][27][28], including segmentation in satellite
II. DEEP LEARNING IN AGRICULTURAL imagery. W. Zhang [14] applied this model in the segment of
SATELLITE IMAGE SEGMENTATION the Arctic glaciers with mAP accuracy of 70%. Another work
of L. Chen [15] tested Mask R-CNN in the segment of urban
The processing of agricultural satellite imagery consists areas in Xiamen (China), with 90% precision and 87.18%
of problems on different levels of complexity. The first level
recall. Particularly in the segment of agricultural areas on
of image processing is object classification. In their study, P.
satellite images, the Mask R-CNN model does not seem to be
Helber's authors used deep learning model to classify 10 land a matter of concern and does not have its own work to test the
objects and achieved a high efficiency with more than 98% effectiveness. Through what Mask R-CNN demonstrates in
accuracy in classification. The next level of image processing other studies, this model showed the possibility of high
combines object classification and object detection. The accuracy in the test of segmenting agricultural area objects on
study of T. Chang et al applied deep learning model to satellite imagery. Besides, Mask R-CNN also has a high
classify and identify four different types of forests. With the
applicability because it is possible to take advantage of object
synthesis, statistics, and comparison of models, the author
positioning predictions (bounding box) combined with object
Charou [25] has conducted research and compared various
segmentation prediction results (mask layer). When
deep learning methods in identifying agricultural area objects
comparing the effectiveness of the two models Mask R-CNN
on satellite images. The methods that Charou tested mostly and U-Net, T. Zhao's experiment [29] put these two in the
achieved high-result performance. However, the above works evaluation of agricultural satellite imagery segmentation,
only highlight the effectiveness when solving the problem of with the object of pomegranate canopy. This comparative test
recognizing objects from satellite images of some deep indicates that the Mask R-CNN gives better results than U-
learning models without mentioning the ability to segment Net. However, the test object of the authors is low complexity
objects. and the test dataset does not have enough noise level, so the
At the higher level when processing agricultural satellite assessment is not highly referential in our opinion.
imagery, object segmentation is concerned by many Through the stage of studying the current state of deep
researchers. Object segmentation is the process of separating learning in image processing, it can be seen that many models
objects from the background data by defining their margins.
have been used for the effectiveness test, from low-level
It makes observing and analyzing objects on images more
image processing models such as classification and object
closely and accurately. Regarding the problem of agricultural detection, to higher-level ones such as object segmentation.
area segmentation on satellite imagery, the study of K.M. Especially, in processing satellite image segmentation, many
Masoud [11] has experimentally applied the improved FCN
popular models have been tested on many types of
model to solve problems. He reaches the conclusion that this
125
agricultural objects. However, among them, the effectiveness the two models when solving segmentation problems of
verification and comparison of Mask R-CNN and U-Net agricultural areas on satellite imagery will be evaluated.
models in the problem of agricultural area segmentation seem Step 6 - Processing predicted results: In this step, we will
to be less studied and experimented. In our research and post-process the results of the test model to get more accurate
experiment work, we will also conduct and compare the results. False predictions will be removed. Add missing
effectiveness of two Mask R-CNN and U-Net models in the predictions and refine those that are not completely accurate.
agricultural satellite image segment particularly in The refinement makes the segmentation values more
agricultural areas which are more general and more complex. accurate, which could be used for various purposes such as
creating new training datasets or using agricultural area
III. EXPERIMENTAL PROCESS statistics. The more accurate an image is, the more reliable it
is to apply to statistics.
IV. IMPLEMENT AND TRAINING
A. Training data
Fig. 1. Process of satellite image segmentation In remote sensing, information collection devices play an
extremely important role because their accuracy affects the
The experimental process consists of 6 main steps as reliability of later analysis and statistics. However, these
shown in “Fig. 1”. devices are relatively expensive. Therefore, with small and
limited experimental research scale in terms of funding, we
Step 1 - Data collection: Data is one of the important
will use available and free satellite image data sources.
factors, directly affecting the accuracy of the model.
Google Map is a popular software integrated satellite view
Therefore, we had created our own dataset that was the most
suitable to the requirement of our experiment. The dataset function that we use for data collection. This is a software
was collected using Google Map and Microsoft's Snipping with relatively wide image altitude (5m to 2000km) and
Tool, with a total of 2400 images for the two datasets of extremely flexible, which can be adjusted depending on the
agricultural areas in Vietnam and in other countries. Besides characteristics of the desired vision. Our dataset is the small
the quantity of each sample object, the variety of categories self-collected from Google Map public domain. During our
should be sufficient to be learned the necessary attributes. data collection, selected images have at least one object in
each image. In addition, the ratio of the object needed to be
Step 2 - Assigning data labels: In this step, VGG Image segmented (agricultural land) is usually over 50% in each
Annotator tool [30] was used to assign location labels of the
image. In the labeling process, the objects will be separated
objects on the dataset photos. Data labeling is also a very
based on the boundary of each type of land. Besides,
important stage, as the model will rely on these locations to
train. If labeled incorrectly, the model would fail in learning confusing data will also be restricted to the labeling process,
the object's properties, which leads to the inaccuracy in such as tree shadows, tractors, wells, ... The number of
prediction. objects in the images is in ranges from 1 to 20 depending on
the image.
Step 3 - Model training: The core task of the model
training process is based on the provided data to extract the The dataset we extracted includes 2400 images, which are
features of labeled objects. These features are put in the divided into two datasets, a satellite image of Vietnam
model to "remember". As a result, when testing with another agricultural area (Vietnam) and a satellite image of
picture (not been trained yet), the model could recognize and agricultural area of other regions (NonVietnam). The images
confirm the objects. Different models have different learning taken are mostly concentrated in the key agricultural areas.
styles, so it is important to adjust the learning parameters so The image taking on each dataset is as the rate follows: the
that the training process achieves the best results. Vietnam dataset with the South around 75%, the North
Step 4 - Model testing: Testing helps to identify whether around 20%, the Central around 5%. And the NonVietnam
the model is learned properly. If not yet, the training process dataset with America around 40%, Europe around 40%,
needs to refine the data or adjust the learning parameters more Africa around 15%, Asia around 5%. Compared to other
appropriately. Then, the models would be replayed with the regions, Vietnam is a country with an intricate and dense
training process to achieve the best results. The identification network of rivers. Therefore, the agricultural satellite images
could be made from the predicted results in a picture or from of this area often contain many interfering objects (houses,
the learning curve graph. Because each training state reflects rivers, lakes, canals,...). In addition, the river network
a certain meaning of the model, it’s necessary to rely on characteristics also affect the planning of agricultural land.
different factors such as the requirements of the problem, the This causes the diversity in the shape and size of agricultural
criteria of the dataset, the features of the object to choose an lands in Vietnam. Besides, the application of high technology
optimal model that is the most suitable for solving each such as segmenting agricultural using commercial satellites
separate problem. is limited, so having a full set of agricultural images of
Step 5 - Commenting and evaluating the model: Based on different regions in Vietnam is an obstacle. Details of training
the data obtained in step 4, as well as the accurate data division are shown in “TABLE I”.
measurement of the models, we will make comments in
comparison with the model theory. Simultaneously, the two TABLE I. DETAIL OF TRAINING DATA
experimental models will be compared in terms of the quality Dataset Total Training Validation Testing
and effectiveness in the segmentation on the same images and Vietnam 650 480 120 50
then, the results will be interpreted. Thus, the performance of NonVietnam 1750 1280 320 150
126
B. Model training V. RESULT AND EVALUATION

A. Mask R-CNN
Fig. 2. Labeling data
After data collection, we carried out data labeling using

VGG Image Annotator software (VVG IA) [30] with multi-
point drawings to identify the agricultural areas on the image.
The difficulty in labeling is to define the right areas to Fig. 4. The Error of Mask R-CNN on the two datasets
identify, because the agricultural area on the image is an
easily confused object (with lawns, soccer fields, ponds, etc.). After the training process, we had the error curve of Mask
For this research, we define areas that are considered R-CNN as shown in “Fig. 4”. The curve indicates that after
agricultural areas, including wet rice and wheat (usually reducing to the best value, the error of the training process
green or yellow) areas, upland areas (divided crops), newly begins to increase due to the phenomenon of learning the
tilled muddy areas (usually brown), etc. After defining the features of the training dataset (over fitting). This soon
agricultural partitions, VVG IA will export a .json file that happens because the learning rate is too high while the
stores the coordinates of the objects, the training of the Mask training takes too many steps (epoch). In addition, the large
R-CNN model will then use this file to extract the partitions parameter and error function of the Mask R-CNN is so
and learn properties on them. With U-Net, this model requires complicated that it leads to a high fluctuation range when
learning data which are images corresponding to the model's assessing the error between consecutive training steps.
output value (the black-and-white image showing the mask
of an object). Therefore, taking advantage of the labels of To verify the effectiveness of the model, the early
Mask R-CNN, we took an additional step of processing the stopping was implemented, and the model was chosen at the
partitions into black and white to apply them to U-Net training step with the lowest error to perform the evaluation.
training (in “Fig. 2”). On Vietnam, we chose the model at training step 46 (loss =
1,049) and on NonVietnam at training step 78 (loss = 0.923).
In Mask R-CNN, during training, the model relies on the The Mask R-CNN evaluation was performed by the
.json file to detect the location of the object in the image, from Intersection over Union (IoU), precision, recall and mAP
which it splits into each partition and put into the network function. The obtained results are demonstrated in “TABLE
architecture [13] to extract and aggregate attributes. By III”. In particular, IoU shows the accuracy of prediction
repeatedly studying over the entire dataset, the training will through the overlap of the predicted values and the correct
produce models that have been "taught" to identify label of the object (the parameter 0.5 is interpreted as the
agricultural areas (in “Fig. 3”). predictions considered to be correct when the overlap is over
50%), Precision shows the accuracy of the projections. To
Our training of Mask R-CNN and U-Net models will
predict the model generated, Recall shows the omission of
adjust the hyper-parameters depending on each dataset, as
correct objects that the model cannot predict, and mAP is the
shown in “TABLE II”.
average parameter of the Average precision-the parameter
TABLE II. VALUE OF SUPPER PARAMETES IN TRAINING FOR that measures the model's prediction accuracy (details in
MASK R-CNN AND U-NET [31]).
Super parameters Vietnam NonVietnam “TABLE III” shows that Mask_NonVietnam has
Mask R-CNN 200 200 achieved better evaluation results. This is because the number
Epoch
U-Net 50 100
of data used in training process of NonVietnam is greater than
Batch_size 1 1
Step_per_epoch 500 1300 that of Vietnam.
Mask R-CNN 0.0001 0.0001
Learning_rate TABLE III. THE PARAMETERS EVALUATE THE ACCURACY OF
U-Net 0.001 0.001
Validation_step 150 350 MASK R-CNN MODEL
Weight_decay 0.0001 0.0001
Model IoU (0.5) Precision Recall mAP
Mask_Vietnam 0.7980 0.7521 0.5875 0.8574
Mask_NonVietnam 0.8070 0.7423 0.6352 0.9521
B. U-Net
The error of U-Net is shown in “Fig. 5”. It is clearly that
the error of the training process tends to decrease, then with
Fig. 3. The Mask R-CNN model to be processed during the training slow speed. The model achieved the best error value on
Vietnam training in the 11th training step (loss = 0.2902), on
NonVietnam training in the 34th training step (loss = 0.1720).
Conducting the evaluation of these two models at the best
training steps by the values of IoU and Dice, we get the
results as in “TABLE IV”. Dice parameter (the Dice
127
coefficient) is a parameter to evaluate another accuracy, agricultural areas and zoning is relatively clear. For U-Net,
which is calculated based on the overlap of the predicted the segment values are still noise. However, the boundaries
value and the correct label of object, same as IoU, but with a of the objects on U-Net are proving to be more effective and
different formula. accurate than that of the Mask R-CNN.
Similar to Mask R-CNN, in “TABLE IV” In “Fig. 8”, the original image feature is the agricultural
U-Net_NonVietnam achieves better values when the amount area of the Americas, U-Net shows superiority in audience
of training data is greater than that of U-Net_Vietnam. segmentation, although there is still noise in a small green
area (right above). In contrast, this Mask R-CNN has shown
poor partition when the wrong partition caused many objects
and the mask value of the right object is not really accurate.
In “Fig. 9”, the characteristic of the original image shows
that there are many objects to identify, with many small
objects and relatively blurred boundaries. As a result, U-Net
and Mask R-CNN are both poor identity and partitioning. In
the U-Net model, although the model has segmented well
most areas of the fields in the image above, it has merged
Fig. 5. The Error of U-Net on the two datasets several objects into one (right below) as the boundaries of the
objects are hard to recognize. However, in general, the
TABLE IV. THE PARAMETERS EVALUATE THE ACCURACY OF U- subjects labeled as agricultural areas are correctly identified
NET
and segmented by U-Net. For the Mask R-CNN case, some
Model IoU (0.5) Dice mAP model agricultural area objects have not yet been identified
UNet_Vietnam 0.7284 0.8368 0.7924 (left bottom) or have mistakenly identified the jamming
UNet_NonVietnam 0.8302 0.9048 0.9269
object (middle), and the results of the segment (mask) still
C. Comparison and discussion contain several mistakes. However, Mask R-CNN performs
well in partitioning the image into separate objects, without
After the process of experimenting and evaluating Mask
segmenting multiple objects into one as by U-Net.
R-CNN and U-Net models on two datasets of agricultural
areas in Vietnam and in other regions, we have highlighted In many cases, U-Net segmentation predictions are more
several observations and achieved high results in precision accurate than these of Mask R-CNN when the boundaries of
experiments. It could be seen that the amount of data used in the object are clearly defined. However, the boundary
training greatly affects the learning process of the models. determination of U-Net will be noisy if there are many
Apparently, the values of evaluation parameters on the obstructions. For Mask R-CNN, the mask layer of Mask R-
Vietnam data are always lower than on the NonVietnam data. CNN is ineffective and has no absolute accuracy in some
This is also affected by the quality of training data because cases. In addition, by defining segmented objects as
the sharpness of the Vietnam data is lower than that of the individual mask layers (with different colors as above), the
NonVietnam data. Low resolution causes difficulty in predictions of Mask R-CNN will be more valuable in
learning the attributes of identifiable objects on models and practical applications than these of U-Net. In general,
easy confusion with other objects. However, the deviations in agricultural area zoning on satellite imagery can be well
the evaluation parameters are not high between the two handled by both Mask R-CNN and U-Net models.
datasets when compared) in each model. It can be concluded
that deep learning models are completely consistent with
Vietnam agricultural areas and are not far less effective in
comparison with Western countries. In addition, our
experiment has shown that Mask R-CNN and U-Net can be
applied to solve agricultural area segmentation on satellite Fig. 6. a. Original image - b. U-Net’s prediction - c. Mask R-CNN’s
images. prediction - d. Mask R-CNN’s mask
We selected the best two models during the training of

Mask R-CNN and U-Net to make a comparison of the
predicted results. As shown in “TABLE III” and “TABLE
IV”, we choose Mask_ NonVietnam at epoch 78 and U-
Net_NonVietnam at epoch 99. Different predicted images of
Fig. 7. a. Original image - b. U-Net’s prediction - c. Mask R-CNN’s
the models are listed in “Fig. 6, 7, 8” and “Fig. 9”. prediction - d. Mask R-CNN’s mask
In “Fig. 6”, the original image feature is Vietnam's
agricultural area. Mask R-CNN is more effective in
identifying and segmenting agricultural areas with similar
properties and blurred boundaries. However, the fractional
values of Mask R-CNN have not been highly accurate.
Fig. 8. a. Original image - b. U-Net’s prediction - c. Mask R-CNN’s
In “Fig. 7”, the original image feature is European prediction - d. Mask R-CNN’s mask
agricultural area (Russia). Mask R-CNN has once again
proved to be more efficient when the identification of
128
[10] E. Charou: Deep Learning for Agricultural Land Detection in Insular

Areas. 10th International Conference on Information, Intelligence,
Systems and Applications (IISA), PATRAS, Greece, pp. 1-4. IEEE
(2019).
[11] Masoud, K.M.; Persello, C.; Tolpekin, V.A: Delineation of
Agricultural Field Boundaries from Sentinel-2 Images Using a Novel
Super-Resolution Contour Detector Based on Fully Convolutional
Fig. 9. a. Original image - b. U-Net’s prediction - c. Mask R-CNN’s Networks. Remote Sens vol 12 (1), pp. 59 (2020).
prediction - d. Mask R-CNN’s mask [12] Yang, M.-D.; Tseng, H.-H.; Hsu, Y.-C.; Tsai, H.P: Semantic
Segmentation Using Deep Learning with Vegetation Indices for Rice
Lodging Identification in Multi-date UAV Visible Images. Remote
VI. CONCLUSION AND FUTURE WORK Sens vol 12, pp. 633 (2020).
[13] He, K., Gkioxari, G., Dollár, P., & Girshick, R.B.: Mask R-CNN. 2017
In this paper, we experimented the two image IEEE International Conference on Computer Vision (ICCV), pp. 2980-
2988 (2017).
segmentation models (Mask-RCNN and U-Net) for [14] Zhang, W.; Witharana, C.; Liljedahl, A.K.; Kanevskiy, M. Deep
segmenting Vietnam's agricultural land. The experiment Convolutional Neural Networks for Automated Characterization of
Arctic Ice-Wedge Polygons in Very High Spatial Resolution Aerial
using satellite image dataset which we extracted from Google Imagery. Remote Sens, vol 10, pp. 1487 (2018)
Map. It showed that the quality of the image affects the [15] Chen, L., Xie, T., Wang, X., & Wang, C.: Identifying urban villages
from city-wide satellite imagery leveraging mask R-CNN. In: Ju Z.,
effectiveness of the model. In the past, Unet was proposed to Yang L., Yang C., Gegov A., Zhou D. (eds) Advances in
Computational Intelligence Systems. UKCI 2019. Advances in
use in the medical field, with high quality data. Therefore, Intelligent Systems and Computing, vol 1043, pp 29. Springer, Cham
when training U-Net on the NonVietnam dataset whose (2019).
[16] International Conf. on Computer Vision ICCV 2019,
images are more high quality than on the Vietnam dataset, the http://cocodataset.org/workshop/coco-mapillary-iccv-2019.html, last
efficiency of U-Net is significantly improved when compare accessed 2020/02/20.
with the Mask R-CNN's improvement. The models have [17] Dai, J., He, K., & Sun, J.: Instance-Aware Semantic Segmentation via
Multi-task Network Cascades. 2016 IEEE Conference on Computer
achieved high accuracy when applied to data of agricultural Vision and Pattern Recognition (CVPR), pp. 3150-3158 (2015).
areas in Vietnam and other regions of the world. This is [18] Li, Y., Qi, H., Dai, J., Ji, X., & Wei, Y.: Fully Convolutional Instance-
Aware Semantic Segmentation. 2017 IEEE Conference on Computer
shown clearly through the values of the model evaluation Vision and Pattern Recognition (CVPR), pp. 4438-4446 (2016).
parameters. However, to be able to use the segment results of [19] Ronneberger, O., Fischer, P., Brox, T.: U-Net: Convolutional Networks
for Biomedical Image Segmentation. In: Navab N., Hornegger J., Wells
the models in real-life applications, it is necessary to improve W., Frangi A. (eds) Medical Image Computing and Computer-Assisted
Intervention – MICCAI 2015. MICCAI 2015. Lecture Notes in
the accuracy of the model's prediction. Computer Science, vol 9351. Springer, Cham (2015)
[20] Fang, W., Fu, L., Zhang, M., & Li, Z.: Seismic data interpolation based
In the future, we will process the predicted values of Mask on U-Net with texture loss. ArXiv, abs/1911.04092 (2019).
R-CNN by clearing false predictions, redefining inaccurate [21] Chen, J., Viquerat, J., & Hachem, E.: U-Net architectures for fast
prediction of incompressible laminar flows. Semanticscholar(2019).
predictions and adding missing ones. The combination of [22] Souza, R., Bento, M.P., Nogovitsyn, N., Chung, K., Lebel, R.M., &
model predictions and manual results will create an Frayne, R.: Dual-domain Cascade of U-Nets for Multi-channel
Magnetic Resonance Image Reconstruction. ArXiv, abs/1911.01458
absolutely accurate and reliable dataset. These data can (2019).
improve the training model, reduce the complexity of manual [23] Iglovikov, V.I., & Shvets, A: TernausNet: U-Net with VGG11
Encoder Pre-Trained on ImageNet for Image Segmentation. ArXiv,
data labeling and can be applied to remote sensing support abs/1801.05746 (2018).
such as land area statistics, mapping, etc. [24] Yi, Y., Zhang, Z., Zhang, W., Zhang, C., Li, W., Zhao, T.: Semantic
Segmentation of Urban Buildings from VHR Remote Sensing Imagery
Using a Deep Convolutional Neural Network. Remote Sen vol 11 (5),
REFERENCES pp. 1774 (2019).
[1] Satellite Imaging Corporation, [25] E. Charou: Deep Learning for Agricultural Land Detection in Insular
https://www.satimagingcorp.com/applications/natural- Areas: In 10th International Conference on Information, Intelligence,
resources/agriculture/, last accessed 2020/02/20. Systems and Applications (IISA), pp. 1-4, PATRAS, Greece (2019).
[2] Planet, https://www.planet.com/markets/monitoring-for-precision- [26] Stoian, A., Poulain, V., Inglada, J., Poughon, V., Derksen: D. Land
agriculture/, last accessed 2020/02/20. Cover Maps Production with High Resolution Satellite Image Time
Series and Convolutional Neural Networks: Adaptations and Limits for
[3] Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., Operational Systems. Remote Sens vol 11 (17), pp. 1986 (2019).
Pietikäinen, M: Deep learning for generic object detection: A survey.
International Journal of Computer Vision vol 128, pp. 261-318. [27] Qadir, H., Shin, Y., Solhusvik, J., Bergsland, J., Aabakken, L.,
Semanticscholar (2020). Balasingham, I.:Polyp Detection and Segmentation using Mask R-
CNN: Does a Deeper Feature Extractor CNN Always Perform Better?.
[4] Dargan, S., Kumar, M., Ayyagari, M.R. et al: A Survey of Deep In 13th International Symposium on Medical Information and
Learning and Its Applications: A New Paradigm to Machine Learning. Communication Technology (ISMICT), pp. 1-6 (2019).
Arch Computat Methods Eng (2019).
[28] Minkesh, A., Worranitta, K., & Taizo, M.: Human Extraction and
[5] Marktechpost, Scene Transition utilizing Mask R-CNN. ArXiv, abs/1907.08884
https://www.marktechpost.com/2019/04/18/introduction-to-neural- (2019).
networks-advantages-and-applications/, last accessed 2020/02/20
[29] Tiebiao Zhao; Yonghuan Yang; Haoyu Niu; Dong Wang; YangQuan
[6] L. Jiao et al.: A Survey of Deep Learning-Based Object Detection: In Chen: Comparing U-Net convolutional network with mask R-CNN in
IEEE Access, vol. 7, pp. 128837-128868. IEEE (2019) the performances of pomegranate tree canopy segmentation.
[7] Minaee, S., Boykov, Y., Porikli, F., Plaza, A., Kehtarnavaz, N., Proceedings Volume 10780, Multispectral, Hyperspectral, and
Terzopoulos, D., Image Segmentation Using Deep Learning: A Survey. Ultraspectral Remote Sensing Technology, Techniques and
ArXiv, abs/2001.05566 (2020) Applications VII; 107801J (2018). SPIE Asia-Pacific Remote Sensing,
[8] Helber, P., Bischke, B., Dengel, A., & Borth, D.: EuroSAT: A Novel Honolulu, Hawaii, United States (2018).
Dataset and Deep Learning Benchmark for Land Use and Land Cover [30] Dutta, A., Gupta, A., Zisserman, A.: VGG Image Annotator (VIA),
Classification. IEEE Journal of Selected Topics in Applied Earth http://www.robots.ox.ac.uk/~vgg/software/via/, last accessed
Observations and Remote Sensing, vol 12, pp. 2217-2226 (2017). 2020/02/20.
[9] Chang, T., Rasmussen, B.P., Dickson, B.G., & Zachmann, L.J.: [31] Hui, J.: mAP (mean Average Precision) for Object Detection,
Chimera: A Multi-Task Recurrent Convolutional Neural Network for https://medium.com/@jonathan_hui/map-mean-average-precision-
Forest Classification and Structural Estimation. Remote Sensing, vol for-object-detection-45c121a31173, last accessed 2020/02/20.
11 (7), pp.768 (2019).
129

Comparing U-Net Convolutional Network With Mask R-CNN in Agricultural Area Segmentation On Satellite Images

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Comparing U-Net Convolutional Network With Mask R-CNN in Agricultural Area Segmentation On Satellite Images

Uploaded by

Copyright:

Available Formats

2020 7th NAFOSTED Conference on Information and

Computer Science (NICS)

Comparing U-Net Convolutional Network with

978-0-7381-0553-6/20/$31.00 ©2020 IEEE 124

B. Model training V. RESULT AND EVALUATION

Fig. 2. Labeling data

After data collection, we carried out data labeling using

We selected the best two models during the training of

[10] E. Charou: Deep Learning for Agricultural Land Detection in Insular

You might also like