Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

International Journal of Applied Earth Observations and Geoinformation 104 (2021) 102544

Contents lists available at ScienceDirect

International Journal of Applied Earth


Observations and Geoinformation
journal homepage: www.elsevier.com/locate/jag

Exploring multiple crowdsourced data to learn deep convolutional neural


networks for road extraction☆
Panle Li a, Xiaohui He b, c, *, Mengjia Qiao a, Disheng Miao a, Xijie Cheng a, Dingjun Song a,
Mingyang Chen a, Jiamian Li a, Tao Zhou a, Xiaoyu Guo b, c, Xinyu Yan b, Zhihui Tian b, c
a
School of Information Engineering, Zhengzhou University, Zhengzhou 450001, China
b
School of Geoscience and Technology, Zhengzhou University, Zhengzhou 450001, China
c
Ecometeorology Joint Laboratory of Zhengzhou University and Chinese Academy of Meteorological Science, Zhengzhou 450001, China

A R T I C L E I N F O A B S T R A C T

Keywords: Road extraction from high-resolution remote sensing images (HRSIs) is essential for applications in various areas.
Road extraction Although deep convolutional neural networks (DCNNs) have exhibited remarkable success in road extraction, the
Deep convolutional neural networks performance relies on a large amount of training samples which are hard to obtain. To address this issue, multiple
Multiple crowdsourced data
crowdsourced data are used in this study, including OpenStreetMap (OSM), Zmap and GPS. And a multi-map
Multi-map integration model
integration model (MMIM) is developed to improve the noise robustness of DCNNs for road extraction tasks.
Refined labels
Specifically, rich geographical road information are obtained from multiple crowdsourced data, including main
roads, new construction roads, midsize and small roads, which can generate complete road training samples and
reduce the label noise. Meanwhile, by exploring the true road label information hidden in different crowdsourced
data, the MMIM is used to generate high-quality refined labels for learning DCNNs. In this case, the DCNN-based
road extraction methods have more opportunities to learn true road distribution and avoid the overfitting
problems of label noise. Experiments based on real road extraction dataset indicate that the proposed method
shows great performance, and road extraction results are smoother and more complete.

1. Introduction understanding of HRSIs (Martins et al., 2020; Huang et al., 2019). Unlike
traditional methods, DCNNs can learn multiscale semantic features
Road extraction from high-resolution remote sensing images (HRSIs) efficiently and automatically through a series of convolutional layers
has long been sought for many real-world applications. These include (Tao et al., 2019). This allows the most reliable and representative road
city planning (Máttyus et al., 2017; Gevaert et al., 2017), traffic man­ features to be extracted from HRSIs. Besides, most DCNNs consist of
agement (Li et al., 2018) and automated navigation (Grinias et al., multiple hidden layers that can learn hierarchical feature representa­
2016). However, road extraction at high quality is difficult in remote tions from raw pixel data. This is a particularly valuable characteristic
sensing image processing (Liu et al., 2019; Ding and Bruzzone, 2020). for mapping original HRSIs to road maps with an end-to-end solution.
There are many reasons for this including the complex and confusing Due to the advantages of DCNNs mentioned above, researchers have
background, diverse road shapes and poor image perspective. Further­ proposed numerous exciting DCNN-based road extraction methods, such
more, as the economy has expanded, the road topology has grown as CasNet (Cheng et al., 2017), MSMT-RE (Lu et al., 2019), BT-RoadNet
extremely complex and even most road regions are occluded by (Zhou et al., 2020), and GCB-Net (Zhu et al., 2021). Although DCNN-
increasing buildings. As a result, extracting road from HRSIs is still a based methods have enormous potential for road extraction from
challenging task and has been extensively studied over the last few de­ HRSIs, they still face huge challenges concerning road extraction in
cades (Zang et al., 2016; B et al., 2016; Ali et al., 2017). practice, especially in the area covering thousands of square kilometers.
With the development of deep learning, deep convolutional neural The reason for this is that learning DCNN-based methods for road
networks (DCNNs) have been widely and successfully applied for an extraction requires substantial training samples with sufficient diversity


This work is funded by the Science and Technology Major Project of Henan Province under Grant 201400210900.
* Corresponding author at: School of Geoscience and Technology, Zhengzhou University, Zhengzhou 450001, China.
E-mail address: hexh@zzu.edu.cn (X. He).

https://doi.org/10.1016/j.jag.2021.102544
Received 24 July 2021; Received in revised form 11 September 2021; Accepted 12 September 2021
Available online 30 September 2021
0303-2434/© 2021 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
P. Li et al. International Journal of Applied Earth Observation and Geoinformation 104 (2021) 102544

(Mnih and Hinton, 2010) since they often contain a large number of improve the completeness of the road reference map produced by GPS.
parameters. In real-world applications, it is quite easy to obtain a large In this way, the geographical road information included in OSM, Zmap
number of remote sensing images. However, generating training sam­ and GPS can achieve a great complementation and the generated road
ples is an expensive process that requires considerable time and samples can be less label noise, which is conducive to improve the
manpower (Li et al., 2021b). performance of DCNN-based road extraction methods.
In recent works, some researchers have tried to learn DCNNs from Then, we develop a multi-map integration model (MMIM) to
crowdsourced data for road extraction tasks, which aims to avoid the improve the noise robustness of DCNNs for road extraction tasks. The
expensive sample generation process. These works first utilize crowd­ MMIM can explore the true road label information hidden in multiple
sourced data to label the road regions in HRSIs and then learn DCNNs crowdsourced data and generate high-quality refined labels for learning
based on these automatically generated road samples (Yuan, 2017; Chen DCNNs. As a result, the DCNNs have more opportunities to learn true
and Zipf, 2017; Sun et al., 2019). The idea behind this method is that the road distribution and the overfitting problem can be avoided effectively.
geographical road information contained in crowdsourced data can be Specifically, the label distribution is first obtained from road reference
explored to learn DCNN-based road extraction methods. As a represen­ maps. Then, the multinoulli distribution (Rowland et al., 2018) is used
tative, Volodymyr Mnih (2013) aligned road vectors in OSM with their to refine true road label information from label distribution for gener­
corresponding HRSIs to train a five-layer convolutional neural network ating high-quality labels. Finally, these high-quality refined labels pro­
for pixelwise road segmentation. Based on their training dataset, Zhang duced by the MMIM and the corresponding HRSIs are used to train
et al. (2018) designed ResUNet to further improve the accuracy of road DCNN-based road extraction methods. The efficiency of the proposed
extraction from HRSIs. Recently, Kaiser et al. (2017) made the best use method is verified on the Zhengzhou Roads (ZZ-Roads) dataset that
of OSM to automatically derive labeled training data for building, road covers approximately 1059 km2 and contains extremely complex road
and background. Aside from OSM, the GPS trajectory that can reflect the network of Zhengzhou city in China. The contributions lie in the
road topological geometry (Qiu and Wang, 2016) is also a promising following three aspects.
crowdsourced data to label road regions in HRSIs for generating road
samples. Zhang et al. (2021) leveraged the crowdsourced GPS trajec­ 1. This is among the earliest work that learns DCNNs from multiple
tories of floating cars to label roads in HRSIs, and this produced a large crowdsourced data for road extraction tasks. It can significant
number of training samples for training LinkNet and D-LinkNet. The improve the performance of road extraction from HRSIs.
crowdsourced data significantly alleviate the data hunger for DCNN- 2. To further improve the robustness of DCNN-based road extraction
based road extraction methods (Chen et al., 2018). However, the methods, we develop a MMIM to refine the road labels from multiple
training samples produced by single crowdsourced data are often noisy crowdsourced data.
and include error labels (e.g. some ground roads are mislabeled to other 3. In addition, a challenging benchmark dataset, ZZ-Roads, is devel­
objects). Because the geographical road information provided by single oped and will be publicly available for further research, which is an
crowdsourced data is often limited in comparison to HRSIs and thus only ideal benchmark for evaluating the performance of road extraction
part of roads in HRSIs can be correctly labeled. When the label noise is methods.
present in road samples, the performance of DCNNs on road extraction
tasks can inevitably be harmed since the DCNNs have strong learning The remainder of this paper is organized as follows. Section 2 de­
capacity and easily prone to overfit the label noise (Frenay and Ver­ scribes the study area. Section 3 presents the detail introduction about
leysen, 2014; Jiang et al., 2018). proposed method. Section 4 presents the experimental results. Section 5
In literature, pioneers in remote sensing image processing and discusses both the experimental and comparative results. Finally, the
computer vision domain have developed many robust learning methods conclusion is drawn in Section 6.
to alleviate the negative effectiveness of label noise. Specifically, the
existing methods can be coarsely divided into two major categories; 1) 2. Study area
noise-tolerant methods (Menon et al., 2018; Li et al., 2019; Li et al.,
2021a) and 2) label correction methods (Jiang et al., 2019; Tu et al., In this study, the Zhengzhou, which is an important city of China, is
2019; Dong et al., 2021). However, it is inadequate to address road chosen as the study area. The location of Zhengzhou is showed in Fig. 1.
extraction problem under the influence of label noise by directly As the capital of Henan province, the Zhengzhou is the undisputed
applying the existing noise-robust methods. One of the primary reasons leader in terms of economic development, transportation and politics
is that for road extraction tasks the problem of label noise is more severe, (Wang et al., 2019; Zhang et al., 2020). The incredible success of
because most existing methods solely rely on single crowdsourced data Zhengzhou is directly bound up with powerful road system. These roads
to generate road samples. The single crowdsourced data is often biased provide a strong foundation for development of Zhengzhou. However,
from area to area and suffers from a general lack of quality assurance for these intricate and complex roads bring great difficulties for map com­
many specific areas. To be more specific, OSM covers the principle road panies to update road database. Using DCNNs to extract roads from
network around the world but it misses too many small or new roads, HRSIs provides a promising way to update road network of Zhengzhou,
particularly in the fast-developing area. GPS offers a promising way to quickly and precisely. Meanwhile, due to complex road conditions, the
record the new roads developed in a fast-developing area but it may also Zhengzhou is an ideal case for evaluating performance of road extrac­
miss small roads. Second, the road topology in practice is quite tion methods.
complicated and road morphology is generally varied in different areas.
Therefore, a more powerful robust learning algorithm is required to 3. Methodology
improve the generalization ability of DCNNs under label noise.
To address these issues, we first utilize multiple crowdsourced data, The flowchart of proposed method is shown in Fig. 3. It can be
including OSM, Zmap and GPS, to label road regions in HRSIs and divided into three main steps. In the first step, the multiple crowdsource
generate training samples, which aims to reduce the label noise in the data, including OSM, Zmap and GPS, are transformed to pixelwise road
training dataset. Specifically, (1) The OSM is used to label the main road reference maps. Then, the refined road labels are generated by exploring
regions in HRSIs. (2) Based on the road reference map generated by the potential true label information contained in multiple road reference
OSM, the up-to-date road topology information contained in GPS is maps. Finally, the refined labels with the corresponding images are used
explored to label latest construction roads in HRSIs, which are easily to train DCNNs for road extraction tasks.
ignored by OSM. (3) Finally, the Zmap, which is published by local
government and covers most midsize and small roads, is used to further

2
P. Li et al. International Journal of Applied Earth Observation and Geoinformation 104 (2021) 102544

Fig. 1. The location of the study area (Zhengzhou) in China.

point density rasterization algorithm (PDRA), which aims to generate


pixelwise road reference map using GPS trajectories.

3.1.1. Road Centerline Rasterization Algorithm


Crowdsourced road data of OSM and Zmap only contain the road
centerlines and provide no information about road widths. Therefore,
the RCRA is proposed to generate pixelwise road reference maps from
road centerlines.
First, the road centerlines contained in OSM or Zmap are rasterized
based on the geographic positioning system. Specifically, let point =
(lonp , lat p ) denotes any point in the road centerlines, and we map the
latitude and longitude coordinates of point to the spatial coordinate (px ,
py ) by the following formula:
⎧ (( ) )

⎪ px = round lonp − lon0 × numpix
Fig. 2. One example for illustrating the relationship of k, κ(px0 , py0 ) and δ((px0 ,

⎪ (( ) )

py = round latp − lat0 × numpix
py0 ), (px , py )). (1)



⎪ d
⎩ numpix =
3.1. Producing pixelwise road reference maps from Multiple lr
Crowdsourced Data
where round is a rounding function. The lon0 and lat 0 are the minimum
We collect data across Zhengzhou, including 1) HRSIs from Google latitude and longitude of the study area, respectively. The numpix is the
Earth, 2) road centerlines from OSM and Zmap, and 3) the GPS trajec­ number of pixels per degree, which can be computed by d and lr . The
tories of taxis. d denotes the number of meters for per degree and lr is the resolution of
The high-resolution remote sensing images of Zhengzhou are ob­ HRSIs. In this study, we set d = 85390 according to the geo-position of
tained from Google Earth. The imagery with spatial resolution of 0.25 m Zhengzhou and the lr is set to 0.25. Therefore, we can obtain the value of
is used for road extraction of Zhengzhou. The road centerlines of numpix = 341560. The raster map obtained by mapping the road cen­
Zhengzhou contained in OSM are first downloaded to generate road terlines directly is a binary image in which the grid values covered by
reference maps. The OSM covers most main roads of Zhengzhou but it the road centerlines are set to 1 and the other is set to 0. In addition, the
misses many small or new roads, especially in developing regions. We lower left corner of the raster map corresponds to the minimum latitude
then collect the road centerlines from Zmap that contains the most and longitude of the study area.
centerlines about midsize and small roads in Zhengzhou and is published Second, we enlarge the road centerlines in the raster map to generate
by local government. In addition, the GPS trajectories of taxis are pixelwise road reference map. Considering the efficiency of generating
collected as the third crowdsourced data. These trajectories are provided the pixelwise road reference map, we use the road category tags, such as
by Transport and Communication Commission of Zhengzhou. They are highways or cycling paths, contained in the crowdsourced data to
generated during taxi driving and include massive spatial location in­ determine the appropriate width for enlarging road centerlines. Let W
formation, which can directly reflect the up-to-date geometry of road denote the width of road, we calculate the corresponding number of
topology. The OSM data were obtained on July 10, 2020 and the GPS pixels in the raster map by the following formula:
were collected on January 20, 2017. The Zmap is downloaded on August (
W × numpix
)
10, 2017 and the images are download from Google Earth on September MW = round (2)
d
10, 2020. All data used in this study have been downloaded from
https://github.com/zzu-egr/zzroads/. Finally, we generate the pixelwise road reference map by labeling the
Based on collected OSM, Zmap, GPS and HRSIs, two rasterization MW pixels around the road centerlines as the road class.
algorithms are described in this study to derive the pixelwise road
reference maps efficiently and automatically. The road centerline ras­ 3.1.2. Point Density Rasterization Algorithm
terization algorithm (RCRA) is the first algorithm that can utilize road For GPS trajectory data, the situation is slightly complex since a large
centerlines to generate pixelwise road reference maps. The second is the number of GPS trajectory points are produced by taxis in Zhengzhou,

3
P. Li et al. International Journal of Applied Earth Observation and Geoinformation 104 (2021) 102544

Fig. 3. The flowchart of the proposed method for road extraction.

and they are scattered and uneven. Considering these problems, we 3.2. Multi-Map Integration Model for Refining Labels
propose the point density rasterization algorithm (PDRA) to generate
the pixelwise road reference maps. To further improve the road extraction performances of DCNNs, we
First, the GPS trajectory points are mapped by Eq. 1 to generate propose a probability model, the multi-map integration model (MMIM),
raster map. Then, a density kernel is used to reduce the noise points and to refine labels from multiple pixelwise road reference maps. The MMIM
enhance the valid points for producing the pixelwise road reference generates refined labels by exploring true road label information con­
map. The specific description is as follows. Let (px0 , py0 ) be the target tained in multiple pixelwise road reference maps. The core idea of
pixel in the raster map, we construct a square kernel κ(px0 , py0 ) with size MMIM is to model label distribution and then apply multinoulli distri­
k around target pixel. For ∀(px , py ), we define bution to refine labels from multiple pixelwise road reference maps. The
(( ) ( )) { ( ) ( ) philosophy behind our method is that the true road information is hid­
δ px0 , py0 , px , py =
1 px , py ∈ κ px0 , py0
(3) den in multiple pixelwise road reference maps. A pixel that is assigned to
0 otherwise road labels by multiple crowdsourced data at the same time is most
likely to be a road class; therefore, we can use multinoulli distribution to
To make clear, we take an example for illustrating the relationship of k, refine labels from multiple pixelwise road reference maps.
κ(px0 , py0 ) and δ((px0 , py0 ), (px , py )) in Fig. 2. For (px1 , py1 ) in the κ(px0 ,py0 ), Let S = {(xi , L i )}Ni=1 be the training dataset, where N is the total
the δ((px0 , py0 ), (px1 , py1 )) is 1. But for (px2 , py2 ) that is out of κ(px0 ,py0 ), the number of samples in training dataset. xi denotes the i-th pixel sample.
δ((px0 , py0 ), (px1 , py1 )) is 0. Then, the kernel density D(px0 , py0 ) of pixel L i = {lir }Rr=1 is label set and R is the number of pixelwise road reference
(px0 , py0 ) can be written maps, where the lir represents label of xi provided by the r-th (r = 1,2,…,
⎛ ⎞ ⎛⎛ ⎞ ⎛ ⎞⎞ R) pixelwise road reference map, and it takes value from the class label


D px0 , py0 = ⎠ δ ⎝⎝ px0 , py0 , px , py ⎠⎠
⎠ ⎝ (4) set {y1 ,y2 ,…,yc }. The MMIM generates the refined label li for xi based on
(px ,py ) L i by the following two steps. It is noted that c is set to 2 in this study
since we focus on the pixels in remote sensing images as members of
Finally, if D(px0 , py0 ) > thr0 , we enhance pixel (px0 , py0 ) by road class or background class.
( ) ( ) ( )
RasterMap px , py = 1, px , py ∈ κ px0 , py0 (5) 3.2.1. Computing the Label distribution
For ∀(xi , L i ) ∈ S, we assume that lir1 and lir2 in L i are mutually in­
Otherwise, we consider (px0 , py0 ) as a noise point and filter it out. In dependent under the condition r1 ∕ = r2 . This assumption is reasonable
above, k denotes the size of square kernel and thr0 denotes the density because lir1 and lir2 (r1 ∕
= r2 ) are obtained from different road reference
threshold. We empirically set their values by experiments and the maps, and these maps are generated by different platforms. Based on
detailed discussion is presented in Section 4.7. above assumption, for r1 ∕ = r2 , we can derive that lir1 and lir2 are drawn
independently and identically distributions. Therefore, L i can be

4
P. Li et al. International Journal of Applied Earth Observation and Geoinformation 104 (2021) 102544

transformed into label distribution P i = {py1 , py2 , …, pyc } by that we denote as f. First, for ∀x, the f is used to obtain multiscale and
( ) abstract semantic features: z = f(x; θ), where θ denotes all parameters of
∑R
1 lir = ym f. Then, the classification probability of sample x is computed using
pyi m = r=1 (6) softmax layer based on the features obtained by f
R ⎛ ⃒ ⎞


where 1 is an indicator function. It outputs 1 when the test condition is ⎜ ⃒ ⎟ exp(zk )
satisfied. Otherwise, it outputs 0. The formal specification is as shown in p⎜ ⃒ ⎟
⎝ k ⃒x⎠ = ∑
y K
( ) (11)
y
⃒ exp zi
Eq. 7. pi m (m = 1, 2, …, c) denotes the probability that xi is labeled ym . ⃒
i=1
y ∑ y
Obviously, pi m ∈ [0, 1] and cm=1 pi m = 1. ⃒
() { where zk represents the kth component of z and p(yk ⃒x) is the probability
0 t = True
1 t = (7) of pixel sample x being predicted to class yk . Finally, the loss between the
1 t = False
predicted results and refined labels is calculated, and the parameter θ is
optimized. Here, we minimize the negative log likelihood over refined
For ∀xi , P i can be seen as prior of the corresponding label since the
y y labels and predicted results to obtain loss at each training iteration. It is
element pi m is probability of xi belonging to class ym . From Eq. 6, pi m is
defined as
calculated based on L i , which indicates that each sample has its own
( ) ( ⃒ )
prior knowledge. The different ground objects in remote sensing images 1 ∑ N ∑c ⃒

are usually different in terms of their visual and spatial features. loss = p li = yk logp yk ⃒xi (12)
N i=1 k=1 ⃒
Therefore, by injecting different prior knowledge for different samples,
we can generate the refined labels effectively and accurately. The loss error between refined labels and predicted results is propagated
backward through all layers and the stochastic gradient descent (Bottou,
3.2.2. Generating the Refined Labels 2012) is adopt to adjust the parameter θ for learning DCNNs. During this
In process of generating the refined labels, the multinoulli distribu­ process, the derivative of Eq. 12 with respect to parameter θ is calcu­
tion is a powerful tool (Zhang and Wu, 2021) that can comprehensively lated. For clarity, we only show the derivative of Eq. 12 to the output of
depict the probability of label being true label, thereby providing fine- final layer and the other layers can be obtain by chain rule. The deriv­
grained information. In this paper, a refined label l is modeled by mul­ ative of loss to the final layer is computed as:
tinoulli distribution with parameters P = {py1 , py2 , …, pyc }. That is, ( ) ( )
considering that each sample xi is independently labeled by R road 1 ∑ N ∑ c
∂loss ∂
= p li = yk logp yk |xi
reference maps, the refined label distribution of xi can be calculated as ∂zk N i=1 k=1 ∂zk
follows: ( )( ( )) (13)
( ⃒ ) 1 ∑ N ∑ c
⃒ R! = p li = yk 1 − p yk |xi
(8)
k
p k1 , k2 , …, kc ⃒⃒R, P i = Πc (pyv ) v N i=1 k=1
k1 !k2 !…kc ! v=1 i
where
where (k1 , k2 , …, kc ) is a vector in which the element kv denotes number
( )
that the sample is labeled as yv . Obviously, the element of (k1 , k2 , …, kc ) ∂ 1 ∂p(yk |xi )
satisfies logp yk |xi =
∂zk p(yk |xi ) ∂zk
⎧∑ c ( ) ⎛ ⎛ ⎞⎞
⎨ kv = R ∑
K

(9) exp zi ⎜ ∑ K ⎜ ⎟⎟
⎩ v=1 i=1 exp(zk ) ⎜ ⎜ ⎟⎟
0⩽kv ⩽R and kv ∈ N + = ( ( ))2 ⎜ exp⎜zk ⎟⎟
exp(zk ) ∑
K ⎝ i=1,i∕=k ⎝ ⎠⎠
exp zi
Since we assume there are R road reference maps for labeling sample xi , i=1 (14)
we further integrate label information for generating refined labels. To ( ) ( )
this end, the majority voting strategy is adopted to decide refined label ∑
K
exp zi − exp zk
probability. The majority voting technique is a simple but efficient i=1

method that is popularly used in decision processes (Han et al., 2019;


= ( )

K
Upadhyay et al., 2021). Here, the number of the same label is at least exp zi
i=1
half and then the assigned probability is considered to be valid. There­
fore, the distribution of the refined labels of xi is computed = 1 − p(yk |xi )
⎛ ⎞ ⎛ ⃒ ⎞

⎜ ⎟ ⎜

⃒ ⎟ 4. Experiments
⎜ ⎟ ∑R ⎜ ⃒ ⎟
p⎜ l = y ⎟= p ⎜k1 , k2 , …, kv , …, kc ⃒R, P i ⎟
⃒ (10)
⎜ ⎟ ⎜ ⎟ 4.1. Dataset and Models
i v
⎝ ⎠ ⌈ ⌉ ⎝ ⃒ ⎠

kv = R2 ⃒
We develop a challenging benchmark road dataset, namely ZZ-
Roads. The visualization of ZZ-Roads is shown in Fig. 4. The regions
where p(li = yv ) is the probability that the refined label of xi is yv . The ⌈
marked with cyan boxes are used as the training dataset and the rest part
∗⌉ is ceiling operator. For example, ⌈3.5⌉ will be 4.
is the testing dataset. The total area covered by training dataset is more
than 216km2 . The corresponding road reference maps are generated by
3.3. Learning DCNNs from Refined Labels for Road Extraction
the method described in Section 3.1. The rest region of Zhengzhou
including about 1059km2 is considered as testing dataset. The testing
With the above refined label l and corresponding image sample x, we
dataset only contains HRSIs and aims to test performance of proposed
can train DCNNs for road extraction from HRSIs effectively and accu­
method on road extraction over the large-scale. The bottom of Fig. 4
rately. In the following, the road extraction process is described.
shows some training examples. We can see that most of roads in
Let a DCNN consist of several convolutional layers and a softmax
Zhengzhou are occluded by trees or cars, and this easily causes
output layer. All convolutional layers can be seen as a feature extractor

5
P. Li et al. International Journal of Applied Earth Observation and Geoinformation 104 (2021) 102544

Fig. 4. The (a) shows the division of training and testing dataset. The regions marked with cyan boxes are taken as training dataset and the rest part is testing dataset.
The bottom shows some training examples. Specifically, the (b) shows the original images. The (c)-(e) show the road reference maps generated by OSM, Zmap
and GPS.

discontinuous road extraction results. To simplify notation, the road resource, such as memory and cores, the mini-bath size is set to 8 for
reference maps generated by OSM, Zmap and GPS are represented as S1 , taking advantage of DCU accelerator. All the experiments are conducted
S2 and S3 , respectively. on a server with a Sugon DCU accelerator and 16 GB of memory. Since
Three DCNN models, UNet, SegNet and CasNet, are adopted as basic the size of DCU memory is limited, we take the patch that the size is
DCNNs to conduct road extraction experiments in this study. UNet 128 × 128 as input for UNet and SegNet, and the 223 × 223 is for
proposed by Ronneberger et al. (2015) is often taken as the basic CasNet.
network for road detection because of the high precision and efficiency.
The SegNet is a deep fully convolutional neural network architecture
4.2. Qualitative evaluation on a large-area road extraction
with encoder-decoder architecture and achieves great performance in
semantic pixel-wise segmentation (Badrinarayanan et al., 2017). The
This section presents the qualitative evaluation of proposed method
CasNet proposed by Cheng et al. (2017) is a multitask network for road
on large-area road extraction. The UNet is first trained with proposed
region and centerline detection. Since we focus on road extraction, we
method and then used to extract road on large-area testing dataset.
only construct and train the road segmentation component of CasNet for
Meanwhile, we present experimental results to compare the proposed
performance comparison. These models are widely used for road
method with different methods.
extraction from remote sensing images, and they achieve great perfor­
Fig. 5a shows road extraction results of UNet+ {S1 , S2 , S3 } on large-
mance with high learning efficiency.
area testing dataset. In here, the UNet+ {S1 , S2 , S3 } represents fact that
All DCNNs are trained with stochastic gradient descent. The learning
the UNet is trained with MMIM using three crowdsourced data. We can
rate always starts from 1e-7 and is reduced by a factor of ten times every
see that UNet+{S1 ,S2 ,S3 } performs well in terms of road extraction from
300 K iterations. The stochastic gradient descent with momentum would
HRSIs. The road networks obtained by UNet+{S1 , S2 , S3 } exhibit con­
help escape the local optima, thus leanding to faster converging for
nectivity and completeness, while the topology structures and back­
DCNNs. However, the training loss will be fluctuated when the mo­
grounds are extremely complex. In addition, for entirety of Zhengzhou,
mentum is set to too large, which would affect the convergence. We set
UNet+{S1 , S2 , S3 } takes approximately 50 min to extract road network.
the momentum to 0.9 in this study since it can bring excellent learning
The total time required from generation of training samples to model
efficiency in our experiments. In addition, due to limited computation
convergence is approximately seven hours. Moreover, we randomly

6
P. Li et al. International Journal of Applied Earth Observation and Geoinformation 104 (2021) 102544

Fig. 5. Visual extraction results of Zhengzhou. The (a) shows an overview of the extracted road network for Zhengzhou. The (b) and (c) show the close-ups of yellow
rectangles in first subfigure.

select two regions, A and B, and magnify them to further observe road shown in Figs. 6 and 7.
extraction effectiveness of UNet+{S1 , S2 , S3 }, as shown in Fig. 5b and c. Fig. 6 presents the visual results for Gaoxin district, which is a
Region A covers approximately 25km2 and region B covers approxi­ developing area of Zhengzhou. Due to different construction times, most
mately 20km2 . As the close-ups show, although some spurs appear in the of roads in Gaoxin district are built with different materials, such as clay,
road extraction map, UNet+{S1 , S2 , S3 } can detect a relatively smooth pitch and cement. Besides, there are many construction fields in Gaoxin
and complete road network. This is because Unet+{S1 , S2 , S3 } can pro­ district, and this leads to great confusion when extracting road regions.
duce smooth boundaries and accurately separate road regions from From Fig. 6, we can see that the proposed method achieves superior road
complex backgrounds. All these phenomena demonstrate that the pro­ extraction performance compared to other methods. The road extraction
posed method can produce excellent road extraction results and has the results obtained with UNet+{S1 }, UNet+{S2 } and UNet+{S3 } are
capability of dealing with large-scale road extraction tasks under com­ discontinuous with many gaps because adopting single crowdsourced
plex road conditions. data easily brings label noise and thus diminishes performance of
Second, we conduct experiments on two large regions using different DCNNs. Some regions are marked with yellow boxes in Fig. 6a, and their
methods, including UNet+{S1 }, UNet+ {S2 }, UNet+{S3 }, UNet+{S1 ,S3 } close-ups are shown in the last two rows of Fig. 6 for highly intuitive
and UNet+{S1 , S2 , S2 }, and this is done to compare performance of comparison. Region A is a representative region with complex road
proposed method with those of other road extraction methods. The UNet structures, such as overpasses. Region B displays roads covered with
+ {S1 }, {S2 } and {S3 } represent using only one crowdsourced data to different building materials. In addition, due to the negative effective­
generate road reference map for training UNet. The visual results are ness of using different construction materials, the spectral features of

7
P. Li et al. International Journal of Applied Earth Observation and Geoinformation 104 (2021) 102544

Fig. 6. Road extraction results for the Gaoxin district. The first two rows illustrate the road extraction results obtained by five different methods. The last two rows
illustrate the close-ups of region A and region B in the original image.

roads in region B are confused with background regions. From the close- UNet+{S1 , S2 , S3 } can achieve more satisfactory and coherent road
up in Fig. 6, we can see that UNet+{S1 , S2 , S3 } can accurately extract extraction results than those of the other comparison methods, even for
road networks under challenging scenarios, including complex road regions with complex road topologies and large object occlusions. Re­
structures, material changes in roads, and roads with confusing gion C in Fig. 7a contains representative regions covered by shadows.
backgrounds. Region D in Fig. 7a displays a more complex scene. Apart from the oc­
Fig. 7 represents a developed area in Zhengzhou, Zhengdong district. clusions emphasized in region C, complex road network topologies and
Compared with that in Gaoxin district, the roads have different shapes overpasses exist. The last two rows in Fig. 7 show close-ups results of
and hierarchies. Moreover, there are many roads covered by large ob­ different methods for region C and region D. From these experimental
jects, such as cars, trees or building shadows, increasing the difficulty of results, we can see that UNet+{S1 , S2 , S3 } is less affected by shadows
road extraction tasks. As we shall see, UNet+{S1 }, UNet+{S2 } and than the other methods and can accurately extract road networks under
UNet+{S3 } are sensitive to the occlusions of large objects. In contrast, complex and challenging scenarios. The experimental results with

8
P. Li et al. International Journal of Applied Earth Observation and Geoinformation 104 (2021) 102544

Fig. 7. Road extraction results for the Zhengdong district. The first two rows illustrate the road extraction results obtained by five different methods. The last two
rows illustrate the close-ups of region C and region D in the original image.

respect to Zhengdong district show that the proposed method has the contained in accurate testing dataset. We adopt the four metrics of
ability to deal with complex road structures, so it has strong potential to precision (Pre), recall (Rec), F1 and intersection over union (IoU) to
extract roads in developed areas with high quality. measure performance of model on accurate testing dataset (Wiedemann
et al., 1998). The quantitative evaluation results are reported in Table 1.
4.3. Quantitative evaluation of proposed method For each metric term, the best values are marked in bold, while the
secondary values are underlined.
In this section, we manually correct 105 road reference maps in From Table 1, it can be seen clearly that the proposed method (rows
large-area testing dataset, which aims to create an accurate testing denoted by {S1 , S3 } and {S1 , S2 , S3 }) achieves the highest performance.
dataset and evaluate proposed method quantitatively. All images in this This indicates that the method combining multiple crowdsourced data
testing dataset are randomly selected from road dense region and cover indeed outperforms method using only one crowdsourced data. On
about 14.763km2 in total. Therefore, the complex road condition is average, the F1 and IoU obtained by UNet+{S1 , S2 , S3 } are 0.7883 and

9
P. Li et al. International Journal of Applied Earth Observation and Geoinformation 104 (2021) 102544

Table 1 method under high label noise conditions, we increase label noise of
Testing performances (Pre, Rec, F1 and IoU) with respect to road extraction. The training dataset manually. Specifically, for each training sample, its road
best values are marked in bold, while the secondary values are underlined. reference map is randomly flipped to another class with a fixed proba­
Pre Rec F1 IoU bility ρ. We conduct experiments under different noise ratios varying
UNet S1 : OSM 0.7212 0.7741 0.7467 0.5738
from 10% to 30%.
S2 : Zmap 0.6764 0.6291 0.6519 0.4590
Quantitative comparisons with UNet, SegNet and CasNet are
exhibited in Tables 2–4. From these results, we can see clearly that the
S3 : GPS 0.7263 0.7388 0.7325 0.5551
proposed method shows significant improvement under all different
{S1 , S3 } 0.7526 0.7517 0.7521 0.5811
label noise levels. Specifically, the gap between models trained with
{S1 , S2 } 0.7534 0.7501 0.7518 0.5805
proposed method and the models using only one crowdsourced data is
{S2 , S3 } 0.7349 0.7466 0.74071 0.5633
nearly 5.84%. The average F1 score of UNet+{S1 , S2 , S3 } under 10%
{S1 , S2 , S3 } 0.7571 0.8222 0.7883 0.6298
noise level is 0.7420, which is higher than results of UNet+{S1 }. For a
SegNet S1 : OSM 0.6827 0.7358 0.7083 0.5280
high label noise level, such as ρ=30% in OSM, the CasNet+{S1 , S2 , S3 }
S2 : Zmap 0.6409 0.6076 0.6238 0.4245
achieves a score of 0.7129 in F1, which is over 10.02% higher than that
S3 : GPS 0.7419 0.6829 0.7112 0.5334
of CasNet+{S1 }, indicating that our proposed method is robust against
{S1 , S3 } 0.7595 0.6880 0.7220 0.5413
high noise level. In addition, for the other three measures, Pre, Rec and
{S1 , S2 } 0.7459 0.6929 0.7184 0.5378
IoU, the proposed method also obtains great improvement. For instance,
{S2 , S3 } 0.7407 0.7037 0.7217 0.5420
the results of CasNet+{S1 , S2 , S3 } are 0.7481, 0.7590 and 0.5806 for Pre,
{S1 , S2 , S3 } 0.7457 0.7154 0.7302 0.5536
Rec and IoU when ρ=10% in OSM, respectively, and they are higher
CasNet S1 : OSM 0.7268 0.6839 0.7047 0.5235
than those of CasNet+{S1 } by 4.12%, 8.38% and 7.20% for the corre­
S2 : Zmap 0.6451 0.7185 0.6798 0.4931
sponding measures. These results demonstrate that the proposed method
S3 : GPS 0.7449 0.7260 0.7353 0.5605
can obtain continuous, coherent and complete road regions from remote
{S1 , S3 } 0.7427 0.7596 0.7511 0.5773
sensing images.
{S1 , S2 } 0.7434 0.7553 0.7493 0.5807
{S2 , S3 } 0.7390 0.7447 0.7419 0.5721
{S1 , S2 , S3 } 0.7400 0.7886 0.7635 0.5960 4.4. Statistical Analysis

In this section, Student’s t-test is conducted to further demonstrate


0.6298, respectively, which are higher than those obtained by using one the improvement of proposed method in road extraction. These exper­
crowdsourced data alone by at least 4.16% and 5.60%. The F1 score of iments are performed on S1 , S2 , S3 and their noise version (ρ = 10%,
SegNet+ {S1 , S2 , S3 } is 0.7302, which is over 2.19%, 10.64%, and 1.9% 20%, 30%). The experimental results are shown in Table 5. For each
better than those obtained by S1 , S2 , and S3 alone, respectively. For experiment, we compute the p-value between the proposed method,
CasNet+{S1 ,S2 ,S3 }, the achieved IoU of 0.5960 is at least 5.13% higher using multiple crowdsourced data trained with MMIM, and compared
than that obtained using only one crowdsourced data. These results methods, using only one crowdsourced data. From Table 5, we can
mean that the training samples generated with multiple crowdsourced observe that the p-values of F1 and IoU are usually less than 0.05, which
data can provide more road label information for DCNNs than an initial indicates that significant improvement is obtained by the proposed
sample with labels from one crowdsourced data. method.
In addition, we can see that the models trained using {S1 } and {S3 }
can show relatively satisfied performance. For example, the UNet+{S3 }
can achieve 0.7325 and 0.5551 in F1 and IoU. And the UNet+{S1 } ob­ 4.5. Robustness Analysis
tains 0.7467 and 0.5738 in F1 and IoU. These results demonstrate the
crowdsourced OSM and GPS have potential to learn DCNNs for road In Section 4.3, we show the performance of proposed method under
extraction tasks. However, we can see that the model can achieve better different label noise levels. The performance curves of F1 score are
performance with the help of {S2 } since the Zmap contains most midsize shown in Fig. 8 to make the comparison more intuitive. We can find that
and small roads in study area. For example, the F1 obtained by the F1 performance curve of DCNNs trained with proposed method is
SegNet+{S2 , S3 } can achieve 0.7217, which is higher the SegNet+{S3 } located at the top. This demonstrate that the proposed method has
about 1.05%. Besides, the performance of CasNet+{S1 , S2 } is increased strong ability to deal with severe label noise in road extraction tasks.
by 4.46% and 5.72% in F1 and IoU compared with that of the Besides, with the increase of label noise level, the performance of DCNNs
CasNet+{S1 }. trained with proposed method decreases more slowly than those of the
To further validate robustness and generalization ability of proposed method using only one crowdsourced data. In great detail, as the noise
ratio increases from 10% to 30% for S1 , the F1 obtained by

Table 2
Quantitative road extraction results of UNet, SegNet and CasNet on OSM with injected noise. The best values are marked in bold.
UNet SegNet CasNet
ρ S1 :OSM {S1 , S3 } {S1 , S2 , S3 } S1 :OSM {S1 , S3 } {S1 , S2 , S3 } S1 :OSM {S1 , S3 } {S1 , S2 , S3 }

10% Pre 0.7056 0.7360 0.7432 0.6941 0.7271 0.7332 0.7069 0.7064 0.7481
Rec 0.7390 0.7369 0.7409 0.6402 0.6552 0.6664 0.6752 0.7565 0.7590
F1 0.7219 0.7365 0.7420 0.6661 0.6893 0.6982 0.6907 0.7306 0.7535
IoU 0.5462 0.5641 0.5698 0.4808 0.5001 0.5105 0.5086 0.5541 0.5806
20% Pre 0.7041 0.7112 0.7183 0.6244 0.6992 0.6960 0.7165 0.7175 0.7547
Rec 0.7064 0.7182 0.7251 0.6894 0.6601 0.6825 0.6615 0.7215 0.7417
F1 0.7052 0.7147 0.7217 0.6553 0.6791 0.6892 0.6879 0.7195 0.7481
IoU 0.5237 0.5324 0.5421 0.4727 0.4958 0.5030 0.5049 0.5393 0.5736
30% Pre 0.7043 0.6833 0.6902 0.6844 0.6335 0.7004 0.5592 0.6411 0.7879
Rec 0.6411 0.6869 0.6984 0.6027 0.7102 0.6621 0.6776 0.7878 0.6510
F1 0.6712 0.6851 0.6943 0.6410 0.6697 0.6807 0.6127 0.7069 0.7129
IoU 0.4826 0.5008 0.5112 0.4610 0.4808 0.4892 0.4280 0.5293 0.5282

10
P. Li et al. International Journal of Applied Earth Observation and Geoinformation 104 (2021) 102544

Table 3
Quantitative road extraction results of UNet, SegNet and CasNet on Zmap with injected noise. The best values are marked in bold.
UNet SegNet CasNet
ρ S2 :Zmap {S2 , S3 } {S1 , S2 , S3 } S2 :Zmap {S2 , S3 } {S1 , S2 , S3 } S2 :Zmap {S2 , S3 } {S1 , S2 , S3 }

10% Pre 0.6146 0.6880 0.7200 0.6342 0.6693 0.7493 0.6236 0.7501 0.7637
Rec 0.6279 0.6919 0.6878 0.5738 0.6637 0.7111 0.6314 0.7370 0.7505
F1 0.6212 0.6899 0.7035 0.6025 0.6665 0.7298 0.6275 0.7435 0.7571
IoU 0.4282 0.5055 0.5221 0.4135 0.4813 0.5506 0.4375 0.5683 0.5856
20% Pre 0.5921 0.6683 0.6846 0.5699 0.7344 0.7697 0.5666 0.7186 0.7741
Rec 0.6177 0.6646 0.6770 0.6019 0.5809 0.6681 0.5885 0.7312 0.7087
F1 0.6046 0.6664 0.6808 0.5855 0.6487 0.7153 0.5774 0.7248 0.7399
IoU 0.4137 0.4787 0.4957 0.3962 0.4648 0.5327 0.3858 0.5487 0.5619
30% Pre 0.5667 0.6494 0.6802 0.5912 0.6880 0.7305 0.5287 0.7323 0.7887
Rec 0.5898 0.6469 0.6740 0.5591 0.5978 0.6746 0.5256 0.6817 0.6765
F1 0.5780 0.6481 0.6771 0.5747 0.6398 0.7014 0.5272 0.7061 0.7283
IoU 0.3818 0.4615 0.4933 0.3840 0.4513 0.5091 0.3377 0.5230 0.5466

Table 4
Quantitative road extraction results of UNet, SegNet and CasNet on GPS with injected noise. The best values are marked in bold.
UNet SegNet CasNet
ρ S3 :GPS {S1 , S3 } {S1 , S2 , S3 } S3 :GPS {S1 , S3 } {S1 , S2 , S3 } S3 :GPS {S1 , S3 } {S1 , S2 , S3 }

10% Pre 0.6979 0.7365 0.7419 0.7294 0.7352 0.7468 0.7262 0.7501 0.7637
Rec 0.7067 0.7567 0.7862 0.6619 0.6701 0.7023 0.7211 0.7370 0.7505
F1 0.7022 0.7465 0.7634 0.6940 0.7012 0.7239 0.7236 0.7435 0.7570
IoU 0.5211 0.5739 0.5951 0.5157 0.5140 0.5422 0.5447 0.5683 0.5855
20% Pre 0.6714 0.7174 0.7473 0.7084 0.7295 0.7375 0.7323 0.7186 0.7008
Rec 0.7070 0.7212 0.7265 0.6712 0.6582 0.6890 0.6817 0.7312 0.7744
F1 0.6888 0.7193 0.7368 0.6893 0.6920 0.7125 0.7061 0.7248 0.7358
IoU 0.5057 0.5407 0.5607 0.5067 0.5033 0.5321 0.5230 0.5487 0.5611
30% Pre 0.6355 0.6991 0.7154 0.6937 0.7287 0.7394 0.7579 0.7323 0.7751
Rec 0.6532 0.6724 0.6960 0.6436 0.6573 0.6809 0.6494 0.6817 0.6754
F1 0.6442 0.6855 0.7056 0.6677 0.6912 0.7089 0.6995 0.7061 0.7218
IoU 0.4555 0.5004 0.5232 0.4901 0.5023 0.5268 0.5151 0.5230 0.5395

a result, the MMIM can propagate the true road labels to DCNNs and
Table 5
help the DCNNs to learn the true road features. For example of x2 , only
p-values obtained in the paired Student’s t-test between proposed method and
one reference map labels it as the road class, but the other two road
other method on three datasets.
reference maps label it as the background class. Thus, a small probability
F1 IoU is assigned by the MMIM to indicate that x2 is no-road class with high
S1 : OSM 0.0072 0.012 probability. In this way, the noisy labels in road reference maps can be
S2 : Zmap 4.91e-08 4.83e-08 filtered as much as possible, and the DCNNs are more likely to avoid
S3 : GPS 0.0017 0.004 overfitting problem.

4.7. Parameters Analysis


CasNet+{S1 , S2 , S3 } decreases by a small amount, from 0.7535 to
0.7129. Compared with CasNet+{S1 }, for which F1 decreases from
There are two parameters determining the performance of the PDRA
0.6907 to 0.6127, CasNet trained with the proposed method has an
in producing road reference maps from GPS trajectories. 1) the param­
advantage of approximately 3.74%. Simultaneously, for the other two
eter k denotes the size of kernel κ and 2) the parameter thr0 is a density
UNet and SegNet models, the results obtained by multiple crowdsourced
threshold. In this study, we set their values according to experiments.
data exhibit slight decreases when the noise level increases from 10% to
Fig. 10 shows the road reference maps generated by PDRA with different
30%, indicating that the proposed method is more beneficial for
k and the influence of thr0 is analyzed in Fig. 11.
achieving stable robustness to noise than compared methods.
From these experimental results, we can see that if k and thr0 are too
small, the road reference maps generated by PDRS are inappropriate.
4.6. Effectiveness of MMIM For small parameter k, such as k = 5, the geometric features of road can
be characterized well, but the road segments are too sparse. With the
To further illustrate the effectiveness of MMIM, we show the refined increasing value of k, this problem could be alleviated. However, some
road labels and road reference maps generated from the crowdsourced road segments may be missing, resulting in insufficient road label in­
data in Fig. 9. There are five columns and three rows. The second to formation. Based on these experimental results, the k is set to 30 in this
fourth columns are the road reference maps generated by OSM, Zmap, study since the road reference map generated by PDRA is clear and
GPS and MMIM. The last column is the color bar that shows the rela­ accurate.
tionship between color and probability. Each row represents different In addition, road reference maps would be scattered if the thr0 is too
samples. large. The reason is that the point density obtained by kernel κ is easily
From the results, we observe that the MMIM can indeed explore the less than thr0 and these points will be treated as noise and filtered in the
true label information hidden in multiple road reference maps and subsequent process. Although this will produce pixelwise road reference
produce refined labels that are closer to the true road labels. Taking the map more clearly, too much road information included in GPS trajec­
example of x1 in Fig. 9, three road reference maps label it to road class. tories will be missing. At the same time, the small thr0 tends to preserve
The MMIM tends to consider x1 as a road class with high probability. As road information but the produced pixelwise road reference map has

11
P. Li et al. International Journal of Applied Earth Observation and Geoinformation 104 (2021) 102544

Fig. 8. Quantitative road extraction results in terms of F1 for UNet, SegNet, and CasNet under different injected label noise levels.

Fig. 9. The effectiveness of the MMIM on refinement labels.

12
P. Li et al. International Journal of Applied Earth Observation and Geoinformation 104 (2021) 102544

Fig. 10. The road reference maps produced by PDRA with different k.

Fig. 11. The road reference maps produced by PDRA with different thr0 .

more noise. Therefore, we set the thr0 to 200 for better road reference can be explored to improve the noise robustness.
maps in this study.
6. Conclusion
5. Discussion
In this study, a novel method to learn DCNNs from crowdsourced
As shown in Section 4, the proposed method can perform better road data for road extraction tasks is presented. The multiple crowdsourced
extraction better than other methods. This superiority may due to the data, including OSM, Zmap and GPS, are utilized to extract road refer­
following factors: (1) the multiple crowdsourced data, including OSM, ence maps automatically, which can reduce label noise significantly in
Zmap and GPS, are employed to generate training samples, which can training samples. Then, we present a multi-map integration model
reduce label noise in the training dataset. (2) the MMIM is developed to (MMIM) to refine geographical road information by integrating multiple
explore the true road label information hidden in multiple crowdsourced road reference maps. MMIM can generate high-quality refined labels for
data for robustly learning DCNNs. This can assist the DCNNs to learn learning DCNN-based road extraction methods. To evaluate the perfor­
true road distribution and avoid label noise overfitting problem. mance of proposed method, we develop a new and large benchmark
For the existing methods that use single crowdsourced data for road road dataset (ZZ-Roads) that covers 1059 km2 with the spatial resolution
extraction, they suffer from severe label noise. This would affect the of 0.25 m. The experimental results based on ZZ-Roads show that the
performance of road extraction. Compared with other methods, our proposed method can extract road regions with high accuracy and
method makes full use of multiple crowdsourced data to reduce the label completeness. In the future, we will continue to extend the proposed
noise. More specifically, the OSM is used to mark the main road regions method to other ground objects, such building and land cover, which
in HRSIs. The GPS is utilized to label latest construction road in HRSIs. could potentially improve the city management and advance the prog­
The Zmap that covers most midsize and small roads is also used, which ress of smart cities.
could further improve the completeness of the road reference map
produced by GPS. Therefore, road training samples with less label noise
could be generated, and the performance of DCNN-based road extraction Declaration of Competing Interest
methods could further be improved in our method.
With the increase of label noise in training samples, the performance The authors declare that they have no known competing financial
of existing road extraction methods is far from satisfactory. While the interests or personal relationships that could have appeared to influence
proposed method can show road extraction performance better. The the work reported in this paper.
main reason is that the MMIM can explore the true road label infor­
mation hidden in multiple crowdsourced data by multinoulli distribu­ Acknowledgment
tion and thus it can provide the refined labels to learn DCNNs for road
extraction tasks. Consequently, the DCNNs have more opportunities to The authors acknowledge with thanks National Supercomputing
learn true road distribution and the robustness against label noise can be Center in Zhengzhou because of their technical support.
improved.
Limitations: Our method can truly improve the performance of References
DCNN-based road extraction methods. However, we need to collect
multiple crowdsourced data for producing road samples, which is time- Ali, A.L., Falomir, Z., Schmid, F., Freksa, C., 2017. Rule-guided human classification of
volunteered geographic information. ISPRS journal of photogrammetry and remote
consuming and will limit the applicability of our method. In future work,
sensing 127, 3–15.
we will further explore the labels produced by developed DCNNs to B, W.W.A., A, N.Y., A, Y.Z., A, F.W., A, T.C., C, P.E., 2016. A review of road extraction
reduce the cost of collecting multiple crowdsourced data. Due to strong from remote sensing images. Journal of Traffic and Transportation Engineering
feature extraction ability, the DCNNs tend to first learn the true road (English Edition) 3, 271–282.
Badrinarayanan, V., Kendall, A., Cipolla, R., 2017. Segnet: A deep convolutional encoder-
label information and then fit the label noise. As a result, the true road decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis
label information hidden in the predicted labels of the developed DCNNs and Machine Intelligence 39, 2481–2495.

13
P. Li et al. International Journal of Applied Earth Observation and Geoinformation 104 (2021) 102544

Bottou, L., 2012. Stochastic gradient descent tricks, in: Neural networks: Tricks of the sensing image classification at high spatial resolution. ISPRS Journal of
trade. Springer, pp. 421–436. Photogrammetry and Remote Sensing 168, 56–73.
Chen, J., Zhou, Y., Zipf, A., Fan, H., 2018. Deep learning from multiple crowds: A case Máttyus, G., Luo, W., Urtasun, R., 2017. Deeproadmapper: Extracting road topology from
study of humanitarian mapping. IEEE Transactions on Geoscience and Remote aerial images, in: Proceedings of the IEEE International Conference on Computer
Sensing 57, 1713–1722. Vision, pp. 3438–3446.
Chen, J., Zipf, A., 2017. Deepvgi: Deep learning with volunteered geographic Menon, A.K., Van Rooyen, B., Natarajan, N., 2018. Learning from binary labels with
information, in: Proceedings of the 26th International Conference on World Wide instance-dependent noise. Machine Learning 107, 1561–1595.
Web Companion, pp. 771–772. Mnih, V., Hinton, G.E., 2010. Learning to detect roads in high-resolution aerial images,
Cheng, G., Wang, Y., Xu, S., Wang, H., Xiang, S., Pan, C., 2017. Automatic road detection in: Computer Vision - ECCV 2010 - 11th European Conference on Computer Vision,
and centerline extraction via cascaded end-to-end convolutional neural network. Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part VI.
IEEE Transactions on Geoscience and Remote Sensing 55, 3322–3337. Qiu, J., Wang, R., 2016. Automatic extraction of road networks from gps traces.
Ding, L., Bruzzone, L., 2020. Diresnet: Direction-aware residual network for road Photogrammetric Engineering and Remote Sensing 82, 593–604.
extraction in vhr remote sensing images. IEEE Transactions on Geoscience and Ronneberger, O., Fischer, P., Brox, T., 2015. U-net: Convolutional networks for
Remote Sensing 1–12. biomedical image segmentation. In: International Conference on Medical image
Dong, R., Fang, W., Fu, H., Gan, L., Wang, J., Gong, P., 2021. High-resolution land cover computing and computer-assisted intervention. Springer, pp. 234–241.
mapping through learning with noise correction. IEEE Transactions on Geoscience Rowland, M., Bellemare, M., Dabney, W., Munos, R., Teh, Y.W., 2018. An analysis of
and Remote Sensing 1–13. categorical distributional reinforcement learning, in: International Conference on
Frenay, B., Verleysen, M., 2014. Classification in the presence of label noise: A survey. Artificial Intelligence and Statistics, PMLR. pp. 29–37.
IEEE Transactions on Neural Networks 25, 845–869. Sun, T., Di, Z., Che, P., Liu, C., Wang, Y., 2019. Leveraging crowdsourced gps data for
Gevaert, C.M., Persello, C., Elberink, S.O., Vosselman, G., Sliuzas, R., 2017. Context- road extraction from aerial imagery, in: Proceedings of the IEEE Conference on
based filtering of noisy labels for automatic basemap updating from uav data. IEEE Computer Vision and Pattern Recognition, pp. 7509–7518.
Journal of selected topics in applied earth observations and remote sensing 11, Tao, C., Qi, J., Li, Y., Wang, H., Li, H., 2019. Spatial information inference net: Road
2731–2741. extraction using road-specific contextual information. ISPRS Journal of
Grinias, I., Panagiotakis, C., Tziritas, G., 2016. Mrf-based segmentation and unsupervised Photogrammetry and Remote Sensing 158, 155–166.
classification for building and road detection in peri-urban areas of high-resolution Tu, B., Zhang, X., Kang, X., Wang, J., Benediktsson, J.A., 2019. Spatial density peak
satellite images. ISPRS journal of photogrammetry and remote sensing 122, clustering for hyperspectral image classification with noisy labels. IEEE Transactions
145–166. on Geoscience and Remote Sensing 57, 5085–5097.
Han, B., Tsang, I.W., Chen, L., Zhou, J.T., Yu, C.P., 2019. Beyond majority voting: A Upadhyay, D., Manero, J., Zaman, M., Sampalli, S., 2021. Intrusion detection in scada
coarse-to-fine label filtration for heavily noisy labels. IEEE Transactions on Neural based power grids: Recursive feature elimination model with majority vote ensemble
Networks and Learning Systems 30, 3774–3787. algorithm. IEEE Transactions on Network Science and Engineering, 1–1.
Huang, J., Zhang, X., Xin, Q., Sun, Y., Zhang, P., 2019. Automatic building extraction Volodymyr Mnih, G.H., 2013. Learning to label aerial images from noisy data, in:
from high-resolution aerial images and lidar data using gated residual refinement International Conference on Machine Learning.
network. ISPRS journal of photogrammetry and remote sensing 151, 91–105. Wang, E., Gao, Z., Heng, Y., Shi, L., 2019. Chinese consumers’ preferences for food
Jiang, J., Ma, J., Wang, Z., Chen, C., Liu, X., 2019. Hyperspectral image classification in quality test/measurement indicators and cues of milk powder: A case of zhengzhou,
the presence of noisy labels. IEEE Transactions on Geoscience and Remote Sensing china. Food Policy 89, 101791.
57, 851–865. Wiedemann, C., Heipke, C., Mayer, H., Jamet, O., 1998. Empirical evaluation of
Jiang, L., Zhou, Z., Leung, T., Li, L., Fei-Fei, L.M., 2018. Mentornet: Regularizing very automatically extracted road axes. Empirical evaluation techniques in computer
deep neural networks on corrupted labels, ICML. vision 12, 172–187.
Kaiser, P., Wegner, J.D., Lucchi, A., Jaggi, M., Hofmann, T., Schindler, K., 2017. Learning Yuan, J., 2017. Learning building extraction in aerial scenes with convolutional
aerial image segmentation from online maps. IEEE Transactions on Geoscience and networks. IEEE transactions on pattern analysis and machine intelligence 40,
Remote Sensing 55, 6054–6068. 2793–2798.
Li, P., He, X., Cheng, X., Gao, X., Li, R., Qiao, M., Li, D., Qiu, F., Li, Z., 2019. Object Zang, Y., Wang, C., Cao, L., Yu, Y., Li, J., 2016. Road network extraction via aperiodic
extraction from very high-resolution images using a convolutional neural network directional structure measurement. IEEE Transactions on Geoscience and Remote
based on a noisy large-scale dataset. IEEE Access 7, 122784–122795. Sensing 54, 3322–3335.
Li, P., He, X., Qiao, M., Cheng, X., Li, Z., Luo, H., Song, D., Li, D., Hu, S., Li, R., Han, P., Zhang, J., Hu, Q., Li, J., Ai, M., 2021. Learning from gps trajectories of floating car for
Qiu, F., Guo, H., Shang, J., Tian, Z., 2021a. Robust deep neural networks for road cnn-based urban road extraction with high-resolution satellite imagery. IEEE
extraction from remote sensing images. IEEE Transactions on Geoscience and Transactions on Geoscience and Remote Sensing 59, 1836–1847.
Remote Sensing 59, 6182–6197. Zhang, J., Wu, X., 2021. Multi-label truth inference for crowdsourcing using mixture
Li, Y., Guo, L., Rao, J., Xu, L., Jin, S., 2018. Road segmentation based on hybrid models. IEEE Transactions on Knowledge and Data Engineering 33, 2083–2095.
convolutional network for high-resolution visible remote sensing image. IEEE Zhang, J., Zhang, L., Qin, Y., Wang, X., Zheng, Z., 2020. Influence of the built
Geoscience and Remote Sensing Letters 16, 613–617. environment on urban residential low-carbon cognition in zhengzhou, china.
Li, Y., Zhang, Y., Zhu, Z., 2021b. Error-tolerant deep learning for remote sensing image Journal of Cleaner Production 271, 122429.
scene classification. IEEE Transactions on Cybernetics 51, 1756–1768. Zhang, Z., Liu, Q., Wang, Y., 2018. Road extraction by deep residual u-net. IEEE
Liu, Y., Yao, J., Lu, X., Xia, M., Wang, X., Liu, Y., 2019. Roadnet: Learning to Geoscience and Remote Sensing Letters 15, 749–753.
comprehensively analyze road networks in complex urban scenes from high- Zhou, M., Sui, H., Chen, S., Wang, J., Chen, X., 2020. Bt-roadnet: A boundary and
resolution remotely sensed images. IEEE Transactions on Geoscience and Remote topologically-aware neural network for road extraction from high-resolution remote
Sensing 57, 2043–2056. sensing imagery. ISPRS Journal of Photogrammetry and Remote Sensing 168,
Lu, X., Zhong, Y., Zheng, Z., Liu, Y., Zhao, J., Ma, A., Yang, J., 2019. Multi-scale and 288–306.
multi-task deep learning framework for automatic road extraction. IEEE Zhu, Q., Zhang, Y., Wang, L., Zhong, Y., Li, D., 2021. A global context-aware and batch-
Transactions on Geoscience and Remote Sensing 57, 9362–9377. independent network for road extraction from vhr satellite imagery. ISPRS Journal
Martins, V.S., Kaleita, A.L., Gelder, B.K., da Silveira, H.L., Abe, C.A., 2020. Exploring of Photogrammetry and Remote Sensing 175, 353–365.
multiscale object-based convolutional neural network (multi-ocnn) for remote

14

You might also like