Professional Documents
Culture Documents
Neurocomputing: Fei Yan, Jiawei Wang, Guojian He, Huan Chang, Yan Zhuang
Neurocomputing: Fei Yan, Jiawei Wang, Guojian He, Huan Chang, Yan Zhuang
Neurocomputing: Fei Yan, Jiawei Wang, Guojian He, Huan Chang, Yan Zhuang
Neurocomputing
journal homepage: www.elsevier.com/locate/neucom
a r t i c l e i n f o a b s t r a c t
Article history: In this paper, we proposed a sparse semantic map building method and an outdoor relocalization strat-
Received 19 December 2019 egy based on this map. Most existing semantic mapping approaches focus on improving semantic un-
Revised 6 February 2020
derstanding of single frames and retain a large amount of environmental data. Instead, we don’t want
Accepted 24 February 2020
to locate the UGV precisely, but use the imprecise environmental information to determine the general
Available online 27 March 2020
position of UGV in a large-scale environment like human beings. For this purpose, we divide the environ-
Communicated by Bo Shen ment into environment nodes according to the result of scene understanding. The semantic map of the
outdoor environment is obtained by generating topological relations between the environment nodes. In
Keywords:
Relocalization the semantic map, only the information of the nodes is saved, so that the storage space can be kept at
Semantic map a very small level with the increasing size of environment. When the UGV receives a new local semantic
3d point clouds map, we evaluate the similarity between local map and global map to determine the possible position
Outdoor environment of the UGV according to the categories of the left and right nodes and the distance between the current
Unmanned Ground Vehicles position and the nodes. In order to validate the proposed approach, experiments have been conducted
in a large-scale outdoor environment with a real UGV. Depending on the semantic map, the UGV can
redefine its position from different starting points.
© 2020 Elsevier B.V. All rights reserved.
1. Introduction tion. However, each place may contain hundreds of local features
and match place features directly can be time consuming. The BoW
The ability to acquire accurate position information in real-time technique quantizes local features into a vocabulary and matches
is the premise and basis for the UAVs to perform various tasks. Re- places using numerical vectors, which is adopt to increase effi-
localization consists in locating the UAVs on an existing map when ciency of relocalization in large environments [5]. Although some
they lost their positions because of features matching failure or be- methods to improve the image quality can improve the relocaliza-
ing kidnapped. It is commonly used for navigation, simultaneous tion results [6], the fact that the visual sensor is vulnerable to poor
localization and mapping (SLAM) and automatic drive [1,2]. illumination or dynamic changes exists objectively. So the visual
Visual sensors and laser sensors are the most important sen- relocalization based on deep learning are proposed, such as Deep
sors for UGVs to perceive environments. At present, visual relocal- Belief Network [7] and Convolutional Neural Network [8], which
ization plays a critical role in the field of relocalization. Given a estimated the camera poses by training an end-to-end deep model
pre-built map or a series of images with the exact position, vi- using images. Although the deep model methods can cope with the
sual relocalization aims to determine the 6D pose of the current environmental change, these methods need a long time to learn
camera view. The traditional methods of visual relocalization de- the different appearances of a place or to train robust features. Fur-
tect the most similar database image of the current image by using thermore, visual sensors are limited by their environmental factors,
image matching methods [3,4]. In order to describe a part of the such as brightness and light intensity.
map, there are several feature descriptors, such as SIFT, SURF and Comparing to visual sensors, the laser sensors are not affected
ORB. All of these descriptors have been used in visual relocaliza- by environment factors and can provide precise depth information.
So laser sensors are also considered as the main sensors for re-
localization. Traditional methods are usually relying on geometry
✩
This work was supported by the National Natural Science Foundation of China matching to achieve relocalization [9,10], and some filtering meth-
(U1913201) and the Science and Technology Foundation of Liaoning Province of ods are also used for localization, such as unscented Kalman fil-
China (20180520031). tering, particle swarm optimization, set-membership, H∞ [11–15].
∗
Corresponding author. These methods largely depend on environment maps with dense
E-mail address: zhuang@dlut.edu.cn (Y. Zhuang).
https://doi.org/10.1016/j.neucom.2020.02.103
0925-2312/© 2020 Elsevier B.V. All rights reserved.
334 F. Yan, J. Wang and G. He et al. / Neurocomputing 400 (2020) 333–342
points or geometrical feature. To build these maps, a dense ex- methods are all aimed at the narrow indoor environment. Al-
ploration need to be made in the environment, which is usually though they can provide abundant environment information, they
computationally inefficient and need a lot of storage space. There- are not suitable for UGVs localization and navigation in a large-
fore, the relocalization based on geometry matching is infeasible in scale environment. In order to make semantic map more practical,
some cases such as long term navigation. On the other hand, deep the semantic maps of the environment as priori maps to realize
model methods also be used in laser relocalization [16,17], but the the location of robots is manually built in [22] and [23]. Further-
problem that training model needs a long time still exists. more, in [24], a 3D online semantic mapping system is presented,
In this paper, a sparse semantic map building method and a which utilized CNN model to transfer pixel label distributions of
laser relocalization strategy are proposed. First, in order to ob- 2D image to 3D grid space, and the authors proposed a Conditional
tain robust features, the 3D point clouds are compressed into 2D Random Field (CRF) model with higher order cliques to enforce se-
ODVL images, which are divided into super pixels, and 23 dimen- mantic consistency among grids.
sional texture features are extracted for each super pixel. Then, the Considering that many papers convert 2D images into 3D in-
Gentle-AdaBoost algorithm is used to accomplish feature screening formation, some researchers directly use 3D LIDAR data or fusion
on these texture features to obtain the initial classification results information to build semantic map. Sun et al. [25] represented
of the 3D point clouds. Moreover, reclassification is carried out to the 3-D map as an OctoMap and modeled each cell as a recur-
improve the classification accuracy by using the Centroid method rent neural network, to obtain a Recurrent-OctoMap. In this case,
and considering the height distribution of the point clouds. Ac- the semantic mapping process can be formulated as a sequence-
cording to the results of scene understanding, the outdoor environ- to-sequence encoding–decoding problem. In [26], a real-time end-
ments are divided into scene nodes and road nodes. The semantic to-end semantic segmentation method is proposed for road-objects
map of the outdoor environment is obtained by generating topo- based on spherical images, which are transformed from the 3D Li-
logical relations between the scene nodes and the road nodes. Fi- DAR point clouds and taken as input of the convolutional neural
nally, the map is simplified so that the positioning accuracy can be networks to predict the point-wise semantic mask. In comparison,
improved. Different from the general concept maps, for the seman- Liu et al. [27] transformed 3D laser point clouds into 2D images
tic map, a map format which only stores the distance information and obtained the semantic information based on the images. The
between objects is proposed. According to the scenes on the left fusion methods based on the data fusion of 3D Lidar and camera
and right sides observed at the current position, combined with can obtain the semantic segmentation in the camera frame, which
the environmental nodes of the semantic map, the current posi- reduce the complexity of semantic segmentation based on laser
tions of the UGV can be estimated. Combining with the triangular data, then use the LiDAR points to localize the segmentation in 3D
relationship, a method for obtaining the possible positions of the [28,29]. However, in [28], the semantic maps retain a large num-
UGV by calculating the intersect points of two circles is proposed. ber of voxels, that make storage space and computing efficiency
And then the last position information is maintained and used as face severe challenges in growing environments. Ma et al. [29] for-
the priori condition to calculate the next possible positions of the mulated the problem in a Bayesian filtering framework, and ex-
UGV. In addition, each point has the score indicating the possibil- ploited lanes, traffic signs, as well as vehicle dynamics to localize
ity of the UGV at the current position. The point with the high- robustly with respect to a sparse semantic map. Although the map
est score is the most likely position of the UGV. In order to vali- can avoid the problem of storage and computing efficiency, its ap-
date the proposed approach, experiments have been conducted in plication environment has great limitations.
a large-scale outdoor environment with a real UGV. Depending on
the semantic map, the UGV can redefine its position from different
starting points. 2.2. Laser relocalization
This paper is organized as follows: in Section II we first review
related work on semantic mapping and laser relocalization. Imple- Visual relocalization algorithms struggle to cope with strong ap-
mentation details of scene understanding and semantic map build- pearance changes that commonly occur during long-term applica-
ing are provided in Section III, while our relocalization method is tions in outdoor environments, and fail under certain ill-lighted
described in Section IV. The experiment is conducted and the re- conditions. In contrast to that, LiDAR sensors are mainly unaffected
sults is analyzed in Section V and Conclusion is reached in the last by appearance change and used for relocalization. Some works are
section. matching current laser data designed with some different features
and prior map or pre-calculated database of features. Chong et al.
2. Related works [30] projected 3D point cloud into 2D plane and extracted the en-
vironment features from the plane to localize UGV in 3D urban en-
2.1. Semantic map building vironment, but the basic assumption was that many surfaces in the
urban environment are rectilinear in the vertical direction. Collier
To enable UGVs to recognize an environment and act accord- et al. [31] extracted highly descriptive features, called the Variable
ingly in a real world, the UGVs need to infer not only the geome- Dimensional Local Shape Descriptors, from LiDAR range data to en-
try but also semantic information of surrounding environment. Se- code environment features. These features were used to learn a
mantic mapping is a map building process, which fuses the Simul- generative model of place appearance and determines if an obser-
taneous Localization and Mapping (SLAM) algorithm and scene un- vation comes from a new or previously seen place. In [32], when
derstanding algorithm [18]. the map-matching method is not feasible in two parallel straight
Visual sensors are usually used for semantic mapping. McCor- wall environments, the data from the laser range sensor and the
mac et al. [19] combined Convolutional Neural Networks (CNNs) odometry is integrated to localize the robot based on moving hori-
and a state-of-the-art dense Simultaneous Localization and Map- zon estimation [33]. Wolcott and Eustice [34] proposed a Gaus-
ping (SLAM) system to build indoor semantic map based on RGB- sian mixture map, which was a 2D grid structure where each grid
D camera. Zhao et al. [20] proposed a deep learning method which cell maintains a Gaussian mixture model characterizing the dis-
performs 3D reconstruction while simultaneously recognizing dif- tribution over z-height in that cell. Cao et al. [35] converted 3D
ferent types of materials and labeling them at the pixel level. A laser points to 2D bearing angle images and extracted ORB fea-
novel deep learning approach for semantic segmentation of RGB- tures, then a visual bag of words approach was used to improve
D images with multi-view context was presented in [21]. These matching efficiency. In [36], the 3D LiDAR data was projected into
F. Yan, J. Wang and G. He et al. / Neurocomputing 400 (2020) 333–342 335
2D scan context, which was used to calculate the similarity score vectors corresponding to each point are obtained. The gray value
of two LiDAR scans. GOi,j about the OVDL image model according to GDi,j and GBi,j is
Instead of relying on geometry based matching, learning-based obtained as following:
approaches have shown amazing performances in field of laser re-
localization. Yin et al. [37] proposed a novel synchronous adver-
λ1 GBi, j + λ2 GDi, j
GOi, j = (1)
sarial feature learning, a kind of unsupervised learning method for λ1 + λ2
LiDAR based loop closure detection task. Furthermore, Yin [38] ac- where λ1 and λ2 are weights. The variance of the gray value ma-
complished the place recognition task based on an end-to-end fea- trix is
ture learning framework with the LiDAR inputs. In [39], the semi-
m
n 2
handcrafted range histogram features are used as an input to a 2D GOi, j −GO
convolutional neural network, while Uy et al. [40] proposed the i=1 j=1
PointNetVLAD where they leverage on the recent success of deep
δ2 = (2)
nm
networks to solve point cloud based retrieval for place recogni-
The optimization method is used to obtain the λ1 and λ2 ,
tion. Kim et al. [41] recently presented the idea to transform point
which generate the maximum variance.
clouds into scan context images [36] and feed them into a CNN
for solving the place recognition problem. In [42], a Convolutional
Neural Network is trained to extract compact descriptors from sin- 3.2. Segmentation for ODVL image
gle 3D LiDAR scans.
In summary, the traditional laser relocalization approaches need In order to classify the objects in the environment, the Simple
to save lots of LiDAR data or features. The complexity tends to Linear Iterative Clustering (SLIC) is used to segment the OVDL im-
grow with the environment scale, and matching the current scans age into many pixel blocks. The SLIC method produces a relatively
with potential target scans is computationally expensive. Mean- regular lattice and costs low computational complexity, which are
while, the learning-based approaches need a long time to learn the desirable properties for the subsequent applications. In the SLIC
different appearances of an environment or to train robust CNN method, the local K-means clustering is performed on pixels based
features. Instead, we present a laser relocalization method based on the color distance. Then isolated small clusters are merged with
on a sparse sematic map without learning. the largest neighbor clusters to obtain the superpixels [1]. Despite
of its simplicity and superiority over the conventional methods, the
3. Scene understanding and semantic map building SLIC method encounters problems in two aspects: first, during the
iterative clustering of the K-means algorithm, the inaccurate clas-
The Bearing Angle (BA) image and the range image are the gen- sification of the pixels can be propagated to produce wrong su-
eral models which compress 3D point clouds data into 2D images. perpixels; second, small clusters are merged into its largest neigh-
From the two aspects of describing the details and the texture of bor without considering their color similarity. When we use SLIC
the images and classifying the objects, the range image does not method to segment the OVDL image, we consider not only the
perform well and the BA image model performs poorly in distin- color information, but also the depth information of each pixel. In
guishing the objects at different depths. The novel optimal depth & this way, the two problems of SLIC mentioned above can be im-
vector length (ODVL) image model compensates for the shortcom- proved.
ings of the two methods by calculating the optimal depth value Firstly, SLIC calculates the size of each pixel block.
and the length of the BA vector corresponding to the point clouds
and mapping images to gray images with weights. It also can be N
used for continuous scanning data with large range and fixed-point S= (3)
C
scanning data. There is no special requirement for the scanning
mode and scanning position and it has good adaptability. where S is the side length of the square and N is the total number
of pixels and C is the number of pixel blocks to be separated. On
3.1. ODVL image model the current plane, the cluster center points are selected evenly at
intervals of S pixels to produce the same size of pixel blocks. In
The origin scenes of the point clouds consist of multiple con- a 2S × 2S neighborhood, since the interval of each cluster center
tinuous scans of the laser data. The number of laser data obtained is S, a pixel point will be searched by many cluster center points.
in one scan is 381. 361 laser points in the middle are considered In the search process, the distance from one pixel to one cluster
as the effective laser data. 250 scans of the laser data are consid- center point which is defined as D’:
ered as a scene. The points are projected onto the horizontal plane ⎧
Fig. 2. The Comparison between original map (left) and simplified map (right). Fig. 3. The intersection points of two circles, which represent the possible positions
of the UGV.
Fig. 4. All the previous possible positions are moved according to the motion in-
formation. The blue points are the previous possible points, the red points are the
points moved. x is the distance the UGV moves in the abscissa direction and y
is the distance the UGV moves in the ordinate direction.
Fig. 7. The “main”, “location” and “map” represent the nodes of the program. The
“map.txt” is the file of map.
5.1. Experimental condition A scene in the point cloud data is selected for analysis, as
shown in Fig. 8. This scene is transformed into range image, Bear-
In this experiment, the data used for the semantic map build- ing Angle image and ODVL image as shown in Fig. 9. Compared
ing experiment and relocalization experiment came from the DLUT with the traditional range image and the BA image, the ODVL im-
database. Fig. 6 shows our UGV with onboard sensors. age can be used for the wide range of data, adapt to the variety
The hardware system of the UGV is composed of laser sen- of scanning methods and improve the distinguishing degree and
sors (two LMS511), ADU5630 attitude and orientation integrated detail quality of the boundary of the image. It can also provide
navigation system and industrial computer. The LMS511 is a effective support for further research on subsequent image seg-
two-dimensional laser sensor that acquires a single column of mentation and recognition. The ODVL image is completely inde-
laser data per scan. With the movement of the UGV, the two- pendent of the scene viewpoint and does not use angle calcula-
dimensional laser data will be accumulated in the direction of tion, which is also the essential difference between the ODVL im-
travel, thereby three-dimensional point cloud data is obtained. age and the BA image. The results of segmentation correspond-
The ADU5630 attitude and orientation integrated navigation sys- ing to each figure are shown in Fig. 10. In the range image, the
tem provides position, heading angle and attitude angle accuracy boundary distinction of objects with similar distance is not obvi-
within the allowable range. It’s frequency is 100 Hz, which can ous, so the range image will lose the details of the image and the
match the highest scanning frequency of the LMS511 laser sensor. segmented texture information will also be lost. For example, the
The UGV’s computer is equipped with the ROS(Robot Operating car is not distinguished from the ground, and the left part of the
System) under Linux and the software system is written in C++. building is not distinguished from tree tops. Furthermore, the some
The open source OpenGL and OpenCV program interfaces are used. windows of the building are lost in the range image. Because he
F. Yan, J. Wang and G. He et al. / Neurocomputing 400 (2020) 333–342 339
Table 1
Error rate of SLIC-based Segmentation.
Fig. 11. (a) The result of the initial classification of the Gentle-AdaBoost classifier;
(b) The reclassification result. The buildings are brown. The trees are green and the
trunk of trees are blue. The cars are yellow and the ground is gray.
from the test scenes are selected randomly as shown in Fig. 11. The
Gentle-AdaBoost classifier is used to classify the test samples. It is
Fig. 9. Range image (left), BA image (middle), ODVL image (right). obviously that the point clouds classification based on the ODVL
image has the better classification effect on the “top half of the
building”, “vegetation” and “ground”. But the classification effect
of “vehicle”, “trunk” and “bottom of the building” is not ideal. The
results of the initial classification are reclassified and semantically
corrected. The results of the classification for all the test samples
are shown in Table 3. Through the classification based on ODVL
diagram, semantic reclassification and point cloud correction, the
final classification results get better accuracy and recall rate, with
a total accuracy rate of 0.868.
The continuous scene data in Fig. 11 is derived from the
database, which was collected based on the campus environment.
Fig. 10. SLIC-based segmentation in range image (left), BA image (middle) and The data format and the acquisition method of this data set are
ODVL image (right). similar to those of this paper. It is used to test the generalization
ability of the algorithm. The scene contains about 2346,500 laser
points. The length of the trajectory of the UGV in this database
BA image can’t distinguish between scenes with different depths, is 1155 m, and the laser points cover an area of 50,419 m2 s. It
many pixels belonging to different scenes which are close to each takes about 300 MB to store all the laser points. There are 13 scene
other in two-dimensional image space but far away in three- data on the left and right sides. Fig. 12(a) shows the point clouds
dimensional space are wrongly divided into a pixel block, such as data after coloring the classified objects and Fig. 12(b) shows the
trunk and building in middle of Fig. 10. The ODVL image has rel- schematic diagram of the semantic description of the scene. The
atively good details and texture features, which are beneficial to semantic map of the experimental area is established as shown
improve the accuracy of classification. In the experiment, 60 scenes in Fig. 13. This experiment is from the long route with 175 road
were randomly selected from the database and they are converted nodes which are considered as 175 steps. The size of the map file
into the range image, BA image and ODVL image. Each image is 9.82 KB for this experience, which is much smaller than the stor-
is divided into 500 pixel blocks. Define the segmentation error age space of the laser points.
rate as:
Ne
Er = (12) 5.3. Relocalization
N
where Ne is the total number of the pixel points which are di- The UGV start from different positions in the semantic map.
vided incorrectly in the pixel blocks and N is the total number of After several steps, when there is only one possible position, the
the pixel points. The Er of super pixel segmentation is shown in relocalization is completed. In the 175 road nodes, we choose one
Table 1. node as the starting point every 5 nodes. The relocalization results
For the method of testing the Gentle-AdaBoost classifier, 70% of are shown in Fig. 14. The abscissa represents the start position
the data are used for training and 30% of the data are used for of the UGV, and the ordinate represents the number of steps for
testing. The result is shown in Table 2. Two understanding results the UGV to relocate itself successfully after starting. Instead of
340 F. Yan, J. Wang and G. He et al. / Neurocomputing 400 (2020) 333–342
Table 2
Training set and test set information.
Table 3
Scene understanding result for testing sample.
Fig. 14. The results of relocalization. The abscissa is the starting position. The num-
ber of the blue points represents the number of steps for the successful relocaliza-
tion. In the direction of the ordinate axis, the distance from the blue point to the
yellow line indicates the number of steps successful relocalization and the distance
from the blue point to the red line indicates the number of steps in which the po-
sitioning failed.
Fig. 15. The positions of the relocalization results in the semantic map. The blue
lines and points indicates the trajectory when building the semantic map. The green
points are the environment nodes and the red points are the positons of relocaliza-
Fig. 12. (a) The 3D laser point clouds used for semantic mapping. (b) Large-scale tion.
outdoor semantic map for the same environment. The white lines represent the
constraint relationship between road nodes and environment nodes.
locating the UGV precisely, our focus is to use the sparse semantic
map to determine the general position of UGV in a large-scale
environment. Fig. 15 shows the positions of the relocalization
results in the semantic map, in which the position points are
distributed near the trajectory when building the prior semantic
map. But because there are few environmental nodes on the left
side of the map, the locating is inaccurate, and the robot needs to
move forward for a distance to find its own position.
The processing of building the semantic map is divided into five
parts: BA image generation, super pixel segment, feature extrac-
tion and classification, reclassification and topological map build-
ing. The duration of running the algorithm is shown in Fig. 16. As
Fig. 13. The semantic map of the experimental area.
can be seen from the figure, the feature extraction and classifica-
F. Yan, J. Wang and G. He et al. / Neurocomputing 400 (2020) 333–342 341
[2] J.G. Liu, D.Q. Liu, J. Cheng, Y.Y. Tang, Conditional simultaneous localization
and mapping: a robust visual SLAM system, Neurocomputing 145 (2014)
269–284.
[3] T. Sattler, B. Leibe, L. Kobbelt, Efficient & effective prioritized matching for
large-scale image-based localization, IEEE Trans. Pattern Anal. Mach. Intell. 39
(9) (Sep. 2017) 1744–1756.
[4] Y. Feng, Y. Wu, L. Fan, Real-time slam relocalization with online learning of
binary feature indexing, Mach. Vis. Appl. 28 (8) (Aug. 2017) 953–963.
[5] A. Angeli, D. Filliat, S. Doncieux, J.A. Meyer, Fast and incremental method for
loop-closure detection using bags of visual words, IEEE Trans. Robot. 24 (5)
(Oct. 2008) 1027–1037.
[6] N. Zeng, H. Zhang, Y. Li, J. Liang, A.M. Dobaie, Denoising and deblurring gold
immunochromatographic strip images via gradient projection algorithmsL,
Neurocomputing 247 (2017) 165–172.
[7] N. Zeng, Z. Wang, H. Zhang, K. Kim, Y. Li, X. Liu, An improved particle fil-
ter with a novel hybrid proposal distribution for quantitative analysis of gold
immunochromatographic strips, IEEE Trans. Nanotechnol. 18 (1) (2019) 819–
829.
[8] A. Kendall, R. Cipolla, Modelling uncertainty in deep learning for camera relo-
calization, in: 2016 IEEE International Conference on Robotics and Automation
(ICRA), Stockholm, Sweden, 2016, pp. 4762–4769.
Fig. 16. Time Cost of Scene Understanding and Semantic Mapping. [9] T. Röhling, J. Mack, D. Schulz, A fast histogram-based similarity measure for
detecting loop closures in 3-D lidar data, in: 2015 IEEE/RSJ International Con-
ference on Intelligent Robots and Systems (IROS), Hamburg, Germany, 2015,
tion take the longest time. For the other process, they spend below pp. 736–741.
[10] B. Steder, G. Grisetti, W. Burgard, Robust place recognition for 3D range data
100 ms. based on point features, in: 2010 IEEE International Conference on Robotics
and Automation, Anchorage, AK, 2010, pp. 1400–1405.
6. Conclusion and future work [11] N. Zeng, H. Zhang, Y. Chen, B. Chen, Path planning for intelligent robot based
on switching local evolutionary PSO algorithm, Assem. Autom. 36 (2) (2016)
120–126.
This paper has focused on how to relocalize UGVs in outdoor [12] V. Pham, J. Juang, Robust and efficient slam via compressed H∞ filtering, Asian
environments based on a sparse semantic map. The laser points J. Control 16 (3) (2014) 878–889.
[13] S. Liu, Z. Wang, G. Wei and M. Li, Distributed set-membership filtering for
are divided into the pixel blocks according to the OVDL image multi-rate systems under the round-robin scheduling over sensor networks,
and classified by Gentle-AdaBoost classifier. The semantic map of IEEE Trans. Cybern., doi:10.1109/TCYB.2018.2885653.
the outdoor environment is built by generating topological rela- [14] Y. Chen, Z. Wang, Y. Yuan, P. Date, Distributed H∞ filtering for switched
stochastic delayed systems over sensor networks with fading measurements,
tions between the environment nodes, which are obtained accord- IEEE Trans. Cybern. 50 (1) (2020) 2–14.
ing to the classification results. Instead of the laser data or the [15] S. Liu, Z. Wang, Y. Chen, G. Wei, Protocol-based unscented Kalman filtering
features, only information of the environment nodes is saved for in the presence of stochastic uncertainties, IEEE Trans. Autom. Control 65 (3)
(2020) 1303–1309.
relocalization, that avoids the problem of storage in large-scale en-
[16] K. Granstrom, J. Callmer, F. Ramos, J. Nieto, Learning to detect loop closure
vironments. As for locating the UGV, a new positioning method is from range data, in: 2009 IEEE International Conference on Robotics and Au-
proposed. After getting the environmental object categories on the tomation, Kobe, Japan, 2009, pp. 15–22.
[17] Y.M. Li, S. Li, Y.J. Ge, A biologically inspired solution to simultaneous localiza-
both sides of the current position, we draw a circle and find the in-
tion and consistent mapping in dynamic environments, Neurocomputing 104
tersection point to get the possible positions of the UGV. Applying (2013) 170–179.
this position as the prior knowledge to the next position estima- [18] C. Galindo, J.A. Fernández-Madrigal, J. González, A. Saffiotti, Robot task plan-
tion process. The related relocalization experiments are carried out ning using semantic maps, Rob Auton. Syst. 56 (11) (Aug. 2008) 955–966.
[19] J. McCormac, A. Handa, A. Davison, S. Leutenegger, SemanticFusion: dense
in the outdoor environment and the results verify the effectiveness 3D semantic mapping with convolutional neural networks, in: 2017 IEEE In-
and robustness of the algorithm. ternational Conference on Robotics and Automation (ICRA), Singapore, 2017,
When the UGVs return to a visited position, if the environment pp. 4628–4635.
[20] C. Zhao, L. Sun, R. Stolkin, A fully end-to-end deep learning approach for
object is not completely observed by LiDAR, current environment real-time simultaneous 3D reconstruction and material recognition, in: 2017
node is not completely consistent with the environment node in 18th International Conference on Advanced Robotics (ICAR), Hong Kong, China,
the sematic map. It is a problem to fuse the new node into the 2017, pp. 75–82.
[21] L. Ma, J. Stückler, C. Kerl, D. Cremers, Multi-view deep learning for consistent
sematic map. So we focus on improve the map updating method semantic mapping with rgb-d cameras, in: 2017 IEEE/RSJ International Confer-
in the future. And how to improve the positioning accurate in the ence on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 2017,
area with few environmental nodes is also one of the future works. pp. 598–605.
[22] P. Wang, R. Yang, B. Cao, W. Xu, Y. Lin, DeLS-3D: deep localization and segmen-
tation with a 3D semantic map, in: 2018 IEEE/CVF Conference on Computer
Declaration of Competing Interest Vision and Pattern Recognition, Salt Lake City, 2018, pp. 5860–5869.
[23] Z. Xiao, K. Jiang, S. Xie, T. Wen, C. Yu, D. Yang, Monocular vehicle self-local-
ization method based on compact semantic map, in: 2018 21st International
The authors declare that they have no known competing finan-
Conference on Intelligent Transportation Systems (ITSC), Maui, United States,
cial interests or personal relationships that could have appeared to 2018, pp. 3083–3090.
influence the work reported in this paper. [24] S. Yang, Y. Huang, S. Scherer, Semantic 3D occupancy mapping through ef-
ficient high order CRFs, in: 2017 IEEE/RSJ International Conference on In-
telligent Robots and Systems (IROS), Vancouver, BC, Canada, 2017, pp. 590–
CRediT authorship contribution statement 597.
[25] L. Sun, Z. Yan, A. Zaganidis, C. Zhao, T. Duckett, Recurrent-OctoMap: learning
state-based map refinement for long-term semantic mapping with 3-d-Lidar
Fei Yan: Writing - original draft, Methodology, Conceptualiza-
data, IEEE Robot. Autom. Lett. 3 (3) (2018) 3749–3756.
tion, Methodology, Writing - original draft. Jiawei Wang: Software, [26] Y. Wang, T. Shi, P. Yun, L. Tai, M. Liu, PointSeg: real-Time semantic segmenta-
Writing - original draft. Guojian He: Data curation, Visualization, tion based on 3D lidar point cloud, arXiv preprint arXiv:1807.06288v8, 25 Sep.
Investigation. Huan Chang: Investigation, Validation. Yan Zhuang: 2018.
[27] Y. Liu, F. Wang, A.M. Dobaie, G. He, Y. Zhuang, Comparison of 2D image models
Supervision, Resources, Project administration. in segmentation performance for 3D laser point clouds, Neurocomputing 251
(2017) 136–144.
References [28] J. Jeong, T.S. Yoon, J.B. Park, Multimodal sensor-based semantic 3D mapping for
a large-scale environment, Expert Syst. Appl. 105 (2018) 1–10.
[1] Q.L. Li, Y. Son, Z.G. Hou, Neural network based fastslam for autonomous robots [29] W.C. Ma, T. Ignacio, Bârsan, et al., Exploiting sparse semantic hd maps for self-
in unknown environments, Neurocomputing 165 (2015) 99–110. driving vehicle localization, arXiv preprint arXiv:1908.03274v1, 8 Aug. 2019.
342 F. Yan, J. Wang and G. He et al. / Neurocomputing 400 (2020) 333–342
[30] Z.J. Chong, B. Qin, T. Bandyopadhyay, M.H. Ang, E. Frazzoli, D. Rus, Synthetic Jiawei Wang is a graduate student in the School of Con-
2D lidar for precise vehicle localization in 3D urban environment, in: 2013 trol Science and Engineering at Dalian University of Tech-
IEEE International Conference on Robotics and Automation, Karlsruhe, Ger- nology, P. R. China. He received a bachelor’s degree in Au-
many, May 2013, pp. 1554–1559. tomation from the School of Marine Electrical Engineering
[31] J. Collier, S. Se, V. Kotamraju, P. Jasiobedzki, Real-Time Lidar-Based Place Recog- of Dalian Maritime University in 2019. His-research inter-
nition Using Distinctive Shape Descriptors, in: Proceedings of the international ests are in robot navigation, and semantic scene under-
society for optics and photonics, 8387(1), 2012, pp. 1–11. standing.
[32] K. Sakaeta, K. Nonaka, K. Sekiguchi, Experimental verification of a vehicle lo-
calization based on moving horizon estimation integrating lrs and odometry, J.
Phys.: Conf. Ser. 744 (2016) 1–12.
[33] L. Zou, Z. Wang, Q.L. Han and D. Zhou, Moving horizon estimation for net-
worked time-delay systems under round-robin protocol, vol. 64, no. 12, pp.
5191–5198, 2019.
Guojian He received the bachelor and master degree in
[34] R.W. Wolcott, R.M. Eustice, Fast lidar localization using multiresolution Gaus-
Control Theory and Engineering from Dalian University of
sian mixture maps, in: 2015 IEEE International Conference on Robotics and
Technology, Dalian, China, in 2011 and 2014, respectively.
Automation (ICRA), Seattle, WA, USA, 2015, pp. 2814–2821.
Currently he is a Ph.D. candidate in the School of Control
[35] F. Cao, Y. Zhuang, H. Zhang, W. Wang, Robust place recognition and loop clos-
Science and Engineering at Dalian University of Technol-
ing in laser-based slam for UGVs in urban environments, IEEE Sens. J. 18 (10)
ogy, P. R. China. His-research interests are in robotics, 3D
(2018) 4242–4252.
LiDAR SLAM, mapping and semantic scene understanding.
[36] G. Kim, A. Kim, Scan context: egocentric spatial descriptor for place recog-
nition within 3D point cloud map, in: 2018 IEEE/RSJ International Confer-
ence on Intelligent Robots and Systems (IROS), Madrid, Spain, Oct. 2018,
pp. 4802–4809.
[37] P. Yin, Y. He, L. Xu, Y. Peng, J. Han, W. Xu, Synchronous adversarial feature
learning for lidar based loop closure detection, in: 2018 Annual American Con-
trol Conference (ACC), Milwaukee, WI, US, June 2018, pp. 234–239.
[38] P. Yin, L. Xu, Z. Liu, L. Li, H. Salman, Y. He, W. Xu, H. Wang, H. Choset, Stabilize Huan Chang received bachelor degree in Measurement
an unsupervised feature learning for lidar-based place recognition, in: 2018 and Control Technology and Instrumentation Program
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), from Huazhong University of Science and Technology,
Madrid, Spain, Oct. 1-5, 2018, pp. 1162–1167. China, in 2012, and master degree in Control Theory and
[39] H. Yin, Y. Wang, L. Tang, X. Ding, and R. Xiong, LocNet: global localization in Engineering from Dalian University of Technology, China,
3D point clouds for mobile robots, arXiv preprint arxiv:1712.02165, 2017. in 2016. His-research interests are in semantic scene un-
[40] M.A. Uy, G.H. Lee, PointNetVLAD: deep point cloud based retrieval for large-s- derstanding, 3-D mapping, and path planning.
cale place recognition, in: 2018 IEEE/CVF Conference on Computer Vision and
Pattern Recognition, Salt Lake City, UT, USA, June 2018, pp. 18–23.
[41] G. Kim, B. Park, A. Kim, 1-Day Learning, 1-year localization: long-term lidar
localization using scan context image, IEEE Robot. Autom. Lett. 4 (2) (2019)
1948–1955.
[42] L. Schaupp1, M. Bürki, R. Dubé, R. Siegwart and C. Cadena, OREOS: oriented
recognition of 3D point clouds in outdoor scenarios, arXiv preprint arXiv:1903. Yan Zhuang (M’11) obtained the B.Sc. and M.Sc. de-
07918v1, Mar. 2019. grees in Control Theory and Engineering from Northeast-
[43] M.H. Bharati, J.J. Liu, J.F MacGregor, Image texture analysis: methods and com- ern University, P. R. China, in 1997 and 20 0 0 respectively.
parisons, Chemom. Intell. Lab. Syst. 72 (2004) 57–71. He received the Ph.D. degree in Control Theory and En-
[44] Y. Zhuang, G. He, H. Hu, Z. Wu, A novel outdoor scene-understanding frame- gineering from the Dalian University of Technology, P. R.
work for unmanned ground vehicles with 3D laser scanners, Trans. Inst. Meas. China in 2004. He joined Dalian University of Technology
Control 37 (4) (2015) 435–445. in 2005 as a Lecturer and became an Associate Professor
in 2007. Currently he is a Professor in School of Control
Fei Yan (M’15) received the bachelor’s and Ph.D. degrees Science and Engineering, Dalian University of Technol-
in Control Theory and Engineering from the Dalian Uni- ogy. His-research interests include mobile robot 3D map-
versity of Technology, Dalian, China, in 2004 and 2011, ping, outdoor scene understanding, 3D-laser-based object
respectively. In 2013, he joined the Dalian University recognition, 3D scene recognition and reconstruction.
of Technology, as a Post-Doctoral and became a Lec-
turer in 2015, where he is currently an Associate Profes-
sor with the School of Control Science and Engineering.
His-current research interests include 3-D mapping, path
planning, and semantic scene understanding.