Recent Developments and Applications of Simultaneous Localization and Mapping in Agriculture

15564967, 2022, 6, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/rob.22077 by Kayseri University, Wiley Online Library on [07/11/2022].
See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Received: 29 October 2021 | Revised: 25 February 2022 | Accepted: 5 April 2022
DOI: 10.1002/rob.22077
SURVEY ARTICLE
Recent developments and applications of simultaneous

localization and mapping in agriculture
Haizhou Ding1 | Baohua Zhang2 | Jun Zhou3 | Yaxuan Yan1 |

3 3
Guangzhao Tian | Baoxing Gu
1
Department of Electronic Information,
College of Artificial Intelligence, Nanjing Abstract
Agricultural University, Nanjing, Jiangsu, China
Simultaneous Localization and Mapping (SLAM) is a process to use multiple sensors
2
Department of Automation, College of
Artificial Intelligence, Nanjing Agricultural
to position an unmanned mobile vehicle without previous knowledge of the
University, Nanjing, Jiangsu, China environment, and meanwhile construct a map of this environment for the further
3
Department of Agricultural Engineering, applications. Over the past three decades, SLAM has been intensively researched
College of Engineering, Nanjing Agricultural
University, Nanjing, Jiangsu, China and widely applied in mobile robot control and unmanned vehicle navigation. SLAM
technology has demonstrated a great potential in autonomously navigating the
Correspondence
mobile robot and simultaneously reconstructing the three‐dimensional (3D)
Baohua Zhang, College of Artificial
Intelligence, Nanjing Agricultural University, information of surrounding environment. With the vigorous driving of sensor
No. 40, Dianjiangtai Road, Pukou District,
technology and 3D reconstruction algorithms, many attempts have been conducted
Nanjing City, Jiangsu Province 210031, China.
Email: bhzhang@njau.edu.cn to propose novel systems and algorithms combined with different sensors to solve
the SLAM problem. Notably, SLAM has been extended to various aspects of
Funding information
National Natural Science Foundation of China,
agriculture involved with autonomous navigation, 3D mapping, field monitoring, and
Grant/Award Number: 31471419, 31901415; intelligent spraying. This paper focuses on the recent developments and applications
Jiangsu Agricultural Science and Technology
Innovation Fund (JASTIF),
of SLAM, particularly in complex and unstructured agricultural environment.
Grant/Award Number: CX (21)3146 A detailed summary of the developments of SLAM is given from three main
fundamental types: light detection and ranging SLAM, Visual SLAM, and Sensor
Fusion SLAM, and we also discuss the applications and prospects of SLAM
technology in agricultural mapping, agricultural navigation, and precise automatic
agriculture. Particular attention has been paid to the SLAM sensors, systems, and
algorithms applied in agricultural tasks. Additionally, the challenges and future trends
of SLAM are reported.
KEYWORDS
agricultural applications, autonomous navigation, precision agriculture, sensors and
systems, SLAM
1 | INTRODUCTION problem in autonomous navigation and localization of vehicles and

robots. The robust solution to long‐term and real‐time SLAM problem
Simultaneous Localization and Mapping (SLAM) is the process that significantly improves the autonomous ability of mobile robots
enables a mobile robot to build a globally consistent map of the operating in unknown environments (Jung et al., 2004; Kohlbrecher
environment where Global Positioning System (GPS) signal is weak or et al., 2011; Mao‐Hai et al., 2006). The SLAM problem can be divided
denied, and at the same time, computes its location within this map into two essential tasks: Localization and Mapping (Stachniss, 2009).
model (Durrant‐Whyte & Bailey, 2006). SLAM is considered to be a key Localization, in a firm and precise manner, measures the current
956 | © 2022 Wiley Periodicals LLC. wileyonlinelibrary.com/journal/rob J Field Robotics. 2022;39:956–983.

15564967, 2022, 6, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/rob.22077 by Kayseri University, Wiley Online Library on [07/11/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
DING ET AL. | 957
posture of the robot in an environment (Chong et al., 2015; Fuentes‐ difficulty to position robots indoors. The combination of SLAM and
Pacheco et al., 2015). Mapping, done simultaneously with no IMU gives a solution to this problem (Faragher et al., 2012; Ouellette
information of the robot's location in advance, combines the partial & Hirasawa, 2007). SLAM is also of great significance for
knowledge of the environment into a consistent model (Chong unmanned vehicle (UGV/UAV). Particularly, Sensor Fusion SLAM,
et al., 2015; Fuentes‐Pacheco et al., 2015). Autonomous mobile robots such as integrating the 3D laser and camera, gradually becomes the
endowed with robust SLAM algorithms have played an important role main method to position unmanned vehicles, build the map
in many application scenes, such as exploration in diverse area, like, simultaneously and implement the autonomous navigation in an
rough terrain (including urban search and rescue), deep area (including unknown environment (Artieda et al., 2009; Demim et al., 2016;
underground mining and underwater surveillance), aerial space (includ- Gupte et al., 2012). In general, SLAM has been widely exploited in
ing planetary exploration), and other places that threaten the very various fields with a good prospect of applications.
safety of human beings (Dissanayake et al., 2011; Khairuddin The applications of SLAM have been extended to agricultural
et al., 2015; Lu et al., 2009; Erfani et al., 2019). areas for mapping, navigation, and precision agriculture (PA) as
SLAM technology has gone through more than 30 years of relatively recent performance improves in both sensors and
history since the concept of SLAM was put forward by R. Smith et al. computer vision algorithms mature (Jha et al., 2019). The ability to
(1990) in the 1980s. The algorithms of SLAM have changed from the accurately locate an automated agricultural machine is a critical factor
filter‐based approach first raised by H. R. C. Smith and Cheeseman to determine the feasibility of automated agriculture (Libby & Kantor,
(1986) to the optimization‐based approach proposed by Triggs et al. 2010; Shafaei et al., 2019). The robust implementation of SLAM is
(1999), and the technical framework has evolved from single thread gradually becoming a key prerequisite for making the agricultural
to multithread. The sensors used in SLAM systems also continue to machinery equipment fully autonomous in the complex agricultural
expand, from sonar in the early phase to two‐/three‐dimensional environment. With robust SLAM algorithms, the agricultural machine
(2D/3D) light detection and ranging (Lidar) later, to monocular, is capable of establishing a map of the agricultural area, and
stereo, RGB‐D, Time of Flight (ToF), and other cameras, as well as positioning itself in the absence of prior environmental information.
integration with sensors, such as Inertial Measurement Unit (IMU). On that basis, the agricultural machine completes the batches of
Lidar SLAM, Visual SLAM, and multi‐Sensor Fusion SLAM are the tasks, like, spraying, weeding, and seeding instead of human labor
three most used SLAM technologies nowadays. In Lidar SLAM, point (Libby & Kantor, 2010). Compared with traditional agriculture,
cloud information of the surrounding environment is collected by agricultural machinery equipment utilizing SLAM technology lightens
Lidar, and then the SLAM system calculates the distance of the farmers' workload, completes risky tasks, improves the crop yield,
relative movement of Lidar and the change of attitude through and increases the utilization ratio of human resources (Griepentrog
matching and comparison of Point Cloud at different moments, to et al., 2009; Habibie et al., 2017).
complete the localization of the robot. Visual SLAM uses the However, with the increasingly large demand for agricultural
knowledge of multiframe images and multiview geometry to estimate automation, a detailed summary of recent developments and
its pose and then calculates depth information by accumulating pose applications of SLAM in agriculture is not available. This paper aims
changes. Especially, RGB‐D cameras can get depth information to offer basic help to those researchers just getting started with the
directly. However, 2D, 3D Lidar, and cameras often fail when used SLAM issue, especially in agricultural areas. Specifically, the SLAM
alone in complex scenes, such as agricultural environments. technology is introduced from three perspectives, including Lidar
Therefore, multi‐Sensor Fusion SLAM has become a hot topic in SLAM, Visual SLAM, and Sensor Fusion SLAM. The applications of
recent years, and Lidar, camera, IMU, wheel odometer, and Global SLAM in agriculture are emphatically reviewed from mapping,
Navigation Satellite Systems (GNSS) are the most used sensors for navigation, and PA. Finally, the remaining challenges and future
fusion algorithms to make up for the lack of each sensor. trends about SLAM are also discussed (Figure 1).
SLAM technology is in demand in many scenarios, such
as Virtual/Augmented Reality (VR/AR) equipment (Klein &
Murray, 2007; Newcombe et al., 2011), indoor autonomous mobile 2 | SLA M TECHNO LOGY
robots (Faragher et al., 2012; Ouellette & Hirasawa, 2007), un-
manned ground vehicle (UGV; Demim et al., 2016), and unmanned SLAM is an algorithm by which an autonomous robot can
aerial vehicle (UAV; Artieda et al., 2009; Gupte et al., 2012). Some are incrementally build an integrated map of the unknown environment
commercially available and some are still in development. In AR, most as continuously getting credible observations and estimating the
systems operate with information of the environment in advance. egomotion and location of the moving robot using this map
However, with the development of SLAM algorithms and equipment, (Dissanayake et al., 2001; Durrant‐Whyte & Bailey, 2006; Strasdat
the monocular and Kinect sensor can reconstruct arbitrary indoor et al., 2010). Over the past several decades, SLAM has become the
scenes without initializing the map in real time (Klein & Murray, 2007; most compelling research fields in robotics community together with
Newcombe et al., 2011). In terms of indoor environments, localization numerous implementations and progress of SLAM algorithms
depends traditionally on communal landmarks and signs since (Clemente et al., 2007; Strasdat et al., 2010). Many remarkable
GNSS signals cannot totally penetrate buildings, increasing the SLAM systems have been applied to navigation and localization.
958 | DING ET AL.
F I G U R E 1 The general overview of the recent developments and applications of SLAM in agricultural environment. The SLAM technology is
introduced from three essential aspects, including Visual SLAM, Lidar SLAM, and Sensor Fusion SLAM, and the agricultural applications of SLAM
are concluded from mapping, navigation, and precise agriculture. SLAM, Simultaneous Localization and Mapping.
Generally, the two most popular sensor modalities used in SLAM the rigorous Bayesian rule. As for Visual SLAM, both filtering
are raw range scan sensors and feature‐based sensors whether approaches and bundle adjustment (BA) approaches (nonprobabilistic
extracted from images (Chong et al., 2015). Systems based on range‐ frameworks) are widely utilized to fulfill and optimize the mapping
scanning sensors are represented by Lidar SLAM, while features and positioning jobs (Santos et al., 2013; Strasdat et al., 2010).
extraction algorithms use cameras as sensors, including monocular, As SLAM technology continues to be used to address practical
stereo vision, and RGB‐D cameras. It is more challenging to attempt problems, Lidar SLAM is considered to be a relatively mature
SLAM with standard cameras as the main sensory input, because the navigation and mapping scheme, and Visual SLAM is a mainstream
essential geometry information of the world is not as accessible as direction of current research (F. Fang et al., 2005) since VSLAM can
laser data (Clemente et al., 2007). Certainly, systems combining two be further improved at the algorithmic level. Besides, the sensor
types of sensors are also presented to solve the SLAM problem. In fusion method is an irresistible trend of SLAM solution in the future.
Lidar SLAM, filtering approaches (probabilistic frameworks) have Particular attention should be also paid on some Deep Learning
been prevalent. The vast majority of the filtering approaches rely on algorithms utilized in SLAM.
DING ET AL. | 959
2.1 | Lidar SLAM large unstructured set of scattered points with accurate angle and
distance information, called Point Cloud. Generally, the SLAM system
The simplest Lidar SLAM system consists of an unmanned vehicle calculates the relative motion distance and pose change of the Lidar
equipped with Lidar sensors that can basically complete the tasks by matching the two point‐clouds at different moments, so as to
of mapping and positioning. Lidar sensors employ a noncontact make an estimation of the robot itself. Estimation means reconstruct-
measurement method to scan 2D/3D point cloud data from the ing an integrated map in an unknown environment given a sequence
surrounding environment. Lidar works by an infrared laser signal of Lidar data. However, Lidar measurements are difficult to be
towards a target and then it receives the reflected light from the realized in the continually moving Lidar precisely because of motion
object. The built‐in system of Lidar sensor output the distance distortion in point clouds (Zhang et al., 2015).
information by solving the triangle, that is, Triangulation Ranging. Single‐line scan Lidar, also known as 2D Lidar, is actually a high‐
Lidar is similar to radar, except that the Lidar sensor sends out and frequency‐pulse laser rangefinder using 1D rotational scanning. With a
receives pulses of light instead of radio waves (Schwarz, 2010). simple structure of only one transmitting and one receiving, single‐line
According to the number of lines, Lidar can be divided into single‐line Lidar has high scanning speed and angular resolution making it reliable
Lidar and multiline Lidar in autonomous robotics, and the latter is to meet the customers' requirement for high‐precision location data.
more sensitive and more accurate to measure distances than the The single‐line Lidar is able to obtain 2D information. To facilitate the
latter. Single‐line Lidar can only realize planar scanning, while acquisition of 3D information about the target object, multiline Lidar,
multiline Lidar can be used for 2.5D or 3D mapping. also known as 3D Lidar, has also been applied in the SLAM system
Lidar SLAM uses the high update rate and high measurement (Dube et al., 2017; Hening et al., 2017; Zhang et al., 2015). Multiline
accuracy to provide environmental features and robot pose in real time Lidar is a scanning instrument with multiple laser emitters rotating to
(Kohlbrecher et al., 2011; Y. Li & Olson, 2010; Marck et al., 2013). On form a beam. In contrast with single‐line Lidar, multiline Lidar can
the basis of the Bayesian probabilistic framework, Extended Kalman obtain more information and multidimensional data about the target.
Filter‐SLAM (EKF‐SLAM) and Particle Filter‐SLAM (PF‐SLAM) are the Yet multiline Lidar is difficult to achieve the same technical indicators in
main algorithmic solutions to the problem of Lidar SLAM. Moreover, terms of high repetition frequency and high angle resolution. In the
graph‐based methods (Konolige et al., 2010; Vincent et al., 2010) use field of unmanned driving, multiline Lidar is often used as the sensor of
nodes to represent the robot's pose and the line of nodes (edges) to autonomous navigation vehicle. However, the sensors become more
represent the constraint between each pose. Assuming that errors have expensive as the number of lines increases, limiting its further
a Gaussian distribution, EKF‐SLAM approximates nonlinear motion and commercial use (Figure 2). Single‐line Lidar and multiline Lidar have
observation models to linear models by the first‐ or second‐order the advantages of high efficiency, high precision, and the output data
expansion of Taylor series (Bishop & Welch, 2001; Guivant & do not require a lot of calculation (Riisgaard & Blas, 2003).
Nebot, 2001). PF‐SLAM is also a nonlinear algorithm derived from According to the optimization framework, Lidar SLAM systems
Monte Carlo ideas (Murphy, 2000) and solved by numerical approxi- can be divided into filter‐based and graph‐based SLAM systems. In
mation of posterior probability density. Under the typical probability the early stage, R. Smith et al. (1990) pioneered the EKF‐SLAM
model, both particle filter (PF) and EKF aim at estimating a joint scheme using a maximum likelihood algorithm for data correlation.
posterior probability (Murphy, 2000). However, EKF‐SLAM built the sparse map based on features with
poor robustness and had large computational complexity.
p (x1: t , m|z1: t, u0: t −1), (1)
Montemerlo et al. (2002) presented an efficient system dubbed
where m is the mean of the map and x1:t = x1, …, xt represents the FastSLAM that was the first one to realize real‐time construction of
trajectory of the robot. This probability is estimated given the occupancy grid map (Elfes, 1989). Considering the inherent condi-
observations z1:t = z1, …, zt and the odometry measurements tional independences of the SLAM problem, they decomposed SLAM
u0:t−1 = u0, …, ut−1 obtained by the mobile robot (Grisetti et al., 2007). into two subproblems: positioning the robot and estimating environ-
Lidar SLAM is initially put into commercial use, such as sweeping mental landmarks, and then utilized landmarks to estimate the pose
robot, due to its advantages of the high accuracy, mature technology, of the robot with Rao‐Blackwellized particle filter (RBPF; Doucet
and well‐developed system. But sensors like multiline Lidar have a et al., 2000). On the basis of FastSLAM, Grisetti et al. (2007)
huge amount of computing and high price and are not conducive to proposed an improved solution named Gmapping, optimizing the
the preliminary experimental study. performance of RBPF by giving more accurate proposal distribution
and an adaptive resampling method. The improved proposal
distribution not only relied on the odometry information, but also
2.1.1 | Sensors and systems took into account the most recent observation, thus making the
proposal distribution closer to the real target distribution. And the
Lidar as a vital component of many SLAM systems detects the resampling strategy determined whether to resample or not by
geometric information in outdoor strong light environments/in setting a threshold, effectively solving the problem of particle
an outdoor strong light environment with a laser source depletion. Generally, Gmapping is considered to be the most widely
(Tang et al., 2015). The information collected by Lidar presents a used Lidar SLAM system in 2D robot navigation (Santos et al., 2013).
960 | DING ET AL.
F I G U R E 2 (a) A multiline Lidar sensor and its profile with a scanning range of 28.6° (Schwarz, 2010). Then, we illustrate a simple Lidar sensor
system (d) consisting of a scanning aperture (c) and a positioning laser (b). It allows one to form laser light triangles with different shapes for a
very short period of time, and we can obtain all the necessary information to calculate the 3D coordinates, when receiving the reflected light of
an obstacle surface (Básaca‐Preciado et al., 2014).
Furthermore, recognizing the sparsity of the matrix, many Lidar Therefore, the system had no odometry data, but strictly required
SLAM systems using graph‐based optimization have been proposed. the Lidar sensor to have high update frequency and low observation
The first Lidar SLAM system based on graph, namely, Karto SLAM, noise. Because the nonlinear optimization method used in graphical
was presented by Konolige et al. (2010). They used a direct sparse approaches has the drawbacks that convergence to a global minimum
linear method to efficiently compute the sparse matrix called Sparse of the cost function cannot be guaranteed generally, Carlone et al.
Pose Adjustment (SPA), demonstrating a better performance in (2012) introduced a new approach dubbed Linear Approximation for
maintaining the map of large‐scale environments than those based on Graph Optimization (LagoSLAM) without requiring an initial guess in
a filter. Google's open‐source solution dubbed Cartographer (Hess the optimization process. The authors utilized a linear orientation and
et al., 2016) generated 2D grid maps with an r = 5 cm resolution. a linear position estimation to obtain the first‐order approximation of
Cartographer exploited scan match to insert a submap at the best‐ the nonlinear system (Santos et al., 2013).
estimated position, and a local loop closure detection was carried out.
When all submaps were constructed, the global loop closure
detection was realized by using branch and bound and precomputed 2.1.2 | Mapping methods: Occupancy grid mapping
grid. Kohlbrecher et al. (2011) proposed Hector SLAM that could
localize and map robustly in various unstructured environments, for Robotic mapping means that the robot gathers sensory discrete‐time
example, finding victims in simulated earthquake scenarios. readings and subsequently integrates them into a representation of
DING ET AL. | 961
the environment (Colleens & Colleens, 2007). Robust and accurate x1:t, z1:t, u0:t−1) represents the probability distribution of the entire
models of the robotic map are essential for advanced SLAM systems map in the case of a given trajectory, where u0:t−1 can be omitted
because they can substantially improve the effectiveness of path according to Markov properties. As each cell is independent, the
planning and the safety of automatic driving (Homm et al., 2010; following equation can be obtained by combining Equation (2) (Thrun
Meyer‐Delius et al., 2012). In general, SLAM maps can be divided into et al., 2002):
two categories: metric maps and topological maps. The topological
map is composed of vertices and edges, only considering the p (m|x1: t , z1: t) = ∏ p (mi |x1:t , z1:t). (4)
i
connectivity between vertices (Chang et al., 2007). Since topological
maps are not good at describing complex environments and detailed Equation (4) means the probability distribution of the entire map
parts of maps are usually missing, metric maps are better employed can be obtained by calculating the probability of each cell. A general
for automatic navigation and driving (Chang et al., 2007). The metric solution for the probability values of each cell can be found in Thrun
map can be further subdivided into sparse (features) maps and dense et al. (2002). Occupancy grid maps are easy to implement short path
maps. There are various dense maps, represented by the occupancy planning. However, the path planning of grid map is inefficient and
grid map (Mitsou & Tzafestas, 2007; Moravec & Elfes, 1985) for 2D requires accurate robot position estimation (Figure 3).
Lidar SLAM and the voxel map (Bosse & Zlot, 2009; Ryde &
Hu, 2007) and point cloud map for 3D Lidar SLAM. Many factors
need to be considered to construct a robotic map, for example, noise 2.1.3 | Localization methods: Monte Carlo
generated by sensor measurement (Mitsou & Tzafestas, 2007) and localization (MCL)
noise generated by robot movement (Choset & Nagatani, 2001).
Occupancy grid mapping first presented by Moravec and Elfes (1985) Localization is a version of online temporal state estimation (Fox
was one of the most popular mapping approaches in SLAM systems. et al., 1999). To locate a robot means to estimate its pose (location and
Occupancy grid mapping was an efficient mathematical method with orientation) relative to its environment (Thrun et al., 2001). MCL
a simple basic principle: empty areas are represented by a white grid algorithm gives a solution to trajectory prediction under the condition
and occupied areas are represented by a black grid (Meyer‐Delius of a known map. This problem considers two aspects of localization:
et al., 2012; Moravec & Elfes, 1985). In history, two main methods for global localization and position tracking (Fox et al., 1999). Localization is
constructing grid maps have been proposed: Bayesian probability fundamental to most of the mobile robot applications (Baggio &
method (Homm et al., 2010; Thrun, 2003) and Dempster–Shafer Langendoen, 2006; Fox et al., 1999; Wolf et al., 2005) and plays a vital
theory of evidence (Gordon & Shortliffe, 1984; Ran et al., 2012). The role in many successful automatic robot systems (Thrun et al., 2001).
occupancy grid map based on the binary Bayesian method was the It is notable that BA approaches (graph optimization) and
most commonly used type in 2D Lidar SLAM. probability methods are usually exploited to solve the problem of
In the beginning, the map is discretized into individual cells. trajectory prediction under known map conditions. Although graph
Occupancy grid map is a model with a strict structure and no optimization has the advantage of considering all frames at the same
parameters. It takes up a lot of memory, and the sensors must have time, it may fall into the local minimum, and the optimization time
access to dense data. Each cell is a value between 0 and 1 becomes longer with the increase of nodes. Generally, probabilistic
representing the probability of occupancy. Cells are independent of methods are commonly used in Lidar SLAM localization represented
each other. Most researchers use the gray value to represent the by four types (Fox et al., 1999): Kalman filter (KF)‐based techniques,
probability (Figure 2): black represents the probability of 1 (p(mi)= 1, i topological Markov localization, grid‐based Markov localization, and
is the label of the cell), indicating that the cell is occupied; white MCL. This problem can be expressed by the first half of Equation (3):
represents the probability of 0 (p(mi) = 0), indicating that the cell is p(x1:t|z1:t, u0:t−1), which can be rewritten as p(xt|zt, ut−1, mt−1, xt−1). The
idle; gray (with a value of 127) represents a probability of 0.5 (p problem means to predict the position of the robot at time t given the
(mi) = 0.5), indicating that the cell status is unknown. The probability map, control, and position information of the robot at time t − 1 and
distribution of the entire map is determined by each cell on the map, the observation information at time t. KF‐based techniques usually
expressed as used Gaussian distribution to approximate the position of robot
(Bishop & Welch, 2001; Guivant & Nebot, 2001), but the Gaussian
p (m) = ∏ p (mi ), (2) distribution cannot express all the situations in robot real positioning.
i
Topological Markov localization used a topological map rather than
in which p(m) represents the probability distribution of the entire Gaussian distribution to roughly estimate the position of the robot in
map. According to the properties of conditional probability, (1) can be the map (Kosecká & Li, 2004). The grid‐based method is proposed to
decomposed into (Thrun et al., 2002) achieve higher accuracy, whereas it requires a large amount of
computation and a priori commitment to the grids (Fox
p (x1: t , m|z1: t, u0: t −1) = (x1: t|z1: t, u0: t −1) ⋅ (m|x1: t , z1: t, u0: t −1), (3)
et al., 1998, 1999). Compared with the first three methods, MCL is
p(x1:t|z1:t, u0:t−1) represents the posterior probability of trajectory, a non‐Gaussian method that comprehensively considers the accuracy
given the observation information and control information. p(m| and computational efficiency.
962 | DING ET AL.
F I G U R E 3 (a) The basic principles of occupancy grid mapping, taking a prominent boundary as an example, where black represents occupied
cells with a probability of 1, and white represents free spaces with a probability of 0. Therefore, the probability distribution of the whole map is
equal to the product of the probability distribution of each occupied cell based on the Bayesian framework. Meanwhile, we illustrate the
construction of an outdoor MultiLevel Surface Map based on the Monte Carlo Location algorithm (Pfaff et al., 2007), which can effectively
reflect the vertical information (c,e) of the map compared with the Elevation Map (d,b).
MCL first proposed by Handschin (1970) used a set of weighted by monocular camera and stereovision camera collect the image by the
samples (also called: particles) from a target distribution p(x). Samples are lens, process the image with the photosensitive component circuit and
desired but cannot be drawn directly (Jourdan et al., 2005; Wolf convert it into the digital signal recognized by the computer. Besides,
et al., 2005). MCL can be divided into four steps. First, all particles should the RGB‐D camera radiates the infrared laser and collects the reflected
be initialized. Second, the particles weight is updated according to the light, meanwhile exploiting 3D structure light or ToF technology to
data observed by the sensor. Third, the positions of all particles are obtain the 3D information of surroundings. Visual SLAM solutions can
updated according to the weights with an initial value. Finally, resampling be roughly divided into filter methods (Eade & Drummond, 2006; Pupilli
and moving particles as the robot moves (Baggio & Langendoen, 2008; & Calway, 2005) and nonfilter methods (Newcombe & Davison, 2010;
Thrun et al., 2001). Compared with other localization methods, MCL has Pirker et al., 2011; Strasdat et al., 2012) according to different back‐end
the advantage that it can represent complex multimodal distribution and optimization methods. Visual sensors have no detection range limit and
integrate seamlessly with Gaussian distribution. However, the MCL are able to extract semantic information. However, Visual SLAM
performs poorly when the proposal distribution used to generate indicates the shortcomings in high environmental requirements and
samples places not enough samples (Thrun et al., 2001). large computational complexity.
2.2 | Visual SLAM 2.2.1 | Sensors and systems
Visual SLAM uses the camera to collect the front image information, (1) Camera sensors
and exerts some optimization algorithms and certain constraints to In Visual SLAM, a camera is the front device equipped to solve
realize the final positioning and mapping function. Cameras represented positioning and mapping problems. The camera is a kind of cheap
DING ET AL. | 963
sensors widely used in robotics and can obtain a large amount of an approach to estimating camera pose in an unknown scene, as
redundant texture information from the environment with a strong well as tracking handheld cameras in small AR workspaces.
scene recognition capacity. In general, cameras have been extended Constructively, they proposed to split tracking and mapping into
to handling real‐time dynamic situations with the increasing capability two separate tasks, improving accuracy and robustness. The work
of computing hardware, even though the information acquired by of Milford and Wyeth (2008), a biologically inspired SLAM system
cameras needs high computing power to process (Davison dubbed RatSLAM, realized SLAM with a single camera in large
et al., 2007). Monocular camera sensors have only one light path environments. They used the rodent hippocampus as a computa-
to extract information. The data of monocular camera is a series of tional model and combined it with a lightweight visual system to
2D photos demonstrating the projection of the objects or parts of acquire odometry and appearance information. Strasdat et al.
objects (Lepetit & Fua, 2005). Photos reflect the 3D world in two (2010) provided a novel near real‐time Visual SLAM system that
dimensions, losing depth data in the process. Relative depth improved the adaptability to rotation, translation, and scale drift at
information is calculated through the motion of monocular camera loop closure via a new pose‐graph optimization technique. This
and utilized to generate 3D graphical images, but accurate metric group leveraged the monocular camera to achieve a great similar
depth maps cannot be obtained (Saxena et al., 2005). Scale performance to using stereo vision. A method called stereo visual
uncertainty means the scale difference between the estimated map odometry (SVO) to realize semidirect monocular visual odometry
and the actual scene. Although the monocular camera is weak in was presented by Forster et al. (2014). SVO eliminated the
obtaining depth data, it has the advantages of simple structure and necessity of costly feature extraction and robust matching
low cost. Recently, some effective methods have been proposed to techniques for motion estimation.
obtain depth data with a monocular camera, such as Supervised
Learning (Saxena et al., 2005), Shape from Shading (R. Zhang However, there are many obvious systematic shortcomings of
et al., 1999), and Hierarchy, multiscale Markov Random Field (MRF; monocular SLAM. For instance, the map should be initialized at
Saxena et al., 2007). The stereo vision camera consisting of two system start‐up (Davison et al., 2007) and scale drift occurs in most
monocular cameras has two light path channels. The depth circumstances. Engel et al. (2015) proposed a novel Large‐Scale
information of each pixel is measured by estimating the spatial Direct SLAM algorithm for stereo cameras (Stereo LSD‐SLAM) that
position of the object based on the distance between two cameras, implemented real‐time semidense SLAM on standard CPUs. In
that is, the stereo baseline. Stereo cameras have a long measuring addition, they adopted a robust approach to achieve illumination
distance with a good performance both indoors and outdoors invariance effectively addressing the brightness variation between
(Achtelik et al., 2009; Goldberg et al., 2002). However, the calibration frames. Paz et al. (2008) used a stereo pair moving with six degrees of
process of stereo cameras tends to be troublesome due to the freedom (DOF) as the only sensor to realize SLAM in large indoor and
computational complexity. RGB‐D cameras, also known as depth outdoor environments, accommodating both monocular and stereo.
cameras, are mainstream sensory systems that obtain RGB images The similar work of Mei et al. (2011) dubbed RSLAM achieved a
along with per‐pixel depth information (Henry et al., 2014). RGB‐D reliable loop closure with lower scale drift by using a binocular
cameras capture the depth information of objects through the camera as the sole sensor. Furthermore, there are a large amount of
physical method of 3D structured light (A. S. Huang et al., 2017; other stereo SLAM systems, utilizing the RBPF to robustly close large
Murartal et al., 2017) or ToF (Henry et al., 2012; Holz et al., 2011). loops (Elinas et al., 2006), combining an edge‐point Iterative Closest
This physical measurement saves a lot of time compared with the Points (ICPs) algorithm (Tomono, 2009) to deal with nontextured
stereo vision camera using algorithms to calculate the distance (Vidas environments, as well as separating the time‐constrained pose
et al., 2013). Most RGB‐D cameras are limited by the measurement estimation from map construction and refinement tasks with stereo
range and large noise, but the acquisition of reflected infrared light is cameras (Pire et al., 2015).
not influenced by the illumination of the environment (Z. Fang & RGB‐D cameras with the availability of low cost and lightweight
Scherer, 2014). These three kinds of cameras all have the advantage (Endres et al., 2013) can easily obtain the depth information and
of simple structure and low cost compared with Lidar sensors, complete the SLAM task robustly. Using the Microsoft Kinect camera
whereas they are greatly affected by environmental lighting factors. as the only sensor is becoming a mainstream scheme. Meanwhile, the
(2) Visual SLAM systems data set collected by the RGB‐D camera is released and applied in
The Visual SLAM system is a process of hardware and software later research (Lai et al., 2011; Sturm et al., 2012). One of the most
collaboration that the autonomous vehicle embedded with famous works done by Henry et al. (2014) built an RGB‐D mapping
computer vision algorithms exerts kinds of camera sensors to system that exploited a novel joint optimization algorithm combining
execute specific tasks. Combined with different camera sensors, visual features and shape‐based alignment. Endres et al. (2012)
Visual SLAM has also developed many classic systems. Davison evaluated the accuracy, robustness, and processing time of three
et al. (2007) presented the first successful real‐time SLAM system different feature descriptors (Scale‐Invariant Feature Transform
with a monocular camera called MonoSLAM. They used a [SIFT], Speeded Up Robust Features [SURF], and Oriented FAST
probabilistic estimation method based on EKF to track the sparse and rotated BRIEF [ORB]) with Kinect. A feasible approach to the 3D
feature points of the front end. Klein and Murray (2007) presented SLAM problem was proposed by Engelhard et al. (2011), where
964 | DING ET AL.
features were extracted and matched by SURF. Recent work by et al. (2006) proposed a novel scale‐ and rotation‐invariant
Murartal et al. (2017) presented a complete SLAM system for interest point detector and descriptor called SURF, and SURF
monocular, stereo, and RGB‐D cameras dubbed ORBSLAM2. This was based on Hessian matrix (Mikolajczyk & Schmid, 2001) due
system worked in real time on standard CPUs and was suitable for to the good performance in computation time and accuracy.
various environments. In recent years, numerous complete and Furthermore, a high‐speed corner detection method: Features
universal Visual SLAM systems (Mur‐Artal et al., 2015; Gomez‐Ojeda from Accelerated Segment Test (FAST) also had a strong
et al., 2019; Ma et al., 2016) have been proposed, and most of them computation speed (Rosten & Drummond, 2006), but lost the
are open‐source solutions providing algorithmic support for the orientation information. On this basis, ORB features (Rublee
further application of SLAM technology. et al., 2011) combined FAST and Binary Robust Independent
Elementary Features (BRIEF), and maintained the rotational
invariance of features utilizing Intensity Centroid. At the same
2.2.2 | The classic framework for Visual SLAM time, the test results of ORB showed significant improvements in
matching speed compared with SURF and SIFT, and ORB‐SLAM
Autonomous navigation of mobile robots can be separated into three was considered to be an optimal system for real‐time SLAM.
main problems: location (Where am I?), mapping (What does the The matching of feature points is the first step to realize VO.
world look like?), and path planning (How can I reach a given Once the accurate matching was implemented and combined
location?) (Fuentes‐Pacheco et al., 2015). To solve these problems, with the depth information obtained indirectly (Triangulation;
a classical framework of Visual SLAM was proposed and roughly Hartley & Sturm, 1997) or directly (RGB‐D camera), the system
divided into four modules: Visual Odometry (VO), Back‐end can estimate the motion of the camera only from the perspective
Optimization, Loop Closure, and Mapping (Cadena et al., 2016). of vision. Solution methods are mainly divided into linear, such as
Aimed at building a globally consistent representation of the Epipolar Constraint (Z. Zhang et al., 1995), Efficient Perspective‐
environment, Visual SLAM has evolved into a relatively mature and n‐Point (Lepetit et al., 2009), and nonlinear, such as BA (Triggs
unified system over the past 10 years. Visual SLAM has good et al., 1999).
adaptability to the general unknown environment whether indoor If the camera moves to the place where the feature is missing,
scenes or outdoor scenes (Ben‐Afia et al., 2014). However, it is the method of feature point matching is not conducive to the
necessary to improve and optimize this framework to cope with the realization of VO. In addition, the feature matching only makes
more complex and specific environment. use of the feature information in an image, while discarding most
of the potentially useful image information. Considering these
(1) Visual Odometry shortcomings, a direct method to estimate camera motion based
VO refers to the process of estimating the camera's pose and on pixel gray information was proposed. According to the
motion based on the acquired image information. Without prior number of pixels used, the direct method can be divided into
knowledge of the scene and the motion (Nistér et al., 2004), VO sparse, dense, and semidense. The optical flow method (Horn &
estimates the motion of the camera between adjacent frames by Schunck, 1981) was a good example and particularly, Lucas
incremental computation, but it also means that VO constantly Kanade optical flow (Bouguet, 2001) had a better tracking effect
accumulates interframe errors leading to drift phenomenon. on the motion of corners. Moreover, there are also some classic
There are numerous VO algorithms roughly represented by two SLAM systems mentioned above that use direct methods to
main categories: feature points method and direct method. implement VO, such as SVO (Forster et al., 2014), LSD‐SLAM
Feature points can be understood as distinct patches in an (Engel et al., 2014, 2015), and so on.
image. And it has been found that corners and edges are more Generally, VO is used as the front end of the SLAM system.
special features in an image than blocks of pixels, so in the early VO is more concerned with the relationship between frames,
research, corners (Harris & Stephens, 1988) were often used as whereas a complete SLAM system requires a globally consistent
features in image matching. However, considering the problem of map and camera pose. Obviously, in most cases, VO is unable to
rotation and scale, general corners were difficult to meet the accomplish the final mission: SLAM, except in cases where
needs of estimating complex motion. With the development of the global path is not concerned. Besides, many methods of
some famous image processing methods, interframe matching Structure from Motion (SFM) are also used in VO module, such as
can be robustly implemented though the correct feature points feature point tracking and BA. SFM and SLAM have different
extraction between two consecutive images. D. G. Lowe (2004) priorities though they have some similarities in approach. SLAM
presented a method dubbed SIFT for extracting distinctive is more focused on real time and fast and accurate localization,
invariant features from images, performing accurate matching while SFM pays close attention to reconstructing 3D scenes
ability in different scenes. However, SIFT was difficult to realize beautifully and is offline.
real‐time mapping and localization without the use of graphics (2) Back‐end Optimization
processing unit (GPU) due to the huge amount of computation. Since VO only considers motion estimation between adjacent
To make computing faster without sacrificing performance, Bay frames, the long‐term computation is bound to cause cumulative
DING ET AL. | 965
errors, resulting in very inaccurate mapping. Therefore, some with each previous frame in the process of SLAM. The task of
groups desired to optimize the graph data collected over a long loop closure is to detect whether the robot has returned to the
period of time to avoid serious error accumulation. Back‐end previous position after an excursion of arbitrary length (Newman
optimization as another module in SLAM ensures that the motion & Ho, 2005). Some solutions are given in the back end to
estimate remains optimal for a long period of time. The eliminate inevitable cumulative errors generated in the VO, but
mainstream optimization methods include filter methods repre- the effect is very limited and easy to drift only using interframe
sented by EKF and nonlinear optimization methods dominated by information to complete optimization. The available loop closure
BA (Triggs et al., 1999). Each method is well adapted to the provides more useful constraints for the back‐end optimization
specific environment. part, thus eliminating drift. Take Pose Graph as an example (Latif
KF is essentially a probabilistic problem of state estimation. et al., 2013), the correct closing loop pulls the edges with
The KF estimates a SLAM process by using a form of feedback accumulated errors together in the Pose Graph. Moreover, loop
control: the filter estimates the process state at some time and closure can also realize relocation (Ortin et al., 2004) when the
then obtains feedback in the form of (noisy) measurements tracking algorithm is lost since loop detection provides a
(Welch & Bishop, 1995). The filter methods represented by correlation between current data and historical data.
EKF all assume Markov property, which means the state at The main method of loop closure detection is to determine
moment k is only related to the state at moment k − 1 and has the loop by analyzing the similarity between two images.
nothing to do with other moments. Among various filter models, Unconcerned of the front‐end feature extraction and back‐end
KF is the optimal unbiased estimation of the linear Gaussian estimation, the loop closure is a relatively independent module.
system. However, the equations of motion and the equations of Certainly, loop detection may take advantage of the data of
observation in SLAM are almost always nonlinear, so the EKF as feature points from the front end (Valgren & Lilienthal, 2010).
the nonlinear version of the KF (Julier & Uhlmann, 1997) was Bag‐of‐Words (BoW) model is one of the most popular
developed for robot navigation. Furthermore, multiple modified representation methods for object categorization (Nister &
filter algorithms have been applied in SLAM, such as Unscented Stewenius, 2006; Y. Zhang et al., 2010) and is applied to solve
Kalman Filter (UKF; Wan & Van Der Merwe, 2000), Square Root the loop closure problem. BoW describes the image by some
Unscented Kalman Filter (Holmes et al., 2008), RBPF (Gil specific features (Words) that reflect the similarity of two images.
et al., 2010; Grisetti et al., 2007), PF combined with UKF (G. Li Whether two images in the same location are distinguished
et al., 2007), and so forth. In the early stage, researchers favored through the presence or absence of similar image features.
filter methods to achieve back‐end optimization, while now the Numerous Words (not one feature, but a collection of the same
nonlinear optimization method is obviously superior to the filter features) make up the dictionary, and the process is similar to a
method under the same calculation amount (Strasdat et al., 2012). clustering problem. Some clustering algorithms, such as K‐means
However, EKF is still an optimum method in some simple scenes. (Lloyd, 1982), K‐means++ (Arthur & Vassilvitskii, 2007), and K‐ary
In the VO module, BA is used to solve the camera pose with Tree (Gálvez‐López & Tardos, 2012), help structure the dictio-
nonlinear optimization by constructing the problem of minimizing nary taking into account both scale and search efficiency. A
the reprojection error. As an important nonlinear optimization robust loop closure requires verification of the detection results,
method in 3D visual reconstruction, BA has also been widely used since the wrong detection may lead to disastrous consequences
in the back end of SLAM. Since the 21st century, researchers have for mapping and positioning. A method of computing similarity
gradually realized the sparsity of BA at the back end, greatly transformation used Random Sample Consensus (RANSAC) to
decreasing the calculation amount of BA. Schur Elimination (Sibley find enough reliable correspondences, and provided geometric
et al., 2010) was a common method to solve the incremental verification (Mur‐Artal & Tardós, 2014) that determined whether
equation in BA utilizing the sparsity of the matrix to find the the loop candidate was correct or not.
inverse. In actual operation, BA is generally solved by General It is noted that the process of building a dictionary is actually
Graph Optimization (Kümmerle et al., 2011) or Ceres (Agarwal & a process of training. On the basis of the clustering algorithm
Mierle, 2012) graph optimization method. Thanks to the sparsity of model, it is necessary to train the dictionary to classify and
the matrix, BA can precisely optimize each camera pose and identify all the features in the environment, and then compare
feature point. However, the computation becomes vast in the large them to give a similarity score. In fact, loop closure detection is
scene, resulting in the loss of real‐time performance. To simplify closely related to Machine Learning (Granstrom et al., 2009), and
the complex, some groups suggested to ignore the optimization of meanwhile some Deep Learning methods like Convolutional
many feature points (landmarks) and focus only on the trajectory of Neural Networks (CNN; Gao & Zhang, 2015; Hou et al., 2015)
the keyframe (the edge between poses), that is, the Pose Graph have been employed in the implementation of loop closure.
(Dubbelman & Browning, 2015; Sunderhauf & Protzel, 2012). (4) Mapping
(3) Loop Closure Detection Map is an intuitive environmental representation. Mapping
Loop closure detection is a global optimization method transforms the real environment into something that enables the
further utilizing the information comparing the current frame autonomous robotic to obtain a better sense of its surroundings
966 | DING ET AL.
and execute further tasks. Localization and mapping are two main characteristics of each sensor, the corresponding localization algo-
tasks in SLAM. They are actually complementary since mapping rithms are usually only applicable to specific scenes and have certain
serves localization and localization contributes to establishing the limitations. Specifically, the Lidar is used to generate the point cloud
basic map. A kind of sparse map is generated by accumulating information consisting of range data of the environment (Kusevic
the 3D landmarks extracted in the previous work (Eade & et al., 2010) and the camera can obtain photographic image
Drummond, 2006). Utilizing the sparse map is a good way to information to describe the robot's surroundings. The Lidar provides
describe exactly where the camera is. A Voronoi‐based thinning high‐precision range information with wide fields of view (L. Huang &
topological map (Ko et al., 2004) can be used for path planning if Barth, 2009), while the camera is difficult to obtain high‐resolution
not considering precise positioning, where points represent range data. Furthermore, the camera allows for better recognition but
features of interest such as corners, doors, and staircases, and the Lidar is limited to object identification (Z. Zhang, Maeta,
edges represent connections of points. et al., 2014). The wheel odometer obtains the wheel speed through
the encoder installed on the wheel motor, and works out the current
However, the purpose of mapping is not only limited to solve pose estimation by combining with the dead reckoning algorithm. The
SLAM problems, but also considering the practical application. Maps estimates for wheel odometry relate only to the ambient ground, not
can be used to implement multiple functions, such as navigation, to other changes in the environment, but are susceptible to wheel
obstacle avoidance, reconstruction, interaction, and so on. Obviously, skidding. The IMU can obtain high‐frequency acceleration and
SLAM systems prefer a dense map, even a semantic map, to capture angular velocity, but it contains noise and tends to accumulate errors
more sensory information about the environment. when obtaining pose estimation by integration. In addition, IMU has
The distance information of most pixels is required for dense the problems of temperature drift and zero drift. GNSS can obtain the
reconstruction. Monocular and binocular cameras obtain depth absolute position of the robot but the signal is easy to be blocked,
indirectly through some geometric relations (Pizzoli et al., 2014), especially in the agricultural environment full of crop occlusion.
while RGB‐D cameras can directly measure depth by computing the Therefore, a modified method based on multisensor fusion has
time to receive reflected infrared light. RGB‐D is thought to be the become a hot topic in the field of SLAM since the combination can
mainstream solution due to its accuracy and less computation, make up for the shortcomings when working alone.
although stereo vision works better in large scenes, such as outdoor The Sensor Fusion SLAM system requires a variety of sensors to
environment. A simple approach is to convert RGB‐D data into Point capture various physical attributes of the environment simulta-
Cloud (Rusu & Cousins, 2011) and splice it into a point cloud map neously, whereas the data collected from each sensor stream vary
composed of discrete points (Fioraio & Konolige, 2011). Alternatively, widely in many ways, such as temporal and spatial resolution, data
Visual SLAM can also build an occupancy grid map (Konolige, 1997) format, and geometric alignment (De Silva et al., 2017). The key to
often seen in Lidar SLAM. However, these algorithms are still limited reach a robust sensors fusion lies in automatic data calibration and
to the estimation and optimization of geometric information, such as fusion between each sensor. The data calibration includes intrinsic
points and lines. With the rapid development of pattern recognition, and extrinsic calibration. Most extrinsic calibration approaches
the addition of semantic content to a geometric map (Civera between camera and Lidar are fundamentally based on identifying
et al., 2011; Tateno et al., 2017) has been proposed by some and matching features acquired from the Lidar and camera sensors,
researchers. The semantic map can lead to better human–computer and then constructing the coordinate transformation coefficient
interaction and help robots perform more intelligent behaviors via matrix equation by using the geometric constraint relation with the
recognizing object instances and registering them into the estimated given calibration plate, such as planar checkerboard plate (Q. Zhang &
map, for example, autonomous fruit picking in the orchard. Pless, 2004), right‐angled triangle plate (G. Li et al., 2007), and trihe-
dral plate (Z. Hu et al., 2016), so as to determine the transformation
relation between the camera coordinate system and the Lidar
2.3 | Sensor Fusion SLAM coordinate system (Castorena et al., 2016; Q. Zhang & Pless, 2004).
Those methods require the existence of known targets in the scene
Sensor Fusion SLAM means that the complementary advantages of and lost effect without enough available feature matching. An
multiple sensors are fully utilized to complete the SLAM mission extrinsic calibration method for Lidar‐vision sensors without a given
robustly. At present, the sensors commonly used for mobile robot target was proposed to eliminate this limitation. With an assumption
localization include the camera, Lidar, IMU, wheel odometry, and the that the Lidar has been intrinsically calibrated, Castorena et al. (2016)
GNSS. Among the above sensors, the camera and Lidar are used to presented a new method for joint automatic extrinsic calibration and
sense the external environment of the robot and estimate the relative sensor fusion for a multimodal sensor system comprising a Lidar and
movement and position of the robot in the environment through an optical camera. They exploited the natural alignment of depth data
observation and matching. The wheel odometer and IMU are used to and intensity edges when the intrinsic calibration was correct.
sense the relative motion of the robot itself and realize the Pandey et al. (2015) presented an algorithm for automatic, targetless,
positioning through motion tracking. These sensors can be used and extrinsic calibration of a Lidar and optical camera system based
independently in the localization algorithms, but due to the different on a Mutual Information framework. Errors are inevitable in feature
DING ET AL. | 967
matching and coordinate transformation. Therefore, some nonlinear camera, and an IMU. The proposed method employs a sequential,
optimization methods are utilized in the intrinsic and extrinsic multilayer processing pipeline, solving for motion from coarse to fine,
calibration of sensors, similar to the back‐end optimization for Visual and the method is capable of handling sensor degradation by
SLAM. Iterative minimization of nonlinear cost functions (Stamos automatic reconfiguration bypassing failure modules. Shan et al.
et al., 2008) is one of existing approaches to optimize calibration (2021) proposed a framework for tightly coupled Lidar‐VIO via
parameters and fusion results, however hinging on a precise initial smoothing and mapping, LVI‐SAM, that achieves real‐time state
estimate. Mirzaei et al. (2012) divided the problem into two least‐ estimation and map‐building with high accuracy and robustness.
squares subproblems, and analytically solved each one to determine a LVI‐SAM is built atop a factor graph and is composed of two
precise initial estimate for the unknown parameters. subsystems: a visual‐inertial system and a Lidar‐inertial system. Lin
Additionally, it is a pivotal problem for muti‐Sensor Fusion SLAM et al. (2021) proposed a robust, real‐time tightly coupled multisensor
to integrate the data output of multiple sensors. For Visual‐Inertial fusion framework, which fuses measurements from Lidar, inertial
fusion SLAM, Mourikis and Roumeliotis (2007) first presented an sensor, and visual camera to achieve robust and accurate state
EKF‐based algorithm, multi‐state constraint kalman filter, for real‐ estimation (R2Live). The proposed framework is composed of two
time vision‐aided inertial navigation. The proposed algorithm is parts: the filter‐based odometry and factor graph optimization. To
capable of high‐precision pose estimation in large‐scale real‐world guarantee real‐time performance, the state is estimated within the
environments. Mur‐Artal and Tardós (2017a, 2017b) proposed a framework of an error‐state iterated KF, and further the overall
novel tightly coupled visual‐inertial SLAM system based on the precision is improved with the factor graph optimization. Moreover,
famous ORB‐SLAM that is able to close loops and reuse its map to the wheel odometry is often used in multi‐Sensor Fusion SLAM.
achieve zero‐drift localization in already mapped areas (VI‐ORB‐ Wu et al. (2017) presented a vision‐aided inertial navigation system
SLAM). T. Qin et al. (2018) presented a tightly coupled, nonlinear (VINS) for localizing wheeled robots (VINS on wheels). The proposed
optimization‐based monocular visual‐inertial state estimator (VINS‐ system extended VINS to incorporate low‐frequency wheel‐encoder
Mono). The method achieves highly accurate visual‐inertial odometry data, and show that the scale becomes observable. Detailed
(VIO) by fusing preintegrated IMU measurements and feature information about sensors used in SLAM was summarized by B.
observations, and enables relocalization with minimum computation Huang et al. (2019) in the paper A Survey of Simultaneous Localization
in the loop detection module by combination with the tightly coupled and Mapping.
formulation. Campos et al. (2021) proposed ORB‐SLAM3, the first
system able to perform visual, visual‐inertial, and multimap SLAM
with monocular, stereo, and RGB‐D cameras, using pin‐hole and 3 | AP P LI C ATI ONS I N AGRI C U LTU RE
fisheye lens models. For Lidar‐Inertial fusion SLAM, Ye et al. (2019)
introduced a tightly coupled Lidar‐IMU fusion method. To obtain SLAM has been extended to various aspects of agriculture involved
more reliable estimations of the Lidar poses, a rotation‐constrained with autonomous navigation, 3D mapping, field monitoring, and
refinement algorithm (Lidar‐inertial odometry [LIO] mapping) is intelligent spraying. The rapid development of intelligent agricul-
proposed in the method to further align the Lidar poses with the tural machinery and agricultural robotics motivates the applications
global map. C. Qin et al. (2020) presented LINS, a lightweight Lidar‐ of SLAM in agricultural environments. Currently, most of the
inertial state estimator, for real‐time egomotion estimation. In the automatic agricultural vehicles used for weed detection, agro-
algorithm, an iterated error‐state KF is designed to correct the chemical dispersal, terrain leveling, irrigation, and so forth, are still
estimated state recursively by generating new feature correspon- manned (Cheein & Carelli, 2013). However, with the popularization
dences in each iteration, and to keep the system computationally of PA, more and more intelligent unmanned vehicles have been
tractable. Shan et al. (2020) proposed a framework for tightly coupled utilized in agricultural production, truly realizing the great liberation
LIO via smoothing and mapping, LIO‐SAM, that achieves highly of labor force. Considering that the execution of the above
accurate, real‐time mobile robot trajectory estimation and map‐ agricultural tasks heavily depends on the accuracy of the system,
building. For Visual‐Lidar fusion SLAM, Sun et al. (2010) fused the the robust SLAM algorithms minimize the estimation errors in both
corner feature from laser sensor and vertical line from the monocular the localization and the mapping processes so that SLAM becomes
camera to the same corner feature. G. Zhao et al. (2014) used the full of application prospects in the uneven and vast agricultural
fuzzy logic inference framework to fuse the Velodyne data and image environments.
data, and obtained the parsing result of each frame. Graeter et al.
(2018) introduced LIMO, a Lidar‐monocular visual odometry. In this
study, they proposed a depth extraction algorithm from Lidar 3.1 | Applications of SLAM for mapping
measurements for camera feature tracks and estimating motion by
robustified keyframe‐based BA. For Lidar‐Visual‐Inertial fusion The development of agricultural robotics is inseparable from the
SLAM, J. Zhang and Singh (2018) presented a data processing acquisition and utilization of some spatial information, for example,
pipeline to online estimate egomotion and build a map of the the position of the robot and a rough description of its surroundings
traversed environment, leveraging data from a 3D laser scanner, a (Rovira‐Más et al., 2008). To build a globally consistent 3D terrain
968 | DING ET AL.
map, the robot initially needs to be aware of its position and robot vehicles directly stream continuous frames to their corre-
surroundings and then register other useful environmental informa- sponding edge node.
tion. Localization and mapping are interdependent processes. To some extent, SLAM can effectively deal with the above
SLAM as a popular algorithm in the robotics community, offers the problems. Automatic environment sensing and mapping is a
novel solution to the autonomous and real‐time positioning and fundamental scientific issue in PA, since the presence of terrain
mapping problem in an agricultural environment. maps, especially the 3D map, is essential for many agricultural tasks.
However, it is noted that agricultural autonomous equipment Generally, the three DOF (x, y, θz) is used to describe an indoor
encounters three main challenges while constructing 3D terrain robot's pose, that is, the autonomous movement in a 2D plane. Given
maps. Agricultural environment differs from the urban or industrial that the natural outdoor environment is complex and unstructured,
environment because of its seasonal variation. Some systems Nüchter et al. (2007) proposed six dimensions, namely, the x, y, and z
performing well in mapping and localization in urban environments coordinates and the roll, yaw, and pitch angles (x, y, z, θx, θy, θz), to
have struggled to reconstruct the 3D structure of agricultural cope with the robot motion outdoors. To digitize the agricultural
environments composed of dynamic growing crops. Contrary to environment also needs to solve the 6D SLAM problem. The laser
those methods only for static scenes, some groups focused on 4D scanner is deemed as an available sensor for outdoor SLAM. Some
reconstruction and mapping which means the output was not only a groups used two 2D laser scanner finders to obtain 3D data and built
set of 3D entities (point, mesh, etc.), but added a particular time or a 3D volumetric map (Thrun et al., 2000). Another way to create a
range of times (Dong et al., 2017). Besides, a globally consistent map globally consistent volumetric map is to use 3D laser scanners
is necessary for accurate localization since any mobile robot's self‐ directly. Then, all the scans were merged into one coordinate system
localization suffers from imprecision (Nüchter et al., 2007). Therefore, by ICPs algorithm (Besl & McKay, 1992) that was also an ideal
another problem is the consistent alignment of multiple 3D submaps solution to the alignment of multiple 3D submaps, that is, point cloud
when generating a global 3D map. Creating a terrain panorama of registration. When a sufficient registration initial guess is given, the
agricultural environment needs constant observation due to the large optimal matching parameters (R, t) are obtained by minimizing the
area of fields and orchards and ultimately the integrated map is cost function. In addition, loop closure is a frequently used scheme
pieced together by those collected submaps. Particularly, the data for mapping in Visual SLAM whereas it has limitations and high
association is an important part in 3D mapping. The loop closure computational costs when applied to agricultural environments.
detection method (Se et al., 2002) was often exerted to solve the Many researchers had done a lot of work on the 3D terrain
alignment problem. However, one limiting aspect of the close‐the‐ mapping in autonomous agriculture with SLAM technology. Rovira‐
loop constraint is that the final point has to be the same as the initial Más et al. (2008) used a stereo camera installed on a mobile vehicle
point (Rovira‐Más et al., 2008). Apart from this, Wang et al. (2005) to build 3D terrain maps, and meanwhile combined a localization
proposed to obtain a consistent terrain map by placing stationary sensor and an IMU. The system reached a good result by merging the
landmarks in the environment so that the seasonal changes of the subscans captured from stereoscopic vision, and also guaranteed the
field no longer affect the mapping efficiency and the loop closure is safety of the vehicle from colliding the obstacle by distinguishing
not the necessary option anymore. Special attention should be paid various scenes with different colors. A similar platform was proposed
to whether the target is stable and cause obstacles to the automatic by Dong et al. (2017) to realize 4D field reconstruction for crop
vehicle that determine the feasibility and labor costs of the landmark. monitoring. Shu et al. (2021) introduced a system capable of
In the same fashion, the choice of range sensors or visual sensors, as combining a sparse, indirect, monocular Visual SLAM, with both
well as the autonomous mobile platform, should also be taken into offline and real‐time MultiView Stereo reconstruction algorithms.
account depending on the feature of different scenes. Qadri and The combination of the above system overcomes many obstacles
Kantor (2021) presented an object‐level feature association algorithm encountered by autonomous vehicles or robots employed in
that enables the creation of 3D reconstructions robustly by taking agricultural environments, such as overly repetitive patterns, need
advantage of the structure in robotic navigation in agricultural fields. for very detailed reconstructions, and abrupt movements caused by
The proposed object‐level SLAM system that utilizes recent advances uneven roads. However, the accuracy of 3D mapping using visual
in Deep Learning‐based object detection and segmentation algo- sensors is generally considered to be inferior to range sensors, such
rithms to detect and segment semantic objects in the environment as laser scanner and Lidar, although the former costs less than the
used as landmarks for SLAM. In addition, due to the complex latter. In general, better results can be achieved by using range laser
characteristics of large‐scale unstructured agricultural environments, sensors. A new algorithm that performed well in error rate compared
it takes a lot of memory and time to build the 3D maps. Therefore, with the Hector mapping algorithm was proposed by Lepej and
some groups put 3D mapping tasks in the cloud. W. Zhao et al. (2020) Rakun (2016) for 2D Lidar mapping in field environment. The work of
proposed a ground‐level mapping and navigating system based on Le et al. (2019) presented a novel method improved from Lidar
computer vision technology (Mesh Simultaneous Localization and Odometry and Mapping to adapt to the online mapping in agricultural
Mapping algorithm, Mesh‐SLAM) and Internet of Things, to generate environment. The modified system was equipped with a 3D Lidar and
a 3D farm map on both the edge side and cloud. In the system, high an IMU and proved to be available through simulated and real
efficiency and speed of mapping stage are enabled by making the datasets test, whereas processing in dynamic environment needs
DING ET AL. | 969
further study. Gimenez et al. (2015) realized structured fruit grove SLAM to provide a detailed global map supporting long‐term, flexible,
mapping with only front laser measurements and the knowledge of and large‐scale orchard picking. Compared with the existing studies,
exact positions of tree corners. The whole system was well tested in this study pays more attention to the structural details of the orchard
the real orchard environment based on an efficient data filtering and provides theoretical and technical references for the future
method capable to comply with the observation feature matching. On research on more stable, accurate, and practical mobile fruit‐picking
that basis, Gimenez et al. (2018) studied probabilistic mapping out‐of‐ robots (Figure 4).
structure objects (weeds, works and machines) in the orchard with
laser measurements. They utilized a probabilistic map instead of an
occupancy grid map to achieve online update, high resolution, and 3.2 | Applications of SLAM for navigation
straightforward adaptability to dynamic environments. Moreover, the
fusion of visual sensors and range sensors is becoming the Navigation, also specifically understood as localization accurately,
mainstream approach since some data fusion algorithms mature. In reconstructing the world correctly, obstacle avoidance, and path
this line of research, Shalal et al. (2015) presented a novel method for planning, is a necessary capacity for autonomous robots to find an
2D orchard mapping based on tree trunk detection utilizing a data optimal path from the starting point to the goal position without
fusion algorithm of camera and laser. Similarly, the work of Pierzchała any collision occurring (Shalal et al., 2013; Plessen, 2019; Stachniss,
et al. (2018) exploited multiple sensors including Lidar and stereo 2009). In agricultural area, autonomous navigation is also in high
camera to map the forest environment and the final global map was demand for less human labor and greater safety. The robot capable
optimized by the graph‐SLAM algorithm. In particular, UAVs com- of successfully map its surroundings with no prior knowledge is the
bined with the SLAM algorithm have been gradually applied to build first step towards the application of autonomous navigation
an agricultural map due to the limitation of ground mobile platform. platforms in agriculture. When the terrain map is constructed, the
Potena et al. (2019) creatively proposed an aerial–ground collabora- mobile robot can implement the path planning more easily. SLAM
tive 3D mapping method by alignment of the data, respectively, from technology has been employed to realize precise autonomous
a UAV's and a UGV's camera. Chen et al. (2021) proposed a new form navigation and optimal path planning in agricultural environments
of orchard mapping by integrating eye‐in‐hand stereovision and for more than 10 years.
F I G U R E 4 The figure illustrates 3D field mapping with aerial–ground collaborative. The map was built by means of an affine transformation
that registers the UGV submap (red rectangular area) into the UAV aerial map (blue rectangular area), where the SLAM was mainly used to
control and navigate the unmanned vehicle and drones autonomously (Potena et al., 2019). 3D, three‐dimensional; SLAM, Simultaneous
Localization and Mapping; UAV, unmanned aerial vehicle; UGV, unmanned ground vehicle.
970 | DING ET AL.
Generally, there still exist some limitations to navigate in place, such as greenhouse and warehouse. Sensors commonly used
agricultural environment due to seasonal changes of terrain and to solve the SLAM problem are widely seen in many navigation
vegetation. The large field area, the uneven ground, and the bad systems. Vision‐based mobile robots equipped with a stereo camera
weather condition also pose challenges for agricultural navigation. are considered as an alternative, but the varied lighting conditions
Nevertheless, most crops and trees from the same kind are outdoors are not conducive to visual navigation. Stefas et al. (2016)
systematically planted in straight rows and the intervals between presented two vision‐based methods that, respectively, used
the rows are almost equal, which is an available characteristic for the monocular and binocular cameras on a UAV to navigate in orchards.
frond‐end feature extraction of SLAM, especially for the laser Therefore, the laser range finder gains popularity and gradually
scanner (Hansen et al., 2011). Besides, GNSS has been applied to becomes the predominant sensor for autonomous localization and
civilian navigation for over 20 years, including for the determination navigation. X. Hu et al. (2018) designed an autonomous quadrotor
of agricultural robots' pose. However, the GPS signals are lost when with autonomous guidance and navigation ability. Karto SLAM (2D
the mobile robot operates in some environments where the leaves of Lidar SLAM) algorithm for forestry quadrotor is utilized to accomplish
vegetation are so dense that the signal cannot penetrate. SLAM the mission of situation awareness, in which the SPA method and the
algorithms that are capable of handling the GPS‐denied environments LM algorithm are used to make the map constructed in practical
attract increasing attention to the implementation of the robust application with good accuracy and closed‐loop characteristics. The
navigation system in agriculture. Comprehensive knowledge of Rapid‐exploration Random Tree* algorithm is used to acquire the
limitations and advantages in agricultural navigation is the basis for suboptimal path further. Shi et al. (2020) designed a navigation
further research (Figure 5). positioning system for greenhouse robots based on odometer and
The realization of autonomous navigation needs to meet two Lidar. The greenhouse robot could collect encoder data and obtain
requirements. First, the mobile robot is required to know its position mileage information through track deduction, and combined with
and orientation all the time so as to decide the next movement and Lidar data and the Gmapping algorithm, a 2D environmental map was
reach the destination. Second, the robot is able to have an efficient established. This system employs the A* algorithm to plan the cruise
perception of the surroundings to judge whether there are obstacles path and uses adaptive monte carlo localization to estimate the
on the path and how to avoid them (Harik & Korsaeth, 2018). Pose position and pose of the robot. At the same time, some supplemen-
estimation and environmental perception are two fundamental tary sensors are also utilized to assist with the motion control of
factors for unmanned vehicles to navigate in the field. Some of the robots, such as IMU, odometer, gyroscope, and so forth. In general,
classic SLAM systems are designed for the robot's pose estimation robust navigation systems are almost sensor‐fused. Most mobile
and sensation of the surroundings, which makes SLAM technology platforms equipped with a high‐precision GPS also exploit Lidar
desirable for agricultural navigation. sensors and a monocular camera to execute periodic navigation tasks
Shalal et al. (2013) concluded three key elements of robust in various environments.
navigation systems that are sensors, computational methods, and At the algorithmic level, navigation methods can be roughly
navigation control strategies. This common navigation system is of divided into the image processing algorithms based on line‐detection
great similarity to the classic framework of SLAM, whereas the and the probabilistic algorithms widely used in SLAM. The line‐
former puts particular emphasis on the intelligent control algorithm, detection methods estimate a row of trees by fitting a line through
such as Neural Network and Genetic Algorithm, for the vehicle the observed laser scan points (Blok et al., 2019), represented by
steering, obstacle avoidance, and other motion decisions. Considering Hough transform (Åstrand & Baerveldt, 2005), RANSAC (Zhang
the complexity of agricultural environments, well‐adapted intelligent et al., 2014), Image Segmentation, and so forth. Barawid et al. (2007)
control strategies are urgently needed for the robot to move along utilized Hough transform to recognize the tree row with an
the established path autonomously and be aware of any possible agricultural tractor on which a 2D laser scanner was mounted. Su
collision in the field. Those approaches combined with control et al. (2021) proposed a Deep Neural Network (DNN) that exploits
strategies (Tang et al., 2011; Youquan et al., 2008) have demon- the geometric location of ryegrass for the real‐time segmentation of
strated significant improvements over the agricultural autonomous interrow ryegrass weeds in a wheat field for agricultural robots. The
navigation compared with some dynamic or kinematic methods. proposed method introduces two subnets which treat interrow and
In addition, all kinds of sensors able to acquire the row intrarow pixels differently, and provide corrections to preliminary
environmental information are very basic components of the segmentation results of the conventional encoder–decoder DNN in a
navigation system. On the basis of the data from sensors, various conventional encoder–decoder style DNN to improve segmentation
software algorithms can process and compute the robot's pose accuracy. Doha et al. (2021) developed a practical real‐life crop row
relative to the surrounding environment in real time. Currently, a detection system which takes the output of semantic segmentation
majority of commercial navigation systems employ high‐precision using U‐net, and then applies a clustering‐based probabilistic
GPS as the main sensor for robots' accurate localization due to the temporal calibration which can adapt to different fields and crops
complete performance of GNSS. Nevertheless, GPS sensors have without the need for retraining the network. Aghi et al. (2020)
evident limitations that they lose efficacy when the signals are presented a low‐cost, power‐efficient local motion planner for
blocked by the tree canopy, as well as in some agricultural indoor autonomous navigation in vineyards based only on an RGB‐D
DING ET AL. | 971
F I G U R E 5 The figure illustrates the application of SLAM in obstacle avoidance (Freitas et al., 2012) and path planning (Blok et al., 2019;
Marden & Whitty, 2014) in agricultural environment. SLAM, Simultaneous Localization and Mapping. APM, autonomous prime mover.
camera, low range hardware, and a dual‐layer control algorithm. The unpredictable environments. As such, the development of SLAM
first algorithm makes use of the disparity map and its depth provided a kind of promising methods based on Bayesian theory,
representation to generate a proportional control for the robotic dubbed probabilistic algorithm, to cope with the dynamic situation in
platform, and, a second back‐up algorithm, based on representations agriculture. The KF using Gaussian distribution to estimate the
learning and resilient to illumination variations, can take control of robot's position and the PF using multiple particles (random sampling)
the machine in case of a momentaneous failure of the first block to represent the probability of its pose are predominant in
generating high‐level motion primitives. The two algorithms are strict agricultural navigation, as well as some improved approaches based
synergy after initial training of the Deep Learning model with an on two filter methods. Particularly, Blok et al. (2019) found that a PF
initial data set. Liu et al. (2022) proposed an integrated system that with a laser beam model outperformed a line‐based KF on the
can perform large‐scale autonomous flights and real‐time semantic navigation accuracy and was more robust to handle missing trees
mapping in challenging under‐canopy environments. They detect and than KF through a contrast experiment of two algorithms in orchards.
model tree trunks and ground planes from Lidar data, which are
associated across scans and used to constrain robot poses as well as
tree trunk models. The autonomous navigation module utilizes a 3.3 | Applications of SLAM for PA
multilevel planning and mapping framework and computes dynami-
cally feasible trajectories that lead the UAV to build a semantic map Generally, precision autonomous farming is the operation, guidance,
of the user‐defined region of interest in a computationally and and control of autonomous machines or unmanned vehicles to carry
storage efficient manner. However, an evident shortcoming of the out agricultural tasks (Habibie et al., 2017; O'Grady et al., 2019),
navigation system only using line‐detection methods is that these including field monitoring, crop harvesting, weed detecting, spraying,
algorithms could be negatively influenced by some dynamic and and so on. PA is the inevitable trend of agricultural production
972 | DING ET AL.
development partly due to the increasing labor shortage and the the real agricultural terrain, varying from crop harvest, health
rapid progress of science and technology (Syed et al., 2019). In monitoring, automatic pollination, and so on. Most of the successful
addition to fieldwork, agricultural robots are also expected to carry PA systems tend to be composed of several self‐contained modules.
out autonomous fruit picking in orchards and transport agricultural Thus, some systems are primarily well tested on the simulated
products from fields to warehouse without humans. The larger platforms to ensure all the submodules are compatible with each
agricultural area and longer autonomous operations rely heavily on other for good running (Figure 6).
stringent accuracy of SLAM algorithms and stable hardware equip- In most cases, each submodule on the PA system is implemented
ment well adapting to the complex ground surface. PA requires that in ROS nodes facilitating the replacement and extension of new
the agricultural equipment is able to execute all kinds of agricultural function modules. The SLAM module is employed as the main
tasks automatically or semiautomatically to maximize the replace- technical support for the navigation and localization of the system
ment or reduction of labor. The SLAM technology has demonstrated platform, that is, the autonomous motion of PA vehicle. An eye‐in‐
a great potential in the realization of PA driven by the rapid hand sensing and motion control framework was designed with a
development of intelligent algorithms and high‐precision hardware. camera attached to the end‐effector of a robot for autonomous
It is noted that SLAM algorithms are mainly used to solve the harvest of dense crop (Barth et al., 2016). Five functionalities
problem of localization and mapping for mobile robots. The SLAM demonstrated in the framework were differentiated into image
system‐equipped platforms can possess a good sensation of their acquisition, fruit detection, application control, visual servo control,
positions and surroundings, so that the SLAM robot is recognized as and robot control, where the LSD‐SLAM algorithm was added to
an adaptable component for PA. In the mapping respect, however, PA provide camera pose estimations and 3D scene reconstruction.
systems pay particular attention on construct more precise and Although part of modules employed open‐source libraries with no
detailed maps frequently based on specific crops or vegetation modifications, the preliminary study on the dense sweet‐pepper crop
detection distinguished from those common mapping systems showed effective results for solving sensing and motion control.
capable to generate the globally consistent large‐scale map. Cheein Another example of a mobile robotic system endowed with a robotic
et al. (2011) used the EIF‐SLAM algorithm to precisely map the arm was used for vineyard protection and monitoring (i.e., the plague
environment of olive groves based on stems information, and the control tasks; Roure et al., 2017). In their study, SLAM was utilized to
SVM method was implemented on a monocular visual system and a implement the mobile ground robot navigating autonomously in
range laser finder for olive stems detection. More importantly, vineyards with different ground surfaces. The project innovatively
mapping results acquired from the PA system tend to be fully utilized concluded three kinds of the slope of ground and terrain morphology:
to enable the PA robot to execute other agricultural tasks. PA Flat crops, Hillside, and Mountainside, and Gmapping was compared
emphasizes the automation of some specific agricultural tasks, to be the best approach to cope with irregular terrain and slops.
whereas most of the tasks merely rely on the farmer's work or the Additionally, Ohi et al. (2018) presented a wheeled ground vehicle
right natural condition at the present stage. Fully autonomous carrying a robotic arm named BrambleBee to pollinate clusters of
agricultural production still needs further research and development, flowers autonomously and precisely in the greenhouse, and SLAM
as well as a combination of various novel techniques. SLAM is realized the geometric mapping and the robot's pose estimation.
competent for many agricultural tasks in PA due to the apparent Autonomous SLAM capabilities are widely held to be one of the key
performance in mapping and autonomous navigation, whereas SLAM features of outdoor mobile robots (Moravec & Elfes, 1985). In most
often plays a part in the concrete PA system. PA systems, SLAM algorithms are superior to other mapping and
The research on the SLAM applied in PA has made great navigation methods like GPS due to the complex characteristics of
progress. In the very beginning, some groups attempted to simulate the agricultural environment.
an agricultural environment (Emmi et al., 2013) for the research on Either the SLAM algorithm wrapped in ROS or the modified
PA. Habibie et al. (2017) realized the monitoring and mapping of ripe SLAM algorithm is efficiently exploited to locate and navigate the
fruit using fine‐tuned SLAM‐Gmapping algorithm and the fusion of robot in different scenes, such as greenhouse, sloping fields (dos
visual sensor and 2D laser scan sensor. However, experiment of this Santos et al., 2015), and forestry (Billingsley et al., 2008), and has a
study is conducted on a computer‐simulated environment over the good cooperation with other functional units in the system.
Gazebo simulator and the Robot Operation System (ROS). Besides, Regarding orchard automation, Roa‐Garzón et al. (2019) proposed a
Jensen et al. (2014) presented a common open software platform fruit‐picking and distribution system driven by a navigation module
called FroboMind to simulate field robots executing PA tasks. The consisting of SVO, visual‐inertial sensor fusion and SLAM. The initial
research on the simulated agricultural environment overcomes the experiment of apple picking and distribution was successfully
high cost of the autonomous agricultural robots and the difficulty to conducted indoors. Similar work by Nellithimaru and Kantor (2019)
maintain and control them in the actual situation. A number of results presented a method to count fruits and estimate yield and used
achieved on simulated platforms lay a foundation for further practice SLAM combined with Deep Learning techniques to accurately
and research in a real environment. reconstruct the features of grapes fitting with spheres. T. Lowe
In the meantime, there is also some research focusing on the et al. (2021) proposed a novel, canopy density estimation solution
design of complete PA systems that conduct a lot of experiments in using a 3D ray cloud representation for perennial horticultural crops
DING ET AL. | 973
F I G U R E 6 The illustration shows three representative applications of SLAM in precision agriculture. Pictures (a) and (b) demonstrate an
eye‐in‐hand sensing and servo control framework for harvesting robotics in dense vegetation (Barth et al., 2016), where SLAM is mainly used for
robotic pose estimation and 3D scene reconstruction. Pictures (c–f) demonstrate a grape counting robot using robust object‐level SLAM
(Nellithimaru & Kantor, 2019). And pictures (g–i) show the application of drones in crop height estimation (Anthony et al., 2014) and field
monitoring. 3D, three‐dimensional; SLAM, Simultaneous Localization and Mapping.
at the field scale. In the proposed method, the AgScan3D (a spinning hyperspectral, multispectral, and visible range scanners are used for
Lidar payload) data are processed through a Continuous‐Time SLAM disease detection.
algorithm into a globally registered 3D ray cloud. They compare Compared with ground vehicles, drone images fused with ground
vineyard snapshots over multiple times within a season and across sensor data are so expected to play a crucial role in PA, providing
seasons, and the vineyard rows are automatically extracted from the wide room for scientific research and development (Daponte
ray cloud and a novel density calculation is performed to estimate the et al., 2019). The UAV is often small, exquisite, and fast‐moving,
maximum likelihood canopy densities of the vineyard. Karpyshev which determinate its high efficiency in the execution of PA tasks,
et al. (2021) presented an autonomous system for apple orchard including pesticide spraying (Dai et al., 2017), field monitoring and
inspection and early‐stage disease detection. The proposed system yield estimation (Duggal et al., 2016), and so forth. Generally, UAVs
uses 2D Lidars and real‐time kinematic (RTK) GNSS receivers for are equipped with some simple sensors, like, stereo cameras, laser
localization and obstacle detection, and various sensors including range finders, and IMU due to the small volume and load
974 | DING ET AL.
requirements, and the SLAM algorithms are used to accurately In addition, a globally consistent map is required to realize the
position the drones. Furthermore, some groups also built an large‐scale SLAM, whereas it is a tough problem to merge the raw
aerial–ground robotics system by using a UAV for large‐area submaps from multirobot collaboration for the global map. In outdoor
operation and transferring the raw information to a UGV for long‐ scenes, especially in the complex agricultural environments, Visual
time data processing (Pretto et al., 2020). Similarly, Vasudevan et al. SLAM is evidently affected by the dynamic variation of seasons and
(2016) exploited a UAV attached to the UGV by use of a strong illumination, as well as the occlusion of plants. The dramatic change
electromagnet to inspect the crops periodically. Vu et al. (2021) of illumination will lead to the loss or even failure of feature matching
developed a UAV used in crop data collection for PA. The safety and tracking, and the occlusion of plants easily leads to inaccurate
control problem of the UAV is solved by the VIO algorithm using map creation, which affects the results of robot navigation. The
a stereo camera synchronized with an IMU. Besides that, the accuracy of camera sensors is not high enough compared with the
UAV is equipped with a high‐resolution RBG camera for data precise 3D Lidar sensors and differential GPS, whereas those high
sampling (Table 1). accuracy sensors are too expensive to use in commercial. The SLAM
systems based on 2D Lidar sensors are not suitable for large‐scale
agricultural scenarios due to the limitations of the information
4 | CHALLENGES AND FUTURE TRENDS detected by 2D Lidar. In the same way, the kidnaping problem in
Lidar SLAM requires the initial value of robot pose and frequently
The problem of SLAM is essentially a 3D reconstruction problem, solved by the relocation of Visual SLAM. Therefore, multi‐Sensor
but SLAM pays more attention to simultaneity. With the increasing Fusion SLAM is considered as a robust solution to make up for the
development of 3D geometric reconstruction, no matter Visual lack of various sensors.
SLAM, VIO‐SLAM, Lidar SLAM, or multi‐Sensor Fusion SLAM has Deep Learning algorithms applied in Semantic SLAM are
stepped forward to a certain degree of maturity and possessed their gradually becoming an innovative research direction in the future.
own perfect systems. Many research groups have been devoted to On the one hand, semantic SLAM utilizing the Deep Learning
the algorithms of SLAM and have achieved remarkable results during methods, such as CNN, clustering algorithms, and BoW, is in line with
the past 20 years. Nevertheless, the study of SLAM theory that the current development mainstream in robotic community. On the
cannot be applied in practice is meaningless. In consideration of the other hand, there are some other places besides semantic SLAM
application layer, the SLAM problem has been successfully solved in where Deep Learning can be exploited to solve SLAM problems. The
some specific scenes, such as floor mopping robot, food delivery estimation of depth graph, the feature points detection, and IMU
robot, and UAV navigation in agriculture, whereas the answer to integral can use Deep Learning instead of traditional methods to
whether the SLAM problem and its application in agriculture have improve the algorithmic accuracy and robustness. For example, some
been completely solved might be no. groups proposed to employ classified semantic places from each
There are still some open problems when leveraging the SLAM robot pose as landmarks instead of traditional corner detection. Also,
method in the real world. For instance, how to implement the the application of Deep Learning technology to SLAM in agricultural
relocalization of the SLAM system over a long period of time and how environments can better solve the problems of obstacle detection,
to represent the maps so as to store all the maps within the limited crop recognition, 3D path planning, and so on. In addition to the
computing resources are not well addressed yet (Cadena et al., 2016). combination with new algorithms, the next breakthrough in SLAM
The relocation of the SLAM system is a common problem since most technology might be triggered by the availability of novel sensors,
of the feature extraction methods, such as FAST and ORB, are such as depth, light‐field, and event‐based cameras.
susceptible to the illumination condition, and the robot is likely to
lose tracking when entering a scene lacking sufficient texture
features. At the same time, methods like SIFT take a long‐time 5 | CONCLUSIONS
computing and cannot meet the real‐time requirements without the
GPU acceleration. The visual feature point map is a relatively large‐ After more than three decades of development, SLAM technology
scale condition, and it mainly contains keyframe, 2D feature points as has evolved into a relatively mature system. This paper selected the
well as descriptor information, and 3D landmark information. This most representative recent developments and applications of SLAM
type of map is applied on robots or autonomous driving sides, in agriculture. In the beginning, we summarized the development
occupying major resources. Currently, it is more reasonable to process of SLAM algorithms from three aspects: Lidar SLAM, Visual
process the visual map on the cloud side which is influenced by SLAM, and Sensor Fusion SLAM. From the literature reviewed,
environmental characteristic structure, including illumination, season, various common sensors represented by Lidar and monocular camera
and changes in space objects. Besides, relocalization using visual were widely exerted to solve SLAM problem. Particularly, the fusion
maps requires feature point matching, which consumes a lot of of multiple sensors exhibited good effects in both indoor and outdoor
computing resources and takes a long time. Feature point matching is scenes. At the same time, many typical SLAM systems were proposed
also easy to be affected by environmental factors, such as intense to deal with the positioning and mapping in different environments
illumination and changing crops in agricultural environments. with the driving of sensor technology and computer vision
TABLE 1 Summary of SLAM applied for mapping, navigation, and precision agriculture.
DING
Application Sensors Platforms Agricultural tasks Environment References

ET AL.
Mapping Stereo camera, IMU, and GPS Utility vehicle 3D terrain map Fields Rovira‐Más et al. (2008)
Mapping Camera and laser range finder Mobile robot Occupancy grid map Apple orchard and vineyard Lepej and Rakun (2016)
Mapping Range laser Mobile robot Probabilistic map Fruit grove Gimenez et al. (2015)
Mapping Range laser Mobile robot – Olive grove Gimenez et al. (2018)
Mapping Camera, laser scanner, IMU, Mobile robot Tree trunk detection and mapping Orchard Shalal et al. (2015)
and GPS
Mapping Lidar, stereo camera, IMU, Ground vehicle 3D mapping Forest Pierzchała et al. (2018)
and GPS
Mapping Camera, IMU, and GPS UAV and UGV 2D mapping Field Potena et al. (2019)
Mapping Camera Robot vehicles 3D mapping Farms W. Zhao et al. (2020)
Mapping Stereo camera Mobile field robot 3D mapping Data set (Sorghum field) Qadri and Kantor (2021)
Mapping Camera Autonomous vehicles or robots 3D mapping Rosario data set (farmland) Shu et al. (2021)
Mapping Stereo camera Mobile fruit‐picking robots 3D mapping Orchard Chen et al. (2021)
Navigation Lidar Wheeled mobile robot Waypoint following and obstacle Greenhouse Harik and Korsaeth (2018)
avoidance
Navigation 2D Lidar, camera, IMU, and GPS Wheeled mobile robot Yield Forecasting Vineyard Marden and Whitty (2014)
Navigation 2D Lidar and IMU Wheeled mobile robot – Orchard Blok et al. (2019)
Navigation 2D laser and RGB‐D camera Wheeled mobile robot – Field Galati et al. (2017)
Navigation Camera and IMU UAV – Orchard Stefas et al. (2016)
Navigation 2D Lidar Quadrotor UAVs Forest X. Hu et al. (2018)
Navigation 2D Lidar and wheel encoders Wheeled mobile robot Greenhouse Shi et al. (2020)
Navigation RGB‐D camera Unmanned ground vehicle Vineyard Aghi et al. (2020)
Navigation Camera Agricultural robot Autonomous precision weeding Data set (wheat farm) Su et al. (2021)
Navigation Camera Crop row detection Cropland Doha et al. (2021)
Navigation 3D Lidar, IMU, and stereo UAV Under‐canopy environments Liu et al. (2021)
cameras
Precision agriculture 3D Lidar, cameras, and IMU UGV with a robotic arm and a Autonomous Pollination Greenhouse Ohi et al. (2018)
pollination end‐effector
Precision agriculture Cameras robot with an end‐effector Crop harvesting Artificial dense vegetation Barth et al. (2016)
|
(Continues)
975
976
TABLE 1 (Continued)
|
Application Sensors Platforms Agricultural tasks Environment References
Precision agriculture 2D and 3D Laser, stereo camera, UGV with an end‐effector Plague control Vineyard Roure et al. (2017)
IMU, and GPS
Precision agriculture Range laser and a monocular UGV Mapping based on stem detection Olive grove Cheein et al. (2011)
camera
Precision agriculture Camera Ground vehicle Yield estimation Vineyard Nellithimaru and
Kantor (2019)
Precision agriculture Laser range finder, RGB‐D Mountain robot Crop monitoring Mountain vineyard dos Santos et al. (2015)
camera, IMU, and GPS
Precision agriculture Stereo camera and IMU Mobile robot Object picking and distribution Indoor laboratory Roa‐Garzón et al. (2019)
Precision agriculture Laser scanner, IMU, and GPS UAV Crop height estimation Cornfield Anthony et al. (2014)
Precision agriculture Monocular camera, IMU, Autonomous quadcopter Plantation monitoring and yield Pomegranate plantation Duggal et al. (2016)
and GPS estimation
Precision agriculture Lidar, camera, and IMU UAV Crop monitoring Agricultural field Chebrolu et al. (2018)
Precision agriculture 3D Lidar Kubota farm vehicle Canopy density estimation Agricultural field T. Lowe et al. (2021)
Precision agriculture 2D Lidars, RTK GNSS, and visible Mobile Robot Apple orchard inspection and Orchard Karpyshev et al. (2021)
range scanners disease detection
Precision agriculture Stereo camera, RGB camera, UAV Crop data collection VIODE data set and AirSim Vu et al. (2021)
and IMU simulation environment
Abbreviations: 3D, three‐dimensional; GNSS, Global Navigation Satellite Systems; GPS, Global Positioning System; IMU, Inertial Measurement Unit; RTK, real‐time kinematic; SLAM, Simultaneous Localization
and Mapping; UAV, unmanned aerial vehicle; UGV, unmanned ground vehicle.
DING
ET AL.
DING ET AL. | 977
algorithms. On this basis, some groups studied for improving the Aghi, D., Mazzia, V. & Chiaberge, M. (2020) Local motion planner for
proposed system and leveraged the modified systems to address autonomous navigation in vineyards with a RGB‐D camera‐based
algorithm and deep learning synergy. Machines, 8(2), 27.
the SLAM problem with high efficiency and accuracy. In Lidar SLAM,
Anthony, D., Elbaum, S., Lorenz, A. & Detweiler, C. (2014) On crop height
we also highlighted the mapping method of occupancy grid and the estimation with UAVs. In: IEEE/RSJ international conference on
localization method of Monte Carlo, and then in Visual SLAM, a intelligent robots & systems, Chicago, IL.
classical framework was intuitively introduced from four fundamental Arthur, D. & Vassilvitskii, S. (2007) k‐Means++: the advantages of careful
seeding. In: Proceedings of the eighteenth annual ACM‐SIAM
parts combining multiple 3D reconstruction algorithms and intelligent
symposium on discrete algorithms. Society for Industrial and Applied
optimization algorithms, and the data calibration problem was also Mathematics, pp. 1027–1035.
discussed in Sensor Fusion SLAM. Artieda, J., Sebastian, J.M., Campoy, P., Correa, J.F., Mondragón, I.F.,
Notably, we surveyed the state of the art in SLAM technology Martínez, C. et al. (2009) Visual 3‐D SLAM from UAVs. Journal of
Intelligent and Robotic Systems, 55, 299–321.
utilized in agricultural environment and mainly addressed three
Åstrand, B. & Baerveldt, A.‐J. (2005) A vision based row‐following system
typical applications of SLAM in agriculture: mapping, navigation, and
for agricultural field machinery. Mechatronics, 15, 251–269.
PA. The literature to the date indicated that SLAM algorithms played Baggio, A. & Langendoen, K. (2006) Monte‐Carlo localization for
a crucial role in agricultural mapping and navigation, whereas SLAM mobile wireless sensor networks. In: International conference on
tended to be a small part of many PA systems and was employed to mobile ad‐hoc and sensor networks, Berlin, Heidelberg: Springer,
pp. 317–328.
control robotic platforms autonomously in real time. At the end of
Baggio, A. & Langendoen, K. (2008) Monte Carlo localization for mobile
this paper, we also briefly analyzed the challenges and future trends wireless sensor networks. Ad Hoc Networks, 6(5), 718–733.
of SLAM technology. Certainly, semantic SLAM demonstrates great Barawid, O.C., Mizushima, A., Ishii, K. & Noguchi, N. (2007) Development
potentials to become the next hot research topic as the Deep of an autonomous navigation system using a two‐dimensional laser
scanner in an orchard application. Biosystems Engineering, 96,
Learning algorithms are booming fast. However, semantic SLAM is
139–149.
seldom applied in agriculture now. Therefore, our paper focuses more Barth, R., Hemming, J. & van Henten, E.J. (2016) Design of an eye‐in‐hand
on laser SLAM, Visual SLAM, multi‐Sensor Fusion SLAM, and its sensing and servo control framework for harvesting robotics in
application in agricultural environments, and less on the development dense vegetation. Biosystems Engineering, 146, 71–84.
Básaca‐Preciado, L.C., Sergiyenko, O.Y., Rodríguez‐Quinonez, J.C.,
and application of semantic SLAM. With the rapid development of
García, X., Tyrsa, V.V., Rivas‐Lopez, M. et al. (2014) Optical 3D
semantic SLAM, it is believed that semantic SLAM will be more laser measurement system for navigation of autonomous mobile
applied in agricultural environments. In the future work, we will also robot. Optics and Lasers in Engineering, 54, 159–169.
consider writing a review paper on semantic SLAM and its application Bay, H., Tuytelaars, T. & Van Gool, L. (2006) SURF: speeded up robust
features. In: European conference on computer vision, Berlin,
in agricultural environments. Hopefully, this paper has provided some
Heidelberg: Springer, pp. 404–417.
significant information for an interested reader to further improve Ben‐Afia, A., Deambrogio, L., Salós, D., Escher, A.‐C., Macabiau, C.,
the field of SLAM technology, especially the applications in Soulier, L. et al. (2014) Review and classification of vision‐based
agriculture (Erfani et al., 2019; Jha et al., 2019; Ji & Singh; Mao‐Hai localisation techniques in unknown environments. IET Radar, Sonar &
Navigation, 8, 1059–1072.
et al., 2006; O'Grady et al., 2019; Pillai & Leonard; Plessen, 2019;
Besl, P.J. & McKay, N.D. (1992) Method for registration of 3‐D shapes. In:
Shafaei et al., 2019; Stachniss, 2009; Syed et al., 2019; Tateno
Sensor fusion IV: control paradigms and data structures. Vol. 1611.
et al., 2017). International Society for Optics and Photonics, pp. 586–606.
Billingsley, J., Visala, A. & Dunn, M. (2008) Robotics in agriculture and
A C KN O W L E D G M E N T S forestry.
Bishop, G. & Welch, G. (2001) An introduction to the Kalman filter. In:
This study was supported by the National Natural Science Founda-
Proceedings of the SIGGRAPH, course. Vol. 8, p. 41.
tion of China (Project No. 31471419, 31901415) and Jiangsu Blok, P.M., van Boheemen, K., van Evert, F.K., IJsselmuiden, J. & Kim, G.‐
Agricultural Science and Technology Innovation Fund (JASTIF) (Grant H. (2019) Robot navigation in orchards with localization based on
No. CX (21)3146). particle filter and Kalman filter. Computers and Electronics in
Agriculture, 157, 261–269.
Bosse, M. & Zlot, R. (2009) Continuous 3D scan‐matching with a spinning
D A TA A V A I L A B I L I T Y S T A T E M E N T 2D laser. In: 2009 IEEE international conference on robotics and
The data that support the findings of this study are available from the automation, Kobe, Japan. IEEE, pp. 4312–4319.
corresponding author upon reasonable request. Bouguet, J.‐Y. (2001) Pyramidal implementation of the affine Lucas Kanade
feature tracker description of the algorithm. Vol. 5. Intel Corporation,
p. 4.
REFERENCES Cadena, C., Carlone, L., Carrillo, H., Latif, Y., Scaramuzza, D., Neira, J. et al.
Achtelik, M., Bachrach, A., He, R., Prentice, S. & Roy, N. (2009) Stereo (2016) Past, present, and future of simultaneous localization and
vision and laser odometry for autonomous helicopters in GPS‐ mapping: toward the robust‐perception age. IEEE Transactions on
denied indoor environments. In: Unmanned Systems Technology XI. Robotics, 32, 1309–1332.
Vol. 7332. International Society for Optics and Photonics, p. Campos, C., Elvira, R., Rodríguez, J.J.G., Montiel, J.M. & Tardós, J.D. (2021)
733219. ORB‐SLAM3: an accurate open‐source library for visual, visual‐
Agarwal, S. & Mierle, K. (2012) Ceres solver: tutorial & reference. Google inertial, and multimap SLAM. IEEE Transactions on Robotics, 37(6),
Inc, 2(72), 8. 1874–1890.
978 | DING ET AL.
Carlone, L., Aragues, R., Castellanos, J.A. & Bona, B. (2012) A linear Proceedings of the 27th ACM SIGKDD conference on knowledge
approximation for graph‐based simultaneous localization and discovery & data mining, pp. 2773–2781.
mapping. Robotics: Science and Systems, VII, 41–48. Dong, J., Burnham, J.G., Boots, B., Rains, G. & Dellaert, F. (2017) 4D crop
Castorena, J., Kamilov, U.S. & Boufounos, P.T. (2016) Autocalibration of monitoring: spatio‐temporal reconstruction for agriculture. In: 2017
Lidar and optical cameras via edge alignment. In: IEEE international IEEE international conference on robotics and automation (ICRA),
conference on acoustics, Speech and Signal Processing (ICASSP). Singapore: IEEE, pp. 3878–3885.
IEEE, pp. 2862–2866. Doucet, A., De Freitas, N., Murphy, K. & Russell, S. (2000) Rao‐
Chang, H.J., Lee, C.G., Hu, Y.C. & Lu, Y.‐H. (2007) Multi‐robot SLAM with Blackwellised particle filtering for dynamic Bayesian networks. In:
topological/metric maps. In: 2007 IEEE/RSJ international conference Proceedings of the sixteenth conference on uncertainty in artificial
on intelligent robots and systems, San Diego, USA. IEEE, pp. intelligence.San Francisco, CA Morgan Kaufmann Publishers Inc., pp.
1467–1472. 176–183.
Chebrolu, N., Labe, T. & Stachniss, C. (2018) Robust long‐term registration Dubbelman, G. & Browning, B. (2015) COP‐SLAM: closed‐form online
of UAV images of crop fields for precision agriculture. IEEE Robotics pose‐chain optimization for Visual SLAM. IEEE Transactions on
and Automation Letters, 3(4), 3097–3104. Robotics, 31, 1194–1213.
Cheein, F.A., Steiner, G., Paina, G.P. & Carelli, R. (2011) Optimized EIF‐ Dube, R., Gawel, A., Sommer, H., Nieto, J. & Cadena, C. (2017) An online
SLAM algorithm for precision agriculture mapping based on stems multi‐robot SLAM system for 3D LiDARs. In: 2017 IEEE/RSJ
detection. Computers and Electronics in Agriculture, 78, 195–207. international conference on intelligent robots and systems (IROS),
Cheein, F.A.A. & Carelli, R. (2013) Agricultural robotics: unmanned robotic Vancouver, Canada.
service units in agricultural tasks. IEEE Industrial Electronics Magazine, Duggal, V., Sukhwani, M., Bipin, K., Reddy, G.S. & Krishna, K.M. (2016)
7, 48–58. Plantation monitoring and yield estimation using autonomous
Chen, M., Tang, Y., Zou, X., Huang, Z., Zhou, H. & Chen, S. (2021) 3D quadcopter for precision agriculture. In: 2016 IEEE international
global mapping of large‐scale unstructured orchard integrating eye‐ conference on robotics and automation (ICRA), Stockholm, Sweden:
in‐hand stereo vision and SLAM. Computers and Electronics in IEEE, pp. 5121–5127.
Agriculture, 187, 106237. Durrant‐Whyte, H. & Bailey, T. (2006) Simultaneous localization and
Chong, T., Tang, X., Leng, C., Yogeswaran, M., Ng, O. & Chong, Y. (2015) mapping: part I. IEEE Robotics & Automation Magazine, 13, 99–110.
Sensor technologies and simultaneous localization and mapping Eade, E. & Drummond, T. (2006) Scalable monocular SLAM. In:
(SLAM). Procedia Computer Science, 76, 174–179. Proceedings of the 2006 IEEE computer society conference on
Choset, H. & Nagatani, K. (2001) Topological simultaneous localization computer vision and pattern recognition. Vol. 1, New York, USA:
and mapping (SLAM): toward exact localization without explicit IEEE Computer Society, pp. 469–476.
localization. IEEE Transactions on Robotics and Automation, 17, Elfes, A. (1989) Using occupancy grids for mobile robot perception and
125–137. navigation. Computer, 22, 46–57.
Civera, J., Gálvez‐López, D., Riazuelo, L., Tardós, J.D. & Montiel, J. (2011) Elinas, P., Sim, R. & Little, J.J. (2006) /spl sigma/SLAM: stereo vision SLAM
Towards semantic SLAM using a monocular camera. In: 2011 IEEE/ using the Rao‐Blackwellised particle filter and a novel mixture
RSJ international conference on intelligent robots and systems, San proposal distribution. In: Proceedings of the 2006 IEEE international
Francisco, U.S. IEEE, pp. 1277–1284. conference on robotics and automation, 2006 (ICRA 2006), Orlando,
Clemente, L.A., Davison, A.J., Reid, I.D., Neira, J. & Tardós, J.D. (2007) USA: IEEE, pp. 1564–1570.
Mapping large loops with a single hand‐held camera. In: Robotics: Emmi, L., Paredes‐Madrid, L., Ribeiro, A., Pajares, G. & Gonzalez‐de‐
science and systems. (Vol. 2, No. 2). Santos, P. (2013) Fleets of robots for precision agriculture: a
Colleens, T. & Colleens, J. (2007) Occupancy grid mapping: an empirical simulation environment. Industrial Robot: An International Journal, 40,
evaluation. In: 2007 Mediterranean conference on control & 41–58.
automation, Athens, Greece. IEEE, pp. 1–6. Endres, F., Hess, J., Engelhard, N., Sturm, J., Cremers, D. & Burgard, W.
Dai, B., He, Y., Gu, F., Yang, L., Han, J. & Xu, W. (2017) A vision‐based (2012) An evaluation of the RGB‐D SLAM system. In: 2012 IEEE
autonomous aerial spray system for precision agriculture. In: 2017 international conference on robotics and automation. Vol. 3. St Paul,
IEEE international conference on robotics and biomimetics (ROBIO), USA: IEEE, pp. 1691–1696.
Macau, China: IEEE, pp. 507–513. Endres, F., Hess, J., Sturm, J., Cremers, D. & Burgard, W. (2013) 3‐D
Daponte, P., De Vito, L., Glielmo, L., Iannelli, L., Liuzza, D., Picariello, F. mapping with an RGB‐D camera. IEEE Transactions on Robotics, 30,
et al. (2019) A review on the use of drones for precision agriculture. 177–187.
In: IOP conference series: earth and environmental science. Vol. 275, Engel, J., Schöps, T. & Cremers, D. (2014) LSD‐SLAM: large‐scale direct
No. 1, IOP Publishing, p. 012022. monocular SLAM. In European Conference on Computer Vision,
Davison, A.J., Reid, I.D., Molton, N.D. & Stasse, O. (2007) MonoSLAM: Cham: Springer, pp. 834–849.
real‐time single camera SLAM. IEEE Transactions on Pattern Analysis Engel, J., Stückler, J. & Cremers, D. (2015) Large‐scale direct SLAM with
& Machine Intelligence, 29, 1052–1067. stereo cameras. In: 2015 IEEE/RSJ international conference on
Demim, F., Nemra, A. & Louadj, K. (2016) Robust SVSF‐SLAM for intelligent robots and systems (IROS), Hamburg, Germany. IEEE, pp.
unmanned vehicle in unknown environment. IFAC‐PapersOnLine, 49, 1935–1942.
386–394. Engelhard, N., Endres, F., Hess, J., Sturm, J. & Burgard, W. (2011) Real‐
Dissanayake, G., Huang, S., Wang, Z. & Ranasinghe, R. (2011) A review of time 3D Visual SLAM with a hand‐held RGB‐D camera. In:
recent developments in simultaneous localization and mapping. In: Proceedings of the RGB‐D workshop on 3D perception in robotics at
2011 6th international conference on industrial and information the European robotics forum, Vasteras, Sweden. Vol. 180, pp. 1–15.
systems,Kandy, Sri Lanka. IEEE, pp. 477–482. Erfani, S., Jafari, A. & Hajiahmad, A. (2019) Comparison of two data fusion
Dissanayake, M.G., Newman, P., Clark, S., Durrant‐Whyte, H.F. & methods for localization of wheeled mobile robot in farm conditions.
Csorba, M. (2001) A solution to the simultaneous localization and Artificial Intelligence in Agriculture, 1, 48–55.
map building (SLAM) problem. IEEE Transactions on Robotics and Fang, F., Ma, X. & Dai, X. (2005) A multi‐sensor fusion SLAM approach
Automation, 17(3), 229–241. for mobile robots. In: IEEE international conference mechatronics
Doha, R., Al Hasan, M., Anwar, S. & Rajendran, V. (2021) Deep learning and automation, 2005. Vol. 4. Niagara Falls, Canada: IEEE, pp.
based crop row detection with online domain adaptation. In: 1837–1841.
DING ET AL. | 979
Fang, Z. & Scherer, S. (2014) Experimental study of odometry estimation Grisetti, G., Tipaldi, G.D., Stachniss, C., Burgard, W. & Nardi, D. (2007) Fast
methods using RGB‐D cameras. In: 2014 IEEE/RSJ international and accurate SLAM with Rao‐Blackwellized particle filters. Robotics
conference on intelligent robots and systems, Chicago, USA: IEEE, pp. and Autonomous Systems, 55, 30–38.
680–687. Guivant, J.E. & Nebot, E.M. (2001) Optimization of the simultaneous
Faragher, R., Sarno, C. & Newman, M. (2012) Opportunistic radio SLAM localization and map‐building algorithm for real‐time implementation.
for indoor navigation using smartphone sensors. In: Proceedings of IEEE Transactions on Robotics and Automation, 17, 242–257.
the 2012 IEEE/ION position, location and navigation symposium. IEEE, Gupte, S., Mohandas, P.I.T. & Conrad, J.M. (2012) A survey of quadrotor
pp. 120–128. unmanned aerial vehicles. In: 2012 Proceedings of the IEEE
Fioraio, N. & Konolige, K. (2011) Realtime visual and point cloud SLAM. Southeastcon. IEEE, pp. 1–6.
In: Proceedings of the RGB‐D workshop on advanced reasoning with Habibie, N., Nugraha, A.M., Anshori, A.Z., Ma'sum, M.A. & Jatmiko, W.
depth cameras at robotics: science and systems conference (RSS). (2017) Fruit mapping mobile robot on simulated agricultural
Vol. 27, California, United States. area in Gazebo simulator using simultaneous localization and
Forster, C., Pizzoli, M. & Scaramuzza, D. (2014) SVO: fast semi‐direct mapping (SLAM). In: 2017 international symposium on micro‐
monocular visual odometry. In: 2014 IEEE international conference on nanomechatronics and human science (MHS), IEEE, pp. 1–7.
robotics and automation (ICRA), Hong Kong, China: IEEE, pp. 15–22. Handschin, J. (1970) Monte Carlo techniques for prediction and filtering
Fox, D., Burgard, W., Dellaert, F. & Thrun, S. (1999) Monte Carlo of non‐linear stochastic processes. Automatica, 6, 555–563.
localization: efficient position estimation for mobile robots. AAAI/ Hansen, S., Bayramoglu, E., Andersen, J.C., Ravn, O., Andersen, N. &
IAAI, pp. 343‐349. Poulsen, N.K. (2011) Orchard navigation using derivative free
Fox, D., Burgard, W. & Thrun, S. (1998) Active Markov localization for Kalman filtering. In: Proceedings of the 2011 American control
mobile robots. Robotics and Autonomous Systems, 25, 195–207. conference, San Francisco, CA: IEEE, pp. 4679–4684
Freitas, G., Hamner, B., Bergerman, M. & Singh, S. (2012) A practical Harik, E.H.C. & Korsaeth, A. (2018) Combining hector SLAM and artificial
obstacle detection system for autonomous orchard vehicles. In: potential field for autonomous navigation inside a greenhouse.
2012 IEEE/RSJ international conference on intelligent robots and Robotics, 7, 22.
systems, Vilamoura‐Algarve, Portugal: IEEE, pp. 3391–3398. Harris, C.G. & Stephens, M. (1988) A combined corner and edge detector.
Fuentes‐Pacheco, J., Ruiz‐Ascencio, J. & Rendón‐Mancha, J.M. (2015) In: Alvey vision conference. Vol. 15, No. 50, pp. 10–5244.
Visual simultaneous localization and mapping: a survey. Artificial Hartley, R.I. & Sturm, P. (1997) Triangulation. Computer Vision and Image
Intelligence Review, 43, 55–81. Understanding, 68, 146–157.
Galati, R., Reina, G., Messina, A. & Gentile, A. (2017) Survey and Hening, S., Ippolito, C.A., Krishnakumar, K.S., Stepanyan, V. &
navigation in agricultural environments using robotic technologies. Teodorescu, M. (2017) 3D LiDAR SLAM integration with GPS/INS
In: IEEE international conference on advanced video & signal based for UAVs in urban GPS‐degraded environments. In: AIAA information
surveillance (AVSS), IEEE, pp. 1–6. systems—AIAA infotech @ aerospace, p. 0448.
Gálvez‐López, D. & Tardos, J.D. (2012) Bags of binary words for fast place Henry, P., Krainin, M., Herbst, E., Ren, X. & Fox, D. (2012) RGB‐D
recognition in image sequences. IEEE Transactions on Robotics, 28, mapping: using kinect‐style depth cameras for dense 3D modeling of
1188–1197. indoor environments. The International Journal of Robotics Research,
Gao, X. & Zhang, T. (2015) Loop closure detection for Visual SLAM 31, 647–663.
systems using deep neural networks. In: 2015 34th Chinese control Henry, P., Krainin, M., Herbst, E., Ren, X. & Fox, D. (2014) RGB‐D
conference (CCC), Hangzhou, China: IEEE, pp. 5851–5856. mapping: using depth cameras for dense 3D modeling of indoor
Gil, A., Reinoso, Ó., Ballesta, M. & Juliá, M. (2010) Multi‐robot Visual environments, Berlin, Heidelberg: Springer, pp. 477–491.
SLAM using a Rao‐Blackwellized particle filter. Robotics & Hess, W., Kohler, D., Rapp, H. & Andor, D. (2016) Real‐time loop closure in
Autonomous Systems, 58, 68–80. 2D Lidar SLAM. In: 2016 IEEE international conference on robotics and
Gimenez, J., Herrera, D., Tosetti, S. & Carelli, R. (2015) Optimization automation (ICRA), Stockholm, Sweden: IEEE, pp. 1271–1278.
methodology to fruit grove mapping in precision agriculture. Holmes, S., Klein, G. & Murray, D.W. (2008) A square root unscented
Computers and Electronics in Agriculture, 116, 88–100. Kalman filter for visual monoSLAM. In: 2008 IEEE international
Gimenez, J., Tosetti, S., Salinas, L. & Carelli, R. (2018) Bounded memory conference on robotics and automation, Pasadena, USA: IEEE, pp.
probabilistic mapping of out‐of‐structure objects in fruit crops 3710–3716.
environments. Computers and Electronics in Agriculture, 151, 11–20. Holz, D., Holzer, S., Rusu, R.B. & Behnke, S. (2011) Real‐time plane
Goldberg, S.B., Maimone, M.W. & Matthies, L. (2002) Stereo vision and segmentation using RGB‐D cameras. In: Robot soccer world cup,
rover navigation software for planetary exploration. In: IEEE Berlin, Heidelberg: Springer, pp. 306–317.
aerospace conference, Vol. 5, IEEE, p. 5. Homm, F., Kaempchen, N., Ota, J. & Burschka, D. (2010) Efficient
Gomez‐Ojeda, R., Moreno, F.A., Zuniga‐Noël, D., Scaramuzza, D. & occupancy grid computation on the GPU with Lidar and radar for
Gonzalez‐Jimenez, J. (2019) PL‐SLAM: A stereo SLAM system road boundary detection. In: 2010 IEEE intelligent vehicles
through the combination of points and line segments. IEEE symposium, La Jolla, USA: IEEE, pp. 1006–1013.
Transactions on Robotics, 35(3), 734–746. Horn, B.K. & Schunck, B.G. (1981) Determining optical flow. Artificial
Gordon, J. & Shortliffe, E.H. (1984) The Dempster–Shafer theory of Intelligence, 17, 185–203.
evidence. In: Rule‐based expert systems: the MYCIN experiments of the Hou, Y., Zhang, H. & Zhou, S. (2015) Convolutional neural network‐based
Stanford heuristic programming project. Vol. 3, pp. 832–838. image representation for visual loop closure detection. In: 2015 IEEE
Graeter, J., Wilczynski, A. & Lauer, M. (2018) LIMO: Lidar‐monocular visual international conference on information and automation, Lijiang, China:
odometry. In: 2018 IEEE/RSJ international conference on intelligent IEEE, pp. 2238–2245.
robots and systems (IROS), Madrid, Spain IEEE, pp. 7872–7879. Hu, X., Wang, M., Qian, C., Huang, C., Xia, Y. & Song, M. (2018) Lidar‐
Granstrom, K., Callmer, J., Ramos, F. & Nieto, J. (2009) Learning to detect based SLAM and autonomous navigation for forestry quadrotors. In:
loop closure from range data. In: 2009 IEEE international conference 208 CSAA guidance, navigation and control conference (CGNCC). IEEE,
on robotics and automation, Kobe, Japan: IEEE, pp. 15–22. pp. 1–6.
Griepentrog, H.W., Andersen, N.A., Andersen, J.C., Blanke, M., Hu, Z., Li, Y., Na, L. & Zhao, B. (2016) Extrinsic calibration of 2‐D laser
Heinemann, O., Madsen, T.E. et al. (2009) Safe and reliable: further rangefinder and camera from single shot based on minimal solution.
development of a field robot. Precision Agriculture, 9, 857–866. IEEE Transactions on Instrumentation & Measurement, 65, 1–15.
980 | DING ET AL.
Huang, A.S., Bachrach, A., Henry, P., Krainin, M., Maturana, D., Fox, D. Latif, Y., Cadena, C. & Neira, J. (2013) Robust loop closing over time for
et al. (2017) Visual odometry and mapping for autonomous flight pose graph SLAM. The International Journal of Robotics Research, 32,
using an RGB‐D camera. In: Robotics research. Cham: Springer, pp. 1611–1626.
235–252. Le, T., Gjevestad, J.G.O. & From, P.J. (2019) Online 3D mapping and
Huang, B., Zhao, J. & Liu, J. (2019) A survey of simultaneous localization localization system for agricultural robots. IFAC‐PapersOnLine, 52,
and mapping. arXiv preprint arXiv, 1909, 05214. 167–172.
Huang, L. & Barth, M. (2009) Tightly‐coupled Lidar and computer vision Lepej, P. & Rakun, J. (2016) Simultaneous localisation and mapping in a
integration for vehicle detection. In: 2009 IEEE intelligent vehicles complex field environment. Biosystems Engineering, 150, 160–169.
symposium. IEEE, pp. 604–609. Lepetit, V. & Fua, P. (2005) Monocular model‐based 3D tracking of rigid
Jensen, K., Larsen, M., Nielsen, S.H., Larsen, L.B., Olsen, K.S. & objects. Now Publishers Inc.
Jørgensen, R.N. (2014) Towards an open software platform for Lepetit, V., Moreno‐Noguer, F. & Fua, P. (2009) EPnP: an accurate O(n)
field robots in precision agriculture. Robotics, 3, 207–234. solution to the PnP problem. International Journal of Computer Vision,
Jha, K., Doshi, A., Patel, P. & Shah, M. (2019) A comprehensive review on 81, 155–166.
automation in agriculture using artificial intelligence. Artificial Li, G., Liu, Y., Li, D., Cai, X. & Zhou, D. (2007) An algorithm for extrinsic
Intelligence in Agriculture, 2, 1–12. parameters calibration of a camera and a laser range finder using line
Jourdan, D.B., Deyst, J.J., Win, M.Z. & Roy, N. (2005) Monte Carlo features. In: IEEE/RSJ international conference on intelligent robots
localization in dense multipath environments using UWB ranging. In: & systems, San Diego, USA: IEEE, pp. 3854–3859.
2005 IEEE international conference on ultra‐wideband, IEEE, pp. Li, Y. & Olson, E.B. (2010) Extracting general‐purpose features from Lidar
314–319. data. In: 2010 IEEE international conference on robotics and
Julier, S.J. & Uhlmann, J.K. (1997) New extension of the Kalman filter to automation, IEEE, pp. 1388–1393.
nonlinear systems. In: Signal processing, sensor fusion, and target Libby, J. & Kantor, G. (2010) Accurate GPS‐free positioning of utility
recognition VI. Vol. 3068. International Society for Optics and vehicles for specialty agriculture, 2010 Pittsburgh, Pennsylvania, June
Photonics, pp. 182–193. 20‐June 23, 2010. American Society of Agricultural and Biological
Jung, M.J., Myung, H., Hong, S.G., Park, D.R., Lee, H.K. & Bang, S.W. Engineers. p. 1.
(2004) Structured light 2D range finder for simultaneous localization Lin, J., Zheng, C., Xu, W. & Zhang, F. (2021) R2 LIVE: a robust, real‐time,
and map‐building (SLAM) in home environments. In: International LiDAR‐inertial‐visual tightly‐coupled state estimator and mapping.
symposium on micro‐nanomechatronics & human science, 2004 and IEEE Robotics and Automation Letters, 6(4), 7469–7476.
The Fourth Symposium Micro‐Nanomechatronics for Information‐Based Liu, X., Nardari, G.V., Ojeda, F.C., Tao, Y., Zhou, A., Donnelly, T. et al.
Society, 376 IEEE, pp. 371–376. (2021) Large‐scale autonomous flight with real‐time semantic SLAM
Karpyshev, P., Ilin, V., Kalinov, I., Petrovsky, A. & Tsetserukou, D. (2021) under dense forest canopy. IEEE Robotics and Automation Letters,
Autonomous mobile robot for apple plant disease detection based 7(2), 5512–5519.
on CNN and multi‐spectral vision system. In: 2021 IEEE/SICE Lloyd, S. (1982) Least squares quantization in PCM. IEEE Transactions on
international symposium on system integration (SII),AtIwaki, Japan: Information Theory, 28, 129–137.
IEEE, pp. 157–162. Lowe, D.G. (2004) Distinctive image features from scale‐invariant
Khairuddin, A.R., Talib, M.S. & Haron, H. (2015) Review on simultaneous keypoints. International Journal of Computer Vision, 60, 91–110.
localization and mapping (SLAM). In: 2015 IEEE international Lowe, T., Moghadam, P., Edwards, E. & Williams, J. (2021) Canopy density
conference on control system, computing and engineering (ICCSCE), estimation in perennial horticulture crops using 3D spinning Lidar
Penang, Malaysia: IEEE, pp. 85–90. SLAM. Journal of Field Robotics, 38(4), 598–618.
Klein, G. & Murray, D. (2007) Parallel tracking and mapping for small AR Lu, Z., Hu, Z. & Uchimura, K. (2009) SLAM estimation in dynamic outdoor
workspaces, 2007 6th IEEE and ACM International Symposium on environments: a review, International Conference on Intelligent
Mixed and Augmented Reality. IEEE. pp. 225–234. Robotics and Applications. Berlin, Heidelberg: Springer. pp. 255–267.
Ko, B.Y., Song, J.B. & Lee, S. (2004) September Real‐time building of a Ma, L., Kerl, C., Stuckler, J. & Cremers, D. (2016) CPA‐SLAM: consistent
thinning‐based topological map with metric features, 2004 IEEE/RSJ plane‐model alignment for direct RGB‐D SLAM. In: IEEE
International Conference on Intelligent Robots and Systems (IROS)(IEEE international conference on robotics & automation (ICRA),
Cat. No. 04CH37566), 2. Sendai, Japan: IEEE. pp. 1524–1529. Sweden: IEEE, pp. 1285–1291.
Kohlbrecher, S., Von Stryk, O., Meyer, J. & Klingauf, U. (2011) A flexible Mao‐Hai, L.I., Hong, B.R., Luo, R.H. & Wei, Z.H. (2006) A novel method for
and scalable SLAM system with full 3d motion estimation. In: 2011 mobile robot simultaneous localization and mapping. Journal of
IEEE international symposium on safety, security, and rescue robotics, Zhejiang University—Science A (Applied Physics & Engineering), 7,
Kyoto, Japan: IEEE, pp. 155–160. 937–944.
Konolige, K. (1997) Improved occupancy grids for map building. Marck, J.W., Mohamoud, A., vd Houwen, E. & van Heijster, R. (2013)
Autonomous Robots, 4, 351–367. Indoor radar SLAM A radar application for vision and GPS denied
Konolige, K., Grisetti, G., Kümmerle, R., Burgard, W., Limketkai, B. & environments. In: 2013 European radar conference, Nürnberg
Vincent, R. (2010) Efficient sparse pose adjustment for 2D mapping. Convention Center: IEEE, pp. 471–474.
In: 2010 IEEE/RSJ international conference on intelligent robots and Marden, S. & Whitty, M. (2014) GPS‐free localisation and navigation of an
systems. Taipei, China: IEEE, pp. 22–29. unmanned ground vehicle for yield forecasting in a vineyard. In:
Kosecká, J. & Li, F. (2004) Vision based topological Markov localization, IEEE Recent advances in agricultural robotics, international workshop
International Conference on Robotics and Automation, 2004. collocated with the 13th international conference on intelligent
Proceedings. ICRA'04. 2004, 2. IEEE. pp. 1481–1486. autonomous systems (IAS‐13), Padua, Italy.
Kusevic, K., Mrstik, P. & Glennie, C.L. (2010) Method and system for Mei, C., Sibley, G., Cummins, M., Newman, P. & Reid, I. (2011) RSLAM: a
aligning a line scan camera with a Lidar scanner for real time data fusion system for large‐scale mapping in constant‐time using stereo.
in three dimensions. Google Patents. International Journal of Computer Vision, 94, 198–214.
Lai, K., Bo, L., Ren, X. & Fox, D. (2011) A large‐scale hierarchical multi‐view Meyer‐Delius, D., Beinhofer, M. & Burgard, W. (2012) Occupancy grid
RGB‐D object dataset. In 2011 IEEE international conference on models for robot mapping in changing environments. In: Twenty‐
robotics and automation, IEEE: Shanghai, China. IEEE, pp. 1817–1824. sixth AAAI conference on artificial intelligence, Toronto, Canada.
DING ET AL. | 981
Mikolajczyk, K. & Schmid, C. (2001) Indexing based on scale invariant RSJ international conference on intelligent robots and systems (IROS),
interest points. Madrid, Spain: IEEE, pp. 7711–7718.
Milford, M. & Wyeth, G. (2008) Mapping a suburb with a single camera Ortin, D., Neira, J. & Montiel, J. (2004) Relocation using laser and vision.
using a biologically inspired SLAM system. IEEE Transactions on In: Proceedings of the IEEE international conference on robotics and
Robotics, 24, 1038–1053. automation, 2004 (ICRA'04), Vol. 2. Washington, USA: IEEE, pp.
Mirzaei, F.M., Kottas, D.G. & Roumeliotis, S.I. (2012) 3D Lidar–camera 1505–1510.
intrinsic and extrinsic calibration: identifiability and analytical least‐ Ouellette, R. & Hirasawa, K. (2007) A comparison of SLAM
squares‐based initialization. The International Journal of Robotics implementations for indoor mobile robots. In: 2007 IEEE/RSJ
Research, 31, 452–467. international conference on intelligent robots and systems, San Diego,
Mitsou, N.C. & Tzafestas, C.S. (2007) Temporal occupancy grid for mobile USA: IEEE, pp. 1479–1484.
robot dynamic environment mapping. In: 2007 Mediterranean Pandey, G., McBride, J.R., Savarese, S. & Eustice, R.M. (2015) Automatic
conference on control & automation,Athens, Greece. IEEE, pp. 1–8. extrinsic calibration of vision and Lidar by maximizing mutual
Montemerlo, M., Thrun, S., Koller, D. & Wegbreit, B. (2002) FastSLAM: a information. Journal of Field Robotics, 32, 696–722.
factored solution to the simultaneous localization and mapping problem. Paz, L.M., Piniés, P., Tardós, J.D. & Neira, J. (2008) Large‐scale 6‐DOF
AAAI/IAAI, p. 593598. SLAM with stereo‐in‐hand. IEEE Transactions on Robotics, 24,
Moravec, H. & Elfes, A. (1985) High resolution maps from wide angle 946–957.
sonar. In: Proceedings of the 1985 IEEE international conference on Pfaff, P., Kümmerle, R., Joho, D., Stachniss, C., Triebel, R. & Burgard, W.
robotics and automation, 2. St. Louis, USA: IEEE, pp. 116–121. (2007) Navigation in combined outdoor and indoor environments
Mourikis, A.I. & Roumeliotis, S.I. (2007) A multi‐state constraint Kalman using multi‐level surface maps. In: WS on safe navigation in open and
filter for vision‐aided inertial navigation. In: 2007 IEEE international dynamic environments, IROS, Vol. 7, San Diego, USA.
conference on robotics and automation, IEEE, p. 6. Pierzchała, M., Giguère, P. & Astrup, R. (2018) Mapping forests using an
Murartal, R., Montiel, J.M.M. & Tardos, J.D. (2015) ORB‐SLAM: a versatile unmanned ground vehicle with 3D LiDAR and graph‐SLAM.
and accurate monocular SLAM system. IEEE Transactions on Robotics, Computers and Electronics in Agriculture, 145, 217–225.
31, 1147–1163. Pire, T., Fischer, T., Civera, J., De Cristóforis, P. & Berlles, J.J. (2015) Stereo
Mur‐Artal, R. & Tardós, J.D. (2014) Fast relocalisation and loop closing in parallel tracking and mapping for robot localization. In: 2015 IEEE/
keyframe‐based SLAM. In: 2014 IEEE international conference on RSJ international conference on intelligent robots and systems (IROS).
robotics and automation (ICRA), Hong Kong, China: IEEE, pp. Hamburg, Germany: IEEE, pp. 1373–1378.
846–853. Pirker, K., Rüther, M. & Bischof, H. (2011) CD SLAM—continuous
Mur‐Artal, R. & Tardós, J.D. (2017a) Visual‐inertial monocular SLAM with localization and mapping in a dynamic world. In: 2011 IEEE/RSJ
map reuse. IEEE Robotics and Automation Letters, 2(2), 796–803. international conference on intelligent robots and systems, IROS 2011,
Mur‐Artal, R. & Tardós, J.D. (2017b) ORB‐SLAM2: an open‐source SLAM San Francisco, CA, IEEE, pp. 3990–3997.
system for monocular, stereo, and RGB‐D cameras. IEEE Transactions Pizzoli, M., Forster, C. & Scaramuzza, D. (2014) REMODE: probabilistic,
on Robotics, 33, 1255–1262. monocular dense reconstruction in real time. In: 2014 IEEE
Murphy, K.P. (2000) Bayesian map learning in dynamic environments. In: international conference on robotics and automation (ICRA), Hong
Advances in neural information processing systems, 12, pp. 1015–1021. Kong, China: IEEE, pp. 2609–2616.
Nellithimaru, A.K. & Kantor, G.A. (2019) ROLS: robust object‐level SLAM Plessen, M.G. (2019) Coupling of crop assignment and vehicle routing for
for grape counting. In: Proceedings of the IEEE conference on computer harvest planning in agriculture. Artificial Intelligence in Agriculture, 2,
vision and pattern recognition workshops, Long Beach, USA. 99–109.
Newcombe, R.A. & Davison, A.J. (2010) Live dense reconstruction with a Potena, C., Khanna, R., Nieto, J., Siegwart, R., Nardi, D. & Pretto, A. (2019)
single moving camera. In: 2010 IEEE computer society conference on AgriColMap: aerial–ground collaborative 3D mapping for precision
computer vision and pattern recognition, IEEE, pp. 1498–1505. farming. IEEE Robotics and Automation Letters, 4, 1085–1092.
Newcombe, R.A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D. & Pretto, A., Aravecchia, S., Burgard, W., Chebrolu, N., Dornhege, C.,
Davison, A.J. et al. (2011) KinectFusion: real‐time dense surface Falck, T. et al. (2020) Building an aerial–ground robotics system for
mapping and tracking. In: 2011 10th IEEE international symposium on precision farming: An Adaptable Solution. IEEE Robotics &
mixed and augmented reality, IEEE, pp. 127–136. Automation Magazine, 28(3), 29–49.
Newman, P. & Ho, K. (2005) SLAM‐loop closing with visually salient Pupilli, M.L. & Calway, A.D. (2005) Real‐time camera tracking using a
features. In: Proceedings of the 2005 IEEE international conference on particle filter. In British machine vision conference 2005, Oxford, UK.
robotics and automation, Barcelona, Spain: IEEE, pp. 635–642. Qadri, M. & Kantor, G. (2021) Semantic feature matching for robust
Nistér, D., Naroditsky, O. & Bergen, J. (2004) Visual odometry. In: mapping in agriculture. arXiv preprint arXiv, 2107, 04178.
Proceedings of the 2004 IEEE computer society conference on Qin, C., Ye, H., Pranata, C.E., Han, J., Zhang, S. & Liu, M. (2020) LINS: a
computer vision and pattern recognition, 2004 (CVPR 2004), Vol. 1. Lidar‐inertial state estimator for robust and efficient navigation. In:
Washington, USA: IEEE, p. I‐I. 2020 IEEE international conference on robotics and automation (ICRA),
Nister, D. & Stewenius, H. (2006) Scalable recognition with a vocabulary Singapore: IEEE, pp. 8899–8906.
tree. In: 2006 IEEE computer society conference on computer vision Qin, T., Li, P. & Shen, S. (2018) VINS‐mono: a robust and versatile
and pattern recognition (CVPR'06), Vol. 2. New York, NY, USA: IEEE, monocular visual‐inertial state estimator. IEEE Transactions on
pp. 2161–2168. Robotics, 34(4), 1004–1020.
Nüchter, A., Lingemann, K., Hertzberg, J. & Surmann, H. (2007) 6D SLAM‐ Ran, Y., Li, X., Lu, L. & Li, Z. (2012) Large‐scale land cover mapping with
3D mapping outdoor environments. Journal of Field Robotics, 24, the integration of multi‐source information based on the
699–722. Dempster–Shafer theory. International Journal of Geographical
O'Grady, M., Langton, D. & O'Hare, G. (2019) Edge computing: a tractable Information Science, 26, 169–191.
model for smart agriculture? Artificial Intelligence in Agriculture, 3, Riisgaard, S. & Blas, M.R. (2003) SLAM for dummies. A Tutorial Approach to
42–51. Simultaneous Localization and Mapping, 22, 126.
Ohi, N., Lassak, K., Watson, R., Strader, J., Du, Y. & Yang, C. et al. (2018) Roa‐Garzón, M.A., Gambaro, E.F., Florek‐Jasinska, M., Endres, F.,
Design of an autonomous precision pollination robot. In: 2018 IEEE/ Ruess, F., Schaller, R. et al. (2019) Vision‐based solutions for
982 | DING ET AL.
robotic manipulation and navigation applied to object picking and Sibley, G., Matthies, L. & Sukhatme, G. (2010) Sliding window filter with
distribution. KI‐Künstliche Intelligenz, 33, 171–180. application to planetary landing. Journal of Field Robotics, 27,
Rosten, E. & Drummond, T. (2006) Machine learning for high‐speed corner 587–608.
detection. In: European conference on computer vision, New York, NY: De Silva, V., Roche, J. & Kondoz, A. (2017) Fusion of LiDAR and camera
Springer, pp. 167–193. sensor data for environment sensing in driverless vehicles.
Roure, F., Moreno, G., Soler, M., Faconti, D., Serrano, D. & Astolfi, P. et al. Smith, R., Self, M. & Cheeseman, P. (1990) Estimating uncertain spatial
(2017) GRAPE: ground robot for vineyard monitoring and relationships in robotics. In: Autonomous robot vehicles, New York,
protection. In: Iberian robotics conference. Springer, pp. 249–260. NY: Springer, pp. 167–193.
Rovira‐Más, F., Zhang, Q. & Reid, J.F. (2008) Stereo vision three‐ Smith, R.C. & Cheeseman, P. (1986) On the representation and estimation
dimensional terrain maps for precision agriculture. Computers and of spatial uncertainty. The International Journal of Robotics Research,
Electronics in Agriculture, 60, 133–143. 5(4), 56–68.
Rublee, E., Rabaud, V., Konolige, K. & Bradski, G.R. (2011) ORB: an efficient Stachniss, C. (2009) Robotic mapping and exploration. Vol. 55. Berlin,
alternative to SIFT or SURF. In: ICCV. Vol. 11. Citeseer, p. 2‐2571. Heidelberg: Springer.
Rusu, R.B. & Cousins, S. (2011) 3D is here: point cloud library (PCL). In: Stamos, I., Liu, L., Chen, C., Wolberg, G., Yu, G. & Zokai, S. (2008)
2011 IEEE international conference on robotics and automation. Integrating automated range registration with multiview geometry
Shanghai, China: IEEE, pp. 1–4. for the photorealistic modeling of large‐scale scenes. International
Ryde, J. & Hu, H. (2007) Mobile robot 3D perception and mapping with Journal of Computer Vision, 78, 237–260.
multi‐resolution occupancy lists. In: Proceedings of the IEEE Stefas, N., Bayram, H. & Isler, V. (2016) Vision‐based UAV navigation in
international conference on mechatronics and automation (ICMA orchards. IFAC‐PapersOnLine, 49, 10–15.
2007), Harbin, Heilongjiang, China. Citeseer. Strasdat, H., Montiel, J. & Davison, A.J. (2010) Scale drift‐aware large
Santos, J.M., Portugal, D. & Rocha, R.P. (2013) An evaluation of 2D SLAM scale monocular SLAM. Robotics: Science and Systems VI, 2, 7.
techniques available in robot operating system. In: 2013 IEEE Strasdat, H., Montiel, J.M. & Davison, A.J. (2012) Visual SLAM: why filter?
international symposium on safety, security, and rescue robotics Image and Vision Computing, 30, 65–77.
(SSRR), IEEE, pp. 1–6. Sturm, J., Engelhard, N., Endres, F., Burgard, W. & Cremers, D. (2012) A
dos Santos, F.B.N., Sobreira, H.M.P., Campos, D.F.B., dos Santos, R.M.P.M., benchmark for the evaluation of RGB‐D SLAM systems. In: 2012
Moreira, A.P.G.M. & Contente, O.M.S. (2015) Towards a reliable IEEE/RSJ international conference on intelligent robots and systems,
monitoring robot for mountain vineyards. In: 2015 IEEE international Algarve, Portugal.IEEE, pp. 573–580.
conference on autonomous robot systems and competitions, Vila Real, Su, D., Qiao, Y., Kong, H. & Sukkarieh, S. (2021) Real time detection of
Portugal: IEEE, pp. 37–43. inter‐row ryegrass in wheat farms using deep learning. Biosystems
Saxena, A., Chung, S.H. & Ng, A.Y. (2005) Learning depth from single Engineering, 204, 198–211.
monocular images. In: Advances in neural information processing Sun, F., Zhou, Y., Li, C. & Huang, Y. (2010) Research on active SLAM with
systems, Vol. 18, pp. 1161–1168. fusion of monocular vision and laser range data. In: 2010 8th World
Saxena, A., Schulte, J. & Ng, A.Y. (2007) Depth estimation using monocular congress on intelligent control and automation,Jinan, China: IEEE, pp.
and stereo cues. In: IJCAI. Vol. 7, pp. 2197–2203. 6550–6554.
Schwarz, B. (2010) Mapping the world in 3D. Nature Photonics, 4, Sunderhauf, N. & Protzel, P. (2012) Towards a robust back‐end for pose
429–430. graph SLAM. In: IEEE international conference on robotics &
Se, S., Lowe, D. & Little, J. (2002) Vision‐based mapping with backward automation,St Paul, USA: IEEEIEEE. pp. 1254–1261. IEEE.
correction, IEEE/RSJ International Conference on Intelligent Robots and Syed, T.N., Jizhan, L., Xin, Z., Shengyi, Z., Yan, Y., Mohamed, S.H.A. et al.
Systems, EPFL, 1. Switzerland: IEEE. pp. 153–158. (2019) Seedling‐lump integrated non‐destructive monitoring for
Shafaei, S., Loghavi, M. & Kamgar, S. (2019) Reliable execution of a robust automatic transplanting with Intel RealSense depth camera. Artificial
soft computing workplace found on multiple neuro‐fuzzy inference Intelligence in Agriculture, 3, 18–32.
systems coupled with multiple nonlinear equations for exhaustive Tang, J., Chen, Y., Niu, X., Wang, L., Chen, L., Liu, J. et al. (2015) LiDAR
perception of tractor‐implement performance in plowing process. scan matching aided inertial navigation system in GNSS‐denied
Artificial Intelligence in Agriculture, 2, 38–84. environments. Sensors, 15, 16710–16728.
Shalal, N., Low, T., McCarthy, C. & Hancock, N. (2013) A review of Tang, J., Jing, X., He, D. & David, F. (2011) Visual navigation control for
autonomous navigation systems in agricultural environments. agricultural robot using serial BP neural network. Transactions of the
Shalal, N., Low, T., McCarthy, C. & Hancock, N. (2015) Orchard mapping Chinese Society of Agricultural Engineering, 27, 194–198.
and mobile robot localisation using on‐board camera and laser Tateno, K., Tombari, F., Laina, I. & Navab, N. (2017) CNN‐SLAM: real‐time
scanner data fusion–Part B: Mapping and localisation. Computers and dense monocular SLAM with learned depth prediction. In:
Electronics in Agriculture, 119, 267–278. Proceedings of the IEEE conference on computer vision and pattern
Shan, T., Englot, B., Meyers, D., Wang, W., Ratti, C. & Rus, D. (2020) LIO‐ recognition, Hawaii, USA, pp. 6243–6252.
SAM: tightly‐coupled Lidar inertial odometry via smoothing and Thrun, S. (2002) Probabilistic robotics. Communications of the ACM, 45(3),
mapping. In: 2020 IEEE/RSJ international conference on intelligent 52–57.
robots and systems (IROS), Madrid, Spain: IEEE, pp. 5135–5142. Thrun, S. (2003) Learning occupancy grid maps with forward sensor
Shan, T., Englot, B., Ratti, C. & Rus, D. (2021) LVI‐SAM: tightly‐coupled models. Autonomous Robots, 15, 111–127.
Lidar–visual‐inertial odometry via smoothing and mapping. In: 2021 Thrun, S., Burgard, W. & Fox, D. (2000) A real‐time algorithm for mobile
IEEE international conference on robotics and automation (ICRA), Xi'an, robot mapping with applications to multi‐robot and 3D mapping. In:
China. IEEE, pp. 5692–5698. Proceedings of the 2000 ICRA. Millennium conference. IEEE international
Shi, Y., Wang, H., Yang, T., Liu, L. & Cui, Y. (2020) Integrated navigation by conference on robotics and automation. Symposia proceedings (Cat. No.
a greenhouse robot based on an odometer/Lidar. Instrumentation, 00CH37065). Vol. 1. San Francisco, CA, USA: IEEE, pp. 321–328.
Mesures, Métrologies, 19(2), 91–101. Thrun, S., Fox, D., Burgard, W. & Dellaert, F. (2001) Robust Monte Carlo
Shu, F., Lesur, P., Xie, Y., Pagani, A. & Stricker, D. (2021) SLAM in the field: localization for mobile robots. Artificial Intelligence, 128, 99–141.
an evaluation of monocular mapping and localization on challenging Tomono, M. (2009) Robust 3D SLAM with a stereo camera based on an
dynamic agricultural environment. In: Proceedings of the IEEE/CVF edge‐point ICP algorithm. In: 2009 IEEE international conference on
winter conference on applications of computer vision, pp. 1761–1771. robotics and automation. IEEE, pp. 4306–4311.
DING ET AL. | 983
Triggs, B., Mclauchlan, P.F., Hartley, R.I. & Fitzgibbon, A.W. (1999) Bundle Zhang, F., Clarke, D. & Knoll, A. (2014) Vehicle detection based on LiDAR
adjustment—a modern synthesis. In: International Workshop on and camera fusion. In: 17th International IEEE conference on
Vision Algorithms, Berlin, Heidelberg: Springer, pp. 298–372. intelligent transportation systems (ITSC), Special Session on
Valgren, C. & Lilienthal, A.J. (2010) SIFT, SURF & seasons: appearance‐ Advanced Vehicle Active Safety Systems. IEEE, pp. 1620–1625.
based long‐term localization in outdoor environments. Robotics & Zhang, J., Maeta, S., Bergerman, M. & Singh, S. (2014) Mapping orchards
Autonomous Systems, 58, 149–156. for autonomous navigation. 2014 Montreal, Quebec Canada July
Vasudevan, A., Kumar, D.A. & Bhuvaneswari, N. (2016) Precision farming 13–July 16, 2014, 1. American Society of Agricultural and Biological
using unmanned aerial and ground vehicles. In: 2016 IEEE Engineers.
technological innovations in ICT for agriculture and rural development Zhang, J. & Singh, S. (2015) May Visual‐lidar odometry and mapping: Low‐
(TIAR). IEEE, pp. 146–150. drift, robust, and fast, 2015 IEEE International Conference on Robotics
Vidas, S., Moghadam, P. & Bosse, M. (2013) 3D thermal mapping of and Automation (ICRA). Seattle, WA, USA: IEEE. pp. 2174–2181.
building interiors using an RGB‐D and thermal camera. In: 2013 IEEE Zhang, J. & Singh, S. (2018) Laser‐visual‐inertial odometry and mapping
international conference on robotics and automation, Karlsruhe, with high robustness and low drift. Journal of Field Robotics, 35(8),
Germanyon: IEEE, pp. 2311–2318. 1242–1264.
Vincent, R., Limketkai, B. & Eriksen, M. (2010) Comparison of indoor robot Zhang, Q. & Pless, R. (2004) Extrinsic calibration of a camera and laser
localization techniques in the absence of GPS. In: Detection and range finder (improves camera calibration). In: 2004 IEEE/RSJ
sensing of mines, explosive objects, and obscured targets XV. Vol. 7664. international conference on intelligent robots and systems (IROS)
International Society for Optics and Photonics, p. 76641Z. (IEEE Cat. No. 04CH37566), Vol. 3. Sendai, Japan: IEEESendai,
Vu, M.T., Nguyen, T.S., Quach, C.H. & Pham, M.T. (2021) Design and Japan: IEEE, pp. 2301–2306.
implement UAV for low‐altitude data collection in precision Zhang, R., Tsai, P.‐S., Cryer, J.E. & Shah, M. (1999) Shape‐from‐shading: a
agriculture. survey. IEEE Transactions on Pattern Analysis and Machine Intelligence,
Wan, E.A. & Van Der Merwe, R. (2000) The unscented Kalman filter for 21, 690–706.
nonlinear estimation. In: Proceedings of the IEEE 2000 adaptive Zhang, Y., Jin, R. & Zhou, Z.‐H. (2010) Understanding bag‐of‐words model:
systems for signal processing, communications, and control symposium a statistical framework. International Journal of Machine Learning and
(Cat. No. 00EX373). IEEE, pp. 153–158. Cybernetics, 1, 43–52.
Wang, L.K., Hsieh, S.‐C., Hsueh, E.‐W., Hsaio, F.‐B. & Huang, K.‐Y. (2005) Zhang, Z., Deriche, R., Faugeras, O. & Luong, Q.‐T. (1995) A robust
Complete pose determination for low altitude unmanned aerial technique for matching two uncalibrated images through the recovery
vehicle using stereo vision. In: 2005 IEEE/RSJ international conference of the unknown epipolar geometry. Artificial Intelligence, 78, 87–119.
on intelligent robots and systems, Edmonton, Canada: IEEE, pp. Zhao, G., Xiao, X., Yuan, J. & Ng, G.W. (2014) Fusion of 3D‐Lidar and
108–113. camera data for scene parsing. Journal of Visual Communication &
Welch, G. & Bishop, G. (1995) An introduction to the Kalman filter. Image Representation, 25, 165–183.
Wolf, J., Burgard, W. & Burkhardt, H. (2005) Robust vision‐based Zhao, W., Wang, X., Qi, B. & Runge, T. (2020) Ground‐level mapping and
localization by combining an image‐retrieval system with Monte navigating for agriculture based on IoT and computer vision. IEEE
Carlo localization. IEEE Transactions on Robotics, 21, 208–216. Access, 8, 221975–221985.
Wu, K.J., Guo, C.X., Georgiou, G. & Roumeliotis, S.I. (2017) VINS on
wheels. In: 2017 IEEE international conference on robotics and
automation (ICRA), Singapore: IEEE, pp. 5155–5162.
Wang, Y., Zhou, J., Ji, C. & An, Q. (2008) Design of agricultural wheeled How to cite this article: Ding, H., Zhang, B., Zhou, J., Yan, Y.,
mobile robot based on autonomous navigation and omnidirectional Tian, G. & Gu, B. (2022) Recent developments and
steering. Transactions of the Chinese Society of Agricultural applications of simultaneous localization and mapping in
Engineering, 24(7), 110–113.
agriculture. Journal of Field Robotics, 39, 956–983.
Ye, H., Chen, Y. & Liu, M. (2019) Tightly coupled 3D Lidar inertial
https://doi.org/10.1002/rob.22077
odometry and mapping. In: 2019 International conference on robotics
and automation (ICRA), IEEE, pp. 3144–3150.

Recent Developments and Applications of Simultaneous Localization and Mapping in Agriculture

Uploaded by

Copyright:

Available Formats

You might also like

Recent Developments and Applications of Simultaneous Localization and Mapping in Agriculture

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Recent Developments and Applications of Simultaneous Localization and Mapping in Agriculture

Uploaded by

Copyright:

Available Formats

15564967, 2022, 6, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/rob.22077 by Kayseri University, Wiley Online Library on [07/11/2022].

Recent developments and applications of simultaneous

Haizhou Ding1 | Baohua Zhang2 | Jun Zhou3 | Yaxuan Yan1 |

1 | INTRODUCTION problem in autonomous navigation and localization of vehicles and

956 | © 2022 Wiley Periodicals LLC. wileyonlinelibrary.com/journal/rob J Field Robotics. 2022;39:956–983.

2.2 | Visual SLAM 2.2.1 | Sensors and systems

Application Sensors Platforms Agricultural tasks Environment References

Mapping Camera Robot vehicles 3D mapping Farms W. Zhao et al. (2020)

Navigation Camera and IMU UAV – Orchard Stefas et al. (2016)

Navigation 2D Lidar Quadrotor UAVs Forest X. Hu et al. (2018)

Navigation Camera Crop row detection Cropland Doha et al. (2021)

Application Sensors Platforms Agricultural tasks Environment References

You might also like