Professional Documents
Culture Documents
A SLAM Overview From A Users Perspective
A SLAM Overview From A Users Perspective
A SLAM Overview From A Users Perspective
net/publication/220633576
CITATIONS READS
19 4,270
3 authors, including:
Udo Frese
Universität Bremen
107 PUBLICATIONS 2,119 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
DoF-Adaptiv - Adaptive combination of support robot's degree-of-freedom for an easier control View project
All content following this page was uploaded by Udo Frese on 09 May 2014.
A prominent example of this approach is the robotic the application uses only the pose information and the
museum tour guide Minerva [35], where a grid-map map is just for the SLAM algorithm itself. Sometimes,
(Sec. 4.1) is acquired with offline-SLAM during the a part of the map is given a-priori so the localization
setup in the museum, and then used for localization operates in an application specific coordinate system
and navigation. and error accumulation is limited. Unlike with incre-
Similarly, the shopping companion robot TOOMAS mental sensors, e.g., odometry or inertial sensing, the
[14] acquires a grid-map with a laser rangefinder. In error does not grow as long as the robot operates in
practice, it was essential to align that map with a floor a known part of the environment. It only grows when
plan to link shop items to their respective shelves [9]. extending the map.
Therefor, the robot was driven to marker points with In particular, when the robot returns to a known
known coordinates in the floor plan. Graph-based SLAM place, the system knows this fact with the precision
(Sec. 4.2) can integrate both types of information. How- of the environment sensor. This allows the robot for in-
ever, in practice, the pose estimate was already precise stance to travel back or repeat a path taken before with
enough with odometry and marker points alone. This great precision. There is one caveat however: all esti-
is a very minimalistic SLAM algorithm, where at ev- mates can change, e.g., because of loop-closure. Then
ery marker point the accumulated error is linearly dis- a different pose estimate is reported at a place visited
tributed on the trajectory since the last marker point. before. So trajectories cannot simply be stored as lists
The lessons learned were: floor plans are not only help- of coordinates. This problem is negligible if the error is
ful but often necessary for a useful map and a simple small or it can be solved by a more elaborate handling
algorithm can do the job with some additional informa- of stored coordinates (Sec. 3.4).
tion. An example of this approach is the MALTA project,
Another example is [21], where 3-D SLAM creates a where a forklift truck navigates autonomously in a pa-
map of a multi-story parking garage that is then later per reel storage depot [1] (Sec. 4.3). The positions of
used for autonomously driving a car through the garage. some pillars are given a-priori to anchor the map and
Overall, mapping prior to operation has advantages limit error accumulation. SLAM estimates the vehicle
although or because it is scientifically less ambitious: pose and the position of paper reels. Since these are
moved continuously into and out of the depot, this map
– All other modules of the system can use coordinates
with respect to the map as usual without taking care changes during operation and cannot be acquired a-
priori. The SLAM pose estimate is directly fed into the
of map changes during operation.
motion controller driving the forklift truck, which is
– Slow but reliable offline SLAM algorithms can be
only possible since the error is limited by giving pillar
used. Indeed, many algorithms are efficient approx-
positions in advance.
imations to non-linear least-squares. If time is avail-
able, a textbook least-squares algorithm such as the An alternative is “sensor odometry”, e.g., visual or
Levenberg-Marquardt-algorithm[28, §15.7] is simpler laser odometry. This means estimating the motion from
and better. sensor data but without building a map. It is often
– The human operator can check the map visually af- much more precise than an incremental motion sen-
ter it is computed. sor such as odometry or inertial sensing. Unlike in real
– The operator can help building the map interac- SLAM, the error accumulates even when moving through
tively by providing a few additional links for clos- the same environment. Sensor odometry can be imple-
ing loops and for aligning with a floor plan. These mented by a SLAM algorithm that forgets parts of the
can be precise marker points or rough links where map having moved by, avoiding efficiency problems.
the metric information is obtained from sensor data. Konolige et al. [20] realize visual odometry, i.e., es-
Closing loops manually will hardly take a minute, timating motion from the images of a stereo-camera. Vi-
but simplify the problem greatly for the SLAM algo- sually distinct points, so-called center-surround-extrema
rithm. This procedure works well with graph-based are detected in the images and fed into a visual SLAM
SLAM (Sec. 4.2). algorithm (Sec. 4.4). The motion during a sequence of
images can be obtained by estimating the end-pose rel-
ative to the start-pose, thus providing visual odometry.
3.2 Online-SLAM for Localization Visual odometry has also been used on the Mars
Exploration Rovers Spirit and Opportunity [7] with a
A more complex mode of operation is using SLAM as simplified approach matching successive images. Math-
a black-box localization system. SLAM performs both ematically this is SLAM with only one unknown pose
localization and mapping online during operation, but and hence it did run on the rover’s 20 MHz CPU.
4
Going beyond odometry, these local relative poses not having to manually steer the robot. Indeed, this is
can be used as links in graph-based SLAM (Sec. 4.2) if one example that for an application a clever combina-
not only successive poses but all spatially close poses are tion of human and robot capabilities is a better design
matched. The resulting two-stage SLAM algorithm can paradigm than the ultimate challenge of full autonomy.
handle very large maps, because it avoids estimating In contrast to that, abandoned mine mapping is an
millions of distinct points in a single computation. application, where autonomy is essential. The under-
A different application is augmented reality [18], ground environment is too dangerous for people and
where 3-D graphics is overlaid on a real camera im- also prevents wire-less teleoperation. One of the most
age. Therefor, the camera’s pose is needed that can be impressive systems that uses SLAM in this way is the
provided by visual SLAM. Again, the map is only used Groundhog robot by Thrun et al. [36]. In an autonomous
by SLAM and (mostly) not by the application itself. If a experiment, it went 308m into an abandoned mine, where
separate sensor was used for tracking the camera, this it detected that the way was blocked by a broken beam
sensor would have to be calibrated and synchronized and went back. On the way back software problems re-
with the camera very precisely, since humans are very quired manual intervention, but still this is an impres-
sensitive to erroneous relative motion between real and sive real-world exploration experiment.
virtual objects. By using the same images for SLAM
and for overlaying, these problems are avoided. The
3.4 Handling Changing Map Estimates in Applications
videos in [18] show such a system in operation, which
at the moment is limited to a desktop-size workspace.
In the discussion above, we have mentioned repeatedly
that changing estimates are difficult for the application
3.3 Online-SLAM for a Continuously Updated Map to use. We will now discuss approaches to solve these
problems.
The most complex way of using SLAM and also the Usually, the robot stores local obstacles, intermedi-
main motivation for SLAM research is to continuously ate way-points or planned trajectories as part of its mo-
build a growing map, localize in that map, and use tion controller for some meters of travelling. Now, imag-
both results for robot control and the overall applica- ine the pose estimate obtained from SLAM changes
tion. The challenge here is that the map changes during significantly: these stored coordinates are suddenly far
operation, and in case of loop-closure it changes even away from the robot and motion control fails.
abruptly. For such temporary spatial data, i.e., coordinates
Tele-operating mobile robots for instance is difficult only kept for a few meters, this problem can be solved
because the operator only sees a small fraction of the by storing them in odometry coordinates. This is the
environment in the robot’s cameras and thus has to coordinate system, where the robot pose is defined by
memorize and grasp the overall spatial structure of the the uncorrected odometry. Its origin slowly drifts due
environment. This is most notable in the area of Safety, to odometry error but never jumps. So, all local control
Security, and Rescue Robotics (SSRR), where the envi- modules can operate in odometry coordinates. Spatial
ronment is often complex and difficult to grasp. Birk et data from the map should be converted into odometry
al. [4] as well as Kleiner and Dornhege [19] have shown coordinates at controlled points in time keeping in mind
2-D SLAM in harsh realistic field experiments and re- that there might be a jump since the last conversion.
ported that the maps were helpful to the operator. Long term spatial data, e.g., global way-points, ob-
It is tempting to propose that a robot should au- ject locations, or application specific places, however,
tonomously explore its environment instead of being must be represented in a way that they consistently
manually controlled as in Section 3.1, in particular in change with the SLAM map estimate. Formally, this
service robotics. Indeed, Stachniss et al. have shown au- is clear as these locations are usually obtained relative
tonomous exploration even with a team of robots [32] to a robot pose during teach-in. Hence, the most ele-
in a lab setting. However, we would argue against this gant way is to represent them in the SLAM algorithm
idea, unless there is compelling reason for it in a spe- as part of the map, where they are updated when the
cific application. Autonomous exploration gives rise to map changes. Alternatively, they can be stored rela-
a set of very difficult problems beyond the SLAM and tive to the respective robot pose used for defining them
exploration algorithm itself: obstacles or hazards invis- and SLAM updates these old robot poses. This is con-
ible to the sensor; forbidden areas for the robot; how venient for graph-based SLAM which updates all past
to tell the robot where to stop exploring; and gener- poses anyway.
ally the increased chance of failure. In our opinion, all On the downside, this approach is often incompat-
these problems more than outweigh the convenience of ible with an existing software architecture since it re-
5
quires every module to refer to the SLAM module for directly obtained from odometry, most links are gener-
storing long term spatial data. So, in summary, it is pos- ated by scan-matching, i.e., by an algorithm that aligns
sible to represent spatial data compatible to the SLAM two laserscans thereby computing their relative pose [6].
philosophy of a changing map estimate but it requires Additionally, a policy must be implemented that con-
considerable effort. This, again, underlines the practi- trols which scans to match, because matching all over-
cal advantage of a prior mapping phase, where during lapping pairs of scans is too slow. If feasible (Sec. 3.1),
operation global coordinates can be used as usual. additional links can be manually added by the user. Be-
yond SLAM, scan-matching can correct odometry and
localize in a map represented by scans.
4 Different Variants of SLAM Graph-based maps are slightly more difficult to use
than grid maps: In offline SLAM (Sec. 3.1), one can eas-
In this section, we will briefly discuss the most common ily compute a grid map by overlaying all laserscans ac-
variants of SLAM regarding the sensor input and the cording to the estimated poses. When operating online,
map representation. Following a user’s perspective, we the poses can be used as reference frames as described
will stress the assumptions made on the environment in Section 3.4. Further, they constitute a navigation
and point to implementations available on the Internet. graph for course path planning. However, free-space in-
Unless otherwise noted implementations are in C/C++. formation for metrical path planning is only available
in the original laserscans, not in the graph-based map.
To our knowledge, no full graph-based SLAM im-
4.1 2-D Evidence Grid-based SLAM plementation is available, but the essential components
are. Censi [6] has implemented several scan matchers
In this SLAM variant a laser rangefinder is used to ob- and Grisetti et al [13] provide a SLAM algorithm, TORO,
tain a bitmap-like map where white corresponds to free operating on a graph of pose relations. Both are well
space, black to obstacles, and gray to unknown space. documented and easy to use. The above mentioned con-
Odometry is (usually) needed as a second input. A par- trol policy needs to be implemented by the user. To in-
ticle filter based solution called GMapping has been de- crease precision, scans can be aligned globally by match-
veloped by Grisetti et al [12] and evaluated on a broad ing all points from all scans simultaneously [27].
range of data-sets. Recently, it has been integrated in Overall, graph-based SLAM with scan-matching is
the Robot Operating System (ROS) [29] where it is com- a slightly more complicated alternative to grid-based
plemented by a module based on the Adaptive Monte SLAM. It handles larger maps and allows the user to in-
Carlo Localization (AMCL) algorithm [34, §8] that lo- corporate manual loop-closing information easily. While
calizes on the maps generated by GMapping. a laser rangefinder is the most popular sensor, in prin-
The interface in ROS is clear and well documented. ciple graph-based approaches can use any sensor-data
While GMapping itself is research-code, we know of that can be matched, i.e., where a relative pose can be
several groups that have used it successfully and we obtained from comparing the sensor data at two points
believe it is the most practical approach for indoor ser- in time. In particular, cameras can be used in 3-D.
vice robot navigation or similar applications. The laser
rangefinder input also serves for obstacle avoidance and
the obtained map models free-space, i.e., it can be used 4.3 2-D Feature-based SLAM
for navigation, path-planning, or exploration. For in-
door navigation, the fact that a laser rangefinder only By far the largest fraction of the SLAM literature con-
sees a horizontal slice of the environment usually causes siders a variant of SLAM, where the map consists of
only small problems. Sometimes, however, overhanging 2-D point features which are observed relative to the
obstacles, e.g., tables and chairs, need to be marked, robot (Fig. 1). For an application with feature-based
either physically (e.g., with a string) or in the map. SLAM, the key question is what type of features to use
and how to detect and identify them reliably. Options
include fiducials (Fig. 1), trees which make good out-
4.2 2-D Graph-based SLAM door features [15], the paper reels reported in [1], and
walls or corners in buildings. Observations of a wall
This SLAM variant maintains a map consisting of a without its end-corners need special treatment [30].
graph of 2-D poses, usually robot poses at regular inter- The most popular example of feature-based SLAM
vals. The input are links with uncertain spatial relations is mapping the Victoria Park in Sydney [15] where trees
between these poses. Links between successive poses are are detected with a laser rangefinder. Feature-based
6
SLAM is the preferred variant for research on the core – Detect interesting feature points, e.g., with the so-
algorithm. Hence most articles use this dataset and sim- called SIFT [25] or SURF [3] detector.
ulations to evaluate their algorithms. – Compute a descriptor vector for each feature point
For every feature observation the identity of the ob- describing its visual appearance, e.g., with the SIFT
served feature needs to be obtained, which is called or SURF descriptor.
data-association. The simplest solution is nearest-neighbor – Match features in each pair of images by finding
association, where within a given threshold an obser- nearest neighbors in the descriptor space.
vation is associated with the nearest neighbor already – Use Random Sample Consensus (RANSAC) [16] with
inside the map. This is simple but works only if the the 5/7 point algorithm on the pairs found: RANSAC
map uncertainty is smaller than the separation between repeatedly draws 5 or 7 pairs from the feature match-
features. Some algorithms compare a group of features ing result, therefrom computes a relative camera
with the map taking map uncertainty into account [26], pose using the so-called 5/7 point algorithm [23, 16],
however, this sometimes fails due to consistency prob- and counts how many pairs are compatible with that
lems [17]. Some algorithms even require the applica- pose. It then chooses the repetition with the high-
tion to provide data-association. In our opinion, un- est count. If it is above a threshold these pairs are
less the map uncertainty is low enough for nearest- accepted as observations of the same feature.
neighbor, data-association is a major hurdle that re- – Run least-squares estimation (bundle adjustment)
quires a deeper understanding of SLAM to master. on all observations of all images accepted by RANSAC.
Another issue is that a set of point features is useful
for localization but does not give any free-space or con- All building blocks for visual SLAM are available:
nectivity information. It is advisable to retain old robot- The SURF detector and descriptor, RANSAC with the
poses at regular intervals in the map so a navigation- 7-point algorithm and nearest neighbor computation
graph can be obtained from the trajectory used during are included in the well documented OpenCV computer
mapping. This is possible with most algorithms, but vision library [5]. Lourakis and Argyros [24] provide sba,
also surprising as many algorithm take quite an effort a well established and documented implementation of
to be able to forget (marginalize out) old robot poses. bundle adjustment using sparse matrix methods. The
As an example of a feature-based SLAM algorithm, package bundler [31] implements the full procedure.
a well-documented implementation of the treemap al- There are some caveats for applications. First, with
gorithm is available [11]. The implementation clearly a single camera the overall scale of the scene is un-
separates the core computational engine from the front- defined. Even if one distance is known, e.g., by look-
end that defines different quantities to be estimated ing at a known object, the scale can drift over time. A
and different measurement equations. This simplifies in- stereo-camera solves this problem and simplifies data-
corporating various kinds of geometric objects, such as association. When all features are first matched be-
corners, walls, and poses. However, it provides no data- tween left and right image, only a 3-point algorithm
association leaving the user with nearest-neighbor. A is needed later. Second, cameras have a smaller field of
MATLAB implementation of the I-SLSJF provided by view (45-100◦) than laser rangefinders making it diffi-
Huang et al [17] is also well documented but also only cult to have enough overlapping features between im-
uses nearest-neighbor data-association. ages, in particular when viewing the same place from
Overall, to non-experts, we can recommend feature- reverse directions. One or two sideways looking cameras
based 2D- SLAM only if reliable features can be ob- alleviate this problem but this is often undesirable.
tained in the target application and the uncertainty is The sba package needs some seconds of computation
small enough for nearest-neighbor data-association. time for 50 images and 5000 features, which is enough
for offline SLAM but not for processing every frame of
4.4 3-D Visual SLAM a video sequence online. Klein and Murray [18] address
this challenge with their Parallel Tracking and Mapping
Visual SLAM is similar to feature-based SLAM as the framework (PTAM), where one thread tracks the cam-
map is a set of visually distinct points and the obser- era motion relative to a fixed map and another thread
vations are image positions of these points. The offline updates the map at larger intervals with sparse bun-
version of this problem has been studied in photogram- dle adjustment. The implementation is available, but
metry as “bundle adjustment” [37] and in computer vi- should be considered research code.
sion as “structure from motion” [16, §18, App. 6.3] for If the application requires free motion in 3-D, as
a long time and can be considered a mature technology. with aerial vehicles or in rough-terrain navigation, an
The canonical procedure consists of the following steps: inertial sensor can complement computer vision.
7
Overall, using images to compute a feature map In 3-D, the situation is more challenging. If stop-
prior to operation is an established technique, online and-go is feasible, 3-D graph-based SLAM with a tilted
visual SLAM is still an area of active research. laser rangefinder is a good choice. For fast motion, as
often seen in aerial applications, we recommend com-
puter vision with a stereo camera as the method of
4.5 3-D Graph-based SLAM choice (perhaps combined with inertial sensing). How-
ever, concerning visual SLAM, only visual odometry
This variant is the 3-D analogon to Section 4.2. 3-D can be considered mature enough to be used without a
point clouds are obtained at certain robot poses and deep understanding of SLAM itself.
pair wise aligned with a scan-matching algorithm. The
results form a graph of uncertain pose relations from
which the most likely poses are computed. As in 2-D, References
finally all scans can be matched globally to increase
precision. 1. Andreasson H (2010) An application of SLAM
A 3-D scan is usually obtained by tilting or rotat- to localize AGVs. Tech. Rep. urn:nbn:se:oru:diva-
ing a conventional 2-D laser rangefinder while the robot 10393, Örebro University, project web page: aass.
is at a temporary halt. Taking the scans while moving oru.se/Research/Learning/malta
would require motion tracking, e.g., by an inertial sen- 2. Bailey T, Durrant-Whyte H (2006) Simultaneous
sor which is a precision challenge. While stop-and-go localisation and mapping (SLAM): Part II state
during operation could be problematic, we believe it is of the art. Robotics and Automation Magazine
feasible to map the environment this way before oper- 13(3):108–117
ation (Sec. 3.1). Localization with scans in motion is 3. Bay H, Tuytelaars T, Gool LV (2006) SURF:
much easier than full SLAM in motion. Speeded up robust features. In: Ninth European
Nüchter [27] provides SLAM6D, a well documented Conference on Computer Vision, software: included
implementation of 3-D scan matching and 3-D graph- in the OpenCV library.
based SLAM. Considerable effort went into making it 4. Birk A, Schwertfeger S, Pathak K (2009) A net-
efficient enough for practical applications, because the working framework for teleoperation in safety, se-
amount of data is much higher in 3-D. curity, and rescue robotics (SSRR). IEEE Wireless
A point cloud representation is not very convenient Communications, Special Issue on Wireless Com-
for most tasks, in particular motion planning, so one munications in Networked Robotics 6(13):6–13
can convert it into a 3-D voxel representation. Wurm 5. Bradski G, Kaehler A (2008) Learning OpenCV:
et al [38] implemented octomap, a library for maintain- Computer Vision with the OpenCV Library.
ing and updating a 3-D voxel-map in a memory efficient O’Reilly, software: opencv.willowgarage.com
oct-tree representation. 6. Censi A (2008) An ICP variant using a point-to-line
Overall, 3-D SLAM with tilted laser rangefinders is metric. In: Proceedings of the IEEE International
feasible whenever a 3-D map is needed, e.g., often out- Conference on Robotics and Automation (ICRA),
doors. We recommend being more careful than with 2-D Pasadena, software: purl.org/censi/2007/csm
SLAM when planning an application, because the en- 7. Cheng Y, Maimone MW, Matthies L (2006) Visual
vironments under scope of 3-D SLAM are more diverse odometry on the mars exploration rovers. IEEE
and involved than the 2-D office environments. Robotics and Automation Magazine pp 54–62
8. Durrant-Whyte H, Bailey T (2006) Simultaneous
localisation and mapping (SLAM): Part I. Robotics
5 Conclusions and Automation Magazine 13(2):99–110
9. Einhorn E (2009) Personal communication
In this article, we have discussed different ways of us- 10. Frese U (2006) A discussion of simultaneous
ing SLAM in an application and different variants of localization and mapping. Autonomous Robots
SLAM, pointing to available implementations where pos- 20(1):25–42
sible. To conclude with an overall recommendation: In- 11. Frese U, Schröder L (2006) Closing a million-
door navigation is a mature technology. We recommend landmarks loop. In: Proceedings of the IEEE/RSJ
the use of a laser rangefinder as the primary sensor, ac- International Conf. on Intelligent Robots and
quiring a map with the robot prior to operation, and Systems, Beijing, pp 5032–5039, software: www.
2-D evidence-grid based SLAM. If interactive mapping openslam.org/treemap.html
is possible, 2-D graph-based SLAM will be more reliable 12. Grisetti G, Stachniss C, Burgard W (2007) Im-
but needs more custom implementation work. proved techniques for grid mapping with rao-
8