Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Robotics and Autonomous Systems 58 (2010) 68–80

Contents lists available at ScienceDirect

Robotics and Autonomous Systems


journal homepage: www.elsevier.com/locate/robot

Multi-robot visual SLAM using a Rao-Blackwellized particle filter


Arturo Gil ∗ , Óscar Reinoso, Mónica Ballesta, Miguel Juliá
Universidad Miguel Hernández, Systems Engineering Department, Avda. de la Universidad s/n, 03202 Elche (Alicante), Spain

article info abstract


Article history: This paper describes an approach to solve the Simultaneous Localization and Mapping (SLAM) problem
Received 8 May 2008 with a team of cooperative autonomous vehicles. We consider that each robot is equipped with a stereo
Received in revised form camera and is able to observe visual landmarks in the environment. The SLAM approach presented here is
21 July 2009
feature-based, thus the map is represented by a set of 3D landmarks each one defined by a global position
Accepted 30 July 2009
Available online 4 August 2009
in space and a visual descriptor. The robots move independently along different trajectories and make
relative measurements to landmarks in the environment in order to jointly build a common map using a
Keywords:
Rao-Blackwellized particle filter. We show results obtained in a simulated environment that validate the
SLAM SLAM approach. The process of observing a visual landmark is simulated in the following way: first, the
Cooperative robots relative measurement obtained by the robot is corrupted with Gaussian noise, using a noise model for a
Particle filter standard stereo camera. Second, the visual description of the landmark is altered by noise, simulating the
Visual landmarks changes in the descriptor which may occur when the robot observes the same landmark under different
scales and viewpoints. In addition, the noise in the odometry of the robots also takes values obtained
from real robots. We propose an approach to manage data associations in the context of visual features.
Different experiments have been performed, with variations in the path followed by the robots and the
parameters in the particle filter. Finally, the results obtained in simulation demonstrate that the approach
is suitable for small robot teams.
© 2009 Elsevier B.V. All rights reserved.

1. Introduction (i) Stereo vision systems are typically less expensive than laser
range systems.
Acquiring maps of the environment is an essential ability for (ii) Most laser ranging systems provide 2D information from
autonomous mobile robots, since the maps are needed to perform the environment whereas stereo vision systems are able to
higher level tasks. In consequence, in the last two decades the provide a more complete 3D representation of the space.
problem of simultaneous localization and mapping (SLAM) has (iii) Vision systems provide a great quantity of information and
received significant attention. The SLAM problem considers the allow us to integrate in the robot other techniques, such as
situation in which an autonomous mobile robot moves through face or object recognition.
an unknown space and incrementally builds a map of this However, stereo systems are usually less precise than laser
environment while simultaneously uses this map to compute its sensors and the information from the cameras needs normally
absolute location. This problem is considered inherently difficult, to be processed in order to extract salient features. In common
since noise introduced in the estimate of the robot pose leads to configurations, the camera is installed at a fixed height and
noise in the estimate of the map and vice versa. orientation [3,4] with respect to the robot reference system and the
To date, typical SLAM approaches have been using laser range movement of the camera is restricted to a plane. However, in recent
sensors to build maps in two and three dimensions (e.g., [1,2]). times some researchers have constructed visual maps allowing a
Recently, the interest on using cameras as sensors in SLAM has free 6 DOF movement of the camera [5,6].
increased and some authors have been concentrating on building Most approaches to visual SLAM are feature-based. In this
3D maps using visual information obtained from cameras. These case, a set of significant points in the environment are used
approaches are usually denoted as visual SLAM. The reasons for this as landmarks. Mainly, two steps must be distinguished in the
interest stem from: selection of visual landmarks. The first step involves the detection
of interest points in the images that can be used as reliable
landmarks. The points should be detected at different distances
∗ Corresponding author. Tel.: +34 966658620; fax: +34 966658979. and viewing angles, since they will be observed by the robot
E-mail address: arturo.gil@umh.es (A. Gil). from different poses in the environment. At a second step the
URL: http://arvc.umh.es (A. Gil). interest points are described by a feature vector which is computed
0921-8890/$ – see front matter © 2009 Elsevier B.V. All rights reserved.
doi:10.1016/j.robot.2009.07.026
A. Gil et al. / Robotics and Autonomous Systems 58 (2010) 68–80 69

using local image information. This descriptor is used in the data obtained under different circumstances. The results demonstrate
association problem, that is, when the robot has to decide whether that the approach is suitable to build visual maps using small teams
the current observation corresponds to one of the landmarks in of robots.
the map or to a new one. When the robot observes a visual The main reason that motivated to carry out a series of
landmark in the environment, it obtains a distance measurement experiments using simulated data is that this enables to assess
and computes a visual descriptor. Next, the descriptor and the the validity of the presented algorithm under different conditions.
measurement are used to find the landmark in the map that In simulation, the map and the true path of the robots is
generated the observation. Generally, when the robot traverses known in advance, and this enables us to evaluate the precision
unexplored areas, the observations correspond to new landmarks of the approach when varying some of the parameters in the
and are thus initialized in the map. However, when the robot Rao-Blackwellized filter. Furthermore, we compare the results
traverses previously explored places, it re-observes landmarks. In when different number of robots are used and evaluate the
this case, if current observations are correctly associated with the computational cost introduced by more robots. On the contrary,
visual landmarks, the robot will be able to find its location with evaluating the results of visual SLAM using real data is complex.
respect to these landmarks, thus reducing the error in its pose. If On the one hand the true path followed by the robots is not known.
the observations cannot be correctly associated to the landmarks On the other hand, the map cannot be known precisely, since the
in the map, the map will become inconsistent. To sum up, the data exact location of the landmarks cannot be established in advance.
association is a fundamental part of the SLAM process, since wrong The remainder of the paper is structured as follows. First,
data associations will produce incorrect maps. Section 2 discusses related work. Section 3 presents the approach
An important subfield within mobile robotics that requires to multi-robot SLAM, our map representation is also detailed.
accurate maps is the performance of collaborative tasks by Section 4 deals with visual landmarks and their utility in SLAM.
multiple vehicles. Multiple vehicles can frequently perform tasks Next, Section 5 exposes our approach to the data association
more quickly and robustly than a single one [7]. However, little problem in the context of visual features. The most relevant
effort has been done until now in the field of multi-robot visual features of the simulated environment are explained in Section 7.
SLAM, which considers the case where several robots move along Next, in Section 8 we present the results obtained. Finally,
the environment and build a map. In this paper we concentrate Section 10 summarizes the most important conclusions and
on this problem and propose a solution that allows us to build proposes future extensions of the work.
a map using a set of visual observations obtained by a team of
mobile robots. It is worth noting that SLAM algorithms focus on 2. Related work
the incremental construction of a map, given a set of movements
carried out by the robots and the set of observations obtained from Up until now, approaches to multi-robot SLAM can be grouped
different locations. However, SLAM algorithms do not consider in one of the two following solutions:
the computation of the movements that need to be performed
(i) Approaches in which each robot estimates its own individual
by the robots, since this is generally considered as a different
map using its observations. At a later stage, a common map is
problem, denoted as exploration. Classical exploration strategies
formed by fusing the individual maps of the robot team.
often try to cover unknown terrain as fast as possible and avoid to
(ii) Approaches where the estimation of all the trajectories and the
repeatedly visit known areas. This strategy, however, is suboptimal
map is made jointly. A single map is computed simultaneously
in the context of the SLAM problem because the robot typically
using the observations of all the robots.
needs to re-visit places in order to localize itself and reduce the
uncertainty [1]. If the exploration has to be accomplished using a The work exposed in [8] can be classified in the first group.
team of robots, the problem becomes harder, since the exploration The idea here is that each robot builds its own map, and, at the
algorithm needs to avoid that different robots explore the same same time continuously attempts to localize in the maps built by
areas in the map. In this paper we concentrate on the visual other robots using particle filters. The approach can cope with the
SLAM problem and assume that the robots are able to explore the situation where the initial locations of the robots are unknown,
environment in an efficient way. solving the data association problem with a rendez-vous technique.
The major contribution of this paper is twofold. First we propose However, the fusion of the individual maps is computationally
an approach to the multi-robot SLAM problem using a Rao- expensive, since K 2 particle filters must be maintained for a team
Blackwellized Particle Filter (RBPF). To the best of our knowledge, of K robots.
this is the first work that uses visual measurements provided by The approach presented in [9] can be classified in the second
several robots to build a common 3D map of the environment. group. It uses an extended Kalman filter (EKF) to estimate a state
Second, we simulate the process of visual SLAM using multiple vector formed by the poses of all the vehicles and a set of 2D
robots. We consider that the robots perform observations on landmarks. With this extension, the robots obtain observations
visual landmarks using stereo cameras, and, in addition, each and construct a single unified map using the update equations of
observation is associated with a visual descriptor, that allows the classical EKF [10]. The initial positions of the vehicles must
to partially differentiate between landmarks. In addition, we be known in advance and the data association is assumed to be
propose a solution to the data association in the context of visual known. In addition, the gain in performance when using multiple
SLAM and compare the results when the noise in the descriptor vehicles is proved theoretically within this context. In this case, the
varies, which simulates the variation in the descriptor when a main drawback stems from the fact that a single hypothesis over
landmark is observed from different viewpoints. Introducing noise the robot pose is maintained. If false data associations are made the
in the descriptor makes the data association harder and affects whole EKF may diverge [11].
negatively the results of the SLAM algorithm. We consider that In [12] stereo vision is used to extract 3D visual landmarks from
the robots are able to communicate with a central agent in the environment. During exploration, the robot extracts SIFT (Scale
the system that is in charge of building a common map of the Invariant Feature Transform) features from stereo images and cal-
environment. When a robot obtains an observation, it transmits culates relative measurements to them. Landmarks are then in-
this information to the central agent, along with the information tegrated in the map with a Kalman Filter associated to each one.
about its odometry. Finally, the relative initial position of the robots However, this approach does not manage the uncertainty associ-
must be known approximately. We present simulative results ated with robot motion, and only one hypothesis over the pose of
70 A. Gil et al. / Robotics and Autonomous Systems 58 (2010) 68–80

the robot is maintained. Consequently it may fail in the presence is referred as xt = {x1 , x2 , . . . , xt }, the set of observations made by
of large odometric errors (e.g. while closing a loop). In [3] a Rao- the robot until time t will be designated z t = {z1 , z2 , . . . , zt } and
Blackwellized particle filter is used to estimate simultaneously the the set of actions ut = {u1 , u2 , . . . , ut }. We formulate the SLAM
map and the path of a single robot exploring the environment. In problem as that of determining the location of all landmarks in the
the mentioned work, SIFT features are used too as landmarks in the map L and robot poses xt from a set of measurements z t and robot
environment and extracted using a stereo pair of cameras. A dif- actions ut . Thus it can be stated as the estimation of the posterior:
ferent usage of SIFT features can be found in [6]. In this case SLAM
is performed using a low cost wide-angle stereo camera and vi- p(xt , L|z t , ut , c t ) (1)
sual landmarks extracted using the Shi and Tomasi operator. The t
where c designates the set of data associations performed until
approach divides the global map into local submaps identified by
time t, c t = {c1 , c2 , . . . , ct }. While exploring a particular environ-
a SIFT fingerprint, composed by a number of SIFT landmarks. The
ment, the robot has to determine whether a particular observation
SIFT fingerprint is used from time to time for loop-closing tasks.
zt = (vt , dt ) corresponds to a previously mapped landmark or to a
Recently, an approach to visual SLAM was presented in [5].
new one. Given that, at time t the map is formed by N landmarks,
Harris points are used as landmarks in the environment and used
this correspondence is represented by ct , where ct ∈ [1 . . . N ],
to build locally independent maps using EKF–SLAM. The local maps
meaning that the observation zt corresponds to the landmark ct
are joined into a full map using a divide and conquer algorithm. It is
in the map. When no correspondence is found we denote it as
capable of mapping large environments using a stereo pair moving
ct = N + 1, indicating that it is a new landmark. For the moment,
with 6 DOF. It is worth noting that the algorithm uses range and
we consider this correspondence as known.
bearing measurements and, at the same time, is able to integrate
The SLAM problem can be separated into two parts:
bearing-only measurements in the map.
(i) The estimation of the trajectory of the robot.
3. Multi-robot visual 3D SLAM (ii) The estimation of the map by means of a series of measure-
ments.
In this section we describe our approach to the SLAM problem However, the two facets can be separated and, if somehow we
in the case where a team of robots explore simultaneously the could know the trajectory of the robot in the environment, then
environment. SLAM is considered to be a complex task due to the
the estimation of the map would be trivial. This conditional
mutual dependency between the map of the environment and the
independence property of the SLAM problem implies that the
pose of the robot. That means that, if we make an error in the
posterior (1) can be factored as [14]:
estimation of the pose, this induces an error in the estimation
of the map and vice versa. The problem becomes harder when N
Y
more robots participate in the construction of the map, since the p(xt , L|z t , ut , c t ) = p(xt |z t , ut , c t ) p(lk |xt , z t , ut , c t ). (2)
dimension of the state increases. k=1
In the approach presented here, the robots share the observa-
This equation states that the full SLAM posterior is decomposed
tions performed over the landmarks and create a common map of
into two parts: one estimation over robot paths, and N indepen-
the environment. It is based, in essence, on a Rao-Blackwellized
dent estimators over landmark positions, each conditioned to the
particle filter (RBPF), proposed initially by Murphy [13] and com-
path estimate. We approximate p(xt |z t , ut , c t ) using a set of M par-
monly referred as FastSLAM in the SLAM community [14]. Basi-
ticles, each particle having N independent landmark estimators
cally, a Rao-Blackwellized particle filter combines a representation
(implemented as EKFs), one for each landmark in the map. Each
of the pose by means of particles with a closed estimation of some
particle is thus defined as:
variables.
In order to build the map, we assume that the robots are St
[m]
= {xt ,[m] , µt[m,1] , Σt[,m1] , d[1m] , . . . , µt[m,N] , Σt[,mN] , d[Nm] }, (3)
equipped with a stereo camera system, which enables them to ob-
tain relative measurements from landmarks that are detected in where µt[m,k] is the best estimation at time t for the position of
the environment. In addition, we assume that a descriptor associ- [m]
landmark lk based on the path of the particle m and Σt ,k is
ated to the landmark can be obtained, based on its visual appear-
the associated covariance matrix. The visual descriptor associated
ance. We also consider that the robots are able to communicate [m]
among themselves and with a central agent in the system. In addi- to the landmark j is represented by dj . The particle set St =
tion, we consider that the relative starting position of the robots is {St[1] , St[2] , . . . , St[M ] } is calculated incrementally from the set St −1 at
approximately known in advance. time t − 1 and the robot control ut . Thus, each particle is sampled
For simplicity, we first start our explanation assuming that from a proposal distribution xt
[m]
∼ p(xt |xt −1 , ut ) that models
a single robot explores the environment while it observes the noise in the odometry of the robots. Next, and following the
visual landmarks. At time t the robot obtains an observation zt , approach of [14], each particle is then assigned a weight according
constituted by zt = (vt , dt ), where vt = (Xc , Yc , Zc ) is a 3D vector to:
relative to the left camera reference frame and dt is the visual
1 1
e{− 2 (vt −v̂t ,ct ) [Zct ] (vt −v̂t ,ct )}
T −1
descriptor associated to the landmark. The map L is represented ωt[m] = p (4)
by a collection of N landmarks L = {l1 , l2 , . . . , lN }. Each landmark |2π Zct |
is described as: lk = {µk , Σk , dk }, where µk = (Xk , Yk , Zk ) is a
vector describing the position of the landmark referred to a global where vt is the actual measurement and v̂t ,ct is the predicted
[m]
reference frame, with associated covariance matrix Σk . In addition, measurement for the landmark ct based on the pose xt . The
each landmark lk is associated with a descriptor dk that partially matrix Zct is the covariance matrix associated with the innovation
differentiates it from others. This map representation is compact (vt − v̂t ,ct ). Note that we implicitly assume that each measurement
and has been used to effectively localize a robot in unmodified vt has been associated to the landmark ct of the map. This problem
environments [15]. is, in general, hard to solve, since similar-looking landmarks may
Following the usual nomenclature in Rao-Blackwellized SLAM, exist. In Section 5 we describe our approach to this problem. In the
we denote the robot pose at time t as xt and the movement of the case that B observations from different landmarks exist at a time
robot at time t as ut . On the other hand, the robot path until time t t, that is zt = {zt ,1 , zt ,2 , . . . , zt ,B }, we calculate the total weight
A. Gil et al. / Robotics and Autonomous Systems 58 (2010) 68–80 71

assigned to the particle as: ωt = i=1 ωt ,i , where wt ,i is the


[m] QB [m] [m] Table 1
weight associated to the observation zt ,i , computed using Eq. (4). Particle set St . Each particle is accompanied by N Kalman filters.
In the following, we assume that a team of K robots explores
the environment. Our objective here is to jointly estimate the Particle 1 {(x, y, θ)h1i , . . . , (x, y, θ)hK i }[1] µ[11] Σ1[1] d[11] . . . µ[N1] ΣN[1] d[N1]
paths followed by the robots and the map. At a time step t the .
.
robot hii is at pose xt ,hii and performs a single observation zt ,hii = .
{vt ,hii , dt ,hii }. We denote the path of the robot hii until time t as
Particle M {(x, y, θ)h1i , . . . , (x, y, θ)hK i }[M ] µ[1M ] Σ1[M ] d[1M ] . . . µ[NM ] ΣN[M ] d[NM ]
xthii = {x1,hii , x2,hii , . . . , xt ,hii }. For simplicity, we refer to xth1:K i =
{xth1i , xth2i , . . . , xthK i } to the set of robot paths in the team until time
t. Analogously, we denote uth1:K i = {uth1i , uth2i , . . . , uthK i } the set of
actions performed by the robots and zht1:K i = {zht1i , zht2i , . . . , zhtK i }
refers to the set of observations performed by the robot team until
time t. Similarly to the case of a single robot, the data association
variable will be denoted as ct , which indicates that the observation
zt ,hii is associated to the landmark ct in the map. It is worth noting
here that we are estimating a map common to all the robots,
in consequence the observation is associated to a landmark in
the map independently of the robot that observed it. The data
association until time t is referred as c t = {c1 , c2 , . . . , ct }. In
consequence, the multi-robot SLAM problem can be stated as the
estimation of the following probability function:
p(xth1:K i , L|zht1:K i , uth1:K i , c t )
N
Y
Fig. 1. The figure shows a new set of particles generated by sampling from the
= p(xth1:K i |zht1:K i , uth1:K i , c t ) p(lk |xth1:K i , zht1:K i , uth1:K i , c t ). (5)
motion model.
k=1

This equation proposes a manner to estimate a group of K paths • Generate a new particle set based on the prior set.
xth1:K i and a map L conditioned to the case that the robots have per-
• Update the estimation of each landmark based on the
formed a number of movements uth1:K i and a series of observations observations.
zht1:K i associated to landmarks in the map c t . In consequence, anal- • Calculate a weight for each particle.
ogously to Eqs. (2) and (5) expresses that we can separate the es- • Perform a resampling based on the weight of each particle.
timation of the map and the estimation of K different paths into
two parts: The function p(xth1:K i |zht1:K i , uth1:K i , c t ) is estimated us- In order to clarify the ideas presented here we describe the
ing a particle filter, while the map is estimated using N indepen- whole process in Algorithm 1, which considers the case in which 3
dent estimations conditioned to the paths xth1:K i . As a result, we are robots explore simultaneously the environment.
decomposing the SLAM problem into two different parts: A local-
ization problem of K robots in an environment and a series of in-
3.1. Generating a new particle set
dividual landmark estimations conditioned to robot paths xth1:K i . In
order to achieve this, each one of the M particles in the filter is
accompanied with N independent estimators for each one of the The first step is to generate a new set of hypothesis St based on
landmarks, implemented as EKFs. In our case, each of the Kalman the set St −1 . That means that we obtain a new particle set over the
[m]
Filters will be conditioned to the K paths of the robot team. In con- robot poses xt ,h1:K i . That is, we obtain a new pose xt ,hii for each of
sequence, each particle is represented as: the robots by sampling from a motion model p(xt |xt −1 , ut ):
[m]
St = {xth1,[:mK i] , µ[1m,t] , Σ1[m,t ] , d[1m] , . . . , µ[Nm,t] , ΣN[m,t] , d[Nm] }. (6) xt ,hii ∼ p(xt ,hii |xt −1,hii , ut ,hii ).
[m]
(7)
The fundamental difference compared to the particle defined in (3)
is that, in this case, the state that we would like to estimate The function p(xt ,hii |xt −1,hii , ut ) defines the movement model for
is composed by the pose (x, y, θ ) of K robots, thus xt ,h1:K i = the mobile agent. We apply this movement model to each of
{xt ,h1i , xt ,h2i , . . . , xt ,hK i }. As a result, we propose a joint estimation the K poses of the particle separately, based on the movement
over a path state of dimension 3K . According to [16], the number performed by the hii robot. Fig. 1 shows the application of Eq. (7).
of particles needed to obtain a good estimation increases exponen- The solid lines represent the true movements performed by the
tially with the dimension of the state. However, the results that robots while the dotted lines correspond to the odometry readings.
we present here show that the approach works perfectly for robot In the beginning of the movements, all the particles concentrate on
teams of 2–3 members using a reasonable number of particles. In the origin. After applying the movement model we obtain a set of
the case presented, the same map is shared by all the robots, which particle clouds that represent the probability over the poses of the
means that an observation performed by a particular robot affects robots at each time step. In this way, non-linear motion models can
the map of the whole robot team. In consequence, one member of be easily represented, as opposed to the linearization required by
the team may observe a landmark previously mapped by a differ- EKF–SLAM approaches [10].
ent robot and update its estimate. In addition, this means that a
robot does not need to explicitly close a loop in order to reduce the
uncertainty in its pose (i.e. return to a previously mapped area). 3.2. Landmark estimation
A robot can reduce the uncertainty in its pose when it observes
landmarks previously mapped by other robots. The structure of the Updating each landmark estimate is performed based on the
particle defined in (6) is clearly explained in Table 1. To sum up, we pose of the hii robot that performed the observation zt ,hii =
propose a method based on a Rao-Blackwellized particle filter for {vt ,hii , dt ,hii } with data association ct . Updating each landmark lk
the case where a robot team cooperates to build a map of a given is performed independently, using the standard EKF equations
environment. The algorithm can be decomposed into 4 basic steps: detailed below:
72 A. Gil et al. / Robotics and Autonomous Systems 58 (2010) 68–80

v̂t ,hii = g (x[tm,hi]i , µ[cmt ,]t −1 ) (8) observations zt ,h1:K i obtained by the robots have not been included
in the filter. This distribution is commonly known as proposal
Glct = ∇lct g (xt , lct )x =x[m] ;l =µ[m] (9) distribution. However, we would like to estimate the posterior
t t ,hii ct ct ,t −1
p(xth1:K i , L|zht1:K i , uth1:K i , cht1:K i ) which includes all the information
[m]
Zct ,t = Glct Σct ,t −1 GTlc + Rt (10) from odometry and sensors until time t, generally stated as the
t

[m] target distribution. This difference is corrected with a process


Kt = Σct ,t −1 GTlc Zc−t ,1t (11) generally called sample importance resampling (SIR). Basically,
t ,hii
a weight is assigned to each particle depending on the quality
µ[cmt ,]t = µ[cmt ,]t −1 + Kt (vt ,hii − v̂t ,hii ) (12) that the current observation and the map matches. Following, a
new set of particles St is created by sampling from St −1 . Each
Σc[tm,t] = (I − Kt Glct )Σc[tm,t]−1 (13)
particle is included in the new set with probability proportional
where v̂t ,hii is the prediction for the current measurement vt ,hii to its weight. During the resampling process, particles with low
assuming that it has been associated with landmark ct in the weight are normally replaced with others with a higher weight.
map. The observation model g (xt , lct ) is linearly approximated by Assuming that a team of K robots exist and each one performs a
the Jacobian matrix Glct . It is assumed here that the noise in the single measurement zt ,hii with data association ct . We compute, the
weight ωt ,hii associated with the particle m and the robot hii as:
[m]
observation is Gaussian and can be modeled with the covariance
matrix Rt . Eq. (12) represents the update of the estimation of the
landmark ct : µct ,t −1 in terms of the innovation v = (vt ,hii − v̂t ,hii ).
[m] 1 1
e{− 2 (vt ,hii −v̂t ,ct ) [Zct ] (vt ,hii −v̂t ,ct )} .
T −1
ωt[m ]
,hii = p (16)
Finally, Eq. (13) updates the covariance matrix which is Σc[tm,t] , |2π Zct |
associated to the m particle and the landmark ct . Updating each [m]
In consequence, for each particle St defined in Eq. (6) a number of
EKF requires a constant time per landmark, since the dimensions
K weights ωt ,hii = {ωt ,h1i · · · ωt ,hK i } are calculated. Since we are es-
of each one are 3 × 3. Note that we implicitly assume that the
timating the joint probability over the robot paths, the total weight
observation zt ,hii corresponds to the landmark lct in the map. [m]
associated to the particle St must be calculated by multiplying
By now we assume this correspondence to be known. Later, in
the weights associated to the K robots: ωt = i=1 ωt ,hii . Next,
[m] QK [m]
Section 5 we deal with this problem with more detail. The update
equations are represented in Algorithm 1, in lines 33–37. It is the weights are normalized to approximate a probability density
i=1 ωt
PM [i]
important to note that the update is performed taking into account function = 1. The same procedure can be extended to
the innovation of the robot hii. the case where each of the vehicles obtains a set of B observations
The matrix Rt associated with the noise in the observation is zt ,hii = {zt ,hii,1 , zt ,hii,2 , . . . , zt ,hii,B }. This can be achieved by calcu-
modeled using the equations of a standard stereo pair of cameras lating a weight for each of the observations using Eq. (16) and then
(assuming pin-hole cameras and parallel optical axes of both multiplying the B results to calculate ωt ,hii .
cameras). The 3D coordinates of a point in the space relative to the
left camera reference system can be calculated as: 3.4. Path and map estimation
I (c − Cx ) I (Cy − r ) fI
Xr = , Yr = , Zr = (14) The algorithm we have just described, maintains a set of
d d d particles that represent a set of plausible paths that the robot team
where, (r , c ) are the row and column of the projected 3D point may follow. Conditioned to these paths, a set of 3D landmarks are
in the left image. Correspondingly, the same point is projected in estimated, each one represented by a Kalman filter. Finally, we
(r , c + d) in the right image, being d the disparity associated to need to choose the path and the map that best represent the true
that point. The parameter I is named baseline, and corresponds trajectories and map. In order to do this, we maintain a logarithmic
to the horizontal separation of both cameras. The parameters Cx sum over the global weights of the particles. In consequence, we
and Cy refer to the intersection of the optical axis with the image choose the best particle as the one that maximizes the sum:
plane in both cameras and f is the focal distance of the cameras. We
A
can calculate the covariances associated to a relative measurement X
log(wt ).
[m]
m̂ = argmaxm (17)
(X r , Y r , Z r ) supposing a linear error propagation [17]:
t =1

I 2 σc2 I 2 (c − Cx )2 σd2
σX2r = + ,
d2 d4 4. Visual landmarks
(15)
I 2 σr2 I 2 (Cy − r )2 σd2 f 2 I 2 σd2
σY2r = + , σZ2r = . We define visual landmark as a point in space that can be easily
d2 d4 d4
detected from different distances and viewing angles by means of a
The noise matrix Rt is then computed as Rt = diag(σ , σ , 2
Xr
2
Yr vision sensor. Corners in images are typically good candidates to be
σZ2r ). This noise model has previously used at a visual SLAM context employed as landmarks for SLAM tasks, since they are stable and
in [12,18]. In the experiments we have used σr = σc = 10 pixels can typically be detected from different viewpoints.
and σd = 0.5, which produce typical errors in a common stereo Solving the SLAM problem with a landmark oriented approach
cameras. In real stereo vision systems the independence between involves two main processes: The detection, that enables the
the measurements X r , Y r and Z r is not true, and a more complex landmark to be detected from a set of poses in the environment,
noise model needs to be derived. Instead, the only assumption that and the description of the landmark, that aims at representing the
can be made is the independence of the noise in the projections landmark based on its visual appearance. The descriptor enables
(r , c ) in the left and right images. the robot to recognize a particular landmark in the map. In the
case of visual landmarks, we aim at detecting some points in the
3.3. Assigning a weight to each particle images that are highly salient, and can easily be detected from
different distances and viewing angles. The description of the
The set of particles generated by the movement model are visual landmarks is based on a sub-image centered at the detected
distributed according to p(xth1:K i |zht1−:K1 i , uth1:K i , cht1−:K1 i ), since the lasts point. The case of visual SLAM is particularly difficult, since:
A. Gil et al. / Robotics and Autonomous Systems 58 (2010) 68–80 73

Algorithm 1 Summary of the proposed algorithm.


1: S=∅
2: [zt ,h1i , zt ,h2i , zt ,h3i ] = ObtainObservations()
3: InitialiseMap(S , x0,h1:3i , zt ,h1:3i )
4: for t = 1 to numMovements do
5: [zt ,h1i , zt ,h2i , zt ,h3i ] = ObtainObservations()
6: [S , ωt ,h1i ] = FastSLAMMR(S , zt ,h1i , Rt , ut ,h1i )
7: [S , ωt ,h2i ] = FastSLAMMR(S , zt ,h2i , Rt , ut ,h2i )
8: [S , ωt ,h3i ] = FastSLAMMR(S , zt ,h3i , Rt , ut ,h3i )
9: ωt = ωt ,h1i ωt ,h2i ωt ,h3i
[m]
10: S = ImportanceResampling(S , ωt ) //Sample randomly from S according to ωt
11: end for

function [St ] = FastSLAMMR(St −1 , zt ,hii , Rt , ut ,hii )


12: St = ∅
13: for m = 1 to M {For every particle} do
[ m]
14: xt ,hii ∼ p(xt ,hii |xt −1,hii , ut ,hii )
[ m]
15: for n = 1 to Nt −1 //Loop over all possible data associations do
Fig. 2. The figure describes the data association problem in the context of visual
16: v̂t ,hii = g (x[tm ] [ m]
,hii , µn,t −1 ) SLAM. The position of two visual landmarks is denoted with stars, with uncertainty
17: Gln = ∇lc g (xt , ln ) [ m] [m]
t xt ,hii =x ;l =µc ,t −1 depicted with an ellipse. Each landmark has a different visual appearance encoded
t ,hii ct
[m]
t
with their visual descriptor. The observation zt ,hii = {vt ,hii , dt ,hii } consists of a
18: Zn,t = Gln Σn,t −1 GTl + Rt
n 3D measurement vt ,hii (dashed line) and a visual descriptor dt ,hii that defines the
19: D(n) = (vt ,hii − v̂t ,hii )T [Zn,t ]−1 (vt ,hii − ẑt ,hii ) appearance of the observed point.
20: E (n) = (dt ,hii − dn )T (dt ,hii − dn )
21: end for
22:
[m]
D(Nt −1 + 1) = D0
is associated to a descriptor, computed using the image informa-
23: j = find(D ≤ D0 ) {Find candidates below D0 } tion at each location and scale. The descriptor provides invariance
24: ct = argminj E (n) {Find minimum among candidates} to image translation, scaling, rotation and is only partially invari-
25: if E (ct ) > E0 // Create a new landmark? then
[ m]
ant to illumination changes and affine projection. The resulting de-
26: ct = Nt −1 + 1
scriptor is of dimension 128 and enables the same points in the
27: end if
28:
[ m]
if ct = Nt −1 + 1 //New landmark then space to be recognized from different viewpoints, which may occur
[m] while the robot moves around its workplace. Lately, SIFT features
29: Nt = Nt[−m]
1
+1
have been used in robotic applications, showing its suitability for
30: µct ,t = g 1 (x[tm
[ m] − ]
,hii , zt ,hii )
[ m] −1
localization and SLAM tasks ([12,4,15,3]). In a prior work we eval-
T
31: Σc ,t = Gl Rt Glc
t uated different detection [25] and description [26] methods and
ct t
32: ωt[m] = p0 their suitability for visual SLAM.
33: else
[m]
34: Nt = Nt[−
m]
1
//Old landmark
35:
[m] −1
Kt = Σc ,t −1 GTL Zc ,t
5. Data association
t ct t
36: µ[cm ]
t ,t
= µ[cm,]t −1 + Kt (vt ,hii − v̂t ,hii )
t While a robot explores the environment it must decide whether
[ m] [ m]
37: Σct ,t = (I − Kt Glc )Σc ,t −1 the observation zt ,hii = (vt ,hii , dt ,hii ) corresponds to a previously
t t
38: end if
mapped landmark or to a different one. This constitutes the Data
1
{− (v )T [Zct ]−1 (vt ,hii −v̂t ,ct )}
e 2 t ,hii t ,ct
−v̂
39: ωt[m ]
,hii =
√ 1 Association problem, and is shown in Fig. 2. In the figure, the
|2π Zct |
40:
[ m] [m] [m] [m] [m] [ m] [m] [ m]
add {xt ,hii , Nt , µ1,t , Σ1,t , d1,t , · · · , µ [m] , Σ [m] , d [m] , ωt } to St
[m] position of two visual landmarks is marked with stars, with an
Nt ,t Nt ,t Nt ,t
associated uncertainty depicted with an ellipse. The figure presents
41: end for
42: return St an observation obtained by the robot, consisting of a measurement
vt ,hii and a visual descriptor of the landmark dt ,hii . In the figure
vt ,hii is shown with dashed line, with an uncertainty denoted by
(i) The landmarks cannot always be detected from different view- its associated ellipse. In addition, and since we are using a vision
points. In consequence, it is difficult to re-detect previously sensor, a descriptor dt ,hii describes the visual appearance of the
mapped landmarks. point in space. In the figure each landmark is represented by a
(ii) The description of the points must be invariant to changes visual descriptor (depicted with a sub-image of the scene). In
in viewing distance (scale) and viewing angle. To achieve an the case presented in the figure, the robot must decide whether
invariant description is particularly complex in the case of the measurement corresponds to landmark θ1 , landmark θ2 or
visual landmarks, since the appearance of a point in space it is a new landmark, using the geometrical compatibility of the
varies greatly with viewpoint changes. In consequence, the landmark along with the visual appearance of the observation.
data association problem becomes hard to solve. In [14] a Mahalanobis distance function is computed to
To date, different detectors and descriptors have been used for associate each measurement to a landmark in the map. Thus, the
mapping and localization using monocular or stereo vision, such as distance D is computed for all the landmarks in the map:
SIFT [12,18,3], the Harris corner detector [19], Harris–Laplace [20],
D = (vt ,hii − v̂t ,ct )T [Zct ]−1 (vt ,hii − v̂t ,ct ). (18)
SURF [21] or a combination of attention regions with Harris cor-
ners [22]. SIFT and SURF combine a detection and description The measurement vt ,hii is associated with the landmark ct in the
method. In particular, SIFT (Scale Invariant Feature Transform) fea- map that minimizes D. If the minimum value surpasses a pre-
tures were developed for image feature generation, and used ini- defined threshold, a new landmark is created.
tially in object recognition applications (see for example [23,24]). To work properly, the previous solution needs the landmarks to
SIFT features combine a method to extract stable locations in im- be placed sufficiently apart from each other, in order to avoid false
ages and a description that enables to identify those points. First, data associations. However, normally, this is not the case in visual
SIFT features are located at maxima and minima of a difference of SLAM, since the landmarks may be placed nearby. As stated in [11]
Gaussian function applied in scale space. Next, each SIFT location the solution is not optimal, since wrong data associations will be
74 A. Gil et al. / Robotics and Autonomous Systems 58 (2010) 68–80

where the function g −1 (xt , zt ) computes the global position of


[m]
made. As depicted in Fig. 2, we propose here to improve the data
association process by incorporating the visual description of the the landmark given the relative measurement zt ,hii obtained by
landmark. In our case, we will employ a SIFT descriptor associated robot hii. The visual descriptor associated with the landmark is
to each landmark in the map. initialized using the one associated with the current observation
We propose the following mechanism for the data association, dt ,hii .
which is computed in two consecutive steps:
• First, we compute the Mahalanobis distance D for all the 7. Simulation environment
landmarks in the map and select only those that are below a
threshold D0 as candidates. If no candidates are found in this
The main reason that motivated to perform a series of
step, then we decide to create a new landmark in the map, as
experiments using simulated data is that this enables to assess
explained in Section 6.
the validity of the presented algorithm. In this case, the map and
• Second, we use the visual appearance of the observation to
the true path of the robots is known in advance, and this enables
select the data association within the candidates of the previous
us to evaluate the precision of the approach when varying some
step. Given a SIFT descriptor as observed by the robot dt ,hii
and a SIFT descriptor dj associated to a landmark in the map, of the parameters in the Rao-Blackwellized filter. In this way, we
we compute a squared Euclidean distance function within can evaluate the algorithm with different number of particles and
the group of candidates that comply with the Mahalanobis analyze the effect of adding more robots to the precision of the map
distance: and the estimated paths.
We simulate an indoor environment and a group of robots
E = (dt ,hii − dj )(dt ,hii − dj )T . (19) that navigate through it. The simulations were carried out using
We select the landmark in the map that minimizes E. If the Matlab. We assume that the mobile agents move around the space
minimum distance E is below a threshold E0 the observation simultaneously, they find 3D visual landmarks and obtain relative
is associated with the landmark j. On the other hand, a observations to them, consisting of a 3D relative measurement and
new landmark is created whenever the distance E exceeds a visual descriptor for each landmark. Fig. 3 shows a 3D view of the
a pre-defined threshold (selected experimentally), since the simulated environment. In Fig. 3(a) the position of each landmark
current observation cannot be correctly explained by any of the is indicated with an asterisk. In order to simulate a typical indoor
landmarks in the map up until now. environment, the landmarks have been placed over planar regions,
that simulate walls which restrict the visibility of the points (i.e. a
The data association is performed independently for each
robot cannot observe a landmark through a wall). This restriction
particle, and this means that different particles would make
can be appreciated in Fig. 3(b), where landmarks behind the wall
different data associations. In addition, the method presented is
indicated with a yellow line cannot be observed. We consider that
simple, since it needs to be computed for each particle. In a
previous work [3], we used this approach in the case of visual SLAM there is a SIFT descriptor associated to each landmark in the map.
with a single robot. The experimental results showed that the high To do this, we previously captured a set of images from an office
specificity of the SIFT descriptors allowed to build a precise map of environment and obtained a set of SIFT descriptors, which were
the environment. stored in a database. Next, we randomly associated each landmark
This mechanism for the data association is suitable for the to one of the SIFT descriptors. The size of the environment is 30 ×
multi-robot case explained here, since each of the robots computes 30 × 2 m, and there exist two loops inside.
the data association with a map common to the robot team. In The process of observing a visual landmark is simulated in
consequence, a robot may associate its current observation with a the following way: we consider that a robot is able to observe a
landmark initialized by a different robot, update the estimation of landmark in the environment when it lies in the field of view of
the landmark and localize with respect to it. In this sense, a robot the stereo cameras and the relative distance is below a threshold
does not need to return to previously explored places to reduce dmax . As previously said, an observation is constituted by zt ,hii =
uncertainty, since it may proceed to well-known areas previously (vt ,hii , dt ,hii ), where vt ,hii = (Xr , Yr , Zr ) is a relative measurement
explored by a different robot. and dt ,hii is a SIFT descriptor associated to the point. When a
Each robot includes its observations in the filter with a pre- robot observes a landmark, we compute vt ,hii using Eq. (14) and
defined order. For example in Algorithm 1 the robot h1i adds their add Gaussian noise to the measurement as computed by Eq. (15).
observations first, next robot h2i and h3i. Since the measurements To simulate the process of observing a landmark from different
are independent and Gaussian, this order does not affect the final viewpoints, we add Gaussian noise to the descriptor dt ,hii and study
estimation of the landmarks. the effects of this variations in the results.
Given an observation zt ,hii = (vt ,hii , dt ,hii ) obtained by one of the
6. Landmark initialization robots, the data association is solved as explained in Section 5, thus
assigning the observation zt ,hii = (vt ,hii , dt ,hii ) to the landmark θj in
In the previous section the data association mechanism was the map, or creating a new landmark.
explained. Two tests are performed consecutively in order to find Once the simulation is finished, the path st estimated is
the data association: first based on geometrical constrains and
compared with the true path and an RMS error is computed
second based on the visual appearance. When an observation
with the differences along the path. Analogously, the true map
cannot be explained by any of the landmarks in the map, a new
is compared with the map of the most probable particle, as
landmark is created. In this case, a new EKF is initialized with the
determined by Eq. (17) and an RMS error is computed in base of
following values:
the Euclidean distance between the true and the estimated path.
ct = N + 1 (20) The generation of a new set of particles creates a random set of
new poses by sampling from the motion model stated in Eq. (7).
µ[cmt ,]t = g −1 (xt , zt )
[m]
(21)
In consequence, different results may be obtained when running
Σc[tm,t] = GTθc R− 1
t Gθct (22) the simulation repeatedly with the same data. In consequence, the
t
simulations are repeated a number of times, computing a mean
d[cm
t
]
= dt ,hii (23) value and 2σ intervals.
A. Gil et al. / Robotics and Autonomous Systems 58 (2010) 68–80 75

a b

15

10
5 5

0 0

-5

Fig. 3. Figure (a) shows the simulated environment. Figure (b) shows the details of the simulated observations. Walls are represented by a line. As can be seen, the walls
restrict the visibility of the visual landmarks placed behind. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of
this article.)

8. Results

SLAM solutions aim at building a map, given a set of movements


performed and observations obtained by the robots. However,
SLAM approaches do not consider the problem of computing
the trajectory of the robots, e.g. in order to minimize the time
needed to explore the environment. In particular, this is normally
considered a different problem, denoted as Exploration. In the
simulated experiments we consider that the robots are able to
explore the environment in an efficient manner. That means that,
if a robot needs T movements to explore the whole environment,
we consider that each one of the K robots require T /K movements
to achieve the exploration of the environment. In other words,
the total distance traveled with only one robot is divided along
the group so that every robot traverses the same distance. The
paths followed by the robots were set by hand, considering
this requirement. In addition, different paths were used in the
experiments, with variations in the initial position of the robots.
During the simulation, the only information that the robots can use
to derive its position is the noisy odometry and the measurements Fig. 4. The figure shows the true path followed by the robots (continuous line) the
performed over the landmarks in the map. odometry (dashed line) and the estimated paths (marked with squares).
In Fig. 4 we show a particular simulation conducted with a team
of 3 robots. Fig. 4 shows the information about the estimation of Table 2
Most important parameters studied in the simulations.
robot paths at a particular step. In the figure, the true path followed
by the robots is shown with a solid line. Using Eq. (7) a noisy M Number of particles used in the filter
odometry is simulated, which is shown as a dashed line. The noise B Number of observations integrated at each simulation step.
dmax Maximum distance at which the observations are made.
parameters used in the odometry have values that resemble the
σd Standard deviation in the disparity computation.
noise in real robots from our laboratory. In the same figure, we σdt Standard deviation of the noise introduced in the descriptor.
present the best estimation over the paths followed by the robots,
denoted with squares. In addition, the uncertainty over the pose of
each one of the robots is presented with a set of particles. In Fig. 5(a) measurements at each simulation step, which are integrated in
we show the difference between the estimated and the true poses the Rao-Blackwellized filter. As a result of each simulation, we
during all the movements performed by the robots. In Fig. 5(b) we obtain an RMS error over the robot paths and the map. The RMS
show the error between the odometry and the true poses for all error in position is computed with respect to the true path. The
the robots. We can clearly see that the error in odometry grows same experiment was repeated several times for each value of
without bound and cannot be directly used to estimate the map. It M, calculating a mean value and a standard deviation over the
can be clearly seen that the error in the estimated path is reduced results for each M. Different trajectories were performed by the
significantly when compared with the odometry. robots, considering different starting positions in the map. When
We conducted a series of experiments and evaluated the results varying M the rest of parameters remain constant, with the values
when varying the parameters used by the SLAM algorithm to indicated in Table 2. The values that appear in the table are
estimate the map and the paths. The most important parameters reasonable and correspond to parameters used when building a
that were studied are gathered at Table 2. The same simulations visual map with a single robot using real data [3]. Fig. 6(a) shows
were performed using different number of robots. the RMS error in the estimated paths with error bars indicating
First, we performed a series of simulations with different 2σ intervals, whereas Fig. 6(b) presents the error in the map.
number M of particles. We consider that each robot obtains B Both, Fig. 6(a) and (b) present the results when 1, 2 or 3 robots
76 A. Gil et al. / Robotics and Autonomous Systems 58 (2010) 68–80

a b

Fig. 5. Fig. (a) shows the error between the estimated and the true pose (x, y, θ) for each of the movements. Fig. (b) shows the error between the odometry and the true
pose. In both figures, the first robot is indicated with a continuous line, the second with a dashed line, and the third is indicated with a dash-dotted line.

a b

Fig. 6. Fig. (a) shows the root mean square (RMS) error in the estimated path of the robots when varying the number of particles M = {1 10 50 100 200 300 400 500 1000 1500
2000}. Fig. (b) shows the RMS error in the estimated map when varying the number of particles M. The error bars are defined as two times the standard deviation. We show
simultaneously results with one robot ( ), two robots () and three robots (4).

are used and follow a similar trend. In general, for any number one of the robots observes landmarks seen previously by other
of robots deployed, we can clearly observe that the error in the robots. In the algorithm proposed, this has the same effect as if the
estimation decreases when the number of particles increases. This robot observed landmarks detected before and the uncertainty is
result was expected, since the function p(xth1:K i , L|zht1:K i , uth1:K i , c t ) is maintained low. As a consequence, since the uncertainty of each
approximated with more precision when the number of particles robot is low, it can be represented using less particles. To sum up,
M is large. The state to estimate is of dimension 3K , being K for a given number of particles, the map can be estimated with
the number of robots. According to [16] in a general case, the comparable precision using 1, 2 or 3 robots. The time needed to
number of particles needed to obtain an accurate estimation grows explore the environment is obviously reduced when more robots
exponentially with the dimension of the state. As a result, we are used.
should expect that the RMS error when using 3 robots would During the simulations, we assumed that the robot could detect
a landmark when it was placed at a distance below dmax and
be always greater than the error obtained with one robot, since
inside the field of view of the cameras. We performed different
more particles are needed to accurately estimate 3 robot paths.
simulations changing the maximum detection distance dmax while
However, the results demonstrate that for the same number of
maintaining constant the rest of the parameters shown in Table 2.
particles, more accurate results are obtained when using more
The results are presented in Fig. 7(a) and (b). It can be observed
robots. For example, Fig. 6(b) demonstrates that the map is more that the localization error decreases when the maximum distance
accurately estimated when the number of robots increases. This increases. This result can be explained in the following way:
result can be explained in the following manner: when one robot when the robot observes landmarks only at a close distance, it
explores the environment on its own, it must return to previously continuously explores new space, and the uncertainty over its
explored places in order to reduce the uncertainty over its pose. pose increases rapidly. On the contrary, when the observation
In a typical environment, as the one simulated, the robot needs to distance is large, the robot maintains its localization inside its
travel a long distance in order to re-visit previously explored places local map, and its uncertainty remains low. When using more
(i.e. close a loop) and accumulates a high uncertainty. This normally robots, a bigger observation distance dmax implies that with more
induces big errors in the estimation of the path and requires a probability a robot may be able to observe landmarks detected
big number of particles for an accurate estimation. In the same by other robots, thus improving the localization and the map, as
environment, when more robots are used, it occurs frequently that previously justified.
A. Gil et al. / Robotics and Autonomous Systems 58 (2010) 68–80 77

a b

Fig. 7. Figure (a) shows the root mean square error (RMS) computed over the estimated path computed for different values of dmax . Figure (b) shows the RMS error computed
between the real and the estimated map when the distance dmax is changed. The error bars are defined as two times the standard deviation. We show simultaneously results
with one robot ( ), two robots () and three robots (4).

a b

Fig. 8. Figure (a) shows the RMS error computed over the estimated path when noise is added to the visual descriptor σdesc . Figure (b) shows the RMS error computed
between the real and the estimated map. We show simultaneously results with one robot ( ), two robots () and three robots (4).

We have also studied the effects on the map construction that 9. Computational complexity
are produced by adding noise to the visual descriptor. As explained
in Section 7 each landmark in the map is associated with a SIFT In this section we analyze the computational complexity of the
descriptor. Once the robot observes a landmark, it obtains an multi-robot SLAM algorithm. We also compare the time needed by
observation zt ,hii = (vt ,hii , dt ,hii ). The relative measurement is the algorithm when changing some parameters in the filter. The
corrupted with a Gaussian noise defined by N (0, Rt ) using Eq. complexity of the FastSLAM algorithm in the case of one robot
(15), whereas a descriptor dt ,hii is associated considering dt ,hii = requires time O(MN ), where M is the number of particles in the
dj + N (0, σd2t ), where dj is the SIFT descriptor of the landmark θj in filter and N the number of landmarks in the map. The complexity
the map and N (0, σd2t ) represents a 128D vector of uncorrelated in M arises from the fact that M particles need to be processed,
Gaussian noise with variance σd2t . As explained in Section 5 the whereas, for each particle the data association needs to iterate over
data association makes use of the descriptor associated to each the N landmarks in the map. This is the basic SLAM filter presented
landmark. In consequence, adding more noise to the observed in Algorithm 1, where it is important to note that the for loop
descriptor implies that more false data associations will be in line 15 iterates over the N landmarks in the map. However, a
produced, and this affects negatively the quality of the results. We more efficient implementation can be achieved. In particular, in the
performed a series of simulations increasing the noise added to implementation used in the simulations, the landmarks are stored
the descriptor while maintaining the rest of parameters constant using a kd-tree structure. In this case, searching for the nearest
(with the values shown in Table 2). The results are presented in neighbour in the kd-tree can be done using O(log N ) time, which
Fig. 8. It can be noticed that, as the noise in the descriptor increases, accelerates the computation of the data association. This leads to
false data associations are likely to occur, and the results get worse. a complexity of O(M log N ). Finally, in the complete multi-robot
When using more robots, the number of observations integrated is algorithm, if we consider that the team is formed by K robots, then
greater, and this fact makes the process more robust to noise. the algorithm requires time O(KM log N ).
Next, we present the influence in the results when a different The algorithm presented here computes the generation of a
number of observations are integrated at each iteration of the new hypothesis, data association and update of the estimation
algorithm. At each simulation we used the parameters shown in separately for each particle. The generation, update and estimation
Table 2 and obtained different results when changing B. The results time are constant. For example, updating the estimation of a
are presented in Fig. 9. Fig. 9(a) present the RMS error in position landmark requires the computation of Eq. (8), that comprises
computed for the paths and Fig. 9(b) shows the RMS error in the matrices of 3 × 3 elements. The time needed by the algorithm
map. It can be noticed that the results in the estimation of the at each iteration depends on a number of factors, as the number
robot paths and the map improve when more observations are of landmarks that exist in the map landmarks at the current step,
integrated at each time step. In addition, the results exhibit the the number of observations found by the robots. In addition, the
same trend as observed in the experiments previously presented. computation time is different if the robots need to initialize new
78 A. Gil et al. / Robotics and Autonomous Systems 58 (2010) 68–80

a b

Fig. 9. Fig. (a) shows the root mean square (RMS) error in the estimated path of the robots when varying the number of observations integrated at each iteration of the
algorithm B = {0 1 5 10 15 20 25}. Fig. (b) shows the RMS error in the estimated map when varying the number of particles M. The error bars are defined as two times the
standard deviation. We show simultaneously results with one robot ( ), two robots () and three robots (4).

Fig. 10. The figure presents the time spent at each iteration of the SLAM algorithm during the creation of the map using two robots. The total time spent by the algorithm
is presented in continuous line. The time used in the prediction step is marked with stars. Using ( ), we present the time spent in the data association. Marked with (4) we
show the time spent in the landmark initialization and, finally, with () we present the time used in the landmark update.

landmarks or they perform an update of existing landmarks. In when using 2 or 3 robots. In a real exploration situation, when
Fig. 10 we present the time needed by the algorithm during a the robots navigate through the environment, the observations
particular experiment using 2 robots and M = 500 particles. must be processed within time constrains. First, the observations
The robots integrate a maximum of B = 10 observations at each must be integrated in the map. Second, based on this map, the
iteration. Fig. 10 presents the total time used at each iteration that exploration algorithm plans the next movement for each robot.
comprises the update of the common map by the two robots. In If the processing time is too high, then the robots would need to
addition, the figure shows the time spent in the prediction step, stop their movements and wait until the new observations have
the data association phase, the update and the initialization of new been integrated in the filter. In order to decrease the computational
landmarks. cost, we can reduce the number of observations that each robot
Next, we compare the processing time needed at each iteration introduces at each iteration of the algorithm. We consider that
of the algorithm when using a different number of robots. To each robot obtains B/K observations. That is, when using one robot,
do this, we performed the exploration using a different number we integrate B = 15 observations and when using three robots,
of robots and measured the time needed at each iteration. The we integrate B = 5 observations. In Fig. 11(b) we present the
time needed at each iteration of the algorithm depends on a mean processing time if the robots perform B/K observations at
series of parameters, basically the number of particles M used, the each iteration of the algorithm. As can be seen in the figure, the
number of observations B integrated at each time step. However, processing time in the case of 1, 2 or 3 robots is similar, for any
this time is not constant and increases with the number of number of particles. In this case, the total number of observations
landmarks that exist in the map: with more landmarks in the integrated in the filter is the same for the case of 1, 2 or 3 robots.
map, finding the correct data association becomes computationally With this restriction, we evaluated the error in the map, which is
expensive. Fig. 11(a) presents the mean time needed to process the shown in Fig. 11(c). It can be observed that the results are similar to
observations at each iteration of the algorithm, using a different the presented in Fig. 6(b), since the estimation of the map and the
number of particles used in the filter and considering that each paths improves when more robots are used. As a consequence, we
of the robots integrates B measurements. It can be seen that, as can reduce the computational cost of the algorithm by integrating
the number of particles M increases, the time needed to process less observations by each robot and still obtain comparable good
the observations grows exponentially, and is significantly greater results.
A. Gil et al. / Robotics and Autonomous Systems 58 (2010) 68–80 79

a b

Fig. 11. Figure (a) and (b) show the mean computation time at each iteration of the algorithm for different number of particles M = {1 10 50 100 200 300 400 500 1000 1500
2000}. Figure (c) shows the RMS error in the map when B/K observations are used.

10. Conclusion robust to false data associations, and is able to produce a good
solution with a wide range of parameter settings.
We have presented an approach to visual SLAM that builds a In the presented approach, the dimension of the state to
3D map of the environment using a team of cooperative robots. estimate grows with the number of robots. In principle, the
We consider that the robots are equipped with stereo heads that number of particles needed to obtain a good estimation grows
allow us to perform observations on visual landmarks, each one exponentially with the dimension of the state to estimate. Under
consisting of a relative measurement and a visual descriptor. typical conditions we show that using the same number of particles
The approach builds a unique map of the environment that is M, the precision of the map estimated using a single robot is similar
shared by all the robots in the team, using a Rao-Blackwellized to the precision obtained using 2 or 3 robots. In addition, the
particle filter to estimate both the map and the trajectories of the results show that, when more robots are used, a precise map can be
robots. The data association problem is solved using the relative obtained without increasing the computational cost of the method.
measurements obtained by the robots and comparing the visual As a consequence, since more robots exist in the environment,
descriptor. With this strategy we have found that the number the time needed to explore the environment will be reduced
of false correspondences is low, thus allowing to obtain good significantly, whereas the computational cost is not increased.
estimations of the map and the path. As a conclusion, under the assumptions considered, we have
We have tested the validness of the approach by means of demonstrated that the proposed SLAM algorithm is suitable
simulations. For this purpose, an indoor environment has been for small groups of robots exploring a given environment
simulated. We used a noise model to represent the relative and obtaining observations over visual landmarks. We show
measurements obtained by the stereo cameras and used typical that precise results can be obtained, without increasing the
values for the noise in the odometry model, as estimated in real computational cost considerably.
robots in our laboratory. In the experiments we assumed that the In this paper we considered that the movement of the robots is
robots were able to explore the environment efficiently, thus we restricted to a plane, whereas the map is considered to be 3D. In
compare the case in which the total distance traveled by one robot the future, we plan to extend the work to a more general 6 DOF
to explore the environment is divided uniformly along the team movement of the robots. This would be the case of an outdoor
members. A great number of simulations have been performed, exploration carried out by a group of robots. However, in order
that precisely resemble real exploring situation. The simulations to estimate a more general movement of the robots, the size of
allowed us to test the approach under different conditions, the state vector needs to be increased, thus adding computational
by changing the parameters of the Rao-Blackwellized particle complexity to the problem. In this paper we have concentrated on
filter. We could also evaluate the algorithm when the descriptor the joint estimation of the map by a group of robots.
associated to a visual landmarks was corrupted with noise, and As a future work, we plan to validate the simulations by using
evaluated its effects on the results. With this procedure we experimental data obtained by real robots and natural landmarks
could emulate the effects of observing a landmark from different occurring in the environment. We further plan to use the mutual
viewpoints. The results presented show that the algorithm is observations of the team, that is, when a robot observes the relative
80 A. Gil et al. / Robotics and Autonomous Systems 58 (2010) 68–80

position of another member in the team. We expect that this [20] P. Jensfelt, D. Kragic, J. Folkesson, M. Björkman, A framework for vision based
information will lead to further improvement of the results. bearing only 3D SLAM, in: IEEE Proc. of the Int. Conf. on Robotics & Automation,
2006.
[21] A.C. Murillo, J.J. Guerrero, C. Sagüés, SURF features for efficient robot
Acknowledgment localization with omnidirectional images, in: IEEE Int. Conf. on Robotics and
Automation, 2007.
[22] S. Frintrop, P. Jensfelt, Active gaze control for attentional visual SLAM, in: IEEE
This work has been supported by the Spanish government Int. Conf. on Robotics and Automation, ICRA, 2008, pp. 3690–3697.
by means of project: ‘Sistemas de Percepción Visual Móvil y [23] D. Lowe, Object recognition from local scale-invariant features, in: Proc. of the
Int. Conf. on Computer Vision, ICCV, 1999, pp. 1150–1157.
Cooperativo como Soporte para la Realización de Tareas con Redes
[24] D. Lowe, Distinctive image features from scale-invariant keypoints, Interna-
de Robots’. Project reference: CICYT DPI2007-61107, Ministerio de tional Journal of Computer Vision 2 (60) (2004) 91–110.
Educación y Ciencia. [25] O. Martínez-Mozos, A. Gil, M. Ballesta, O. Reinoso, Interest Point Detectors
for Visual SLAM, in: Lectures Notes in Artificial Intelligence (LNAI), vol. 4788,
2007.
References [26] M. Ballesta, O. Martínez-Mozos, A. Gil, O. Reinoso, A Comparison of Local
Descriptors for Visual SLAM, in: Proceedings of the Workshop on Robotics and
[1] C. Stachniss, G. Grisetti, D. Hähnel, W. Burgard, Improved Rao-Blackwellized Mathematics, RoboMat 2007, 2007.
mapping by adaptive sampling and active loop-closure, in: Proc. of the
Workshop on Self-Organization of AdaptiVE behavior, SOAVE, 2004, pp. 1–15.
[2] R. Triebel, W. Burgard, Improving simultaneous mapping and localization in Arturo Gil received the M. Eng. degree in Industrial
3D using global constraints, in: Proc. of the National Conference on Artificial Engineering from Miguel Hernández University (UMH),
Intelligence, AAAI, 2005. Elche, Spain, in 2002, receiving also the best student
[3] A. Gil, O. Reinoso, O. Martínez-Mozos, C. Stachniss, W. Burgard, Improving academic award in Industrial Engineering by the UMH.
data association in vision-based SLAM, in: Proc. of the IEEE/RSJ Int. Conf. on Since 2003, he works as a lecturer and researcher at the
Intelligent Robots and Systems, IROS, China, 2006. UMH, teaching subjects related to Control and Computer
[4] R. Sim, P. Elinas, M. Griffin, J.J. Little, Vision-based SLAM using the Vision. His research interests are focussed on mobile
Rao-Blackwellised particle filter, in: IJCAI Workshop on Reasoning with robotics, visual SLAM and cooperative robotics. He is
Uncertainty in Robotics, 2005. currently working on techniques to build visual maps
[5] L.M. Paz, P. Pinis, J.D. Tardós, J. Neira, Large-scale 6-DOF SLAM with stereo-in- using teams of mobile robots.
hand, IEEE Transactions on Robotics 24 (5) (2008) 946–957.
[6] D. Schleicher, L.M. Bergasa, R. Barea, E. López, M. Ocaña, J. Nuevo, Real-time
wide-angle stereo visual slam on large environments using SIFT features
Óscar Reinoso received the M. Eng. degree from the
correction, in: Proceedings IEEE/RSJ Int. Conf. on Robotics and Automation,
Polytechnical University of Madrid (UPM), Madrid, Spain
ICRA, 2007, pp. 3878–3883.
in 1991. Later, he obtained de Ph.D. degree in 1996.
[7] L. Parker, Current state of the art in distributed autonomous mobile robotics,
He worked at Protos Desarrollo S.A. company in the
Distributed Autonomous Robotic Systems 14 (6) (2000) 3–12.
development and research of artificial vision systems from
[8] B. Stewart, J. Ko, D. Fox, K. Konolige, A hierarchical bayesian approach to
1994 to 1997. Since 1997, he works as a professor at Miguel
mobile robot map structure estimation, in: Proceedings of the Conference on
Hernández University (UMH), teaching subjects related
Uncertainty in AI, UAI, 2003.
to Control, Robotics and Computer Vision. His research
[9] J.W. Fenwick, P.M. Newman, J.J. Leonard, Cooperative concurrent mapping and
interests are in mobile robotics, climbing robots and visual
localization, in: IEEE Int. Conf. on Robotics and Automation, ICRA, 2002, pp.
inspection systems. He is member of CEA-IFAC and IEEE.
1810–1817.
[10] M.W.M. Gamini Dissanayake, P. Newman, S. Clark, H. Durrant-Whyte, M.
Csorba, A solution to the simultaneous localization and map building (SLAM)
problem, IEEE Transactions on Robotics and Automation (ICRA) 17 (2001). Mónica Ballesta received the M. Eng. degree in Indus-
[11] J. Neira, J.D. Tardós, Data association in stochastic mapping using the joint trial Engineering from the Miguel Hernández University
compatibility test, Transactions on Robotics and Automation 17 (6) (2001) (UMH), Elche, Spain in 2005, receiving also the academic
890–897. award in Industrial Engineering by the UMH. Since 2007,
[12] S. Se, D. Lowe, J. Little, Vision-based mobile robot localization and mapping she has a research position as scholarship holder in the
using scale-invariant features, in: IEEE Int. Conf. on Robotics and Automation, area of Systems Engineering and Automation of the UMH,
ICRA, 2001, pp. 2051–2058. receiving a pre-doctoral grant (FPI) by the Valencian Gov-
[13] K. Murphy, Bayesian map learning in dynamic environments, in:Neural ernment (Generalitat Valenciana). Her research interests
Information Processing Systems (NIPS), 1999. are focussed on mobile robots, visual feature extraction
[14] M. Montemerlo, S. Thrun, Simultaneous localization and mapping with and visual SLAM.
unknown data association using FastSLAM, in: IEEE Int. Conf. on Robotics and
Automation, ICRA, 2003.
[15] A. Gil, O. Reinoso, M.A. Vicente, C. Fernández, L. Payá, Monte carlo localization Miguel Juliá Miguel Juliá received a M. Eng. degree in
using SIFT features, Lecture Notes in Computer Science 3523 (2005) 623–630. Telecommunications Engineering at Miguel Hernández
[16] S. Thrun, W. Burgard, D. Fox, Probabilistic Robotics, The MIT Press, ISBN: 0- University (UMH), Elche, Spain. During 2006, he worked as
262-20162-3, 2005. a researcher at the Computer Science and Communications
[17] P.R. Bevington, D.K. Robinson, Data Reduction and Error Analysis for the department, at Murcia University. Since 2007 he is
Physical Sciences, McGraw-Hill, 1992. involved in research projects at the Systems Engineering
[18] S. Se, D. Lowe, J. Little, Mobile robot localization and mapping with uncertainty department at the UMH. Currently, his research interests
using scale-invariant visual landmarks, International Journal of Robotics focus on mobile robotics, exploration and computer vision.
Research 21 (8) (2002) 735–758.
[19] A.J. Davison, Real-time simultaneous localisation and mapping with a single
camera, in: Proc. of the Int. Conf. on Computer Vision, ICCV, 2003.

You might also like