Professional Documents
Culture Documents
gil2010
gil2010
1. Introduction (i) Stereo vision systems are typically less expensive than laser
range systems.
Acquiring maps of the environment is an essential ability for (ii) Most laser ranging systems provide 2D information from
autonomous mobile robots, since the maps are needed to perform the environment whereas stereo vision systems are able to
higher level tasks. In consequence, in the last two decades the provide a more complete 3D representation of the space.
problem of simultaneous localization and mapping (SLAM) has (iii) Vision systems provide a great quantity of information and
received significant attention. The SLAM problem considers the allow us to integrate in the robot other techniques, such as
situation in which an autonomous mobile robot moves through face or object recognition.
an unknown space and incrementally builds a map of this However, stereo systems are usually less precise than laser
environment while simultaneously uses this map to compute its sensors and the information from the cameras needs normally
absolute location. This problem is considered inherently difficult, to be processed in order to extract salient features. In common
since noise introduced in the estimate of the robot pose leads to configurations, the camera is installed at a fixed height and
noise in the estimate of the map and vice versa. orientation [3,4] with respect to the robot reference system and the
To date, typical SLAM approaches have been using laser range movement of the camera is restricted to a plane. However, in recent
sensors to build maps in two and three dimensions (e.g., [1,2]). times some researchers have constructed visual maps allowing a
Recently, the interest on using cameras as sensors in SLAM has free 6 DOF movement of the camera [5,6].
increased and some authors have been concentrating on building Most approaches to visual SLAM are feature-based. In this
3D maps using visual information obtained from cameras. These case, a set of significant points in the environment are used
approaches are usually denoted as visual SLAM. The reasons for this as landmarks. Mainly, two steps must be distinguished in the
interest stem from: selection of visual landmarks. The first step involves the detection
of interest points in the images that can be used as reliable
landmarks. The points should be detected at different distances
∗ Corresponding author. Tel.: +34 966658620; fax: +34 966658979. and viewing angles, since they will be observed by the robot
E-mail address: arturo.gil@umh.es (A. Gil). from different poses in the environment. At a second step the
URL: http://arvc.umh.es (A. Gil). interest points are described by a feature vector which is computed
0921-8890/$ – see front matter © 2009 Elsevier B.V. All rights reserved.
doi:10.1016/j.robot.2009.07.026
A. Gil et al. / Robotics and Autonomous Systems 58 (2010) 68–80 69
using local image information. This descriptor is used in the data obtained under different circumstances. The results demonstrate
association problem, that is, when the robot has to decide whether that the approach is suitable to build visual maps using small teams
the current observation corresponds to one of the landmarks in of robots.
the map or to a new one. When the robot observes a visual The main reason that motivated to carry out a series of
landmark in the environment, it obtains a distance measurement experiments using simulated data is that this enables to assess
and computes a visual descriptor. Next, the descriptor and the the validity of the presented algorithm under different conditions.
measurement are used to find the landmark in the map that In simulation, the map and the true path of the robots is
generated the observation. Generally, when the robot traverses known in advance, and this enables us to evaluate the precision
unexplored areas, the observations correspond to new landmarks of the approach when varying some of the parameters in the
and are thus initialized in the map. However, when the robot Rao-Blackwellized filter. Furthermore, we compare the results
traverses previously explored places, it re-observes landmarks. In when different number of robots are used and evaluate the
this case, if current observations are correctly associated with the computational cost introduced by more robots. On the contrary,
visual landmarks, the robot will be able to find its location with evaluating the results of visual SLAM using real data is complex.
respect to these landmarks, thus reducing the error in its pose. If On the one hand the true path followed by the robots is not known.
the observations cannot be correctly associated to the landmarks On the other hand, the map cannot be known precisely, since the
in the map, the map will become inconsistent. To sum up, the data exact location of the landmarks cannot be established in advance.
association is a fundamental part of the SLAM process, since wrong The remainder of the paper is structured as follows. First,
data associations will produce incorrect maps. Section 2 discusses related work. Section 3 presents the approach
An important subfield within mobile robotics that requires to multi-robot SLAM, our map representation is also detailed.
accurate maps is the performance of collaborative tasks by Section 4 deals with visual landmarks and their utility in SLAM.
multiple vehicles. Multiple vehicles can frequently perform tasks Next, Section 5 exposes our approach to the data association
more quickly and robustly than a single one [7]. However, little problem in the context of visual features. The most relevant
effort has been done until now in the field of multi-robot visual features of the simulated environment are explained in Section 7.
SLAM, which considers the case where several robots move along Next, in Section 8 we present the results obtained. Finally,
the environment and build a map. In this paper we concentrate Section 10 summarizes the most important conclusions and
on this problem and propose a solution that allows us to build proposes future extensions of the work.
a map using a set of visual observations obtained by a team of
mobile robots. It is worth noting that SLAM algorithms focus on 2. Related work
the incremental construction of a map, given a set of movements
carried out by the robots and the set of observations obtained from Up until now, approaches to multi-robot SLAM can be grouped
different locations. However, SLAM algorithms do not consider in one of the two following solutions:
the computation of the movements that need to be performed
(i) Approaches in which each robot estimates its own individual
by the robots, since this is generally considered as a different
map using its observations. At a later stage, a common map is
problem, denoted as exploration. Classical exploration strategies
formed by fusing the individual maps of the robot team.
often try to cover unknown terrain as fast as possible and avoid to
(ii) Approaches where the estimation of all the trajectories and the
repeatedly visit known areas. This strategy, however, is suboptimal
map is made jointly. A single map is computed simultaneously
in the context of the SLAM problem because the robot typically
using the observations of all the robots.
needs to re-visit places in order to localize itself and reduce the
uncertainty [1]. If the exploration has to be accomplished using a The work exposed in [8] can be classified in the first group.
team of robots, the problem becomes harder, since the exploration The idea here is that each robot builds its own map, and, at the
algorithm needs to avoid that different robots explore the same same time continuously attempts to localize in the maps built by
areas in the map. In this paper we concentrate on the visual other robots using particle filters. The approach can cope with the
SLAM problem and assume that the robots are able to explore the situation where the initial locations of the robots are unknown,
environment in an efficient way. solving the data association problem with a rendez-vous technique.
The major contribution of this paper is twofold. First we propose However, the fusion of the individual maps is computationally
an approach to the multi-robot SLAM problem using a Rao- expensive, since K 2 particle filters must be maintained for a team
Blackwellized Particle Filter (RBPF). To the best of our knowledge, of K robots.
this is the first work that uses visual measurements provided by The approach presented in [9] can be classified in the second
several robots to build a common 3D map of the environment. group. It uses an extended Kalman filter (EKF) to estimate a state
Second, we simulate the process of visual SLAM using multiple vector formed by the poses of all the vehicles and a set of 2D
robots. We consider that the robots perform observations on landmarks. With this extension, the robots obtain observations
visual landmarks using stereo cameras, and, in addition, each and construct a single unified map using the update equations of
observation is associated with a visual descriptor, that allows the classical EKF [10]. The initial positions of the vehicles must
to partially differentiate between landmarks. In addition, we be known in advance and the data association is assumed to be
propose a solution to the data association in the context of visual known. In addition, the gain in performance when using multiple
SLAM and compare the results when the noise in the descriptor vehicles is proved theoretically within this context. In this case, the
varies, which simulates the variation in the descriptor when a main drawback stems from the fact that a single hypothesis over
landmark is observed from different viewpoints. Introducing noise the robot pose is maintained. If false data associations are made the
in the descriptor makes the data association harder and affects whole EKF may diverge [11].
negatively the results of the SLAM algorithm. We consider that In [12] stereo vision is used to extract 3D visual landmarks from
the robots are able to communicate with a central agent in the environment. During exploration, the robot extracts SIFT (Scale
the system that is in charge of building a common map of the Invariant Feature Transform) features from stereo images and cal-
environment. When a robot obtains an observation, it transmits culates relative measurements to them. Landmarks are then in-
this information to the central agent, along with the information tegrated in the map with a Kalman Filter associated to each one.
about its odometry. Finally, the relative initial position of the robots However, this approach does not manage the uncertainty associ-
must be known approximately. We present simulative results ated with robot motion, and only one hypothesis over the pose of
70 A. Gil et al. / Robotics and Autonomous Systems 58 (2010) 68–80
the robot is maintained. Consequently it may fail in the presence is referred as xt = {x1 , x2 , . . . , xt }, the set of observations made by
of large odometric errors (e.g. while closing a loop). In [3] a Rao- the robot until time t will be designated z t = {z1 , z2 , . . . , zt } and
Blackwellized particle filter is used to estimate simultaneously the the set of actions ut = {u1 , u2 , . . . , ut }. We formulate the SLAM
map and the path of a single robot exploring the environment. In problem as that of determining the location of all landmarks in the
the mentioned work, SIFT features are used too as landmarks in the map L and robot poses xt from a set of measurements z t and robot
environment and extracted using a stereo pair of cameras. A dif- actions ut . Thus it can be stated as the estimation of the posterior:
ferent usage of SIFT features can be found in [6]. In this case SLAM
is performed using a low cost wide-angle stereo camera and vi- p(xt , L|z t , ut , c t ) (1)
sual landmarks extracted using the Shi and Tomasi operator. The t
where c designates the set of data associations performed until
approach divides the global map into local submaps identified by
time t, c t = {c1 , c2 , . . . , ct }. While exploring a particular environ-
a SIFT fingerprint, composed by a number of SIFT landmarks. The
ment, the robot has to determine whether a particular observation
SIFT fingerprint is used from time to time for loop-closing tasks.
zt = (vt , dt ) corresponds to a previously mapped landmark or to a
Recently, an approach to visual SLAM was presented in [5].
new one. Given that, at time t the map is formed by N landmarks,
Harris points are used as landmarks in the environment and used
this correspondence is represented by ct , where ct ∈ [1 . . . N ],
to build locally independent maps using EKF–SLAM. The local maps
meaning that the observation zt corresponds to the landmark ct
are joined into a full map using a divide and conquer algorithm. It is
in the map. When no correspondence is found we denote it as
capable of mapping large environments using a stereo pair moving
ct = N + 1, indicating that it is a new landmark. For the moment,
with 6 DOF. It is worth noting that the algorithm uses range and
we consider this correspondence as known.
bearing measurements and, at the same time, is able to integrate
The SLAM problem can be separated into two parts:
bearing-only measurements in the map.
(i) The estimation of the trajectory of the robot.
3. Multi-robot visual 3D SLAM (ii) The estimation of the map by means of a series of measure-
ments.
In this section we describe our approach to the SLAM problem However, the two facets can be separated and, if somehow we
in the case where a team of robots explore simultaneously the could know the trajectory of the robot in the environment, then
environment. SLAM is considered to be a complex task due to the
the estimation of the map would be trivial. This conditional
mutual dependency between the map of the environment and the
independence property of the SLAM problem implies that the
pose of the robot. That means that, if we make an error in the
posterior (1) can be factored as [14]:
estimation of the pose, this induces an error in the estimation
of the map and vice versa. The problem becomes harder when N
Y
more robots participate in the construction of the map, since the p(xt , L|z t , ut , c t ) = p(xt |z t , ut , c t ) p(lk |xt , z t , ut , c t ). (2)
dimension of the state increases. k=1
In the approach presented here, the robots share the observa-
This equation states that the full SLAM posterior is decomposed
tions performed over the landmarks and create a common map of
into two parts: one estimation over robot paths, and N indepen-
the environment. It is based, in essence, on a Rao-Blackwellized
dent estimators over landmark positions, each conditioned to the
particle filter (RBPF), proposed initially by Murphy [13] and com-
path estimate. We approximate p(xt |z t , ut , c t ) using a set of M par-
monly referred as FastSLAM in the SLAM community [14]. Basi-
ticles, each particle having N independent landmark estimators
cally, a Rao-Blackwellized particle filter combines a representation
(implemented as EKFs), one for each landmark in the map. Each
of the pose by means of particles with a closed estimation of some
particle is thus defined as:
variables.
In order to build the map, we assume that the robots are St
[m]
= {xt ,[m] , µt[m,1] , Σt[,m1] , d[1m] , . . . , µt[m,N] , Σt[,mN] , d[Nm] }, (3)
equipped with a stereo camera system, which enables them to ob-
tain relative measurements from landmarks that are detected in where µt[m,k] is the best estimation at time t for the position of
the environment. In addition, we assume that a descriptor associ- [m]
landmark lk based on the path of the particle m and Σt ,k is
ated to the landmark can be obtained, based on its visual appear-
the associated covariance matrix. The visual descriptor associated
ance. We also consider that the robots are able to communicate [m]
among themselves and with a central agent in the system. In addi- to the landmark j is represented by dj . The particle set St =
tion, we consider that the relative starting position of the robots is {St[1] , St[2] , . . . , St[M ] } is calculated incrementally from the set St −1 at
approximately known in advance. time t − 1 and the robot control ut . Thus, each particle is sampled
For simplicity, we first start our explanation assuming that from a proposal distribution xt
[m]
∼ p(xt |xt −1 , ut ) that models
a single robot explores the environment while it observes the noise in the odometry of the robots. Next, and following the
visual landmarks. At time t the robot obtains an observation zt , approach of [14], each particle is then assigned a weight according
constituted by zt = (vt , dt ), where vt = (Xc , Yc , Zc ) is a 3D vector to:
relative to the left camera reference frame and dt is the visual
1 1
e{− 2 (vt −v̂t ,ct ) [Zct ] (vt −v̂t ,ct )}
T −1
descriptor associated to the landmark. The map L is represented ωt[m] = p (4)
by a collection of N landmarks L = {l1 , l2 , . . . , lN }. Each landmark |2π Zct |
is described as: lk = {µk , Σk , dk }, where µk = (Xk , Yk , Zk ) is a
vector describing the position of the landmark referred to a global where vt is the actual measurement and v̂t ,ct is the predicted
[m]
reference frame, with associated covariance matrix Σk . In addition, measurement for the landmark ct based on the pose xt . The
each landmark lk is associated with a descriptor dk that partially matrix Zct is the covariance matrix associated with the innovation
differentiates it from others. This map representation is compact (vt − v̂t ,ct ). Note that we implicitly assume that each measurement
and has been used to effectively localize a robot in unmodified vt has been associated to the landmark ct of the map. This problem
environments [15]. is, in general, hard to solve, since similar-looking landmarks may
Following the usual nomenclature in Rao-Blackwellized SLAM, exist. In Section 5 we describe our approach to this problem. In the
we denote the robot pose at time t as xt and the movement of the case that B observations from different landmarks exist at a time
robot at time t as ut . On the other hand, the robot path until time t t, that is zt = {zt ,1 , zt ,2 , . . . , zt ,B }, we calculate the total weight
A. Gil et al. / Robotics and Autonomous Systems 58 (2010) 68–80 71
This equation proposes a manner to estimate a group of K paths • Generate a new particle set based on the prior set.
xth1:K i and a map L conditioned to the case that the robots have per-
• Update the estimation of each landmark based on the
formed a number of movements uth1:K i and a series of observations observations.
zht1:K i associated to landmarks in the map c t . In consequence, anal- • Calculate a weight for each particle.
ogously to Eqs. (2) and (5) expresses that we can separate the es- • Perform a resampling based on the weight of each particle.
timation of the map and the estimation of K different paths into
two parts: The function p(xth1:K i |zht1:K i , uth1:K i , c t ) is estimated us- In order to clarify the ideas presented here we describe the
ing a particle filter, while the map is estimated using N indepen- whole process in Algorithm 1, which considers the case in which 3
dent estimations conditioned to the paths xth1:K i . As a result, we are robots explore simultaneously the environment.
decomposing the SLAM problem into two different parts: A local-
ization problem of K robots in an environment and a series of in-
3.1. Generating a new particle set
dividual landmark estimations conditioned to robot paths xth1:K i . In
order to achieve this, each one of the M particles in the filter is
accompanied with N independent estimators for each one of the The first step is to generate a new set of hypothesis St based on
landmarks, implemented as EKFs. In our case, each of the Kalman the set St −1 . That means that we obtain a new particle set over the
[m]
Filters will be conditioned to the K paths of the robot team. In con- robot poses xt ,h1:K i . That is, we obtain a new pose xt ,hii for each of
sequence, each particle is represented as: the robots by sampling from a motion model p(xt |xt −1 , ut ):
[m]
St = {xth1,[:mK i] , µ[1m,t] , Σ1[m,t ] , d[1m] , . . . , µ[Nm,t] , ΣN[m,t] , d[Nm] }. (6) xt ,hii ∼ p(xt ,hii |xt −1,hii , ut ,hii ).
[m]
(7)
The fundamental difference compared to the particle defined in (3)
is that, in this case, the state that we would like to estimate The function p(xt ,hii |xt −1,hii , ut ) defines the movement model for
is composed by the pose (x, y, θ ) of K robots, thus xt ,h1:K i = the mobile agent. We apply this movement model to each of
{xt ,h1i , xt ,h2i , . . . , xt ,hK i }. As a result, we propose a joint estimation the K poses of the particle separately, based on the movement
over a path state of dimension 3K . According to [16], the number performed by the hii robot. Fig. 1 shows the application of Eq. (7).
of particles needed to obtain a good estimation increases exponen- The solid lines represent the true movements performed by the
tially with the dimension of the state. However, the results that robots while the dotted lines correspond to the odometry readings.
we present here show that the approach works perfectly for robot In the beginning of the movements, all the particles concentrate on
teams of 2–3 members using a reasonable number of particles. In the origin. After applying the movement model we obtain a set of
the case presented, the same map is shared by all the robots, which particle clouds that represent the probability over the poses of the
means that an observation performed by a particular robot affects robots at each time step. In this way, non-linear motion models can
the map of the whole robot team. In consequence, one member of be easily represented, as opposed to the linearization required by
the team may observe a landmark previously mapped by a differ- EKF–SLAM approaches [10].
ent robot and update its estimate. In addition, this means that a
robot does not need to explicitly close a loop in order to reduce the
uncertainty in its pose (i.e. return to a previously mapped area). 3.2. Landmark estimation
A robot can reduce the uncertainty in its pose when it observes
landmarks previously mapped by other robots. The structure of the Updating each landmark estimate is performed based on the
particle defined in (6) is clearly explained in Table 1. To sum up, we pose of the hii robot that performed the observation zt ,hii =
propose a method based on a Rao-Blackwellized particle filter for {vt ,hii , dt ,hii } with data association ct . Updating each landmark lk
the case where a robot team cooperates to build a map of a given is performed independently, using the standard EKF equations
environment. The algorithm can be decomposed into 4 basic steps: detailed below:
72 A. Gil et al. / Robotics and Autonomous Systems 58 (2010) 68–80
v̂t ,hii = g (x[tm,hi]i , µ[cmt ,]t −1 ) (8) observations zt ,h1:K i obtained by the robots have not been included
in the filter. This distribution is commonly known as proposal
Glct = ∇lct g (xt , lct )x =x[m] ;l =µ[m] (9) distribution. However, we would like to estimate the posterior
t t ,hii ct ct ,t −1
p(xth1:K i , L|zht1:K i , uth1:K i , cht1:K i ) which includes all the information
[m]
Zct ,t = Glct Σct ,t −1 GTlc + Rt (10) from odometry and sensors until time t, generally stated as the
t
I 2 σc2 I 2 (c − Cx )2 σd2
σX2r = + ,
d2 d4 4. Visual landmarks
(15)
I 2 σr2 I 2 (Cy − r )2 σd2 f 2 I 2 σd2
σY2r = + , σZ2r = . We define visual landmark as a point in space that can be easily
d2 d4 d4
detected from different distances and viewing angles by means of a
The noise matrix Rt is then computed as Rt = diag(σ , σ , 2
Xr
2
Yr vision sensor. Corners in images are typically good candidates to be
σZ2r ). This noise model has previously used at a visual SLAM context employed as landmarks for SLAM tasks, since they are stable and
in [12,18]. In the experiments we have used σr = σc = 10 pixels can typically be detected from different viewpoints.
and σd = 0.5, which produce typical errors in a common stereo Solving the SLAM problem with a landmark oriented approach
cameras. In real stereo vision systems the independence between involves two main processes: The detection, that enables the
the measurements X r , Y r and Z r is not true, and a more complex landmark to be detected from a set of poses in the environment,
noise model needs to be derived. Instead, the only assumption that and the description of the landmark, that aims at representing the
can be made is the independence of the noise in the projections landmark based on its visual appearance. The descriptor enables
(r , c ) in the left and right images. the robot to recognize a particular landmark in the map. In the
case of visual landmarks, we aim at detecting some points in the
3.3. Assigning a weight to each particle images that are highly salient, and can easily be detected from
different distances and viewing angles. The description of the
The set of particles generated by the movement model are visual landmarks is based on a sub-image centered at the detected
distributed according to p(xth1:K i |zht1−:K1 i , uth1:K i , cht1−:K1 i ), since the lasts point. The case of visual SLAM is particularly difficult, since:
A. Gil et al. / Robotics and Autonomous Systems 58 (2010) 68–80 73
a b
15
10
5 5
0 0
-5
Fig. 3. Figure (a) shows the simulated environment. Figure (b) shows the details of the simulated observations. Walls are represented by a line. As can be seen, the walls
restrict the visibility of the visual landmarks placed behind. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of
this article.)
8. Results
a b
Fig. 5. Fig. (a) shows the error between the estimated and the true pose (x, y, θ) for each of the movements. Fig. (b) shows the error between the odometry and the true
pose. In both figures, the first robot is indicated with a continuous line, the second with a dashed line, and the third is indicated with a dash-dotted line.
a b
Fig. 6. Fig. (a) shows the root mean square (RMS) error in the estimated path of the robots when varying the number of particles M = {1 10 50 100 200 300 400 500 1000 1500
2000}. Fig. (b) shows the RMS error in the estimated map when varying the number of particles M. The error bars are defined as two times the standard deviation. We show
simultaneously results with one robot ( ), two robots () and three robots (4).
are used and follow a similar trend. In general, for any number one of the robots observes landmarks seen previously by other
of robots deployed, we can clearly observe that the error in the robots. In the algorithm proposed, this has the same effect as if the
estimation decreases when the number of particles increases. This robot observed landmarks detected before and the uncertainty is
result was expected, since the function p(xth1:K i , L|zht1:K i , uth1:K i , c t ) is maintained low. As a consequence, since the uncertainty of each
approximated with more precision when the number of particles robot is low, it can be represented using less particles. To sum up,
M is large. The state to estimate is of dimension 3K , being K for a given number of particles, the map can be estimated with
the number of robots. According to [16] in a general case, the comparable precision using 1, 2 or 3 robots. The time needed to
number of particles needed to obtain an accurate estimation grows explore the environment is obviously reduced when more robots
exponentially with the dimension of the state. As a result, we are used.
should expect that the RMS error when using 3 robots would During the simulations, we assumed that the robot could detect
a landmark when it was placed at a distance below dmax and
be always greater than the error obtained with one robot, since
inside the field of view of the cameras. We performed different
more particles are needed to accurately estimate 3 robot paths.
simulations changing the maximum detection distance dmax while
However, the results demonstrate that for the same number of
maintaining constant the rest of the parameters shown in Table 2.
particles, more accurate results are obtained when using more
The results are presented in Fig. 7(a) and (b). It can be observed
robots. For example, Fig. 6(b) demonstrates that the map is more that the localization error decreases when the maximum distance
accurately estimated when the number of robots increases. This increases. This result can be explained in the following way:
result can be explained in the following manner: when one robot when the robot observes landmarks only at a close distance, it
explores the environment on its own, it must return to previously continuously explores new space, and the uncertainty over its
explored places in order to reduce the uncertainty over its pose. pose increases rapidly. On the contrary, when the observation
In a typical environment, as the one simulated, the robot needs to distance is large, the robot maintains its localization inside its
travel a long distance in order to re-visit previously explored places local map, and its uncertainty remains low. When using more
(i.e. close a loop) and accumulates a high uncertainty. This normally robots, a bigger observation distance dmax implies that with more
induces big errors in the estimation of the path and requires a probability a robot may be able to observe landmarks detected
big number of particles for an accurate estimation. In the same by other robots, thus improving the localization and the map, as
environment, when more robots are used, it occurs frequently that previously justified.
A. Gil et al. / Robotics and Autonomous Systems 58 (2010) 68–80 77
a b
Fig. 7. Figure (a) shows the root mean square error (RMS) computed over the estimated path computed for different values of dmax . Figure (b) shows the RMS error computed
between the real and the estimated map when the distance dmax is changed. The error bars are defined as two times the standard deviation. We show simultaneously results
with one robot ( ), two robots () and three robots (4).
a b
Fig. 8. Figure (a) shows the RMS error computed over the estimated path when noise is added to the visual descriptor σdesc . Figure (b) shows the RMS error computed
between the real and the estimated map. We show simultaneously results with one robot ( ), two robots () and three robots (4).
We have also studied the effects on the map construction that 9. Computational complexity
are produced by adding noise to the visual descriptor. As explained
in Section 7 each landmark in the map is associated with a SIFT In this section we analyze the computational complexity of the
descriptor. Once the robot observes a landmark, it obtains an multi-robot SLAM algorithm. We also compare the time needed by
observation zt ,hii = (vt ,hii , dt ,hii ). The relative measurement is the algorithm when changing some parameters in the filter. The
corrupted with a Gaussian noise defined by N (0, Rt ) using Eq. complexity of the FastSLAM algorithm in the case of one robot
(15), whereas a descriptor dt ,hii is associated considering dt ,hii = requires time O(MN ), where M is the number of particles in the
dj + N (0, σd2t ), where dj is the SIFT descriptor of the landmark θj in filter and N the number of landmarks in the map. The complexity
the map and N (0, σd2t ) represents a 128D vector of uncorrelated in M arises from the fact that M particles need to be processed,
Gaussian noise with variance σd2t . As explained in Section 5 the whereas, for each particle the data association needs to iterate over
data association makes use of the descriptor associated to each the N landmarks in the map. This is the basic SLAM filter presented
landmark. In consequence, adding more noise to the observed in Algorithm 1, where it is important to note that the for loop
descriptor implies that more false data associations will be in line 15 iterates over the N landmarks in the map. However, a
produced, and this affects negatively the quality of the results. We more efficient implementation can be achieved. In particular, in the
performed a series of simulations increasing the noise added to implementation used in the simulations, the landmarks are stored
the descriptor while maintaining the rest of parameters constant using a kd-tree structure. In this case, searching for the nearest
(with the values shown in Table 2). The results are presented in neighbour in the kd-tree can be done using O(log N ) time, which
Fig. 8. It can be noticed that, as the noise in the descriptor increases, accelerates the computation of the data association. This leads to
false data associations are likely to occur, and the results get worse. a complexity of O(M log N ). Finally, in the complete multi-robot
When using more robots, the number of observations integrated is algorithm, if we consider that the team is formed by K robots, then
greater, and this fact makes the process more robust to noise. the algorithm requires time O(KM log N ).
Next, we present the influence in the results when a different The algorithm presented here computes the generation of a
number of observations are integrated at each iteration of the new hypothesis, data association and update of the estimation
algorithm. At each simulation we used the parameters shown in separately for each particle. The generation, update and estimation
Table 2 and obtained different results when changing B. The results time are constant. For example, updating the estimation of a
are presented in Fig. 9. Fig. 9(a) present the RMS error in position landmark requires the computation of Eq. (8), that comprises
computed for the paths and Fig. 9(b) shows the RMS error in the matrices of 3 × 3 elements. The time needed by the algorithm
map. It can be noticed that the results in the estimation of the at each iteration depends on a number of factors, as the number
robot paths and the map improve when more observations are of landmarks that exist in the map landmarks at the current step,
integrated at each time step. In addition, the results exhibit the the number of observations found by the robots. In addition, the
same trend as observed in the experiments previously presented. computation time is different if the robots need to initialize new
78 A. Gil et al. / Robotics and Autonomous Systems 58 (2010) 68–80
a b
Fig. 9. Fig. (a) shows the root mean square (RMS) error in the estimated path of the robots when varying the number of observations integrated at each iteration of the
algorithm B = {0 1 5 10 15 20 25}. Fig. (b) shows the RMS error in the estimated map when varying the number of particles M. The error bars are defined as two times the
standard deviation. We show simultaneously results with one robot ( ), two robots () and three robots (4).
Fig. 10. The figure presents the time spent at each iteration of the SLAM algorithm during the creation of the map using two robots. The total time spent by the algorithm
is presented in continuous line. The time used in the prediction step is marked with stars. Using ( ), we present the time spent in the data association. Marked with (4) we
show the time spent in the landmark initialization and, finally, with () we present the time used in the landmark update.
landmarks or they perform an update of existing landmarks. In when using 2 or 3 robots. In a real exploration situation, when
Fig. 10 we present the time needed by the algorithm during a the robots navigate through the environment, the observations
particular experiment using 2 robots and M = 500 particles. must be processed within time constrains. First, the observations
The robots integrate a maximum of B = 10 observations at each must be integrated in the map. Second, based on this map, the
iteration. Fig. 10 presents the total time used at each iteration that exploration algorithm plans the next movement for each robot.
comprises the update of the common map by the two robots. In If the processing time is too high, then the robots would need to
addition, the figure shows the time spent in the prediction step, stop their movements and wait until the new observations have
the data association phase, the update and the initialization of new been integrated in the filter. In order to decrease the computational
landmarks. cost, we can reduce the number of observations that each robot
Next, we compare the processing time needed at each iteration introduces at each iteration of the algorithm. We consider that
of the algorithm when using a different number of robots. To each robot obtains B/K observations. That is, when using one robot,
do this, we performed the exploration using a different number we integrate B = 15 observations and when using three robots,
of robots and measured the time needed at each iteration. The we integrate B = 5 observations. In Fig. 11(b) we present the
time needed at each iteration of the algorithm depends on a mean processing time if the robots perform B/K observations at
series of parameters, basically the number of particles M used, the each iteration of the algorithm. As can be seen in the figure, the
number of observations B integrated at each time step. However, processing time in the case of 1, 2 or 3 robots is similar, for any
this time is not constant and increases with the number of number of particles. In this case, the total number of observations
landmarks that exist in the map: with more landmarks in the integrated in the filter is the same for the case of 1, 2 or 3 robots.
map, finding the correct data association becomes computationally With this restriction, we evaluated the error in the map, which is
expensive. Fig. 11(a) presents the mean time needed to process the shown in Fig. 11(c). It can be observed that the results are similar to
observations at each iteration of the algorithm, using a different the presented in Fig. 6(b), since the estimation of the map and the
number of particles used in the filter and considering that each paths improves when more robots are used. As a consequence, we
of the robots integrates B measurements. It can be seen that, as can reduce the computational cost of the algorithm by integrating
the number of particles M increases, the time needed to process less observations by each robot and still obtain comparable good
the observations grows exponentially, and is significantly greater results.
A. Gil et al. / Robotics and Autonomous Systems 58 (2010) 68–80 79
a b
Fig. 11. Figure (a) and (b) show the mean computation time at each iteration of the algorithm for different number of particles M = {1 10 50 100 200 300 400 500 1000 1500
2000}. Figure (c) shows the RMS error in the map when B/K observations are used.
10. Conclusion robust to false data associations, and is able to produce a good
solution with a wide range of parameter settings.
We have presented an approach to visual SLAM that builds a In the presented approach, the dimension of the state to
3D map of the environment using a team of cooperative robots. estimate grows with the number of robots. In principle, the
We consider that the robots are equipped with stereo heads that number of particles needed to obtain a good estimation grows
allow us to perform observations on visual landmarks, each one exponentially with the dimension of the state to estimate. Under
consisting of a relative measurement and a visual descriptor. typical conditions we show that using the same number of particles
The approach builds a unique map of the environment that is M, the precision of the map estimated using a single robot is similar
shared by all the robots in the team, using a Rao-Blackwellized to the precision obtained using 2 or 3 robots. In addition, the
particle filter to estimate both the map and the trajectories of the results show that, when more robots are used, a precise map can be
robots. The data association problem is solved using the relative obtained without increasing the computational cost of the method.
measurements obtained by the robots and comparing the visual As a consequence, since more robots exist in the environment,
descriptor. With this strategy we have found that the number the time needed to explore the environment will be reduced
of false correspondences is low, thus allowing to obtain good significantly, whereas the computational cost is not increased.
estimations of the map and the path. As a conclusion, under the assumptions considered, we have
We have tested the validness of the approach by means of demonstrated that the proposed SLAM algorithm is suitable
simulations. For this purpose, an indoor environment has been for small groups of robots exploring a given environment
simulated. We used a noise model to represent the relative and obtaining observations over visual landmarks. We show
measurements obtained by the stereo cameras and used typical that precise results can be obtained, without increasing the
values for the noise in the odometry model, as estimated in real computational cost considerably.
robots in our laboratory. In the experiments we assumed that the In this paper we considered that the movement of the robots is
robots were able to explore the environment efficiently, thus we restricted to a plane, whereas the map is considered to be 3D. In
compare the case in which the total distance traveled by one robot the future, we plan to extend the work to a more general 6 DOF
to explore the environment is divided uniformly along the team movement of the robots. This would be the case of an outdoor
members. A great number of simulations have been performed, exploration carried out by a group of robots. However, in order
that precisely resemble real exploring situation. The simulations to estimate a more general movement of the robots, the size of
allowed us to test the approach under different conditions, the state vector needs to be increased, thus adding computational
by changing the parameters of the Rao-Blackwellized particle complexity to the problem. In this paper we have concentrated on
filter. We could also evaluate the algorithm when the descriptor the joint estimation of the map by a group of robots.
associated to a visual landmarks was corrupted with noise, and As a future work, we plan to validate the simulations by using
evaluated its effects on the results. With this procedure we experimental data obtained by real robots and natural landmarks
could emulate the effects of observing a landmark from different occurring in the environment. We further plan to use the mutual
viewpoints. The results presented show that the algorithm is observations of the team, that is, when a robot observes the relative
80 A. Gil et al. / Robotics and Autonomous Systems 58 (2010) 68–80
position of another member in the team. We expect that this [20] P. Jensfelt, D. Kragic, J. Folkesson, M. Björkman, A framework for vision based
information will lead to further improvement of the results. bearing only 3D SLAM, in: IEEE Proc. of the Int. Conf. on Robotics & Automation,
2006.
[21] A.C. Murillo, J.J. Guerrero, C. Sagüés, SURF features for efficient robot
Acknowledgment localization with omnidirectional images, in: IEEE Int. Conf. on Robotics and
Automation, 2007.
[22] S. Frintrop, P. Jensfelt, Active gaze control for attentional visual SLAM, in: IEEE
This work has been supported by the Spanish government Int. Conf. on Robotics and Automation, ICRA, 2008, pp. 3690–3697.
by means of project: ‘Sistemas de Percepción Visual Móvil y [23] D. Lowe, Object recognition from local scale-invariant features, in: Proc. of the
Int. Conf. on Computer Vision, ICCV, 1999, pp. 1150–1157.
Cooperativo como Soporte para la Realización de Tareas con Redes
[24] D. Lowe, Distinctive image features from scale-invariant keypoints, Interna-
de Robots’. Project reference: CICYT DPI2007-61107, Ministerio de tional Journal of Computer Vision 2 (60) (2004) 91–110.
Educación y Ciencia. [25] O. Martínez-Mozos, A. Gil, M. Ballesta, O. Reinoso, Interest Point Detectors
for Visual SLAM, in: Lectures Notes in Artificial Intelligence (LNAI), vol. 4788,
2007.
References [26] M. Ballesta, O. Martínez-Mozos, A. Gil, O. Reinoso, A Comparison of Local
Descriptors for Visual SLAM, in: Proceedings of the Workshop on Robotics and
[1] C. Stachniss, G. Grisetti, D. Hähnel, W. Burgard, Improved Rao-Blackwellized Mathematics, RoboMat 2007, 2007.
mapping by adaptive sampling and active loop-closure, in: Proc. of the
Workshop on Self-Organization of AdaptiVE behavior, SOAVE, 2004, pp. 1–15.
[2] R. Triebel, W. Burgard, Improving simultaneous mapping and localization in Arturo Gil received the M. Eng. degree in Industrial
3D using global constraints, in: Proc. of the National Conference on Artificial Engineering from Miguel Hernández University (UMH),
Intelligence, AAAI, 2005. Elche, Spain, in 2002, receiving also the best student
[3] A. Gil, O. Reinoso, O. Martínez-Mozos, C. Stachniss, W. Burgard, Improving academic award in Industrial Engineering by the UMH.
data association in vision-based SLAM, in: Proc. of the IEEE/RSJ Int. Conf. on Since 2003, he works as a lecturer and researcher at the
Intelligent Robots and Systems, IROS, China, 2006. UMH, teaching subjects related to Control and Computer
[4] R. Sim, P. Elinas, M. Griffin, J.J. Little, Vision-based SLAM using the Vision. His research interests are focussed on mobile
Rao-Blackwellised particle filter, in: IJCAI Workshop on Reasoning with robotics, visual SLAM and cooperative robotics. He is
Uncertainty in Robotics, 2005. currently working on techniques to build visual maps
[5] L.M. Paz, P. Pinis, J.D. Tardós, J. Neira, Large-scale 6-DOF SLAM with stereo-in- using teams of mobile robots.
hand, IEEE Transactions on Robotics 24 (5) (2008) 946–957.
[6] D. Schleicher, L.M. Bergasa, R. Barea, E. López, M. Ocaña, J. Nuevo, Real-time
wide-angle stereo visual slam on large environments using SIFT features
Óscar Reinoso received the M. Eng. degree from the
correction, in: Proceedings IEEE/RSJ Int. Conf. on Robotics and Automation,
Polytechnical University of Madrid (UPM), Madrid, Spain
ICRA, 2007, pp. 3878–3883.
in 1991. Later, he obtained de Ph.D. degree in 1996.
[7] L. Parker, Current state of the art in distributed autonomous mobile robotics,
He worked at Protos Desarrollo S.A. company in the
Distributed Autonomous Robotic Systems 14 (6) (2000) 3–12.
development and research of artificial vision systems from
[8] B. Stewart, J. Ko, D. Fox, K. Konolige, A hierarchical bayesian approach to
1994 to 1997. Since 1997, he works as a professor at Miguel
mobile robot map structure estimation, in: Proceedings of the Conference on
Hernández University (UMH), teaching subjects related
Uncertainty in AI, UAI, 2003.
to Control, Robotics and Computer Vision. His research
[9] J.W. Fenwick, P.M. Newman, J.J. Leonard, Cooperative concurrent mapping and
interests are in mobile robotics, climbing robots and visual
localization, in: IEEE Int. Conf. on Robotics and Automation, ICRA, 2002, pp.
inspection systems. He is member of CEA-IFAC and IEEE.
1810–1817.
[10] M.W.M. Gamini Dissanayake, P. Newman, S. Clark, H. Durrant-Whyte, M.
Csorba, A solution to the simultaneous localization and map building (SLAM)
problem, IEEE Transactions on Robotics and Automation (ICRA) 17 (2001). Mónica Ballesta received the M. Eng. degree in Indus-
[11] J. Neira, J.D. Tardós, Data association in stochastic mapping using the joint trial Engineering from the Miguel Hernández University
compatibility test, Transactions on Robotics and Automation 17 (6) (2001) (UMH), Elche, Spain in 2005, receiving also the academic
890–897. award in Industrial Engineering by the UMH. Since 2007,
[12] S. Se, D. Lowe, J. Little, Vision-based mobile robot localization and mapping she has a research position as scholarship holder in the
using scale-invariant features, in: IEEE Int. Conf. on Robotics and Automation, area of Systems Engineering and Automation of the UMH,
ICRA, 2001, pp. 2051–2058. receiving a pre-doctoral grant (FPI) by the Valencian Gov-
[13] K. Murphy, Bayesian map learning in dynamic environments, in:Neural ernment (Generalitat Valenciana). Her research interests
Information Processing Systems (NIPS), 1999. are focussed on mobile robots, visual feature extraction
[14] M. Montemerlo, S. Thrun, Simultaneous localization and mapping with and visual SLAM.
unknown data association using FastSLAM, in: IEEE Int. Conf. on Robotics and
Automation, ICRA, 2003.
[15] A. Gil, O. Reinoso, M.A. Vicente, C. Fernández, L. Payá, Monte carlo localization Miguel Juliá Miguel Juliá received a M. Eng. degree in
using SIFT features, Lecture Notes in Computer Science 3523 (2005) 623–630. Telecommunications Engineering at Miguel Hernández
[16] S. Thrun, W. Burgard, D. Fox, Probabilistic Robotics, The MIT Press, ISBN: 0- University (UMH), Elche, Spain. During 2006, he worked as
262-20162-3, 2005. a researcher at the Computer Science and Communications
[17] P.R. Bevington, D.K. Robinson, Data Reduction and Error Analysis for the department, at Murcia University. Since 2007 he is
Physical Sciences, McGraw-Hill, 1992. involved in research projects at the Systems Engineering
[18] S. Se, D. Lowe, J. Little, Mobile robot localization and mapping with uncertainty department at the UMH. Currently, his research interests
using scale-invariant visual landmarks, International Journal of Robotics focus on mobile robotics, exploration and computer vision.
Research 21 (8) (2002) 735–758.
[19] A.J. Davison, Real-time simultaneous localisation and mapping with a single
camera, in: Proc. of the Int. Conf. on Computer Vision, ICCV, 2003.