Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Learning Hierarchical Object Maps Of

Non-Stationary Environments With Mobile Robots

Dragomir Anguelov Rahul Biswas Daphne Koller




Benson Limketkai Scott Sanner Sebastian Thrun




Computer Science Department  School of Computer Science


Stanford University Carnegie Mellon University
Stanford, CA 94305 Pittsburgh, PA 15213

Abstract that a garbage can has moved from one location to another.
It might do so without the need to learn a model of this
Building models, or maps, of robot environments is garbage can from scratch, as would be necessary with ex-
a highly active research area; however, most existing isting robot mapping techniques [17].
techniques construct unstructured maps and assume
static environments. In this paper, we present an al- Object representations offer a second, important advan-
gorithm for learning object models of non-stationary tage, which is due to the fact that many office environ-
objects found in office-type environments. Our algo- ments possess large collections of objects of the same type.
rithm exploits the fact that many objects found in of- For example, most office chairs are instances of the same
fice environments look alike (e.g., chairs, trash bins).
It does so through a two-level hierarchical representa-
generic chair and therefore look alike, as do most doors,
tion, which links individual objects with generic shape garbage cans, and so on. As these examples suggest, at-
templates of object classes. We derive an approxi- tributes of objects are shared by entire classes of objects,
mate EM algorithm for learning shape parameters at and understanding the nature of object classes is of signifi-
both levels of the hierarchy, using local occupancy grid cant interest to mobile robotics. In particular, algorithms
maps for representing shape. Additionally, we develop
a Bayesian model selection algorithm that enables the
that learn properties of object classes would be able to
robot to estimate the total number of objects and object transfer learned parameters (e.g., appearance, motion pa-
templates in the environment. Experimental results us- rameters) from one object to another in the same class.
ing a real robot indicate that our approach performs This would have a profound impact on the accuracy of ob-
well at learning object-based maps of simple office en- ject models, and the speed at which such models can be
vironments. The approach outperforms a previously
developed non-hierarchical algorithm that models ob-
acquired. If, for example, a cleaning robot enters a room
jects but lacks class templates. it never visited before, it might realize that a specific ob-
ject in the room possesses the same visual appearance of
other objects seen in other rooms (e.g., chairs). The robot
1 Introduction would then be able to acquire a map of this object much
faster. It would also enable the robot to predict proper-
Building environmental maps with mobile robots is a key ties of this newly seen object, such as the fact that a chair
prerequisite of truly autonomous robots [16]. State-of-the- is non-stationary—without ever seeing this specific object
art algorithms focus predominantly on building maps in move.
static environments [17]. Common map representations
In previous work, we developed an algorithm that has
range from lists of landmarks [3, 7, 18], fine-grained grids successfully been demonstrated to learn shape models of
of numerical occupancy values [5, 12], collections of point
non-stationary objects [2]. This approach works by com-
obstacles [8], or sets of polygons [9]. Decades of research
paring occupancy grid maps acquired at different points in
have produced ample experimental evidence that these rep- time. A straightforward segmentation algorithm was de-
resentations are appropriate for mobile robot navigation in
veloped that extracts object footprints from occupancy grid
static environments.
maps. It uses these footprints to learn shape models of
Real environments, however, consist of objects. For ex- objects in the environment, represented by occupancy grid
ample, office environments possess chairs, doors, garbage maps.
cans, etc. Many of these objects are non-stationary, that
This paper goes one step further. It proposes an algo-
is, their locations may change over time. This observation
rithm that identifies classes of objects, in addition to learn-
motivates research on a new generation of mapping algo-
ing plain object models. In particular, our approach learns
rithms, which represent environments as collections of ob-
shape models of individual object classes, from multiple
jects. At a minimum, such object models would enable a occurrences of the same type objects. By learning shape
robot to track changes in the environment. For example,
models of object types—in addition to shape models of in-
a cleaning robot entering an office at night might realize
(a) CLASS TEMPLATE ϕ
(b)
Here 
is the total number of objects (with ). 
ϕ Each object model !
is represented by an occupancy grid
map, just like at the template level. The key difference be-
" # 
OBJECTS θ

occupancy grid maps β tween object models and templates is that each
corresponds to exactly one object in the world, whereas a
MAP 1
θ
template 
may correspond to more than one object. If,
SNAPSHOTS

δ α for example, all non-stationary objects were to look alike,


 would contain multiple models (one for each object),
µ

MAP 2

whereas would contain only a single shape template.


To learn a hierarchy, we assume that the robot maps it
Figure 1: (a) Generative hierarchical model of environ- $
environments at different points in time, between which
ments with non-stationary objects. (b) Representation as the configuration of the environment may have changed.
Each map is represented as a (static) occupancy grid map,
a graphical model.
and will be denoted . %&'%()

*%,+
Objects may or may not be present at any time , and they -
dividual objects—our approach is able to generalize across may be located anywhere in the free space of the environ-
different object models, as long as they model objects of the ment. The number of object snapshots present in the map
same type. This approach follows the hierarchical Bayesian %/. is denoted 01.
. The set of object snapshots extracted
framework (see [1, 6, 10]). As our experimental results from the map are denoted %. . % . %243 . 


*%,5#6*3 .
demonstrate, this approach leads to significantly more ac- Each object snapshot %87)3 .
is—once again—represented
curate models in environments with multiple objects of the by an occupancy grid map, constructed from robot sensor
same type. measurements [5, 12, 17]. The exact routines for extraction
The specific learning algorithm proposed here is an in- of object snapshots from maps are described in [2] and will
stance of the popular EM algorithm [11]. We develop a be reviewed briefly below.
closed-form solution for learning at both levels of the hi- Finally, we notice that objects may be observed in any
erarchy, which simultaneously identifies object models and orientation. Since aligning object snapshots with objects in
shape templates for entire object classes. On top of this, we the model is an important step in the learning procedure, we
propose a Bayesian procedure for determining the appro- will make the alignment parameters explicit. In particular,
priate number of object models, and object class models. we will use 9 7:3 .
to denote the alignment of snapshot % 7:3 .
Experimental results, carried out using a physical robot, relative to the generative model. In our implementation,
suggests that our approach succeeds in learning accurate each 9:7:3 .
consists of two translational and one rotational
shape and class models. A systematic comparison with parameters.
our previous, non-hierarchical approach [2] illustrates that
the use of class models yields significantly better results, 2.2 Probabilistic Model
both in terms of predictive power (as measured by the log- To devise a sound algorithm for inferring an object hierar-
likelihood over testing data) and in terms of convergence chy from data, we have to specify probabilistic models of
properties (measured by the number of times each algo- how snapshots are generated from objects and how objects
rithm is trapped in a local maximum of poor quality). are generated from object templates. An overview graphi-
cal model for our probabilistic model is shown in Figure 1b.
2 The Generative Hierarchical Model 
Let be a concrete object, and be a single snapshot %(7:3 .
of this object. Recall that each grid cell in is a real   ;< 
We begin with a description of the hierarchical model. The
object level generalizes the approach of [2] to maps with
number in the interval   
. We interpret each occupancy
value as a probability that a robot sensor would detect an
continuous occupancy rather than binary values. The cen-
occupied grid cell. However, when mapping an environ-
tral innovation is the introduction of a template level.
ment, the robot typically takes multiple scans of the same
2.1 The Object Hierarchy object, each resulting in a binomial outcome. By aggre-
gating the individual binary variables into a single aggre-
Our object hierarchy (Figure 1a) is composed of two levels, gate real value, we can approximate this fairly cumbersome
the object template level at the top, and the physical object model into a much cleaner Gaussian distribution of a single
level at the bottom. The object template level consists of a
set of shape templates, denoted . 


  real-valued observation. Thus, the probability of observing
% 7:3 . 

a concrete snapshot of object is given by

=?> %@7:3 .BA /9:7:3 .DCFEHG<IKLNM J L#O'PRQTS<QVUW4X 6 3 Y W4X 6[Z]\ ^D_ I/`Da \ ^D_bZ L (1)
In our estimation procedure, each template will be
represented by an occupancy grid map [5, 12, 17], that is,
  
The function c > % 7:3 .R9 7:3 . C denotes the aligned snapshot
an array of values in that represent the occupancy of

%@7:3 . , and c > %?7:3 . 9 7)3 .*C  ; denotes its ; -th grid cell. The pa-
a grid cell.
The object level contains shape models of concrete ob-
jects in the world, denoted: . 

 rameter dfe is the variance of the noise.
It will prove useful to make explicit the correspondence the data given the model: We also want to take into con-
between objects "
and object snapshots . This will be %87:3 . sideration the probabilistic relationships between the two
established by the following correspondence variables levels of the hierarchy. Thus, we want to maximize the
   e 


 + joint probability over the data and the model $ : %
(2)
)(*,+.-/)0 =?> $ % C  )(*,+.-/)(0 =2>  9*% C
Since each % . is an entire set of snapshots, each . is in fact 1
` 3 3Y
(8)
a function:
9
.  <

 0 .   " 

R

Note that we treat the latent alignment parameters as
(3) model parameters, which we maximize during learning.
EM is an iterative procedure that can be used to max-
A similar model governs the relationship between tem-
plates and individual objects. Let be a concrete object  imize a likelihood function. Starting with some ini-
generated according to object template , for some and  tial model, EM generates a sequence of models of non-
. The probability that a grid cell , ;< decreasing likelihood:

%  \  _   \  _ 9 \  _ 'R2%  \ e _   \ e _ 9 \ e _ 'R



takes on a value
  
is a function of the corresponding grid cell .    ;<
 / ;< (9)

Let $ \ 3b_ 4%  \ 3b_  \ 3 _  9 \ 3b_ ' be the 5 -th such model.
We assume that the probability of a grid cell value is
normally distributed with variance  : (e
sire is to find an > 5 6  C st model $ \ 387  _ for which
Our de-
=?>   ; A    ;< C    G I L J L Q *` a \ ^D_ I \ ^D_bZ L


(4) =2> $ \ 397  _ *% C  =?> $ \ 3 _ % C (10)
Equation (4) defines a probabilistic model for individual As is common in the EM literature [11], this goal is
grid cells, which is easily extended to entire maps: achieved by maximizing an expected log likelihood of the
=?>  A  C   =2> / ;< A   ; C form
^ L $ \ 83 7 _  ).*+-/)0 3 ?
1 :<; = >8@9A
+ =2>B   C$ % CEDD $ \ 3b_ * %GF
G<I L J L O P Q `Da \ ^D_ I \ ^D_bZ
(11)
E D
3
(5)
Here : ; = is the mathematical expectation
 over the latent
Again, we introduce explicit variables for the correspon-
dence between objects and templates : ! # tion B
 =2>  A \ b_ * % C
correspondence variables
$ 3 .
and , relative to the distribu-

 
 )


   (6) As our framework is not the fully standard variant of
    EM, we derive the correctness of (11). As in the standard
with "


 . The statement means  derivation, we lower bound the difference between the log-
that object 
is an instantiation of the template . The  likelihood of the new and the old model:
correspondences are unknown in the hierarchical learning
@8A + =2> $ \ 387  _  % C  @8A + =2> $ \ 3 _ % C
problem, which is a key complicating factor in our attempt
=2> $ \ 387  _ % C =?>J   K$ \ 397  _ *% C
to learn hierarchical object models.
 @9A =?> $ \ 3b_ *% C  @9A
+ I
+ H
=2> $ \ 3 _ *% C

There is an important distinction between the correspon-
; 3 =
dence variables ’s and ’s, arising from the fact that each
 =?J>   K$ \ 397  _ *% C =2>J   A $ \ 3 _ *% C
 @9A +IH
3; = =2> $ \ 3 _ *% C =2>J  A $ \ 3 _ *% C
object can only be observed once when acquiring a local
%,.
map . This induces a mutual exclusivity constraint on the

set of valid correspondences at the object level:
?
= >  \ _ =2>B   C$ \ 387  _ % C
 !
 #" . >  C   . > ! C  @9A I
+ H J 3
 A $ % C =?>J   K$ \ 3 _ *% C
; 3=
(7)
=2B>  \ 387  _
 @9A + : ; 3 =ML =?>J    C $ K $ \ 3b_ *% % C C DDD $ \ 3b_ *%GN
Thus, we see that objects in the object level are physical
objects that can only be observed at most once, whereas (12)
objects at the class level are templates of objects—which D
might be instantiated more than once. For example, an ob- This is bounded from below using Jensen’s inequality:
ject at the class level might be a prototypical chair, which
=?>J   K$ \ 397  _ *% C D \ 3 _
might be mapped to multiple concrete chairs at the object
level—and usually multiple observations over time of any  : ; 3 = KL @8A =2>B   C$ \ 3 _ % C DD $ % N
+
of those concrete chairs at the snapshot level.
 3 
= > 8
@ A + =?>J   K$ \ 397  _ *% COD D $ \ 3 _ % F
: ; DD
3 Hierarchical EM
 :P; 3 =?>,@9A + 2
= >
B  C$ % C DD $ \ 3 _ %GF
 \3 _
(13)
Our goal in this paper is to learn the model $ &% ('    9 D
%
given the data . Unlike many EM implementations, how-
=2> % C
Thus, by optimizing (11) we also maximize the desired log
ever, we do not simply want to maximize the probability of likelihood, @8A + $ .
Returning to the problem of calculating (11), we note that E-step, in which the expectations of the correspondences
the probability inside the logarithm factors into two terms, are calculated given the 5 -th model and alignment, and

 \  _ \    \ _  _
one for each level of the hierarchy (multiplied by a con- two M-steps, one that generates a new hierarchical model
stant): % 387 387 397 ' , and one for finding new alignment param-
=2>J   C$ *% C  2= >B    9% C eters 9 .
(14)
4.1 E-Step
Exploiting the independences  shown in Figure 1b, and the
In our case, the E-step can easily be implemented exactly:
uniform priors over , , and , we obtain:
\ _3   2= >  A  \ b_  \ b_ C
^ 3  3 3
 ?= >  C =2 >  C =2>  A    C =2>J C =?> 9 C =2> % A 9 * C =2>  \ 3b_ A    \ 3b_ C =2>   A  \ 3 _ C
E =?>  A   C =2> % A 9 * C (15)  O ?= >  \ 3 _ A    \ 3b_ C =?>   A  \ 3b_ C
The probability @9A + =2>  A  C of the objects   given the
  _2 \ 3 _ 
6
=?> \3 _
object templates  and the correspondences is essen-  O  =?>  A \ 3 _ A       C  \ 3b_ C
 _ 2
L  J L8OPRQ `a ` a9b \ ^D_ IGE` acb \ ^D_ Z L
tially defined via (5). Here we recast it using a notation
that makes the conditioning on explicit:
 "
G I
=2>  A   C O  _ 2 G"I L  J L O P Q ` a ` a9b \ ^D_ IG ` a9b _ \ ^D_bZ L
(19)
 L L O  Q = , Z:O PRQ \ ^D_ \ ^D_bZ L
(16)

E  G I J  J a ` a IG 
2
and, similarly,
S P P S P
d  9$ $ )  = eWJ7 ;  c :  9 :X\ $ =  A
where  > C is an indicator function which is 1 if its argu-
A
P
02UT = 62W V =jiBk L L '
O f
P O
C W _Llcm A l9n W ` a9_ b X 6 W _ X 6Xo k A p ` a96Xb q W _4r  o
L
(20)

=2> % A 9 C is based on (1) and conveniently written as: O g 6h


f 1 J   $ ' & 

O 2UT f 26 V iBk L  J L O P O W _Xlcm l9n W` a9_ b X 6   $ ' W _ X 6 o k & p ` ac6b q W _ r   o L


ment is true, and 0 otherwise. Similarly, the probability

=2> % A 9 C E
+ 5 6 L[M L The summation over 7 in calculating the expectations
(17)
L
  G"I J a  J ;
O
 Q 6 Q 7 Z , Z O'PRQ SQTUW4X 6 3 Y WRX 6 Z]\ ^D_
I `*a \ D
^ _ Z s
7:\ 3b3_ . 3 is necessary because of the mutual exclusion con-
. 2 7 2 straint described above, namely that no object can be seen
twice in the same map. The summation is exponential in
Substituting the product (15) with (16) and (17) into the the number of observed objects 0 . —however, 0 . is rarely
expected log likelihood (11) gives us: larger than 10. If summing over . becomes too costly, ef-
  !" ficient (and provably polynomial) sampling schemes can be
#%$ & $ ' applied for approximating the desired expectations [4, 13].
( +-,
H H ) 6587  9:<;= HBA
)ED F GHJI.KD F G =L? 4.2 Model M-Step
>@?
  \ b_
(18)
) *  ./* 10243
6 P
= 6WJ7  9 :X;@=
2C S P S P
Our M-step first generates a new hierarchical model
maximizing (18) under fixed expectations
^ 3
and s 3
by
\ _3  7:3 . 3
MOH P N H S
* RQ *  02UT 2V
Y ?
H
BA ; $ :L\ $ = D F GH )ED F G =L?]
and fixed alignment parameters . It is an appealing prop- 9
2[Z@2 C erty of our model that this part of the M-step can be exe-
cuted efficiently in closed form.
In deriving this expression, we exploit the linearity of Our first observation is that the expression in (18) de-
the expectation, which allows us to replace the indicator composes into a set of decoupled optimization problems
variables through probabilities (expectations). It is easy over individual pixels, that can be optimized for each pixel
to see that the expected log-likelihood in (18) consists of ; individually:
two main terms. The first enforces consistency between
%  \ 93 7  _  ;<] \ 397  _  ;< '  ).*+-utv
\ ^D_ \ ^D_
the template and the object level, and the second between

H H ^ \ 3 _3  > `Da  
the object and the data level.

/ ;< ?  ; C e


 2  e
4 The Implementation of the EM Algorithm
w2 x
(21)
Maximizing (18) is achieved in a straightforward man-  + 5#6 s 7:\ 3 3_ . 3 > > \ 3b3_ . C  ;< F   ;< C e
 \  _   \  _   \  \ _ e _ \ _ \ e _ R

ner via the EM algorithm, which generates a sequence
6 H H H
d e c % :
7 3 R
.
 9 :
7
of hierarchical models % ' % ' and a se- w2 . 2 7 2
quence of alignment parameters of increas- 9  9 e 

ing likelihood. To do so, EM starts with a random model We then observe that (21) is a quadratic optimization prob-
and random alignment parameters. It then alternates an lem, which therefore possesses a convenient closed-form
solution [15]. In particular, we can reformulate (21) as a sensor noise model. Larger variances induce a smoother
standard least-squares optimization problem: likelihood function, but ultimately result in fuzzier shape
>  ;<   ;< C e
)(*,+.-ut v 
models. Smaller variances lead to crisper maps, but at the
\ ^D_ (22)
 expense of an increased number of sub-optimal local max-
ima. Consequently, our approach anneals the covariance
 ;<  >   ;<[   ;< C is a vector comprising the ; -th slowly towards the desired values of  and , using large d
where 
d
values for  and  that are gradually annealed down with
cell value of all models at both levels. The constraint matrix

has the form
an annealing factor  : 

 I   3 .     I  3     \3 _   6  3 
 d \3 _


(27)
d I 7:3 3   d 6  3 d

 (23)
(28)

where
 ,
3  
3    and 7:3 . 3 are submatrices gener- The values  \ 3 _ and d \ 3 _ are used in the 5 -th iteration of EM.
ated from the expectations calculated in the E-step. Gener-
ating such matrices from a quadratic optimization problem 4.5 Determining the Number of Objects
such as (21) is straightforward, and the details are omitted
here due to space limitations. The vector  is of the form  ; A final and important component of our mapping algorithm
determines the number of class templates and the num-
 ;   d I  7:3 . 3 f%  ;< + 
ber of objects . So far, we have silently assumed that both

 
 (24) and are given. In practice, both values are unknown
where %  ;<
is a vector constructed from the aligned -th ; and have to be estimated from the data.
map cell values of the snapshots . The solution to (21) is % The number of objects is bounded below by the number

 ;H > + C I  +  ;<


of objects seen in each individual map, and above by the
    sum of all objects ever seen:
(25) 

Thus, the new model 


\ 387  _  
\ 387  _ is the result of a se- . 0 .
-/)0  H 0 .
.
  (29)
quence of simple matrix operations, one for each pixel ; .
The number of class templates is upper-bounded by the
4.3 Alignment M-Step
A final step of our M-step involves the optimization of the
number of objects . 
9
alignment parameters . Those are obtained by maximizing
Our approach applies a Bayesian prior for selecting the

right , and , effectively transforming the learning prob-
the relevant parts of the expected log likelihood (18). Of lem into a maximum a posterior (MAP) estimation prob-
significance is the fact that the alignment variables depend

only on the object level , and not on the template level .  lem. At both levels, we use an exponential prior, which in
This leads to a powerful decomposition by which each 9 7:3 . log-form penalizes the log-likelihood in proportion to the

number of objects and object templates :

=/ @8A + =?> %8  A  C A %82   &


can be calculated separately, by minimizing:

\9 7)393 7 .  _  ).*+-utv : ; 3 `    (30)  

Y W4X 6 (26)
H \ 3 _ H > > \  _
with appropriate constant penalties
` and . Hence, our
of  and . It
 

G 7)3 . 3 c %@7:3 .  9 7:3 .DC  ;< F  ;< C e


8
3 7 approach applies EM for plausible
finally selects those values for  and
values

  ^ (30), through a separate EM optimization for each value of


that maximize

Since the projection c is a non-linear, these optimization


 and . At first glance this exhaustive search procedure
might appear computationally wasteful, but in practice 
problems are solved using hill climbing. In practice, we
is usually small (and is even smaller), so that the optimal
find that after less than 5 iterations of EM, the alignment
values 9:7:3 .
have practically converged to their final value—
values can be found quickly.
which experimentally confirms that their optimization can
be performed in a separate M-step.
5 Experimental Results
Our algorithm was evaluated extensively using data col-
4.4 Improving Global Convergence lected by a Pioneer robot inside an office environment. Fig-
Our approach inherits from EM the property that it is a hill ure 2 shows the robot, the environment, and some of the
climbing algorithm, subject to local maxima. In our ex- non-stationary objects encountered by the robot. As in [2],
periments, we found that a straightforward implementation maps were acquired in two different environments. Fig-
of EM frequently led to suboptimal maps. Our algorithm ures 3a and 4 show four and nine example maps extracted
therefore employs deterministic annealing [14] to smooth in these environments, respectively. The concurrent map-
the likelihood function and improve convergence. In our ping and localization algorithm used for generating these
case, we anneal by varying the noise variance  and in the d maps is described in [17].
Training Data Set 1 Testing Data Set 1 Training Data Set 2 Testing Data Set 2
-300 -400
-400 -300
-500
Log-Likelihood

Log-Likelihood

Log-Likelihood

Log-Likelihood
-400
-400 -600
-500

-500 -500 -700


-600
-800
-600 -600
-700 -900

-700
1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7
Iteration EM Iteration EM Iteration EM Iteration EM

Figure 7: Log-likelihood of the training and testing data of both real-world data sets, as determined by 1-fold cross
validation. The dashed line is the result of the shallow, non-hierarchical approach, which performs significantly worse
than the hierarchical approach (solid line).

(a) (b) (a)

(c)

Figure 2: (a) The Pioneer robot used to collect laser range


data. (b) The robotics lab where the second data set was (b) (c)
collected. (c) Actual images of dynamic objects used in the
second data set.

The individual object snapshots were extracted from reg-


ular occupancy grid maps using map differencing, a tech-
nique closely related to image differencing, which is com-
monly used in the field of computer vision. In particular,
our approach identifies occupied grid cells which, at other
points in time, were free. Such cells are candidates of snap- Figure 3: (a) Four maps used for learning models of dy-
shots of moving objects. A subsequent low-pass filtering namic objects using a fixed number of objects per map. (b)
removes artifacts that occur along the boundary of occu- Overlay of optimally aligned maps. (c) Difference map be-
pied and free space, Finally, a region growing technique fore low-pass filtering.
leads to a set of distinct object snapshot [19]. Empirically,
our approach found all non-stationary objects with 100%
reliability as long as they are spaced apart by at least one
grid cell (5 cm). Figures 3b and 4b show overlays of the in-
dividual maps, and Figures 3c and 4c provide examples of
object snapshots extracted from those maps. Clearly, more
sophisticated methods are needed if objects can touch each that the hierarchical model converges in all cases to a model
other. of equal quality, whose result is visually indistinguishable
In a first series of experiments, we trained our hierarchi- from the one presented here.
cal model from data collected in the two robot environ- In a second set of experiments we evaluated our ap-


ments. Figure 5 shows an example run of EM using the proach to model selection, estimating how well our ap-
 
< 
correct number of objects and shape tem- proach can determine the right number of objects. Us-
plates. As is easily seen, the final object models are highly ing the penalty term 6 , which was used
accurate—in fact, they are more accurate than the individ- throughout all our experiments without much optimiza-
ual object snapshots used for their construction. In a series tion, we found the following log posterior that shows
of 20 experiments using different starting points, we found a clear peak at the correct values, shown in bold face:
(a) Hierarchical model, class templates

(b) Hierarchical model, objects

Figure 5: Results of the hierarchical model (7 iterations of


EM): (a) class templates, and (b) object models.
Figure 4: (a) Nine maps used for learning models of dy-
namic objects using a variable number of objects per map.
(b) Overlay of optimally aligned maps. (c) Difference map model generated by the non-hierarchical approach, its ac-
before low-pass filtering. The objects are clearly identifi- curacy lags behind significantly that of the model generated
able. by our hierarchical algorithm. We attribute this difference
to the fact that the non-hierarchical approach lacks cross-
   

object generalization.
-
   H

  H


  H

  H 
   In summary, our experiment indicate that our algorithm
 H 
  H   H
! H
  H#" ! " %$ 
    HH 

  
HH

& 
learns highly accurate shape models at both levels of the
H
  H 
H hierarchy, and it consistently identifying the ‘right’ number
  

 H
H   HH
  HH

of objects and object templates. In comparison with the
flat approach described in [2], it yields significantly more
  H
H H

  accurate object models and also converges more frequently

Notice that while the correct value for is 4, none of the  to an accurate solution.

training maps possessed 4 objects. The number had to be 6 Conclusion


estimated exclusively based on the fact that, over time, the
robot faced 4 different objects with 3 different shapes. We have presented an algorithm for learning a hierarchy
Next, we evaluated our approach in comparison with the of object models of non-stationary objects with mobile
non-hierarchical technique described in [2]. The purpose robots. Our approach is based on a generative model
of these experiments was to quantify the relative advantage which assumes that objects are instantiations of object tem-
of our hierarchical object model over a shallow model that plates, and are observed by mobile robots when acquir-
does not allow for cross-object transfer. We noticed several ing maps of its environments. An approximate EM algo-
deficiencies of the non-hierarchical model. First, the re- rithm was developed, capable of learning models of ob-
sulting object models were systematically inferior to those jects and object templates from snapshots of non-stationary
generated using our hierarchical approach. Figure 6 shows objects, extracted from occupancy grid maps acquired at
two examples of results, obtained with different initial ran- different points in time. Systematic experiments using a
dom seeds. While the first of these results looks visually physical robot illustrate that our approach works well in
adequate, the second does not: It contains a wrong collec- practice, and that it outperforms a previously developed
tion of objects (three circles, one box). Unfortunately, in non-hierarchical algorithm for learning models of non-
11 our of 20 runs, the flat approach converged to such a stationary objects.
suboptimal solution. Our approach possesses several limitations that warrant
However, even the visually accurate model is inferior. future research. For identifying non-stationary objects, our
Figure 7 plots the log-likelihood of each model for the presewnt segmentatino approach mandates that objects do
training data and for cross validation data, using 1-fold not move during robotic mapping, and that they are spaced
cross validation. Despite the high visual accuracy of the far enough apart from each other (e.g., 5 cm). Beyond that,
(a) Flat model, good result
[5] A. Elfes. Sonar-based real-world mapping and navigation.
IEEE Journal of Robotics and Automation, RA-3(3):249–
265, June 1987.
[6] A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin.
Bayesian Data Analysis. Chapman and Hall, 1995.
[7] J.J. Leonard and H.J.S. Feder. A computationally efficient
method for large-scale concurrent mapping and localiza-
tion. In J. Hollerbach and D. Koditschek, editors, Proceed-
ings of the Ninth International Symposium on Robotics Re-
search, Salt Lake City, Utah, 1999.
(b) Flat model, poor local maximum [8] F. Lu and E. Milios. Globally consistent range scan
alignment for environment mapping. Autonomous Robots,
4:333–349, 1997.
[9] C. Martin and S. Thrun. Real-time acquisition of compact
volumetric maps with mobile robots. In IEEE International
Conference on Robotics and Automation (ICRA), Washing-
ton, DC, 2002. ICRA.
[10] A. McCallum, R. Rosenfeld, T. Mitchell, and A.Y. Ng. Im-
proving text classification by shrinkage in a hierarchy of
classes. In Proceedings of the International Conference on
Machine Learning (ICML), 1998.
[11] G.J. McLachlan and T. Krishnan. The EM Algorithm and
Extensions. Wiley Series in Probability and Statistics, New
Figure 6: Comparison with the flat model: Both diagrams York, 1997.
show the result of seven iterations of EM using the flat, [12] H. P. Moravec. Sensor fusion in certainty grids for mobile
robots. AI Magazine, 9(2):61–74, 1988.
non-hierarchical model, as described in [2]: (a) success-
[13] H. Pasula, S. Russell, M. Ostland, and Y. Ritov. Track-
ful convergence, (b) unsuccessful convergence to a poor ing many objects with many sensors. In Proceedings of the
model. 11 out of 20 runs converged poorly. Sixteenth International Joint Conference on Artificial Intel-
ligence (IJCAI), Stockholm, Sweden, 1999. IJCAI.
[14] K. Rose. Deterministic annealing for clustering, com-
our approach currently does not learn attributes of objects pression, classification, regression, and related optimization
other than shape, such as persistence, relations between problems. Proceedings of IEEE, D, November 1998.
multiple objects, and non-rigid object structures. Finally, [15] G Strang. Introduction to Linear Algebra. Wellesley-
exploring different generative models involving more com- Cambrigde Press, 1998.
plex transformation (e.g., scaling of templates) constitutes [16] C. Thorpe and H. Durrant-Whyte. Field robots. In Pro-
another worthwhile research direction. ceedings of the 10th International Symposium of Robotics
Research (ISRR’01), Lorne, Australia, 2001.
Nevertheless, we believe that this work is unique in
[17] S. Thrun. A probabilistic online mapping algorithm for
its ability to learn hierarchical object models in mobile teams of mobile robots. International Journal of Robotics
robotics. We believe that the framework of hierarchi- Research, 20(5):335–363, 2001.
cal models can be applied to a broader range of map- [18] S. Williams, G. Dissanayake, and H.F. Durrant-Whyte. To-
ping problems in robotics, and we conjecture that cap- wards terrain-aided navigation for underwater robotics. Ad-
turing the object nature of robot environments will ulti- vanced Robotics, 15(5), 2001.
mately lead to much superior perception algorithms in mo- [19] S.W. Zucker. Region growing: Childhood and adolescence.
bile robotics, along with more appropriate symbolic de- Comput. Graphics Image Processing, 5:382–399, 1976.
scriptions of physical environments.

References
[1] J. Berger. Statistical Decision Theory and Bayesian analy-
sis. Springer Verlag, 1985.
[2] R. Biswas, B. Limketkai, S. Sanner, and S. Thrun. To-
wards object mapping in dynamic environments with mo-
bile robots. Submitted to the IEEE Conference on Intelli-
gent Robots and Systems (IROS), 2002.
[3] J.A. Castellanos, J.M.M. Montiel, J. Neira, and J.D. Tardós.
The SPmap: A probabilistic framework for simultane-
ous localization and map building. IEEE Transactions on
Robotics and Automation, 15(5):948–953, 1999.
[4] F. Dellaert, S. Seitz, S. Thrun, and C. Thorpe. Feature cor-
respondence: A Markov Chain Monte Carlo approach. In
T.K. Leen, T. Dietterich, and B. Van Roy, editors, Advances
in Neural Information Processing Systems 13. MIT Press,
2001.

You might also like