Professional Documents
Culture Documents
Anguelov Uai02
Anguelov Uai02
Abstract that a garbage can has moved from one location to another.
It might do so without the need to learn a model of this
Building models, or maps, of robot environments is garbage can from scratch, as would be necessary with ex-
a highly active research area; however, most existing isting robot mapping techniques [17].
techniques construct unstructured maps and assume
static environments. In this paper, we present an al- Object representations offer a second, important advan-
gorithm for learning object models of non-stationary tage, which is due to the fact that many office environ-
objects found in office-type environments. Our algo- ments possess large collections of objects of the same type.
rithm exploits the fact that many objects found in of- For example, most office chairs are instances of the same
fice environments look alike (e.g., chairs, trash bins).
It does so through a two-level hierarchical representa-
generic chair and therefore look alike, as do most doors,
tion, which links individual objects with generic shape garbage cans, and so on. As these examples suggest, at-
templates of object classes. We derive an approxi- tributes of objects are shared by entire classes of objects,
mate EM algorithm for learning shape parameters at and understanding the nature of object classes is of signifi-
both levels of the hierarchy, using local occupancy grid cant interest to mobile robotics. In particular, algorithms
maps for representing shape. Additionally, we develop
a Bayesian model selection algorithm that enables the
that learn properties of object classes would be able to
robot to estimate the total number of objects and object transfer learned parameters (e.g., appearance, motion pa-
templates in the environment. Experimental results us- rameters) from one object to another in the same class.
ing a real robot indicate that our approach performs This would have a profound impact on the accuracy of ob-
well at learning object-based maps of simple office en- ject models, and the speed at which such models can be
vironments. The approach outperforms a previously
developed non-hierarchical algorithm that models ob-
acquired. If, for example, a cleaning robot enters a room
jects but lacks class templates. it never visited before, it might realize that a specific ob-
ject in the room possesses the same visual appearance of
other objects seen in other rooms (e.g., chairs). The robot
1 Introduction would then be able to acquire a map of this object much
faster. It would also enable the robot to predict proper-
Building environmental maps with mobile robots is a key ties of this newly seen object, such as the fact that a chair
prerequisite of truly autonomous robots [16]. State-of-the- is non-stationary—without ever seeing this specific object
art algorithms focus predominantly on building maps in move.
static environments [17]. Common map representations
In previous work, we developed an algorithm that has
range from lists of landmarks [3, 7, 18], fine-grained grids successfully been demonstrated to learn shape models of
of numerical occupancy values [5, 12], collections of point
non-stationary objects [2]. This approach works by com-
obstacles [8], or sets of polygons [9]. Decades of research
paring occupancy grid maps acquired at different points in
have produced ample experimental evidence that these rep- time. A straightforward segmentation algorithm was de-
resentations are appropriate for mobile robot navigation in
veloped that extracts object footprints from occupancy grid
static environments.
maps. It uses these footprints to learn shape models of
Real environments, however, consist of objects. For ex- objects in the environment, represented by occupancy grid
ample, office environments possess chairs, doors, garbage maps.
cans, etc. Many of these objects are non-stationary, that
This paper goes one step further. It proposes an algo-
is, their locations may change over time. This observation
rithm that identifies classes of objects, in addition to learn-
motivates research on a new generation of mapping algo-
ing plain object models. In particular, our approach learns
rithms, which represent environments as collections of ob-
shape models of individual object classes, from multiple
jects. At a minimum, such object models would enable a occurrences of the same type objects. By learning shape
robot to track changes in the environment. For example,
models of object types—in addition to shape models of in-
a cleaning robot entering an office at night might realize
(a) CLASS TEMPLATE ϕ
(b)
Here
is the total number of objects (with ).
ϕ Each object model !
is represented by an occupancy grid
map, just like at the template level. The key difference be-
" #
OBJECTS θ
occupancy grid maps β tween object models and templates is that each
corresponds to exactly one object in the world, whereas a
MAP 1
θ
template
may correspond to more than one object. If,
SNAPSHOTS
=?> %@7:3 .BA /9:7:3 .DCFEHG<IKLNM J L#O'PRQTS<QVUW4X 6 3 Y W4X 6[Z]\ ^D_ I/`Da \ ^D_bZ L (1)
In our estimation procedure, each template will be
represented by an occupancy grid map [5, 12, 17], that is,
The function c > % 7:3 .R9 7:3 . C denotes the aligned snapshot
an array of values in that represent the occupancy of
%@7:3 . , and c > %?7:3 . 97)3 .*C ; denotes its ; -th grid cell. The pa-
a grid cell.
The object level contains shape models of concrete ob-
jects in the world, denoted: .
rameter dfe is the variance of the noise.
It will prove useful to make explicit the correspondence the data given the model: We also want to take into con-
between objects "
and object snapshots . This will be %87:3 . sideration the probabilistic relationships between the two
established by the following correspondence variables levels of the hierarchy. Thus, we want to maximize the
e
+ joint probability over the data and the model $ : %
(2)
)(*,+.-/)0 =?> $ % C )(*,+.-/)(0 =2> 9*% C
Since each % . is an entire set of snapshots, each . is in fact 1
` 3 3Y
(8)
a function:
9
. <
0 . "
R
Note that we treat the latent alignment parameters as
(3) model parameters, which we maximize during learning.
EM is an iterative procedure that can be used to max-
A similar model governs the relationship between tem-
plates and individual objects. Let be a concrete object imize a likelihood function. Starting with some ini-
generated according to object template , for some and tial model, EM generates a sequence of models of non-
. The probability that a grid cell , ;< decreasing likelihood:
Let $ \ 3b_ 4% \ 3b_ \ 3 _ 9 \ 3b_ ' be the 5 -th such model.
We assume that the probability of a grid cell value is
normally distributed with variance : (e
sire is to find an > 5 6 C st model $ \ 387 _ for which
Our de-
=?> ; A ;< C G I L J L Q *` a \ ^D_ I \ ^D_bZ L
(4) =2> $ \ 397 _ *% C =?> $ \ 3 _ % C (10)
Equation (4) defines a probabilistic model for individual As is common in the EM literature [11], this goal is
grid cells, which is easily extended to entire maps: achieved by maximizing an expected log likelihood of the
=?> A C =2> / ;< A ; C form
^ L $ \ 83 7 _ ).*+-/)0 3 ?
1 :<; = >8@9A
+ =2>B C$ % CEDD $ \ 3b_ * %GF
G<I L J L O P Q `Da \ ^D_ I \ ^D_bZ
(11)
E D
3
(5)
Here : ; = is the mathematical expectation
over the latent
Again, we introduce explicit variables for the correspon-
dence between objects and templates : ! # tion B
=2> A \ b_ * % C
correspondence variables
$ 3 .
and , relative to the distribu-
)
(6) As our framework is not the fully standard variant of
EM, we derive the correctness of (11). As in the standard
with "
. The statement means derivation, we lower bound the difference between the log-
that object
is an instantiation of the template . The likelihood of the new and the old model:
correspondences are unknown in the hierarchical learning
@8A + =2> $ \ 387 _ % C @8A + =2> $ \ 3 _ % C
problem, which is a key complicating factor in our attempt
=2> $ \ 387 _ % C =?>J K$ \ 397 _ *% C
to learn hierarchical object models.
@9A =?> $ \ 3b_ *% C @9A
+ I
+ H
=2> $ \ 3 _ *% C
There is an important distinction between the correspon-
; 3 =
dence variables ’s and ’s, arising from the fact that each
=?J> K$ \ 397 _ *% C =2>J A $ \ 3 _ *% C
@9A +IH
3; = =2> $ \ 3 _ *% C =2>J A $ \ 3 _ *% C
object can only be observed once when acquiring a local
%,.
map . This induces a mutual exclusivity constraint on the
set of valid correspondences at the object level:
?
= > \ _ =2>B C$ \ 387 _ % C
!
#" . > C . > ! C @9A I
+ H J 3
A $ % C =?>J K$ \ 3 _ *% C
; 3=
(7)
=2B> \ 387 _
@9A + : ; 3 =ML =?>J C $ K $ \ 3b_ *% % C C DDD $ \ 3b_ *%GN
Thus, we see that objects in the object level are physical
objects that can only be observed at most once, whereas (12)
objects at the class level are templates of objects—which D
might be instantiated more than once. For example, an ob- This is bounded from below using Jensen’s inequality:
ject at the class level might be a prototypical chair, which
=?>J K$ \ 397 _ *% C D \ 3 _
might be mapped to multiple concrete chairs at the object
level—and usually multiple observations over time of any : ; 3 = KL @8A =2>B C$ \ 3 _ % C DD $ % N
+
of those concrete chairs at the snapshot level.
3
= > 8
@ A + =?>J K$ \ 397 _ *% COD D $ \ 3 _ % F
: ; DD
3 Hierarchical EM
:P; 3 =?>,@9A + 2
= >
B C$ % C DD $ \ 3 _ %GF
\3 _
(13)
Our goal in this paper is to learn the model $ &% (' 9 D
%
given the data . Unlike many EM implementations, how-
=2> % C
Thus, by optimizing (11) we also maximize the desired log
ever, we do not simply want to maximize the probability of likelihood, @8A + $ .
Returning to the problem of calculating (11), we note that E-step, in which the expectations of the correspondences
the probability inside the logarithm factors into two terms, are calculated given the 5 -th model and alignment, and
\ _ \ \ _ _
one for each level of the hierarchy (multiplied by a con- two M-steps, one that generates a new hierarchical model
stant): % 387 387 397 ' , and one for finding new alignment param-
=2>J C$ *% C 2= >B 9% C eters 9 .
(14)
4.1 E-Step
Exploiting the independences shown in Figure 1b, and the
In our case, the E-step can easily be implemented exactly:
uniform priors over , , and , we obtain:
\ _3 2= > A \ b_ \ b_ C
^ 3 3 3
?= > C =2 > C =2> A C =2>J C =?> 9 C =2> % A 9 * C =2> \ 3b_ A \ 3b_ C =2> A \ 3 _ C
E =?> A C =2> % A 9 * C (15) O
?= > \ 3 _ A \ 3b_ C =?> A \ 3b_ C
The probability @9A + =2> A C of the objects given the
_2 \ 3 _
6
=?> \3 _
object templates and the correspondences is essen- O
=?> A \ 3 _ A C \ 3b_ C
_ 2
L J L8OPRQ `a ` a9b \ ^D_ IGE` acb \ ^D_ Z L
tially defined via (5). Here we recast it using a notation
that makes the conditioning on explicit:
"
G I
=2> A C O
_ 2 G"I L J L O P Q ` a ` a9b \ ^D_ IG ` a9b _ \ ^D_bZ L
(19)
L L O Q = , Z:O PRQ \ ^D_ \ ^D_bZ L
(16)
E G I J J a ` a IG
2
and, similarly,
S P P S P
d 9$ $ ) = eWJ7 ; c : 9 :X\ $ = A
where > C is an indicator function which is 1 if its argu-
A
P
02UT = 62W V =jiBk L L '
O f
P O
C W _LlcmA l9n W ` a9_ b X 6 W _ X 6Xo k A p ` a96Xb q W _4r o
L
(20)
=2> % A 9 C E
+ 5 6 L[M L The summation over 7 in calculating the expectations
(17)
L
G"I J a J ;
O
Q 6 Q 7 Z , Z O'PRQ SQTUW4X 6 3 Y WRX 6 Z]\ ^D_
I `*a \ D
^ _ Z s
7:\ 3b3_ . 3 is necessary because of the mutual exclusion con-
. 2 7 2 straint described above, namely that no object can be seen
twice in the same map. The summation is exponential in
Substituting the product (15) with (16) and (17) into the the number of observed objects 0 . —however, 0 . is rarely
expected log likelihood (11) gives us: larger than 10. If summing over . becomes too costly, ef-
!" ficient (and provably polynomial) sampling schemes can be
#%$ & $ ' applied for approximating the desired expectations [4, 13].
( +-,
H H ) 6587
9:<;= HBA
)ED F GHJI.KD F G =L? 4.2 Model M-Step
>@?
\ b_
(18)
)* ./* 10243
6 P
= 6WJ7
9 :X;@=
2C S P S P
Our M-step first generates a new hierarchical model
maximizing (18) under fixed expectations
^ 3
and s 3
by
\ _3 7:3 . 3
MOH P N H S
* RQ * 02UT 2V
Y ?
H
BA ; $ :L\ $ = D F GH )ED F G =L?]
and fixed alignment parameters . It is an appealing prop- 9
2[Z@2 C erty of our model that this part of the M-step can be exe-
cuted efficiently in closed form.
In deriving this expression, we exploit the linearity of Our first observation is that the expression in (18) de-
the expectation, which allows us to replace the indicator composes into a set of decoupled optimization problems
variables through probabilities (expectations). It is easy over individual pixels, that can be optimized for each pixel
to see that the expected log-likelihood in (18) consists of ; individually:
two main terms. The first enforces consistency between
% \ 93 7 _ ;<] \ 397 _ ;< ' ).*+-utv
\ ^D_ \ ^D_
the template and the object level, and the second between
H H
^ \ 3 _3 > `Da
the object and the data level.
where
,
3
3 and 7:3 . 3 are submatrices gener- The values \ 3 _ and d \ 3 _ are used in the 5 -th iteration of EM.
ated from the expectations calculated in the E-step. Gener-
ating such matrices from a quadratic optimization problem 4.5 Determining the Number of Objects
such as (21) is straightforward, and the details are omitted
here due to space limitations. The vector is of the form ; A final and important component of our mapping algorithm
determines the number of class templates and the num-
; d I 7:3 . 3 f% ;< +
ber of objects . So far, we have silently assumed that both
(24) and are given. In practice, both values are unknown
where % ;<
is a vector constructed from the aligned -th ; and have to be estimated from the data.
map cell values of the snapshots . The solution to (21) is % The number of objects is bounded below by the number
Y W4X 6 (26)
H \ 3 _ H > > \ _
with appropriate constant penalties
` and . Hence, our
of and . It
Log-Likelihood
Log-Likelihood
Log-Likelihood
-400
-400 -600
-500
-700
1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7
Iteration EM Iteration EM Iteration EM Iteration EM
Figure 7: Log-likelihood of the training and testing data of both real-world data sets, as determined by 1-fold cross
validation. The dashed line is the result of the shallow, non-hierarchical approach, which performs significantly worse
than the hierarchical approach (solid line).
(c)
ments. Figure 5 shows an example run of EM using the proach to model selection, estimating how well our ap-
<
correct number of objects and shape tem- proach can determine the right number of objects. Us-
plates. As is easily seen, the final object models are highly ing the penalty term 6 , which was used
accurate—in fact, they are more accurate than the individ- throughout all our experiments without much optimiza-
ual object snapshots used for their construction. In a series tion, we found the following log posterior that shows
of 20 experiments using different starting points, we found a clear peak at the correct values, shown in bold face:
(a) Hierarchical model, class templates
H
H HH
HH
of objects and object templates. In comparison with the
flat approach described in [2], it yields significantly more
H
H H
accurate object models and also converges more frequently
Notice that while the correct value for is 4, none of the to an accurate solution.
References
[1] J. Berger. Statistical Decision Theory and Bayesian analy-
sis. Springer Verlag, 1985.
[2] R. Biswas, B. Limketkai, S. Sanner, and S. Thrun. To-
wards object mapping in dynamic environments with mo-
bile robots. Submitted to the IEEE Conference on Intelli-
gent Robots and Systems (IROS), 2002.
[3] J.A. Castellanos, J.M.M. Montiel, J. Neira, and J.D. Tardós.
The SPmap: A probabilistic framework for simultane-
ous localization and map building. IEEE Transactions on
Robotics and Automation, 15(5):948–953, 1999.
[4] F. Dellaert, S. Seitz, S. Thrun, and C. Thorpe. Feature cor-
respondence: A Markov Chain Monte Carlo approach. In
T.K. Leen, T. Dietterich, and B. Van Roy, editors, Advances
in Neural Information Processing Systems 13. MIT Press,
2001.