Download as pdf or txt
Download as pdf or txt
You are on page 1of 53

Indoor Scene Recognition by 3 D Object

Search For Robot Programming by


Demonstration Pascal Meißner
Visit to download the full and correct content document:
https://textbookfull.com/product/indoor-scene-recognition-by-3-d-object-search-for-rob
ot-programming-by-demonstration-pascal-meisner/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Swift 3 Object Oriented Programming Gaston C. Hillar

https://textbookfull.com/product/swift-3-object-oriented-
programming-gaston-c-hillar/

EXCEL VBA Programming By Examples Programming For


Complete Beginners Step By Step Illustrated Guide to
Mastering Excel VBA Thanh Tran

https://textbookfull.com/product/excel-vba-programming-by-
examples-programming-for-complete-beginners-step-by-step-
illustrated-guide-to-mastering-excel-vba-thanh-tran/

Python 3 Object Oriented Programming 3rd Edition Dusty


Phillips [Dusty Phillips]

https://textbookfull.com/product/python-3-object-oriented-
programming-3rd-edition-dusty-phillips-dusty-phillips/

Learn to Program with Python 3: A Step-by-Step Guide to


Programming Irv Kalb

https://textbookfull.com/product/learn-to-program-with-
python-3-a-step-by-step-guide-to-programming-irv-kalb/
Scratch by Example Programming for All Ages 1st Edition
Eduardo A. Vlieg

https://textbookfull.com/product/scratch-by-example-programming-
for-all-ages-1st-edition-eduardo-a-vlieg/

Practical Machine Learning and Image Processing: For


Facial Recognition, Object Detection, and Pattern
Recognition Using Python Himanshu Singh

https://textbookfull.com/product/practical-machine-learning-and-
image-processing-for-facial-recognition-object-detection-and-
pattern-recognition-using-python-himanshu-singh/

Learn to Program with Python 3: A Step-by-Step Guide to


Programming, 2nd Edition Irv Kalb

https://textbookfull.com/product/learn-to-program-with-
python-3-a-step-by-step-guide-to-programming-2nd-edition-irv-
kalb/

Writing Blockbuster Plots A Step by Step Guide to


Mastering Plot Structure and Scene Martha Alderson

https://textbookfull.com/product/writing-blockbuster-plots-a-
step-by-step-guide-to-mastering-plot-structure-and-scene-martha-
alderson/

Visual Basic For Kids A Step by Step Computer


Programming Tutorial Philip Conrod

https://textbookfull.com/product/visual-basic-for-kids-a-step-by-
step-computer-programming-tutorial-philip-conrod/
Springer Tracts in Advanced Robotics 135

Pascal Meißner

Indoor Scene
Recognition
by 3-D Object
Search
For Robot Programming
by Demonstration
Springer Tracts in Advanced Robotics

Volume 135

Series Editors
Bruno Siciliano, Dipartimento di Ingegneria Elettrica e Tecnologie
dell’Informazione, Università degli Studi di Napoli Federico II, Napoli, Italy
Oussama Khatib, Artificial Intelligence Laboratory, Department of Computer
Science, Stanford University, Stanford, CA, USA

Advisory Editors
Nancy Amato, Computer Science & Engineering, Texas A&M University, College
Station, TX, USA
Oliver Brock, Fakultät IV, TU Berlin, Berlin, Germany
Herman Bruyninckx, KU Leuven, Heverlee, Belgium
Wolfram Burgard, Institute of Computer Science, University of Freiburg, Freiburg,
Baden-Württemberg, Germany
Raja Chatila, ISIR, Paris cedex 05, France
Francois Chaumette, IRISA/INRIA, Rennes, Ardennes, France
Wan Kyun Chung, Robotics Laboratory, Mechanical Engineering, POSTECH,
Pohang, Korea (Republic of)
Peter Corke, Science and Engineering Faculty, Queensland University of
Technology, Brisbane, QLD, Australia
Paolo Dario, LEM, Scuola Superiore Sant’Anna, Pisa, Italy
Alessandro De Luca, DIAGAR, Sapienza Università di Roma, Roma, Italy
Rüdiger Dillmann, Humanoids and Intelligence Systems Lab, KIT - Karlsruher
Institut für Technologie, Karlsruhe, Germany
Ken Goldberg, University of California, Berkeley, CA, USA
John Hollerbach, School of Computing, University of Utah, Salt Lake, UT, USA
Lydia E. Kavraki, Department of Computer Science, Rice University, Houston, TX,
USA
Vijay Kumar, School of Engineering and Applied Mechanics, University of
Pennsylvania, Philadelphia, PA, USA
Bradley J. Nelson, Institute of Robotics and Intelligent Systems, ETH Zurich,
Zürich, Switzerland
Frank Chongwoo Park, Mechanical Engineering Department, Seoul National
University, Seoul, Korea (Republic of)
S. E. Salcudean, The University of British Columbia, Vancouver, BC, Canada
Roland Siegwart, LEE J205, ETH Zürich, Institute of Robotics & Autonomous
Systems Lab, Zürich, Switzerland
Gaurav S. Sukhatme, Department of Computer Science, University of Southern
California, Los Angeles, CA, USA
The Springer Tracts in Advanced Robotics (STAR) publish new developments and
advances in the fields of robotics research, rapidly and informally but with a high
quality. The intent is to cover all the technical contents, applications, and
multidisciplinary aspects of robotics, embedded in the fields of Mechanical
Engineering, Computer Science, Electrical Engineering, Mechatronics, Control, and
Life Sciences, as well as the methodologies behind them. Within the scope of the
series are monographs, lecture notes, selected contributions from specialized
conferences and workshops, as well as selected PhD theses.
Special offer: For all clients with a print standing order we offer free access to the
electronic volumes of the Series published in the current year.
Indexed by DBLP, Compendex, EI-Compendex, SCOPUS, Zentralblatt Math,
Ulrich’s, MathSciNet, Current Mathematical Publications, Mathematical Reviews,
MetaPress and Springerlink.

More information about this series at http://www.springer.com/series/5208


Pascal Meißner

Indoor Scene Recognition


by 3-D Object Search
For Robot Programming by Demonstration

123
Pascal Meißner
IAR-IPR
Karlsruhe Institute of Technology
Karlsruhe, Germany

Dissertation approved by the KIT Department of Informatics. Oral examination on July 6th,
2018 at Karlsruhe Institute of Technology (KIT)

ISSN 1610-7438 ISSN 1610-742X (electronic)


Springer Tracts in Advanced Robotics
ISBN 978-3-030-31851-2 ISBN 978-3-030-31852-9 (eBook)
https://doi.org/10.1007/978-3-030-31852-9
© Springer Nature Switzerland AG 2020
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, expressed or implied, with respect to the material contained
herein or for any errors or omissions that may have been made. The publisher remains neutral with regard
to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
To Carlo Bourlet—Professor at CNAM,
Paris, France—, a role model in dedication
and determination.
Foreword by Rüdiger Dillmann

Today’s artificial intelligence systems can be found on simple embedded systems as


much as in cloud-computing data centers and already have a huge impact on both
economic growth and the structures of our societies. In daily life, coexistence and
cooperation between humans and increasingly intelligent machines have become a
reality. A concept which could play an important role in further advancing this
man–machine symbiosis is the so-called anthropomorphism. Applied to robotics, it
fosters the development of humanoid robots which are provided with anthropo-
morphic skills so as to interact with their human counterparts. However, how to
reach the level of intuitiveness and richness in interpersonal communication is still
an open question.
Roboticists around the world are considering a large variety of modalities and
algorithms for programming robots in the most natural manner, for instance,
through voice commands, gestures or even physical demonstrations of everyday
tasks. Their common goal is to overcome the explicit modeling of daily human–
robot interaction by expert programmers. This book is part of these attempts in the
sense that it contributes to the vast field of Robot Programming by Demonstration
(PbD). The main goal of PbD research is to enable humans to teach robots
real-world tasks through physical demonstrations. By this means, a future auton-
omous robot should be able to select actions according to their adequacy within the
environment conditions it encounters. However, it will not be sufficient to just
program actions through demonstrations. Capabilities for deciding whether or not
an action is appropriate in a given situation, are also required. In order to decide
this, characteristic models of scenes to be expected have to be available in turn.
The absence of scene models that are suitable for that purpose has been the
impulse for the author to develop his active-vision-based approach for recognizing
scenes, which he fully introduces in this book. Derived from the Implicit Shape
Model, his novel scene representation models both which objects occur in a scene
and how they co-occur in terms of the 6-DoF spatial relations they are engaged in.
This scene representation—named Implicit Shape Model (ISM) trees—can be
learnt from the very demonstrations recorded for task learning through PbD. His
method for recognizing scenes with this representation favors a modular approach,

vii
viii Foreword by Rüdiger Dillmann

starting off from results of third-party object-localization algorithms instead of


trying to recognize scenes from raw image data. While contradicting current
computer vision trends, proceeding in such a modular manner has already shown
strong results in other contexts. Meißner’s approach allows for the precise modeling
of 3-D relationships, a requirement specific to scene modeling which is to precede
the execution of manipulation tasks. The author’s partly symbolic representation
also offers large generalization capabilities, yet avoiding symbol-grounding issues
due to its minimalistic design. Other issues the author deals within this book,
include how to model spatial relations in indoor scenes as well as how to assess
deviations between expected and actual layouts of scenes. His work also addresses
the question which of the spatial restrictions in a scene should be considered and
which ones should be left out. This important yet neglected question is directly
linked to the complexity of scene recognition and thus, indirectly, to that of
decision-making for robots.
The author designed his scene representation with the goal of enabling efficient
object search. Beyond pure scene recognition, he proposes an active scene recog-
nition system for mobile robots in this book. The system integrates the scene
recognition algorithms he developed with a novel algorithm for guiding the
focus-of-attention of such a mobile robot. Active scene recognition on the basis of
object search fills an important gap in the overall PbD workflow. But well beyond
PbD, the introduction of ISM trees is an important step forward towards developing
service robots which reliably and robustly operate in dynamic household scenarios
in accordance with the principle of explainable artificial intelligence.

Karlsruhe, Germany Rüdiger Dillmann


July 2019
Foreword by Bruno Siciliano

At the dawn of the century’s third decade, robotics is reaching an elevated level of
maturity and continues to benefit from the advances and innovations in its enabling
technologies. These all are contributing to an unprecedented effort to bringing
robots to human environment in hospitals and homes, factories and schools; in the
field for robots fighting fires, making goods and products, picking fruits and
watering the farmland, saving time and lives. Robots today hold the promise for
making a considerable impact on a wide range of real-world applications from
industrial manufacturing to health care, transportation, and exploration of the deep
space and sea. Tomorrow, robots will become pervasive and touch upon many
aspects of modern life.
The Springer Tracts in Advanced Robotics (STAR) is devoted to bringing to the
research community the latest advances in the robotics field on the basis of their
significance and quality. Through a wide and timely dissemination of critical
research developments in robotics, our objective with this series is to promote more
exchanges and collaborations among the researchers in the community and con-
tribute to further advancements in this rapidly growing field.
The monograph by Pascal Meissner is based on the author’s doctoral thesis. It
focuses on Robot Programming by Demonstration (PbD) to enable humans to teach
robots real-world tasks through physical demonstrations. The concept of Implicit
Shape Model (ISM) trees is introduced to derive scene representation models in
terms of the spatial relations among the objects to be manipulated. Then, an opti-
mization algorithm for Active Scene Recognition (ASR) allows embedding
canonical scene recognition in a decision-making system to select best camera
views for 3-D object localization.
Rich of experiments in a setup mimicking a kitchen, the results demonstrate the
good performance of ISM trees as scene classifiers for a large number of object
arrangements. A very fine addition to the STAR series!

Naples, Italy Bruno Siciliano


July 2019 STAR Editor

ix
Preface

While it is the purpose of this thesis to convey the most important findings of my
Ph.D. research, I want to take this preface as an opportunity to report on the very
nature of doing doctoral studies as I got to know it. While some may argue that
finding the right institution and getting admitted there is the main challenge for a
graduate—from my point of view, the former is a matter of personality, while a
good recipe for the latter is to carry out one’s studies at a lab of one’s own choice as
continuously as possible—I think that being a researcher is a major challenge that
has little in common with succeeding in one’s university studies. From my own
experience, numerous of my colleagues experienced disappointment and frustration
while being Ph.D. candidates, even though working under—in my view—good
conditions. I suggest that this kind of issues results from misconceptions of what it
actually means to do doctoral studies. As an attempt to clarify this at least for my
field in Germany, I want to draw an analogy between being a Ph.D. candidate and
an entrepreneur on the basis of Long et al. from 1983. More precisely, I propose
that Ph.D. students consider themselves as being entrepreneurs. According to Long
et al., a first defining aspect of entrepreneurship is self-employment. While the
colleagues at my lab and myself were employees in the public service, I still think
that this attribute applied to us, e.g. because we were continuously expected to
come up with new research challenges on our own. Far beyond our mere interest in
technology, it was essential to have the ambition to discover research questions as
well as to develop and present appropriate answers. In the sense of my entrepreneur
metaphor, we had to figure out promising business opportunities, to develop offers
and to sell them. In my opinion, the fact that research findings are mostly attributed
to individuals is closely linked to the self-employment in academia and is thus an
indication of it. For example, Nobel Prizes are to this day awarded to an individual
and not to collaborative achievements. The impact of findings from Ph.D. research
is commonly regarded as a good measure for the achievement they represent. If one
considers a publication which contains such findings, an offer and the authors as its
supplier, the impact can be equated with the benefit Ph.D. students can strive for.
As entrepreneurs, Ph.D. candidates should, therefore, keep the actual purpose

xi
xii Preface

of their endeavor in mind—maximizing impact through appropriately publishing


relevant results.
With no further proof, I claim that publications are offered on a market where
their authors compete with others. This means that working in academia coincides
with facing highly competitive situations many graduates may be confronted with
for the first time. Maximizing benefit is only possible if one has good knowledge
of the market he participates in and is permanently adapting to it. In concrete terms,
one should carefully assess which conferences or journals are best suited for his
results regarding their thematic focus and reputation. Besides, one should present
his findings in a way that they are as easily accessible as possible to his reviewers
and to other potential readers. When designing a publication, it is indispensable to
adopt their perspective in terms of, for instance, their knowledge on the topic in
question, their possible associations with the employed vocabulary and the time
they are willing to invest in order to understand the publication. Assessing a market
furthermore goes along with estimating how many competitors one has on a
research problem and who they are. A Ph.D. candidate should ask himself whether
it makes more sense for him to work on a popular topic with a large community but
presumably under time pressure and with considerable risks that his results may be
overlooked? Or does he prefer to look for a niche, with the consequence that little
exchange will be possible or that the relevance of either the problem he addresses or
the solution he proposes may be challenged as such? Another question one might
have to answer in the second case is whether the state of the art is advanced enough
for him to generate substantial results in the short time span of doctoral studies.
Returning to the economics perspective, it seems obvious that investments have
to be undertaken in order to create benefits. Whether and to which degree investing
one’s lifetime pays off as scientific impact is highly speculative. Besides activity on
markets, considerable uncertainty is another attribute of entrepreneurship, thus
supporting the analogy I make. The last aspect of research entrepreneurship I want
to address is management and how to optimize its cost–benefit ratio. At my lab,
managing not just applied to ourselves but also to the undergraduates we super-
vised. Depending on our strengths, i.e. on whether one of us made better progress
working alone or in a team with contributing undergraduates, the additional
resources provided by these undergraduates for solving problems outweighed the
costs of attracting and supervising them. In my opinion, negative experiences from
supervising students often originate in ignoring that supervision represents an
investment into obtaining contributions to research problems or to tasks of lesser
scientific benefit. Being an investment, supervision has to be treated accordingly.
Of course, attracting and supervising undergraduates are fields prone to optimiza-
tion—especially since they come along with participating in a market. Acquisition
can be optimized by thinking about how, where and when to make offers that match
the interests of undergraduates, i.e. their thematic interests—one shouldn’t under-
estimate the importance of trends, their insecurities and their call for reliability.
Optimizing supervision equals optimizing the outcome of the time that both the
supervisor and the undergraduates invest. Of course, this happens under the con-
straint that the quality of supervision is kept up—in my view, the foremost priority
Preface xiii

for any Ph.D. candidate as soon as he starts supervising. We tried to optimize our
efforts in supervising with various concepts such as chaining fixed-length
appointments, undergraduates working together on greater problems and experi-
ments, undergraduates supervising each other, groupware-supported supervision or
the usage of development frameworks such as Scrum. What proved to be essential
to us was not only relying on the aforementioned mechanistic approaches but also
taking into account the specific traits of each individual undergraduate in order to
adapt their respective tasks, working conditions as well as our leadership style
during his stay at our lab.
Provided sufficient expertise as well as the toughness and perseverance to remain
focused on obtaining research findings—despite the numerous encountered dis-
tractions and interruptions—I am convinced that anyone who can identify with
being a research entrepreneur can find his fulfillment in my field. To conclude, I
wish everyone a hopefully insightful and maybe even enjoyable read of this thesis.
This book is equivalent to the Ph.D. thesis I submitted under the title “Indoor
Scene Recognition by 3-D Object Search for Robot Programming by
Demonstration” to the KIT Department of Informatics. I defended this thesis at
Karlsruhe Institute of Technology (KIT) on July 6th, 2018. The source code for all
contributions of this approved thesis is freely available under https://github.
com/asr-ros.

Karlsruhe, Germany Pascal Meißner


August 2018
Acknowledgements

My sincere gratitude goes to Dr. Stefan Gächter Toya, Prof. John K. Tsotsos,
Dr. Robert Eidenberger and Prof. Antonio Torralba for inspiring me with their
research. They laid the foundations for the contributions of my thesis.
I am very grateful to my advisor Prof. Rüdiger Dillmann for putting his trust in me
while I pursued my doctoral studies. I particularly thank him for supporting my vision
while granting me complete freedom in defining and implementing it. Moreover, I
would like to thank Prof. Michael Beetz for the interesting conversations about my
research problems, we had. My special thanks go to Prof. Torsten Kröger for his
tremendous support towards the end of my doctoral studies.
My deepest gratitude goes to my mentor Dr. Sven R. Schmidt-Rohr. First as my
supervisor, then as a colleague, he provided decisive support in word and deed
throughout highs and lows. I also thank him for broadening my horizon in unex-
pected directions with his compelling enthusiasm and strategic foresight. Many
thanks to Dr. Rainer Jäkel for his expert advice as well as for his friendly, calm and
consistently helpful manner. My thanks additionally go to Dr. Martin Lösch for
being such a committed leader of our research group at the beginning of my
doctoral studies.
My gratitude goes to my student co-workers Tobias Allgeyer,
Florian Aumann-Cleres, Jocelyn Borella, Souheil Dehmani, Benny Fuhry,
Nikolai Gaßner, Joachim Gehrung, Fabian Hanselmann, Heinrich Heizmann,
Florian Heller, Robin Hutmacher, David Kahles, Oliver Karrenbauer,
Daniel Kleinert, Felix Marek, Matthias Mayr, Jonas Mehlhaus, Sebastian Münzner,
Trung Nguyen, Reno Reckling, Ralf Schleicher, Patrick Schlosser, Patrick Stöckle,
Daniel Stroh, Jeremias Trautmann, Richard Weiss and Valerij Wittenbeck for
spending countless days and nights in my two labs and joining me in struggling with
both hard- and software.

xv
xvi Acknowledgements

Special thanks go to Armin Dürr—former owner of the “1001 Computer” store


in the small town of Bretten—for introducing me to the world of IT. I conclude by
thanking my family, in particular, Antje Lossin as well as Corinne and
Jürgen Meißner, for providing great assistance through the eventful years of my
doctoral studies.
Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.1 Programming by Demonstration . . . . . . . . . . . . . . . . . . . . 5
1.1.2 Passive Scene Recognition . . . . . . . . . . . . . . . . . . . . . . . . 8
1.1.3 Active Scene Recognition . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2 Thesis Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4 Document Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 23
2.1 Scene Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 23
2.1.1 Convolutional Neural Networks and Image Databases .... 23
2.1.2 Applicability of Convolutional Neural Networks
and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2 Part-Based Object Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2.2 Constellation Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2.3 Implicit Shape Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.2.4 Pictorial Structures Models . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2.5 Comparison and Conclusion . . . . . . . . . . . . . . . . . . . . . . . 34
2.3 View Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3.2 Selected Approaches to Three-Dimensional Object
Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 37
2.3.3 Comparison and Conclusion . . . . . . . . . . . . . . . . . . . .... 39
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 41

xvii
xviii Contents

3 Passive Scene Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 43


3.1 Concept Overview of Passive Scene Recognition . . . . . . . . . . . .. 43
3.2 Concept Overview of Relation Topology Selection . . . . . . . . . . .. 46
3.3 Scene-Related Definitions and Data Acquisition from
Demonstrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 50
3.4 Implicit Shape Models as Star-Shaped Scene Classifiers . . . . . . .. 53
3.4.1 Scene Classifier Learning—Pose Normalization for
Rotationally Symmetric Objects . . . . . . . . . . . . . . . . . . .. 53
3.4.2 Scene Classifier Learning—Generation
of an ISM Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 55
3.4.3 Scene Recognition—Voting for Scene Category
Instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 58
3.4.4 Scene Recognition—Verifying Buckets for Scene
Category Instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 60
3.4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 70
3.5 Trees of Implicit Shape Models as Hierarchical Scene
Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 71
3.5.1 Generation of an ISM Tree by Heuristic Depth-First
Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.5.2 Scene Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.5.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.6 The Learning of Optimized Trees of Implicit Shape Models . . . . . 90
3.6.1 Implicit Shape Model Trees for Complete Relation
Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.6.2 Overview of Relation Topology Selection . . . . . . . . . . . . . 94
3.6.3 Generation of Test Configurations for False Positives . . . . 101
3.6.4 Generation of Successors of a Relation Topology . . . . . . . 103
3.6.5 Relation Topology Selection with Hill-Climbing . . . . . . . . 106
3.6.6 Relation Topology Selection with Simulated Annealing . . . 111
3.6.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
4 Active Scene Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.1 Concept Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.2 Robot Software Architecture for Active Scene Recognition . . . . . . 132
4.3 Data Acquisition from Demonstrations of Scene Variations . . . . . . 136
4.4 Object-Search-Related Definitions . . . . . . . . . . . . . . . . . . . . . . . . 138
4.5 Prediction of Object Poses with Trees of Implicit Shape
Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
4.5.1 Object Pose Prediction Algorithm . . . . . . . . . . . . . . . . . . . 141
4.5.2 Sampling of Scene Models . . . . . . . . . . . . . . . . . . . . . . . . 148
4.5.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
Contents xix

4.6 Estimation of Next-Best-Views from Predicted Object Poses . . . . . 157


4.6.1 Objective Function for the Rating of Camera Views . . . . . 157
4.6.2 Optimization Algorithm for Next-Best-View Estimation . . . 166
4.6.3 Invalidation of Lines of Sight in Clouds of Predicted
Poses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
4.6.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
5.2 Evaluation of Passive Scene Recognition . . . . . . . . . . . . . . . . . . . 179
5.2.1 Influence of Object Pose on Passive Scene Recognition . . . 179
5.2.2 Influence of Object Occurrence on Passive Scene
Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
5.2.3 Runtime of Passive Scene Recognition . . . . . . . . . . . . . . . 198
5.2.4 Runtime of Relation Topology Selection . . . . . . . . . . . . . . 201
5.2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
5.3 Evaluation of Active Scene Recognition . . . . . . . . . . . . . . . . . . . 203
5.3.1 Scene Category Models from Relation Topology
Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
5.3.2 Story 1—Mobile Robot Searching Utensils and Dishes . . . 214
5.3.3 Story 2—Mobile Robot Searching Food and Beverages . . . 232
5.3.4 Efficiency-Oriented Comparison of Three Approaches
to ASR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
5.3.5 Runtime of Pose Prediction Algorithm . . . . . . . . . . . . . . . 243
5.3.6 Runtime of Next-Best-View Estimation . . . . . . . . . . . . . . . 244
5.3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
6.1 Progress Beyond the State of the Art . . . . . . . . . . . . . . . . . . . . . . 249
6.2 Limitations and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
6.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258

Appendix: Collaborations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259


References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
Chapter 1
Introduction

From 2014 onwards, the European Commission decided to spend up to 700 million
euros [16] in research and innovation in the field of robotics over seven years. Apart
from this considerable funding on the public side, the private side of the robotics
community in Europe invests another 2.1 billion euros [16] together with other par-
ties. The objective of this overall public-private investment of 2.8 billion euros is
defined in the Strategic Research Agenda (SRA) for robotics in Europe. The SRA
document describes robots as a key technology to address long-term societal issues
“such as healthcare and demographic change, food security and sustainable agricul-
ture, smart and integrated transport and secure societies” [1, p. 7]. For future robots to
succeed in that respect, SRA in particular requires them to have perception abilities
and decisional autonomy [1, p. 41]. For instance, a robot should be able to decide on
its own which action it performs next. In the real world, decisional autonomy highly
depends on perception, as the following definition of the term situation illustrates:
A situation is the entirety of circumstances which are to be considered for the selection of
an appropriate behavior pattern at a particular point of time [42].

The authors claim that a situation is derived from an underlying scene, which they
define as follows:
A scene describes a snapshot of the environment including the scenery and dynamic elements
as well as all actors’ and observers’ self-representations and the relationships among those
entities [42].

This definition distinguishes between a scenery and dynamic elements, depending


on whether an object is stationary or capable of moving. As this definition shows,
scene knowledge is necessary for reliable decision-making. Moreover, scenes should
not just be regarded as simple sets of objects. In fact, the spatial relations [18] among
objects are as important for correct decision-making as the mere presence of objects.
For instance, the different configurations of silverware that are depicted in Fig. 1.1
all represent different messages to a waiter.
Figure 1.1 should be considered as a special case of the overall application scenario
of this thesis, which is indoor scenes. It is important to note at this point that each
© Springer Nature Switzerland AG 2020 1
P. Meißner, Indoor Scene Recognition by 3-D Object Search, Springer Tracts
in Advanced Robotics 135, https://doi.org/10.1007/978-3-030-31852-9_1
2 1 Introduction

2
Skill Skill
S E S E
Scene References Scene References

Shall I adjust the Shall I clear the


place setting? table?

Skill
S E
Skill Scene References
S E
Scene References
Shall I add
utensils?
Shall I bring
cereals?

Fig. 1.1 1: Mobile robot observing a breakfast scene in our laboratory setup. It reasons about which
action (skill) [25] to apply. 2: Different configurations of the same utensils and dishes, representing
a different scene category [23]

of the depicted silverware configurations is a stereotypical example, standing for a


broader scene category. This concept of categories originates from object category
recognition [21]. In the context of scenes, categories subsume a set of scenes under
a common label, be it a string or a pictogram. Such a set of scenes is made up
of variations of the same configuration of objects, e.g. one of the depicted object
configurations. Scene recognition, as we define it, corresponds to estimating how
well a sensed configuration of objects matches a scene category. This is achieved
by applying a scene classifier, trained on all scenes of a category, onto the object
configuration. Let’s assume that a classifier for scene category “X”1 is given: In
relation to 1 in Fig. 1.1—where a robot observes a table—we can verbalize our
recognition problem with the following questions: “Are these utensils and dishes an
example of “X”?”, “How well does each of these objects match our model of “X”?”,
“Which objects on the table belong to “X”?”, and “How many objects from “X” are
missing on the table?”. This thesis provides the means to answer such questions.
Models that represent scenes by means of objects and spatial relations are espe-
cially suitable for human-centered indoor environments [35], i.e. our application
scenario. They can be grouped into symbolic and subsymbolic approaches—a com-
mon distinction in Artificial Intelligence [37]. Symbolic approaches commonly rely
on Qualitative Spatial Relations (QSR) [19]. QSRs correspond to prepositions in nat-
ural language which are used to describe spatial relations. Prior work on subsymbolic
approaches has instead applied part-based representations from object recognition
onto scenes, mainly probabilistic methods [17]. As subsymbolic approaches follow
an objective that is structurally more similar to ours and have shown excellent results,
such an approach is adopted in this thesis.

1 “X” stands for any configuration in 2 in Fig. 1.1, from “Pause” to “Do not like”.
1 Introduction 3

Since SRA requires autonomy [37, p. 35] from future robots, they will have to
be able to learn models of scenes on their own and in a multitude of domains. The
subsequent requirements regarding the generality and flexibility of a suitable model
for scene recognition are directly related to this observation.
1. A uniform representation of spatial relations that is sufficiently generic to describe
each type of relation but still captures the details in its variations.
2. Freedom in choosing which pairs of objects within a scene category are intercon-
nected by spatial relations.
A mobile household robot, which we take as an example, is going to face both
missing objects and clutter when trying to match a learnt scene classifier to its per-
cepts. Hence follows another requirement for scene classification:
3. Robustness against missing objects and clutter.
Existing work on probabilistic methods for part-based object recognition suffers
from severe limitations to all these requirements. As an alternative, we propose to
derive scene recognition from another method from part-based recognition that is
called Implicit Shape Models (ISMs)2 [30], a Generalized Hough Transform [8]
variant.
In the usual household, scenes will not be visible as a whole from a single point
of view. In order to nevertheless gather evidence about existing scenes, robots will
have to integrate successive estimates about the presence of objects while freely
traversing their environment. Thus, a scene category representation should be favored
that fulfills the following requirements:
4. Independence from the viewpoint from which a scene is perceived.
5. Low time consumption, since scene recognition is executed repeatedly during
evidence-gathering.
Requirement 4 is best met with a scene category model in which both object poses
and spatial relations are specified in six-degrees-of-freedom (6-DoF). In literature,
such three-dimensional models are usually limited to modeling object positions with-
out orientations. In addition, we expect scene category representations to consider
uncertainties in spatial relations. Since modeling uncertainties in 6-DoF with para-
metrical distributions is tedious, a non-parametrical approach as the ISMs is more
appropriate for us. However, it is an open issue how to adapt their representation
and the algorithms operating on them in order to model scenes in full 6-DoF while
maintaining efficiency. The same holds for requirement 2, as ISMs in the field of
part-based object recognition are only able to represent relations between a single
so-called reference part and all other parts of the object instead of relating arbitrary
combinations of parts of an object [21, p. 70].
Household scenes like in Fig. 1.2 can, e.g., be described as a single global scene
or as a combination of local scenes of presumably different categories. Since local

2 Even though the inventors of the Implicit Shape Models present a probabilistic motivation for their

approach in [30], we do not regard it as a probabilistic approach in a strict sense.


4 1 Introduction

Setting - Ready for Breakfast


Cupboard - Filled
Drinks - on Shelf
Sandwich - on Shelf
Dishwasher Basket - Filled
Cereals - on Shelf

Fig. 1.2 Example configuration of objects—a scene—in our laboratory setup. It is composed of
subscenes. The objects of each subscene are surrounded by boxes which differ in color depending
on which subscene they belong to

scenes of the same category tend to be less specialized to a certain environment


than global scenes, they are more likely found in different environments. Thus, they
yield better re-usability. This is why we favor an approach based on local scenes. In
order to take decisions autonomously, a robot needs as much information as possible
about the combination of scenes in the environment. For example, in Fig. 1.2, the
robot cannot decide whether to set or to clear the table just from noticing that the
scene category “Setting—Ready for Breakfast” isn’t complete. Instead, it should
additionally determine whether the missing utensils or dishes are either part of the
category “Cupboard—Filled” or of “Dishwasher Basket—Filled”.
Because objects in indoor scenes may be widely distributed or occluded by clutter,
scene recognition can generally not deliver good results without a multi-view strategy
[10, p. 1] or, more precisely, a method to visually search for objects. Object search in
three-dimensional environments, a term coined by [46], is dealt with in the field of
view planning [40] for visual sensors. In this thesis, we adopt a common formalization
of object search as a succession of optimization problems, each of which consists of
estimating a Next-Best-View, i.e. the next best viewpoint for a mobile robot to search
objects. In general, three-dimensional object search in the real world has to face vast,
high-dimensional search spaces and large time consumption for each search step.
Under such circumstances, blind search turns out to be intractable in practice. On a
formal level, view planning for object search has proven to be NP-complete in [46].
Thus, informed search algorithms are favored. Existing work on informed object
search focuses on the design of application-specific objective functions rather than
dealing with the development of optimization algorithms. The defined optimization
problems usually come with extensive simplifications to reduce computational costs.
It is generally assumed that hypotheses about the emplacement of searched objects
1 Introduction 5

are known in advance, either from prior knowledge [32] or by way of intermediate
objects [29].
On the contrary, autonomy [37, p. 35] demands from robots the ability to adapt
their knowledge to arbitrary environments. This holds in particular for object search.
Consequently, we require the following for this problem:
6. Hypotheses about the 6-DoF poses of searched objects should not be predefined
but rather predicted at runtime from estimates about present scenes.
7. A realistic model for how to search objects with visual sensors. The choice of
sensor viewpoints during the search should consider three-dimensional space,
taking into account both sensor position and orientation. It should also precisely
model how the interdependence between sensor viewpoints and the 6-DoF poses
of the searched object affects visual perception.
In order to meet these requirements, we designed an optimization problem and
algorithm for object search and decided to combine them with our method for
scene recognition. This allows for guiding the subsequent search for missing objects
through information about partially recognized scenes. For the special case of the
place setting in 1 in Fig. 1.1, this proceeding could be verbalized by a robot as fol-
lows: “I have found a milk carton on the table that should belong to a scene of
the category “Cereals—Setting”. Let’s estimate where I should look for the miss-
ing cereals box that belongs to this category, too.” In this thesis, we call such an
approach Active Scene Recognition (ASR). In cognitive science, this term [43] has
been coined as a process in which a human observer improves his capabilities in
visual perception by deliberately changing between points of view. This contrasts
scene recognition in which the observer is immobile. In computer science litera-
ture, recognizing scenes and searching objects are usually investigated as separate
problems in different research fields. Scene recognition is generally performed on a
single sensor-reading. Reference [43] refers to that as a passive approach. In order
to stress the difference between scene recognition with an immobile observer and
recognition with a mobile observer, we designate scene recognition without object
search as Passive Scene Recognition (PSR).

1.1 Motivation

1.1.1 Programming by Demonstration

Programming by Demonstration (PbD) [14] is a paradigm that aims at providing


non-expert users with the means to intuitively program robots. An example for the
capabilities that have been programmed by demonstration in literature is manipula-
tion skills such as pouring from a bottle into a cup [25]. When performing PbD, the
first step is to continuously record a demonstration of a skill with the help of sensors,
in particular by means of visual perception. The skill can be either demonstrated
6 1 Introduction

Robot
Human

Perception
Sensors
Actors

Execution, Evaluation
User demonstration: and Adaption
how

Modeling +
Simulation Simulation +
Validation
User intention:
why

Model-based Transfer

Interpretation
Segmentation,
Abstraction,
User interaction Generalization
Background knowledge

Fig. 1.3 Overview of the steps in the principal method for PbD of skills from user demonstration.
Derived from [14]

by the user himself or by directly controlling the robot that is to be taught [14].
Our work is based on the first approach, transferring skills from humans to robots.
Demonstrations by users can be mapped to differing robot systems, in contrast to
demonstrations with a robot that are restricted to that target system [14]. The different
steps of this approach, shown in Fig. 1.3, are as follows: It is usual to demonstrate
different variations of the same skill, all annotated by the user. This allows for a
subsequent step, based on learning algorithms, to abstract from the concrete setup
(in which each demonstration takes place) to a generalized concept of the skill itself.
This conceptual representation encodes the goals which are the purpose of a learnt
skill and enables robots to adapt their skill knowledge to deviating situations. In that
respect, PbD differs from the related Imitation Learning, the perspective of which is
focused on reproducing and adapting demonstrated motions rather than abstracting
to a conceptual representation. Before the robot can execute such conceptual knowl-
edge, an additional step in PbD transfers the skill model to the kinematics of the
target system.
1.1 Motivation 7

Assuming that an exemplary robot has four different skills from PbD at its disposal,
like in Fig. 1.1, the next problem is whether any of these skills should be used in the
presence of a specific scene, and if so, which one? Given in the scene in Fig. 1.1,
the robot could opt for bringing cereals because, e.g., the present milk carton has
always been observed next to a cereals box in a scene category named “Cereals—
Setting”. Or it could choose to clear the table as a knife lies on top of the plate in
the middle of the place setting, which is usual in scene category “Setting—Clear
the Table”. Deciding which of these skills is applicable leads us back to perception
and decisional autonomy. The applicability of a skill can be formalized as a set of
preconditions [25] that have to be fulfilled. As the aforementioned example illustrates,
scenes or rather their presence is an important cue among those preconditions. Since
skills, generated by PbD, are expected to adapt to changing environmental conditions
during execution, this must also be expected of its preconditions and in particular of
the scene category models, to which some of these preconditions refer.
We designed the contributions of this thesis to seamlessly integrate into the
approach of Jaekel et al. [24] for PbD of manipulation skills. How both systems
are interrelated is visualized by the connected pair of arcs in Fig. 1.4. The learning
of skill models takes place in two steps which are visible in the upper arc: In the first
place, the demonstration of a user is generalized to a skill model. Then this model is
specialized to the target robot system. Just before the skill is executed on the target
system, its model references representations of those scene categories that are among
the preconditions for this skill. We can assume that preconditions for a skill model,
generated by PbD, should rather rely on scene category models that are specialized
in this skill, instead of scene category models which are trained independently of
the skill. Nevertheless, this does not exclude that the same category model can be
referenced by multiple skills. For example, tea drinkers would want a category model
named “Drinks—Setting” for breakfast to associate milk to tea rather than to coffee.
These user-specific preferences can, e.g., also hold for a specific order in which food
or silverware is put to order in a cupboard. This especially affects the spatial rela-
tions between the objects in a scene. In order to create user-specific scene category
models in a non-expert-friendly manner, we adopt the principle of learning from user
demonstrations in our learning of scene category models. Learning such category
models and their usage in scene recognition is depicted in the lower arc in Fig. 1.4.
For each scene category that is a precondition to a specific manipulation skill, the
user presents a number of possible variations of object configurations in the course
of a demonstration. Every variation is sensory-recorded and interpreted by means
of visual perception. Based on estimates about the names and locations of the con-
cerned objects, a learning algorithm for scene classifiers first decides which objects
to connect pairwise by spatial relations, before the actual classifier is deduced from
the recorded estimates under consideration of which relations have been selected.
Once scene classifiers for all preconditions of a given skill are acquired, they can by
used for Active Scene Recognition: A robot that wants to apply a skill at runtime,
first uses the set of related scene classifiers to check for the presence of the required
local scenes. More precisely, it tries to extract scene models from its percepts before
it starts to employ the learnt skill model. This is mainly achieved by repeatedly
8 1 Introduction

Learning: Skill Model


Skill Model Skill Model
Generalization Specialization

Demonstration: Skill Execution: Skill

Programming by Demonstration
of
Manipulation Skills

Grounding in scenes

Scene Category Models


for
Programming by Demonstration

Demonstration: Scenes Scenes, recognized by robot

Selection of Passive Scene Three-dimensional


Spatial Relations Recognition Object Search
Execution: Active Scene Recognition

Fig. 1.4 Manipulation skills from PbD have to be grounded in recognized scenes. Upper arc is
derived from [24, p. 8]

performing Passive Scene Recognition and object search in an alternating fashion.


We only had a restricted number of training examples, i.e. variations of scenes, at
our disposal for learning classifiers since such examples ought to be specifically
demonstrated for a given skill. Consequently, we had to derive a concept of scene
classifiers that already delivers accurate models with a small number of examples.
We took ISMs as a starting point as they are pointed out in literature [21, p. 100] as
an approach for part-based object recognition that fulfills this requirement.

1.1.2 Passive Scene Recognition

We deduced our definition of scenes from that by Ulbrich et al. [42], presented at
the beginning of this chapter. Just like them, we regard scenes as snapshots of the
environment and not as processes with a start and end point in time. According to the
authors, scenes can on the highest level be subdivided into their elements and actors,
1.1 Motivation 9

Scene
Models

(1,m) (0,n)
relation with

Object Subscene Reference

Models Object
Element Reference
Object Scene
3-D-Position Scene (0,m)
relation with
(1,n)

3-D-Orientation Recognition
3-D-Position
Name
3-D-Orientation 3-D-Position
Object 1 Object 2 ... Object n Name 3-D-Orientation

Name

Scene 1 Scene 2 ... Scene m

Fig. 1.5 Definition of those data structures as entity-relationship models [9] that are input and
output to scene recognition

also called observers. Representations of actors focus on their skills. Since dealing
with skill modeling can be outsourced to PbD of manipulation skills as developed by
Jaekel et al. [24], we deliberately leave them out in our scene definition. Furthermore,
we only consider those scene elements that are relevant to the successful execution of
a given skill. It is up to the human demonstrator to pick out the relevant elements in
the real world. Thus, the learning of scene category models takes place in a supervised
manner [15, p. 16]. For robots to use scene classifiers in locations—different from
the places where the demonstrations take place—we usually omit the scenery and
restrict ourselves to dynamic scene elements and their interrelations. If elements of
the scenery are indispensable to a skill, they can nevertheless be considered as well.
The input to our scene classifiers, carrying out scene recognition, is a set of object
models which is visible in Fig. 1.5 on the left. Object models are usually obtained
from estimates about those objects that are present in the environment of the robot.
As features, each object model includes a 6-DoF pose and a name tag. Using position
and orientation information in combination instead of limiting ourselves to positions
is particularly important when it comes to manipulation. For example, when a robot
wants to pour something into a cup, it does not only have to take into account that
the cup has to be within its reach but also that it is standing upright. The name tag
in turn grants access to object-specific information like training data for its visual
localization or surface models for its visualization.
For each scene that a scene classifier recognizes, a scene model such as on the
right in Fig. 1.5 is output. Those models consist of objects that are connected by
binary [18], spatial relations. The objects in a scene model are a subset of the input
to the classifier and adopt their features. Strictly speaking, spatial relations do not
connect the objects within a scene. Instead, all objects are connected to a common
scene reference. We define objects as being the elements of a scene. In contrast, the
reference is a placeholder for the scene itself. Its location is nearly identical to that
of one of the objects. The name of the reference is equivalent to the name of the
scene category, prescribed by the demonstrator when learning the scene category
model for the employed classifier. Moreover, the reference has a confidence value
that expresses the confidence of the classifier in the existence of the scene. This confi-
dence is derived from confidences about how well the relations within the considered
10 1 Introduction

scene category are fulfilled. While the representation of each relation is encapsulated
in the employed scene category model, it is the scene classifier that compares this
knowledge to given object models in order to calculate confidences. Scene confi-
dences are tremendously important when it comes to integrating our approach with
manipulation skill execution. Their values decide whether the preconditions of a skill
are met.
Going back to the distinction between symbolic and subsymbolic approaches to
scene modeling, the question is which one of both approaches is best to calculate
scene models as defined in the previous section. Symbolic approaches make it possi-
ble to abstract numerical data about scenes, like estimates from visual perception, into
natural language descriptions. In order to describe spatial configurations of objects in
a symbolic manner, as our scene model does, the concept of Qualitative Spatial Rela-
tions (QSR) [12] has been introduced. Mathematical definitions for QSRs such as “on
top of”, “left of”, “inside” have been developed, as well as corresponding computa-
tional models to decide about their presence in numerical data like images or point
clouds. QSRs allow for estimating qualitative information about isolated aspects of
scenes. Combining them among each other and with attributes of objects delivers
rich, language-based scene descriptions. Encoding them in probabilistic frameworks,
for instance, allows for classifying scenes [31]. Even though classification derives
information such as the type of a scene, this can only be done on a qualitative level.
Besides, the computational models for relations that are part of those descriptions
have to be designed and parameterized by expert users. This proceeding is prone to
errors since inappropriate design or parameterization can lead to models that either
overlook the decisive details that distinguish scenes at all or are too coarse to capture
those details in case the scenes are similar enough. For example, in 2 in Fig. 1.1, a
description like “Fork and knife touch each other” does not suffice to identify which
of the five scene categories is meant. Working in a quantitative manner, a classifier
may not provide as general scene descriptions as a symbolic approach, but can rely
on more informative data [19] when it comes to scene classification. In the sense that
calculations of our classifiers entirely rely on the subsymbolic information, which
is encapsulated in scene category models, symbols as the names of spatial relations
are irrelevant in our approach to scene recognition. The only employed symbols are
name tags for objects on the input side and for scenes on the output side.
According to [18], all binary spatial relations have in common that they describe
relative poses between pairs of objects. In Euclidean space, six parameters for trans-
lation and rotation are required to express the pose of a rigid body [41]. Each of the
various mathematical formalisms for translation and rotation [41] therefore provides
the most generic manner to characterize spatial relations, though it provides no means
to organize relations into categories. In contrast, there is no mathematical formalism
for QSRs that covers a sufficiently large number of types of spatial relations in order
to be able to realistically model scenes in real-world environments [12]. Instead, a
variety of concepts for different types of spatial relations [12] coexist such as e.g. for
topological spatial relations [11]. Each concept just allows for distinguishing among
the subset of spatial relations that it represents. Since we require that scene category
models are learnt from demonstrations, a human demonstrator would be expected to
1.1 Motivation 11

R R R

R R R R
R R

1 2 3 4

Fig. 1.6 All possible relation topologies for scene categories with three objects and in which all
objects are connected

assign and parameterize mathematical definitions of spatial relations for each given
real-world scene. This in turn would require expert knowledge about QSRs, which
is contrary to the PbD paradigm. We define a unified representation of spatial rela-
tions by means of translations and rotation with no further abstraction as asked in
requirement 1.
The maximum number of binary relations that any scene category model for n
objects can represent corresponds to the number of edges n·(n−1) 2
in a complete graph
[44]. In general, we visualize as a graph the specific combination of relations that
every scene category model represents. Undirected graphs, resulting from that kind
of visualization, are shown in Fig. 1.6. Their vertices stand for objects and their
edges for relations like {R1 , R2 , R3 }. While 1 in Fig. 1.6 depicts a complete graph,
three additional combinations of less relations are shown on its right. We call these
combinations relation topologies. Figure 1.6 shows all relation topologies that are
possible for three objects.
In order to verify if a configuration of objects is consistent with a scene cate-
gory, each of the represented relations has to be checked for being fulfilled by a
corresponding object pair in the configuration. Computational costs in scene recog-
nition increase disproportionally when raising the number of objects in the scene to
be recognized. This comes as a result of an equally disproportional increase of the
maximum number of modeled relations. Thus, relation topologies with few relations
should be favored. Not every relation that can be defined on a set of objects is equally
relevant to scene recognition. For example, in 1 in Fig. 1.1, the depicted place setting
contains two forks on the left of the plate. When we look at guidelines [45] for lay-
ing place setting, two rules are commonly related to the relative poses of these three
objects. The first relates the relative poses of the forks, the second to the relative pose
of the plate and the fork lying closest to it. Consequently, if we connect the fork in
the middle to both the plate and the other fork by spatial relations, there is no need to
relate the other fork to the plate as well. Efficiency is not the only issue to consider
when deciding which relation topology to use for creating a scene category model.
False positives [37, p. 770] may occur in scene recognition, depending on the relation
topology employed. This issue is discussed in more detail in Sect. 3.6.1. In order to
effectively optimize efficiency and accuracy of scene recognition by choosing suit-
able relation topologies, we ought to base our scene category model on a method
12 1 Introduction

that is able to represent a maximum number of different relation topologies. In other


words, we have to fulfill requirement 2. At first sight, the scene model we define
only relates the reference of a scene to its elements and not any elements among each
other. This seems to be in contradiction to requirement 2. However, elements in our
scene models do not just stand for objects but can also represent other scenes. Scenes
that are elements to others are what we call subscenes. In fact, our scene category
model3 is recursively defined so that it may return entire hierarchies of scene models
as recognition results. Using hierarchical scene category models enables us to gener-
ate scene classifiers for any connected relation topology. Without loss of generality,
we consider connected relation topologies as the entirety of all topologies that a sin-
gle scene category model should be able to represent. Disconnected topologies can
be subdivided into connected subtopologies, each of which can be modeled by an
individual scene category model as long as every object participates in at least one
relation.

1.1.3 Active Scene Recognition

In Fig. 1.2, we gave an example for the cluttered indoor scenarios we address in
this thesis and interpret as combinations of several local scenes. A mobile robot is
expected to recognize such scenes, based on scene category models that are learnt
from demonstrations. A demonstration can, for example, be performed in front of an
observing robot, as indicated in the lower middle of Fig. 1.7. Since doing demonstra-
tions usually produces significant efforts for the human demonstrator, reusability of
category models is of great importance. In order to address that, we introduced the
concept of local scenes in which portions of a vast global scene are independently
modeled by their own category models. The place setting on the table in Fig. 1.1
could be such a local scene. We call it “Setting—Ready for Breakfast”. On both the
left and the right of Fig. 1.7, “Setting—Ready for Breakfast” appears at two different
locations on the same table. Spatial relations are not the only means to defining a
category model. Instead, this can also be done with the help of absolute object poses.4
Symbols representing scene category models that could have been generated by both
approaches are visible in the upper middle of Fig. 1.7.
We assume that the mobile robot with the pivoting head of visual sensors that is
shown in Fig. 1.1 is exploring its environment by alternating scene recognition and
three-dimensional object search, i.e. by performing Active Scene Recognition (ASR).
When doing so with the help of spatial relations, the robot is going to check whether
the absolute object poses, acquired during ASR, comply to the spatial relations in
the employed category model. The absolute poses are not directly checked. Instead,

3 In the following, we use the term scene (category) model to designate the entire hierarchical model,

including all subscenes.


4 In the latter case, object poses are not defined relative to each other but all in relation to a coordinate

frame fixed in the environment.


Another random document with
no related content on Scribd:
[Goes to writing-table, and
writes. After writing a
page, he blots it on
blotter and turns over
and writes on second
sheet.

Dennis. If it works I don’t go back to the city by a long sight. The


governor may go it alone till I have seen the fun.
George (rising and imitating English accent and using his watch
as an eye-glass). I say, Steve, cawnt he make the heavy English
noticeable?
Dennis. Yes; tell him to come out strong on that.
George. And remember he’s in the hands of an oculist, doncher
know. That will be a good excuse for goggles.
Dennis. Tell him we’ll share the expense if he will only come.
Steven. What was his third name?
George. George Augustus Guelph Dunstan—otherwise Dust-pan.
Dennis. When is an earl a small thing?
George (with disgust). He never is, when he’s in this country.
Dennis. You never could guess a conundrum!
George. Give it up, old man.
Dennis. When he’s a little early.
George. Hurry up, Steve. Dennis is in sad need of dinner.
Steven (reading letter). How’s this?

“Dear Frank,—We hear you are to come up here on Tuesday. Now,


if you want a soft thing pay heed to what I write. We expect a howling
English Lord up here the last of the week, and the girls are going to
lay themselves out for his benefit, just to spite us poor republicans.
Put on goggles, a beard and wig; get a big pattern suit and a leather
hat-box, and telegraph Mrs. Wycherly (in the name of Ferrol), that
you will arrive on the 5.15 train Tuesday. You will be met, coddled,
caressed, etc. etc., till we shall all call you tenderfoot. But a word in
your ear! Make yourself rather disagreeable. Dress in the wrong
clothes at meals. Use the words ‘nasty’ and ‘beastly’ frequently, and
of all things meet the girls more than half-way in their attentions.
Your name is George Augustus Guelph Dunstan, Earl of Ferrol and
Staunton. Your papa is the Marquess of D-a-c-h-a-n-t (pronounced
Jaunt). Your dear mama is no more. You have been in Florida, where
you hurt your eyes, and are just from Washington—‘a beastly bore,
you know.’ I would give untold gold if I could do it instead of you.

“Always yours, Steve.”

Dennis. I say, boys, we must have a kodak ready for the unveiling,
and catch the girls’ faces on the fly.

George } (together, shaking hands and


Steven } laughing heartily). Oh! won’t
Dennis } it be rich!
Enter Rose, r. d.
Rose (crossing up stage to r.). Why, you wretched boys, haven’t
you gone up yet?

[Men jump and turn with


consternation.

Steven (concealing letter behind him). Why—ah—is it late?


Enter Helen, r. d., and crosses to tea-table, which she draws back
to l.
Rose. Late! You’ve just ten minutes to dress. Be quick! Mrs.
Wycherly has been stopped in the hall by a telegram, and if she
catches you here you’ll never hear the last of it.

[Men exit hurriedly and


awkwardly l. d.

Helen. Talk of the tardiness of women!


Rose. I know they’ve been talking about us. Did you see how guilty
they looked?
[Crosses to desk.
Enter Amy, r. d.
Amy. After what Mrs. Wycherly said of tardiness, they ought to
look guilty.
Rose (seating herself at desk and arranging pens, etc.). If they are
not late, it’s Seymour’s fault, not theirs.
Helen. I hope mama won’t wait for them. I have a good mind to tell
Seymour to put a lump of ice in the soup.
Amy. I should rather see those good for nothing, gossiping, over-
spoiled men there.

[Rose begins to study blotter


with great interest.

Helen. They deserve some kind of penance for their behaviour this
afternoon.
Amy. Yes, even in addition to our intended neglect when Lord
Ferrol arrives.
Helen. Oh, we can make it a capital joke, and if Lord Ferrol is only
nice we can have both the joke and a good time.
Amy. Well, I don’t care what Lord Ferrol is; I am going to use him
to punish—them.
Helen. Oh! Amy, why that significant pause? We all know how
them spells his name.
Rose (springing to her feet with a scream). Girls! Girls!!
Amy (startled). What’s the matter?
Rose (melodramatically). My Lords! My Lords! There are traitors
in the camp and treachery stalks rampant.

[Comes to centre with blotter.

Helen. Oh, come off that roof!


Rose. No, really, I’m in dead earnest.
Amy. What is it, Rose?
Rose (evidently reading with difficulty from blotter). Listen.
“Dear Frank,—We hear you are to come up here on Tuesday. Now, if
you want a soft thing, pay heed to what I write—” Oh, I can’t read it
backwards. Where is a mirror?
Helen (rushing to mantel). Here, Here.

[Holds mirror in front of


blotter.

Rose (reading). “We are expecting a howling English Lord up here


the last of the week, and the girls are going to lay themselves out for
his benefit.”

Helen } (with intense anger). What!!!


Amy }
Rose (reading). “Just to spite us poor republicans. Put on goggles,
a beard and wig; get a big pattern suit and a leather hat-box.
Telegraph Mrs. Wycherly (in the name of Ferrol) that you will arrive
on the 5.15 train Tuesday. You will be met, coddled, CARESSED!!

[Drops blotter in rage.

Amy (shrieking). Oh!


Helen (intensely). What!! (Grabs at blotter eagerly.) Here, you
read too slowly, let me. (Amy holds mirror.) “Coddled, caressed, till
we shall call you tenderfoot. But a word in your ear! Make yourself
rather disagreeable. Dress in the wrong clothes at meals. Use the
words ‘nasty’ and ‘beastly’ frequently, and of all things meet the girls
half-way in their attentions. Your name is George Augustus—” It
ends there.

[Girls look at each other


indignantly.

Amy (dangerously). It was about time!

[Going to the mantel and


replacing mirror.
Helen. What shall we do?
Amy.
“And he said can this be?
We are ruined by Chinese cheap labour (pause)
We will go for them heathen Chinee.”

Helen (turning). Yes!—but how?


Amy. Girls, put on your thinking-caps, and hunt for some terrible
punishment.
Rose. Something “lingering, with boiling oil or melted lead.”
Enter Mrs. W. r. d., with telegraph blank in hand.
Mrs. W. Why, girls, what were those shrieks about?
Rose (with embarrassment). Oh, nothing, Mrs. Wycherly. That is

Amy. I hope we didn’t frighten you, Mrs. Wycherly.
Mrs. W. Oh, no! I was only coming in to speak to Helen. (Helen
comes to centre.) I have just received a despatch from Frank Parker.
He has been called back to San Diego by the illness of his mother, so
we shall not have his visit after all. (Hands telegram to Helen and
sits at desk r. Rose sits at desk l. Helen and Amy cross to r. and
evidently consult over telegram.) I really am very sorry, for I wanted
to renew with the son a very old family friendship, but there is no
chance, for he has gone West already.
Helen (crossing to Mrs. W. and pleading). Oh, mama! Will you
not keep it a secret from the boys? Only George and Steven would
care, and we have a really good reason for not wanting them to know.
Oh, please, mama!

[Puts arms round Mrs. W.’s


neck.

Amy (beseechingly). Oh, Mrs. Wycherly, please do!


Rose (kneeling imploringly). Do, Mrs. Wycherly!
Mrs. W. (suspiciously). What mischief are you concocting now?
(rising and going to l. d., followed by all the girls). Well, I won’t
promise not to, but I will hold my tongue till I see that I had better
speak.
Helen. Oh, you dear mama!
Mrs. W. (laughing). Temper your justice with mercy.
[Exits l. d.
Helen (melodramatically coming down c.). Who talks to me of
justice and mercy!
Rose. Helen, can’t you arrange to have Burgess drive over to that
5.15 train? It would be so lovely to see the men’s faces when the
carriage came back empty.
Amy. Gracious! If we only could get the real Ferrol here, in place of
the fictitious, and yet make the men think it was Mr. Parker.
Rose. But Lord Ferrol won’t be here till Friday, and by that time
the boys will have either found it out, or suspect from the time that it
really is the genuine article.
Amy. I’ll tell you what to do. Let me wire my cousin Jack Williams
to get himself up as an Englishman, and come up here on Tuesday. I
can coach him so that he can pass himself off for Mr. Parker, and the
two are enough alike, judging from the description, if disguised, to
fool the boys.
Helen. But the moment they were alone with him they would find

Rose (interrupting). We’ll arrange it so that until we are ready for
developments, they shall have no chance to find out.
Rose. But how about Mrs. Wycherly? She knows Mr. Williams,
doesn’t she?
Amy. We’ll let her into the secret—she’ll enjoy it as much as any of
us.
Helen. And she’s always wanted to have your cousin here.
Rose. Quick, Amy. Write the telegram.

[All rush to desk. Amy sits in


chair l.

Helen. Mercy! but you’ll ruin yourself with such a one.


Rose. We’ll have to share the expense.
Amy (getting paper and pencil). No, I shall only send a short
despatch, and write full particulars by letter. Let me see—(Aloud.)
“Come up here disguised as an Englishman—goggles, beard, wig,
loud clothes—”
Rose. And hat-box.
Amy. “And hat-box, by the train that gets here?—”
[Looks at Helen inquiringly.
Helen. Five fifteen.
Amy. “That gets here at 5.15 Tuesday. Wire Mrs. Wycherly in name
of Ferrol that you will be here at that time. Further particulars by
post, but don’t fail.—Amy.”
[Rises and folds telegram.
Rose. If he will only come! Think of those boys watching our
attention to him, and laughing in their sleeves.
Rose. And we all the time laughing at them.
Helen. And think of their faces when the discovery is made!
Rose. Oh, Helen! You must have your camera ready, and take them
at that moment.
[All laugh.

Curtain
ACT II

Scene.—Same room, and same arrangement, except that tea-


table is up back to r., and the easy-chair l. is down centre. Mrs.
W. sits chair c. sewing. Rose sits on arm of easy-chair r. Amy
walking up and down at back. Helen sits chair r. of fireplace.
Amy (restlessly). I am so excited I can’t keep still. If Jack hadn’t
telegraphed when he did, I could never have survived the nervous
strain—but weren’t the men’s faces lovely when you read the
despatch at luncheon! Sly dogs!
Helen. I hope it will take the boys so long to clear the snow off
Silverspoon that we can have your cousin alone for a few minutes.
Rose. No such luck as that! Our evening’s skating will hardly weigh
with them, compared to the danger of our greeting the supposed Mr.
Parker without their moral support to carry him through.
Helen. I almost wish it were Mr. Parker instead of Mr. Williams
who is coming. How we could torture them all by awkward
questions!
Rose. I don’t think I ever appreciated before how deliciously the
Indian must feel when he takes his enemy’s scalp.
Mrs. W. Why, you blood-thirsty little wretch!
Helen. Mama, we must make our arrangements so that they will
have no chance to interview him this evening. Then, to-morrow, we
will either fully coach him, or let them find out the trick—according
to our wishes.
Mrs. W. Let me see,—I will meet him at the front door; the
moment the carriage drives up—
Helen. Yes, and you must bring him in here to tea. We won’t let
him go till the bell rings for dressing. Then we will all see him
upstairs.
Mrs. W. But you can’t watch him after he is once in his room, and
any of the men can go to him.
Rose. “Not if the court understand himself, and he thinks he do.”
We will spell each other, so that one of us shall sit in the upper hall
till Mr. Williams comes downstairs. The boys would never dare to
run such a battery without a better excuse than they can invent for
going to the room of an entire stranger.
Mrs. W. That makes it safe till we leave them to their cigars.
Helen (coming down, and sitting on the arm of Mrs. W.’s chair).
Mama, you will have to tell the boys that for a particular reason,
cause unspecified, you want to let the servants clear the dining-room
early, so as to set them free. Tell them to smoke in the library; we will
sit with them and put up with the smoke for once.
Rose. That will do, and you must break up the party at our usual
bed-time with the excuse that Lord Ferrol, after his journey, will
want to retire early. Take no denial, and we will escort him upstairs.
Then we girls will sit on the divan in the hall and gossip till we feel
sure that all is safe.
Amy. And we’ll write a note making an early appointment with
him in the valley summer-house; and then—(Sounds of laughter
outside.) Hush!
Enter George, Steven, and Dennis, r. d., and cross over to
fireplace, where they stand and warm their hands.
Mrs. W. Ah, what a breath of winter freshness you bring in with
you!
Steven. It is a simply glorious afternoon. How you girls could stay
indoors and roast over a fire is a puzzle to me!
Dennis. You forget, Steve, that telegram which came at luncheon.
They were afraid they might lose a few moments of his society!
George. If his ludship isn’t afraid of a little frost, we will show him
how to spend an evening on the ice.
Dennis. I’ll bet a box of chocolates that he doesn’t know how to
skate. (Aside to men.) They don’t have ice in Southern California.
Amy. Ten pounds and taken. (Aside to girls.) Jack is a superb
skater!
Steven. Two to one that Dennis wins.
Rose. I suppose you think you are betting on a certainty, so I shall
take you up, just to make you feel ashamed when I lose.
Steven. Mrs. Wycherly, can’t we have our tea without waiting for
his giblets? I am simply famished!
Helen (crossing to l.). I wonder if men ever really think of anything
besides eating.
George. If you think that clearing the drifts off that lake is a light
and ornamental position under the government, try it.
Mrs. W. (rising and reseating herself at desk chair r.). Well,
Helen, you may make it now, only save a cup for Lord Ferrol.

[George pulls easy-chair c.


back to r., while Dennis
and Steven bring tea-
table to former position
by chair. Rose exits l. d.

Helen (coming to tea-table and holding cup up). Lord Ferrol’s


cup.
Steven. Oh, no!
Dennis. Never!

[They try to obtain possession


of it.

Helen (going round table and sitting, still holding cup). Not for
you.
Enter Rose with hot water pot. Men return to fireplace. Amy sits
easy-chair l. of tea-table.
Rose (rubbing teapot against Dennis’s hand as she passes). Hot
water.
Dennis (jumping and looking at his hand). Not the least doubt of
it.
Helen. Make the most of it, boys: it’s the last time our tea will be
sweet to you!
Dennis. Why is Helen like a “P. & O.” steamer?
Helen (indignantly). I’m not!
Steven. Because she’s steaming the tea?
Dennis. No.
Amy. Don’t keep us in suspense.
Steven. Because she’s full of tease.
George. You make me tired.
Steven. Is that why you sat down so often on the ice?
Helen. Isn’t that just like George,—sitting round, while the rest do
the work.
George. If you think there’s any particular pleasure in sitting in a
snowdrift, there’s one outside, right against the verandah.
Steven. That would never do at present. It might result in a cold,
and so destroy our little plan of winning the maiden affections of—
well, I won’t give him a name till I have seen him!
Helen. It is hard to put up with foreign titles, but as long as our
government will not protect that industry, the home product is so
rude, boorish, VULGAR, and YOUNG, that we cannot help—
Rose (interrupting). Listen! (Pause.) There’s the carriage.

[All rise and start toward r.


door.

Mrs. W. (rising and intercepting them at door). Now, don’t all


come running out to frighten the poor man. (Men return to
fireplace; girls reseat themselves.) Let his first greeting be with me,
and then I will bring him in and let him see you and get a cup of tea.
[Exit r.
Dennis (stalking down stage).
Fe, Fo, Fi, Fum,
I smell the blood of an Englishman;
Be he alive, or be he dead,
I’ll grind his bones to make me bread.

Rose (pointing at Dennis).


Ping Wing, the pieman’s son,
Was the very worst boy in all Canton,
He stole his mother’s—

Mrs. W. (outside). No, I’m sure—


Enter Mrs. W. and Lord F. (in goggles and wig) r. d. and come
down c.
Mrs. W. You are chilled by your ride, so you must have a cup of tea
before going to your room. Helen, this is Lord Ferrol. My daughter,
Miss Wycherly, Miss Newcome, Miss Sherman, Lord Ferrol—.
Lord F. (bowing). Charmed, I assure you!
Mrs. W. My nephews, Mr. George and Steven Harold, and Mr.
Grant. There! the formidable host is reviewed, and you can now
make yourself as comfortable as possible.
Lord F. Er, thanks, but if you will allow me, I will go to my room
first,—I am so filthy.
Mrs. W. Oh, but you really must have tea first.
Lord F. You’re awfully good, I’m sure. Er, will you pardon my
glasses, but I burned my eyes shooting alligators, and, er! that was
why I couldn’t make a more positive date, for I was in the hands of an
oculist.
Amy (aside). Oh! Jack, what a lie!
Steven (aside). Didn’t I tell you the old fellow would come out
strong? I shouldn’t know him myself?
Amy (rising from easy-chair l.). Here, Lord Ferrol, I have been
sitting in the easiest chair to prevent the others from taking it, so that
you should have it when you came.
Lord F. Er, thanks, awfully!

[Sits. Amy stands in devoted


attitude just at back of
his chair.

Rose (rising and bringing hassock). Let me give you this hassock
—one is so uncomfortable in these deep chairs without one.
Lord F. Er, Thanks! You’re very kind.
Helen (tenderly). Lord Ferrol, will you tell me how you like your
tea?
Lord F. Strong, please, with plenty of cream and sugar.
Amy (admiringly). Ah, how nice it is to find a man who takes his
tea as it should be taken! (looking at men scornfully). It is really a
mental labor to pour tea for the average man.
Dennis. Average is a condition common to many; therefore we are
common. Yet somebody said the common people were never wrong.
Helen. Well, they may never be wrong, but they can be
uncommonly disagreeable!
Lord F. Yes, that’s very true. You know, at home we don’t have
much to do with that class, but out here you can’t keep away from
them.
Amy (turning to men). There! I hope you are properly crushed?
Lord F. (turning to Amy). Eh!
Amy (leaning over Lord F. tenderly). Oh, I wasn’t speaking to you,
dear Lord Ferrol!
Mrs. W. I fear that you have had some unpleasant experiences
here, from the way you speak.
Lord F. Rather. (Helen hands cup with winning smile.) Thanks,
awfully!
George. Perhaps Lord Ferrol will tell us some of them; we may be
able to free him from a wrong impression.
Lord F. The awful bore over here is, that every one tries to make
jokes. Now, a joke is very jolly after dinner, or when one goes to
“Punch” for it.
Steven. To what?
Lord F. To “Punch,” don’t you know,—the paper.
Steven. Oh! Excuse my denseness; I thought we were discussing
jokes.
Lord F. I beg pardon?
Amy. Don’t mind him, Lord Ferrol.
George. No, like “Punch,” he’s only trying to be humorous.
Lord F. Er, is that an American joke?
Dennis. I always thought Punch was a British joke!
Lord F. Er, then you Americans do think it funny?
George. Singularly!
Lord F. What I object to in this country is the way one’s inferiors
joke. It’s such bad form.
Rose (horrified). Surely they haven’t tried to joke you?
Lord F. Yes. Now to-day, coming up here, I took my luggage to the
station, and got my brasses, but forgot your direction that it must be
re-labelled at the Junction, so they wer’n’t put off there. I spoke to
the guard, and he was so vastly obliging in promising to have them
sent back that I gave him a deem.
Omnes. A what?
Lord F. A deem—your small coin that’s almost as much as our
sixpence, don’t you know.
Omnes. Oh, yes!
Lord F. Well, the fellow looked at it, and then he smiled, and said
loud enough for the whole car to hear: “My dear John Bull, don’t you
sling your wealth about in this prodigal way. You take it home, and
put it out at compound interest, and some day you’ll buy out Gould
or Rockefeller.”
Helen. How shockingly rude! What did you do?
Lord F. I told him if he didn’t behave himself, I’d give him in
charge. (Men all laugh.) Now, is that another of your American
jokes?
Dennis (aside). Oh! isn’t this rich?
Amy (aside to Lord F.). Oh, you are beautiful!
Lord F. (bewildered and starting). Thanks awfully,—if you really
mean it!
Steven (coming down to back of Lord F.’s chair). What did she
say, Lord Ferrol? You must take Miss Sherman with a grain of
allowance.
Amy. I’m not a pill, thank you.
Lord F. Why, who said you were?
Dennis. Only a homœopathic sugarplum.
Lord F. I don’t understand.
Steven (aside to Lord F.). Keep it up, old man. It’s superb!
Lord F. I beg pardon,—did you speak to me?
Steven (retreating to fireplace). Oh, no! only addressing vacancy.
Mrs. W. I hope, Lord Ferrol, that there has been enough pleasant
in your trip to make you forget what has been disagreeable.
Lord F. Er, quite so. The trip has been vastly enjoyable.
Rose. Where have you been?
Lord F. I landed in New York and spent the night there, but it was
such a bore that I went on to Niagara the next day. From there I
travelled through the Rockies, getting some jolly sport, and then
went to Florida.
Mrs. W. Why, you have seen a large part of our country; even more
than your father did. I remember his amazement at our autumn
foliage. He said it was the most surprising thing in the trip.
Amy. What did you think of it, Lord Ferrol?
Lord F. It struck me as rather gaudy.
Rose. Why, I had never thought of it, but perhaps it is a little vivid.
Dennis (aside to men). Oh, how I should like to kick him!
Steven (aside to Dennis). Hush! You forget that “Codlin’s your
friend—not Short.”
George. Didn’t you ever see a Venetian sunset?
Lord F. Oh, yes. Why do you ask?
George (sarcastically). I merely thought it might be open to the
same objection!
Lord F. It might—I don’t remember. I’ll look it up in my journal
when I get home, and see if it impressed me at the time.
Helen. Do you keep a journal? (Rises and sits on footstool at Lord
F.’s feet.) How delightful! (Beseechingly.) Oh, won’t you let me look
at what you have with you?
Rose. Please, Lord Ferrol!
Amy. Ah, do!
Lord F. It would bore you, I’m sure.
Dennis (aside). I don’t care if he isn’t a double-barrelled earl, I
should like to kick him all the same!
Helen. Lord Ferrol, you must let us hear some of it.
Rose. If you don’t we shall think you have said something
uncomplimentary of the American women.
Lord F. No, I assure you I have been quite delighted.
Amy. Then why won’t you let us see it?
Lord F. Er, I couldn’t, you know; but if you really are in earnest, I’ll
read you some extracts.
Omnes. Oh, do!
Lord F. I ought to explain that I started with the intention of
writing a book on America, so this (producing book) is not merely
what I did and saw, but desultory notes on the States.
Rose. How interesting!
Lord F. After your suggestion of what I have written of the
American women, I think it best to give you some of my notes on
them.
Mrs. W. By all means!
Lord F. (reading). “Reached Washington, the American capital,
and went direct to Mrs. ——. Cabman charged me sixteen shillings.
When I made a row, butler sent for my host, who, instead of calling a
constable, made me pay the fellow, by insisting on paying it himself.
Mr. —— is a Senator, and is seen very little about the house, from
which I infer the American men are not domestic—presumably,
because of their wild life—”
Mrs. W. (with anxiety). Their what?
Lord F. Their wild life,—spending so much of their time on the
plains, don’t you know.
Mrs. W. (relieved). Oh! Excuse my misapprehension.
Lord F. (reading). “The daughter is very pretty, which Mrs. ——
tells me is unusual in Washington society—as if I could be taken in
by such an obvious Dowager puff! (Men all point at Mrs. W. and
laugh. Mrs. W. shakes her finger reprovingly.) Miss —— says the
Boston girls are plain and thin, due to their living almost wholly on
fads, which are very unhealthy.” (Speaking.) I couldn’t find that
word in the dictionary.
Steven. Sort of intellectual chewinggum, Lord Ferrol.
Dennis. Yes, and like gum, you never get beyond a certain point
with it. It’s very fatiguing to the jaw.
Lord F. (reading). “She says the New York girls are the best
dressed in the country, being hired by the dressmakers to wear
gowns, to make the girls of other cities envious, and that this is
where they get all the money they spend. Very remarkable!”
Helen. Something like sandwich men, evidently.
Lord F. (reading). “The Philadelphia girls, she says, are very fast,
but never for long at a time, because the men get sleepy and must
have afternoon naps.”
Amy. Did she tell you that insomnia is thought to make one very
distinguished there?
Lord F. (making note in book). Er, thanks, awfully. (Reading.)
“She says that the Baltimore girls are great beauties, and marry so
quickly that there is generally a scarcity. It is proposed to start a joint
stock company to colonise that city with the surplus from Boston,
and she thinks there ought to be lots of money in it! Another extreme
case of American dollar worship! The Western girls, she told me, are
all blizzards.” (Speaking.) I don’t think I could have mistaken the
word, for I made her spell it. Yet the American dictionary defines
blizzard as a great wind or snow storm.
George. That is it, Lord Ferrol. They talk so much that it gives the
effect of a wind storm.
Lord F. Ah! much obliged. (Reading.) “Went to eight receptions in
one afternoon, where I was introduced to a lot of people, and talked
to nobody. Dined out somewhere, but can’t remember the name.
Took in a Miss ——, a most charming and lovely—”
Dennis (interrupting). Ah, there!
Lord F. I beg pardon.
Rose. You must forgive his rude interruption, Lord Ferrol.
Lord F. Oh, certainly! You’re sure you’re not bored?
Omnes. By no means. Do go on.
Lord F. “A most charming and lovely girl from New York. She
thinks Miss —— characterised the cities rightly, except her own.
Asked me if I thought she was only a dressmaking advertisement? As
scarcely any of her dress was to be seen, I replied that as I couldn’t
look below the table, I was sure it was the last thing one would
accuse her of being. She blushed so violently that I had to tell her
that I had seen much worse dresses in London; but that didn’t please
her any better, and she talked to the man next her for the rest of the
evening. (All have difficulty in suppressing their laughter.) I met a
Boston girl afterwards who—”

[Bell rings.

Mrs. W. Lord Ferrol, there is our summons to the upper regions.


We will not make a formal guest of you, but will all guide you to your
room.
[All rise.
Lord F. Er, thanks.
Mrs. W. (taking Lord F.’s arm). Your trunks not having arrived
(exit r. d. with Lord F.) we will none of us—

[Exit Amy and Helen r. d.,


evidently laughing.
Rose exits l. d. Men all
go off into paroxysms
of laughter.

Steven (suddenly). Well, I must go and coach him.


Dennis. My dear fellow! you can’t paint the lily.
Enter Rose, quietly, l. d. Men all check their laughter.
Rose. I came back for my skates. Why, what are you laughing
about! And pray what lily are you going to paint?
George. My dear cousin, when a person enters a room already
occupied, without due warning, she must not ask questions relative
to the subject under discussion.
Rose (talking down stage to conceal her laughter). I know very
well what you were talking about. You were making fun of Lord
Ferrol.
Steven. Give you my solemn word we were not making fun of Lord
Ferrol.
Men. No! How suspicious you girls are!
[All laugh. Helen tries to
suppress her laughter,
and then rushes out r.
d., followed by Steven.

Dennis. That journal was a mighty clever dodge of Parker’s. It


staved off all dangerous questions till Steve could coach him.
George. There were some capital notions in it, too. If he will only
give us a few more risqué anecdotes, none of the girls will dare talk to
him.
Dennis. Did you see Mrs. Wycherly’s horrified expression when he
alluded to the wild life of the American men? I am sure she thought
he was going to give us some “exposures in high life.”
Enter Steven hurriedly, r. d.
Steven. Look here, fellows, you’ve got to help me. The girls have
planted themselves on the divan upstairs, and I can’t go to Ferrol’s
room without their seeing me. Come up and occupy them, while I
slip in.
Dennis. Decoy ducks, eh?
Stuart. That’s it. Come along, George.

[All exit r. d.,—slight pause.

Enter Lord F. l. d., dressed as before.


Lord F. (looking about). I must have made a mistake in the door,
for I got into the butler’s pantry; but this is right, I am sure. Queer
place and queer manners! Will make interesting reading, though. Ah,
a good chance to fill up my journal. (Seats himself at desk, takes out
book, and writes, speaking aloud and soliloquising as he does so.)
“At 5.15 reached some unpronounceable and unspellable place. Was
met by Mrs. Wycherly at front door”—curious fashion that! It made
me take her for the housekeeper at first. “She insisted, in spite of my
protests,”—I suppose it was an American idea of hospitality,—“in
taking me at once into the drawing-room and presenting me to the
house-party, and giving me a cup of tea. I felt very disagreeable, both
from the condition I was in, and the fact that all of them kept making
remarks which were entirely unintelligible to me. The young ladies
were very kind, but more forward even than they are in England,
though in a different way.”—I confess I rather liked it.—“Read some
of my journal aloud and had no corrections. Blizzard applied to
Western girls means that they talk a great deal. Was shown to my
room by Mrs. Wycherly and the young ladies, which was rather
embarrassing, especially as they seemed inclined to linger, and only
hurried out on the appearance of the gentlemen. On leaving, one of
the girls slipped her hand into mine and gave it a distinct squeeze, at
the same time asking in a whisper, ‘Did your sister send her love?’”—
Now the idea of Sappho sending her love to a girl of whom she had
never heard!—“I pretended not to hear, but she evidently knew that
she had been too free, for as she left she jerked her head towards the
gentlemen and said, ‘They didn’t see.’ Could not change my travelling
suit, my boxes having gone astray. Found a letter pinned to my pin-
cushion, and when the valet brought the hot water, he gave me
another. Both, judging from the hand-writing and paper, seem to be
written by ladies and gentlemen.”—I should like to know what they
mean? I wonder if it’s good form in America to play jokes on guests?
(Produces notes and reads.) “Dear F.”—(Rises and comes to c.) Now
the idea of the fellow writing to me in that way on the acquaintance
of a single afternoon—why, even my best friends only say “Dear
Ferrol.”—“You were simply marvellous. I would have staked my
bottom dollar on your identity, if I had not known who you were.”—
Now what does he mean by that, I wonder?—“You were so real that
Dennis wanted to kick you, and nothing but the presence of the
ladies prevented him.”—Gad! I wonder if these fellows can be
gentlemen, and if so, whether they are a fair specimen—kick me!
(Pause.) Well, I suppose they’re jealous.—“So don’t be too hard on
us. Now as to the future. If we do not see each other this evening, you
must get up before breakfast, go out of the side door, and strike
across the lawn toward the river. Three minutes’ walk will bring you
in sight of a little summer-house. Come to it, and some of us will be
there prepared to instruct you as to yourself, and put you on your
guard as to the girls, who, you see, are making a dead set at you.”—
You know, that’s just what I thought.—“Remember, in the bright
lexicon, etc., etc., Steve.”—Now what does he mean by “bright
lexicon?” And does he think I’m going to tramp through the snow,
when it’s so evidently a joke? (Opens other note.) “You dear love of a
snob”—Now I should vastly like to know how that is meant. I don’t

You might also like